Abstract
Single-particle electron cryo-microscopy and computational image classification can be used to analyze structural variability in macromolecules and their assemblies. In some cases, a particle may contain different regions that each display a range of distinct conformations. We have developed strategies, implemented within the Frealign and cis TEM image processing packages, to focus classify on specific regions of a particle and detect potential covariance. The strategies are based on masking the region of interest using either a 2-D mask applied to reference projections and particle images, or a 3-D mask applied to the 3-D volume. We show that focused classification approaches can be used to study structural covariance, a concept that is likely to gain more importance as datasets grow in size, allowing the distinction of more structural states and smaller differences between states. Finally, we apply the approaches to an experimental dataset containing the HIV-1 Transactivation Response (TAR) element RNA fused into the large bacterial ribosomal subunit to deconvolve structural mobility within localized regions of interest, and to a dataset containing assembly intermediates of the large subunit to measure structural covariance.
Keywords: Single-particle cryo-EM, cisTEM, Frealign, classification, heterogeneity, ribosome
1. Introduction
Single-particle electron cryo-microscopy (cryo-EM) enables the visualization of macromolecules and their assemblies under near-native conditions [1]. In recent years, the technique has gained popularity, in part due to its ability to determine macromolecular structures at near-atomic resolution and without the need for crystallization [2]. While advances in resolution [3,4] have expanded the scope of the technique over the last five years, the ability to decipher structural heterogeneity is an ongoing area of development in the field [5,6]. Given that macromolecules, and especially their assemblies, are dynamic, image classification opens up the possibility to address novel types of questions pertaining to the molecular mechanisms underlying their function.
Structural heterogeneity can be either compositional or conformational in nature. Compositional heterogeneity means that the stoichiometry of subunits within an assembly varies within the dataset, such as particles containing or missing an additional, loosely associated protein factor. Conformational heterogeneity means that particles are uniform in composition, but the constituent components within each object can be flexible and can adopt one of several structurally different states. Conformational heterogeneity can be further subdivided into either discrete or continuous conformational heterogeneity. In the former case, the macromolecule would adopt one of several distinct structural states, each represented by a local minimum within the energy landscape describing all possible states. In the latter case, local energy minima are less distinct and the flexible regions can adopt many intermediate states to produce a quasi-continuum of states. Finally, a fourth case can be defined as containing a combination of the above scenarios.
To understand structural heterogeneity within a single-particle experiment, the particle images are subject to a classification procedure, which assigns each particle to one of potentially many different classes. In the simplest scenario, a global classification strategy assigns each particle to a specific class on the basis of differences between the particle image and a set of references, evaluated across the entire image. Different classification approaches have been developed, including supervised and unsupervised techniques, and numerous variations have been implemented to analyze structural heterogeneity [6-11]. Global 3-D classification does not require specific knowledge about the type and location of the heterogeneity, making it an integral part of today’s processing workflow of virtually all single-particle software packages. Given that macromolecular assemblies can be highly dynamic, and because every subdivision leads to fewer particles within each class (and thus lower signal and loss of resolution), the fundamental disadvantage of a global classification strategy is the limited number of well-defined classes that can be recovered from a dataset of a given size. This is particularly true when one wants to resolve variability in small, heterogeneous regions that may easily be lost during a global classification procedure. In contrast to a global classification strategy, “focused classification” zooms in on a region or feature of interest, in order to understand structural heterogeneity in a localized manner [12-15]. Focused classification can overcome the potential particle number limit associated with global classification by reducing the number of classes needed to represent the local variability and (in principle) excluding other regions of the particle from the analysis. This approach is particularly advantageous when regions outside of the area of interest are themselves dominated by structural heterogeneity. For example, minor domain movements within an otherwise dynamic macromolecular assembly might be difficult to resolve using global classification techniques alone because the majority of the signal guiding the classification procedure is dominated by regions outside of the area of interest. In another example, two large regions can exhibit independent variability, and a global classification may not converge on a solution that represents all possible states, or the number of states required leaves too few particles in the corresponding reconstructions, limiting their resolution. In general, focused classification provides an alternative means to deconstruct highly dynamic and/or heterogeneous datasets, reducing the analysis to a more tractable problem. Numerous successful applications of focused classification have been used to understand the independent movements of regions of large macromolecular complexes, such as the spliceosome and the ribosome [16-19].
Focused classification requires defining a region of interest within the particle and excluding density outside it. In its original description and implementation [14], a 3-D mask is defined for a region of interest, projected along the view determined for each particle and applied as a 2-D mask to the particle images and reference projections. Regions of interest were those of high variance in a bootstrap 3-D variance analysis, and classification was based on cross-correlation functions [14]. Later implementations in Frealign [8,20] and cisTEM [21] rely on user-defined regions of interest and are based on a likelihood function for classification. Alternatively, classification can be focused on a region by applying a 3-D mask, followed by standard 3-D classification using the masked reconstructions as references and no further masking in 2-D (masked 3-D classification, [13]). Typical applications of masked 3-D classification are membrane proteins that contain detergent micelles: the 3-D mask is used to exclude the heterogeneous micelle while focusing on the protein [22]. The primary disadvantage of this 3-D masking approach is that a projection of the density, which only contains the masked region, is compared with the particle image, which contains the masked region in addition to all other overlapping density, and this additional density can obscure the features to be classified. To reduce the problem of density discrepancy, the density outside the mask could be included in the reference after applying a low-pass filter [20,23]. The filter removes noise from the disordered regions of the particle while maintaining valid low-resolution signal to minimize the mismatch between reference and images. To further reduce density mismatch, another approach was introduced, whereby, in addition to masking the 3-D object, the density outside the mask is computationally subtracted from the particle images [12,13,15]. This leaves a projection of the masked 3-D object and a density-subtracted 2-D particle image, which contains comparable features that can be used for classification. An advantage of the density subtraction approach is that it can, in principle, be implemented in a hierarchical fashion, in order to subtract increasingly finer features in a step-wise manner. The (non-hierarchical) density subtraction approach has been used to improve heterogeneous regions of numerous macromolecular complexes that could not be improved using a global classification approach alone [12,13,15,19,24]. However, there are also disadvantages to this method. First, density subtraction requires an accurate model of the signal in each particle image to properly subtract the desired density. Especially when looking at small regions and subtracting density corresponding to larger volumes, the subtraction may leave residual signal in the raw images, a problem that is exacerbated if the complex exhibits greater heterogeneity than is accounted for in the references used for density subtraction. The residual signal from the incomplete density subtraction can interfere with subsequent classification and obscure the variability in smaller regions (especially if applied in a hierarchical context). These complications are avoided by the 2-D masking approach [8,14,22,24]. In the following, we will use the term “focused classification” for both approaches and distinguish them by their masking operation in 2-D and 3-D, respectively.
A major advantage of any focused classification approach is its ability to selectively classify features of interest within a distinct region of a cryo-EM map, which opens up numerous potential directions. First, it enables classification of pseudo-symmetric features in a particle that are related by a symmetry operator but not strictly symmetric due to independently dynamic mobility [15,25,26]. For example, surface-exposed regions of macromolecules may not obey the strict symmetry that may apply to the particle core, leading to loss of resolution in the peripheral regions of otherwise symmetric particles such as icosahedral viruses (reviewed in [27]). To classify pseudo-symmetric regions of a particle, the images are first aligned according to a common reference frame compatible with the pseudo-symmetry. The symmetry is then dropped, and all pseudo-symmetry related reference projections are generated for each particle image, alignments are determined for each of these projections, and an asymmetric reconstruction is calculated using each particle image multiple times to include all pseudo-symmetry related alignments. This effectively multiplies the number of particles in a dataset by the number of different possible symmetry operations and enables classification of different views into different classes, thereby resolving the heterogeneity in the pseudo-symmetric regions. This approach can, therefore, improve the resolution of density that would otherwise be an average of multiple structural states due to symmetrization. The approach has been applied, for example, to resolve density detail that was not visible after global classification alone [26], and to reveal genome structures within viral particles [28] (for other examples, see [27]). Second, selectively focusing on discrete asymmetric units can reveal covariant heterogeneity within the data. For example, two different regions located on opposite sides of a particle might be structurally coupled with each other by allostery. If the variability of two regions is random, there should be no correlation in the assignment of these regions to different classes during pseudo-symmetric classification. However, if correlation is present, this indicates covariance in the two regions. In the simplest case, counting of the number of matching asymmetric units within the same class, and comparison with a random distribution, would provide evidence for structural covariance. This phenomenon represents an area of development that may facilitate understanding global structural landscapes of dynamic macromolecular machines.
In this manuscript, we explore different strategies to focus classification on regions of interest, with both synthetic and experimental data. We show the advantages and disadvantages of the 2-D and 3-D masking approaches, and additionally explore their ability to calculate density covariances within otherwise distinct regions of a reconstruction. Finally, we show how focused and masked 3-D classification can be applicable to heterogeneous experimental datasets, highlighting a test case that is relevant to visualizing mounted targets on scaffolds using single-particle cryo-EM, and a case of structural covariance.
2. Materials and methods
2.1. Generation of synthetic humanoid datasets.
Synthetic datasets were generated as previously described [8]. Briefly, we randomly shifted and rotated projection images of humanoid structures, added noise, a CTF (to have CTF-modulated noise components), envelope function, and a final layer of noise. The final signal-to-noise ratios (SNRs) were 0.100, 0.050, 0.025, 0.013, and 0.006, covering a wide range, but consistent with experimental calculations [29]. The SNR values were computed as the ratio of the variance of the signal and the variance of the noise in a given image. To reduce spurious correlations associated with the CTF for covariance analysis, we used a 640-pixel box size for projecting the data, and prior to the addition of noise and the CTF. 28 distinct datasets were made, corresponding to the different structural combinations of arms, hands, and feet (Figure 1). Combined datasets corresponding to the three distinct scenarios were then generated from the individual 28 datasets. Each combined dataset contained 10,000 particles (pixel size 5.24, box size 80 after Fourier resampling) with each of the 28 sub-datasets selected randomly. To initiate classifications, the orientations were either used as is (Table 1-2), perturbed by introducing a small standard error, with an average offset of +/− 2.5 pixels for shifts and +/− 4° for Euler angles, both randomized with a normal Gaussian distribution (Supplementary Table 1), or they were derived de novo in cisTEM by running 1 cycle of global search, frequency-limited to 20Å, and another cycle of local refinement, frequency-limited to 10Å (Supplementary Table 2).
Table 1-. Results of focused classification on an asymmetric unit for the three different scenarios.
Scenario 1 | Scenario 1 | Scenario 2 | Scenario 2 | Scenario 3 | Scenario 3 | |
---|---|---|---|---|---|---|
SNR | 2-D mask | 3-D mask | 2-D mask | 3-D mask | 2-D mask | 3-D mask |
0.100 | 0.99 | 0.91 | 0.85 | 0.70 | 0.87 | 0.71 |
0.050 | 0.96 | 0.85 | 0.73 | 0.56 | 0.75 | 0.61 |
0.025 | 0.87 | 0.74 | 0.42 | 0.41 | 0.46 | 0.39 |
0.013 | 0.72 | 0.57 | 0.21 | 0.17 | 0.21 | 0.17 |
0.006 | 0.47 | 0.36 | 0.09 | 0.08 | 0.08 | 0.09 |
Table 2-. Results of focused classification on an asymmetric unit when the mask is applied on the wrong region.
SNR | 2-D mask | 3-D mask |
---|---|---|
0.100 | 0.23 | −0.01 |
0.050 | 0.11 | 0.00 |
0.025 | 0.01 | 0.01 |
0.013 | 0.00 | 0.01 |
0.006 | 0.00 | 0.00 |
pure noise | −0.01 | 0.00 |
2.2. Particle assignment during classification.
To facilitate quantitative assessment, we made the assumption that each classified particle belongs to the class with the highest probability (occupancy in Frcalign/cisTEM). At higher SNRs, this was an insignificant assumption, as most occupancies were close to 1 or 0; however, at lower SNRs, particles are represented by lower occupancies in multiple classes with slight differences between them. By assuming that each asymmetric unit corresponds to the class with the highest occupancy, we could simplify the calculation of Kappa coefficients (a number to quantify the success of classification, see below) and other analyses.
2.3. Measures for evaluating the accuracy of classification.
To evaluate the accuracy of each classification trajectory, we define the following measures. For each asymmetric unit in each class:
TP (true positive) — starting occupancy 100, ending marginal occupancy greater than all other classes.
FP (false positive) — starting occupancy 0, ending marginal occupancy greater than all other classes.
TN (true negative) — starting occupancy 0, ending occupancy less than the class with greatest marginal occupancy
FN (false negative) — starting occupancy 100, ending occupancy less than the class with greatest marginal occupancy
N: number of observations — TP+FP+TN+FN
Using the definitions above, the following metrics are defined:
Accuracy (the relative observed agreement among raters, or Po) = (TP + TN) / N
Sensitivity = True Positive Rate (TPR) = TP / (TP + FN)
Specificity = True Negative Rate (TNR) = TN / (TN + FP)
Kappa:
where Po is the accuracy, above, and Pe is the probability of chance agreement.
Youden’s Index (J Statistic) = TPR + TNR – 1.
2.4. Calculation and merging of cryo-EM difference maps.
Difference maps between pairs of reconstructions were calculated using the “diffmap.exe” program, which is distributed with Frealign and performs amplitude scaling of the maps in resolution zones before evaluating their difference. For Figure 4, positive values are shown of difference maps computed between each map after global or focused classification and a consensus map without classification. Merging of the difference maps in Figure 4 was performed according to the following procedure. An empty merge volume was generated with zeros for the pixel values. Subsequently, for each pairwise difference map, and for each voxel, if the value of the voxel is greater than the value of this voxel in the merged map, the merged map was set to the greater value.
2.5. Covariance analysis of separate regions of cryo-EM density maps.
To determine whether different regions correlate with one another, correlation coefficients were computed between fractional density values in these regions. An identical procedure was used for both scenarios 2 and 3. First, we performed focused classification, with the requested number of classes, k, identical to the expected number of non-degenerate asymmetric units. Binary masks were created for each region of interest (ROI), namely the hand in each of two positions, the near foot, and the far foot. A mask encompassed the ROI, with minimal incursion into neighboring density. A soft edge was not employed, because the mask was solely used for the purpose of computing fractional density occupancy values. For each of the k resulting maps, and for each ROI, the mask was used to extract the resulting density. Subsequently, the approximate mass in the ROI was calculated using the “volume” command implemented within the EMAN1 processing suite [30]. The resulting mass was optionally normalized to the true mass arising from a perfect classification to judge the quality of the classification, although this step is not strictly necessary for correlation analysis. Finally, the correlation matrix Rij was computed as:
where Cij refers to the covariance between two components i and j. To make sure that there was adequate sampling, the resulting volumes represent an average of 3 independent runs, using random starting class occupancy values for initiating each classification.
For the experimental dataset of assembling ribosomal subunits [31], focused classification was performed using 2-D masks on selected regions using a radius of 30 Å for 100 cycles and specifying 10 classes (Supplementary Figure 4A-B). We observed similar classification results with 3-D masks. The mass inside a specific region of the maps was calculated after masking the region either with a shaped mask (“H58” and “H63") or with a spherical mask (“mask 1”. “mask2”. “L10”. and “large subunit base” in Supplementary Figure 4) using the “volume” command in EMAN1. Covariance was calculated as above.
2.6. Ribosome preparation.
The 57-nt HIV-1 TAR element was appended inserted into twelve different helices (H9, H12, H19, H24, H25, H31, H45, H46, H59, H63, H68 and H98) by replacement of the loop residues to screen for optimal attachment sites. These twelve were chosen based on their location on the periphery of the ribosome and lack of tertiary contacts. All insertions resulted in viable bacterial growth (albeit much slower in some cases). H45 qualitatively yielded the most complete density with the least apparent mobility of the attached RNA (data not shown). Uniformly labeled ribosomes were prepared in the same way for all insertions. To ensure that all ribosomes contain the appended construct, a well-established protocol for introducing and characterizing site-specific mutations into Escherichia coli ribosomes was used [32,33]. Briefly, a Δ7 prrn E. coli strain SQZ10 [34], which has a genomic deletion of all rRNA genes, was used. The rRNA genes are supplied by a plasmid that also contains the levansucrase gene and confers kanamycin resistance (Plasmid 1, pHK-rrnC-sacB). Levansucrase expression is lethal to E. coli when grown on sucrose-containing media [35]. An additional ampicillin-resistant plasmid containing the rRNA genes with the RNA construct of interest inserted (Plasmid 2, p278) was then transformed and grown in liquid culture. Cells were plated on media containing ampicillin and 5% sucrose to select for those that had lost Plasmid 1 but retain Plasmid 2. To confirm the selection, colonies were plated on Kan media to ensure that they cannot grow.
Insertion of TAR into helix 45 of p278 was carried out using site-directed ligase-independent mutagenesis [36]. Mutant plasmids were then transformed into SQZ10 cells and selected using the strategy described above. Mutant ribosomes were purified by first growing to mid-logarithmic phase (OD550 = 0.3-0.5) in 500 mL Luria Broth while shaking at 37 °C then chilled on ice for 30 minutes and pelleted by centrifugation. The cell pellet was then resuspended in 20 mL Resuspension Buffer (20 mM Tris-HCl, pH 7.5, 10 mM MgCl2, 100 mM NH4C1, 0.5 mM EDTA, 2 mM CaCl2, 6 mM β-mercaptoethanol). The resulting resuspension was lysed through a French Press three times, filtered through a 0.45 μm syringe filter and clarified by centrifugation at 18,000g for 30 minutes twice. The supernatant was concentrated to ~500 uL using a 50K MWCO filter (Amicon) and layered onto 36 mL 10-40% sucrose gradient in Gradient Buffer (50 mM Tris-HCl, pH 7.5, 10 mM MgCl2, 100 mM NH4C1, 6 mM β-mercaptoethanol) and ultracentrifuged in SW-32Ti rotor at 16,700g for 18.5 hours at 4 °C. 70S ribosomes fractions were collected, buffer exchanged into Storage Buffer (20 mM Tris-HCl, pH 7.5, 10 mM MgCl2, 100 mM NH4C1, 6 mM β-mercaptoethanol), aliquoted and stored at 4 °C until ready for grids.
2.7. Cryo-EM grid preparation and data acquisition.
2.5 μl of purified ribosomes after sucrose fractionation were diluted to a concentration of 4 mg/ml with Storage Buffer and placed on UltrAuFoil R1.2/1.3 300-mesh grids (Quantifoil) that were plasma-cleaned (75% argon/25% oxygen atmosphere, 15 W for 7 s using a Gatan Solarus). After 1 min incubation under >80% humidity at 4 °C, grids were blotted manually with a filter paper (Whatman No. 1) before being plunged into liquid ethane cooled by liquid nitrogen using a manual plunger. Leginon was used for automated EM image acquisition [37]. Grids were imaged on a Titan Krios microscope (FEI) operating at 300kV and equipped with a K2 Summit direct electron detector (Gatan). A nominal magnification of 22,500x was used for data collection, giving a pixel size of 1.31 Å at the specimen level, with the defocus range of −0.5 μm to −2.5 μm. Movies were recorded in counting mode with an accumulated total dose ~50 electrons/Å2 fractionated into 60 frames with an exposure rate of ~7 electrons/pixel/s.
2.8. Image processing and model generation.
All pre-processing was performed within the Appion suite [38]. Motion correction was carried out by using the program MotionCor2 [39] and exposure-filtered in accordance with the relevant radiation damage curves [40]. The CTF for each micrograph was estimated using CTFFind4 [41] during data collection. 70S ribosomes served as a template for automatic particle picking using FindEM [42]. 346K particles were selected and subjected to per-particle CTF estimation using the program GCTF [43]. After 2-D and 3-D classification in GPU-enabled Relion [44,45], selected classes containing 232K particles were combined to a single stack and imported to Frealign for global refinement with 8 classes. Every ten cycles of refinement/classification, the reconstructed maps of all 8 classes were aligned to a common 50S scaffold using custom scripts implemented for performing a 3-D alignment within the Chimera package [46] while running Frealign/cisTEM, in order to maintain a common reference-frame for subsequent classification. A total of 50 cycles of global refinement/classification were performed. Subsequently, the best orientations were combined into a single parameter file for classification. Focused classification was performed for 500 cycles, and without further alterations to the orientations, by defining a spherical mask of 30 Å, centered on the expected region of TAR. Global resolution for the final map was estimated using the Fourier shell correlation (FSC [47]) at 0.143 and directional resolution anisotropy was evaluated by the 3-D FSC server [48]. Local resolution estimation was performed using sxlocres.py implemented within Sparx [49].
The model of TAR attached to H45 of the 23S ribosome was prepared by first removing the loop residues of H45 from a recent 2.9 A structure, PDB ID 5AFI [50], and removing the polyA nucleotides from a model of TAR based on small-angle X-ray scattering data. The terminal backbone atoms were docked and aligned in UCSF Chimera [46]. The TAR region was then rigid-body refined into the cryo-EM density in Coot [51].
3. Results
3.1. Quantitative characterization of focused classification with 2-D and 3-D masking
To quantify the performance of different focused classification approaches, including 2-D and 3-D masking, we used the algorithms implemented in Frealign [8,20] and applied them to simulated data. We generated multiple synthetic datasets that are characterized by various degrees of heterogeneity. Figure 1 shows the distinct components of a “humanoid” reconstruction, with the legs, body, neck, and head positioned identically, and representing the constant, homogeneous regions of a particle, characterized by twofold rotational symmetry. In contrast, the arms can belong to one of two conformations, and are therefore characterized by pseudo-symmetry. Lastly, the hands and feet, which represent small features of a map that might be lost during global classification, can be either present or absent. We generated maps representing all possible combinations of these features and created multiple synthetic datasets containing random translations and rotations, a contrast transfer function (CTF), an envelope function, and multiple levels of noise, bringing the final CTF-modulated SNR down to 0.100, 0.050, 0.025, 0.013, or 0.006, as previously described (Supplementary Figure 1 and [8,52]). Below, we describe three scenarios, which serve to demonstrate different aspects of focused classification. Importantly, in all described cases, focused classification is performed on an asymmetric subunit basis, which allows one to break down and constrain the heterogeneity problem [27] and reveal discrete movements within a more complex landscape of heterogeneity.
First scenario – the base, pseudo-symmetric case:
In the base scenario, only the arms/hands are mobile and can adopt one of two distinct positions within an asymmetric unit, and the hand always remains co-occupied with an arm (Figure 1A). This case represents a common problem with pseudo-symmetric experimental datasets, whereby most of the molecule is homogeneous and characterized by symmetry (here, twofold), but one feature does not obey symmetry constraints (here, the arms/hands). There are four combinatorial possibilities, three of which would be expected to be recovered using a global classification strategy (structures A2 and A3 are degenerate and are related by 180° rotation). However, in an asymmetric focused classification centered on one side of the humanoid, one would expect to find only two non-degenerate possibilities, because the arm/hand can reside in only one of two structural states.
Second scenario – resolving small variable densities:
In the second scenario, we use focused classification to recover finer features within a more complex structural landscape. In addition to the arms occupying one of two distinct positions, the hands can be either present or absent, and their occupancy is completely randomized (Figure 1B). Thus, for each of the four structural states described in the base scenario, one would see four additional structural states represented by the presence or absence of each of two hands. In sum, there are 16 different combinatorial possibilities, global classification would be expected to uncover 10 non-degenerate classes, but only four classes should be resolved using asymmetric classification.
Third scenario – resolving small variable densities and their covariances:
The third scenario is identical to the second scenario, except that a hand on each asymmetric unit is always co-associated with its corresponding foot (Figure 1C). For example, if the left hand is present, so is the left foot, and if it is absent, the foot too is absent; the same applies to the opposite asymmetric unit. One can then classify on the hand only, but look at both the hand and foot areas in the resulting maps and count the number of times that density for the hand co-occurs with density for the foot. In doing so, one can begin to decipher patterns and relationships within distinct components.
3.2. Focused classification on an asymmetric subunit of a synthetic humanoid
For each of the three cases described above, and for all five levels of noise, we performed focused classifications on a single asymmetric unit, with a mask around the region encompassing an arm and hand (Figure 2A). For these experiments, the particle alignment parameters were set to the correct parameters used to generate the data and were kept fixed during classification. To quantitatively evaluate the accuracy of classification, we used the κ coefficient as a statistical measure, which captures the performance of a diagnostic test, while taking into account the possibility of occurrence by chance [53]. We also used the Youden’s J statistic (informedness, [54]), but found that the results largely paralleled those of κ (data not shown). The κ coefficient evaluates the agreement of raters for classifying N items into mutually exclusive classes and relies on the precise knowledge of the number of false positives (FP), false negatives (FN), true positive (TP), and true negatives (TN), which we can obtain from the data (see Methods). Importantly, κ estimates the probability of an “informed” decision by taking into account random chance and returns 0 when classification is random (chance) and 1 when perfect classification is achieved. Qualitatively, it is simple to visually assess how “clean” the classification is, and whether or not the particles were correctly partitioned, by looking at the separation of the arms in our data. Supplementary Figure 2 shows how the results look when classification is nearly perfect (Supplementary Figure 2A), when classification is completely random (Supplementary Figure 2D), and two intermediate cases (Supplementary Figure 2B-C). A correct classification partitions the arms within a single asymmetric unit (and not its counterpart) into two distinct classes, with no signs of contaminating density (κ close to 1); as more errors are introduced, the two classes become progressively more mixed, up to a point where one cannot distinguish between the two volumes within or outside the asymmetric unit (κ close to 0, Supplementary Figure 2). In this manner, we could also determine which parameters provide optimal classification results (e.g. mask size, soft edge drop-off, etc., as demonstrated in Supplementary Figure 3), which we determined prior to evaluating the test cases.
Table 1 shows the result of focused classification for all three scenarios, using both a 2-D masking approach and a 3-D masking approach, as implemented in Frealign and evaluated using the κ coefficient. The resulting numbers indicate the following general trends. First, for all three cases and for virtually all SNRs, the 2-D masking approach was superior to the 3-D masking approach. Such a result is not surprising because, as indicated in the introduction, the disadvantage of the 3-D masking approach, in the absence of density subtraction, is that the experimental projection images contain overlapping density along the path of the projection, as compared to a projection of the masked region from the reference map. The second general trend is that, with more mobile components within a dataset, and the smaller the desired features for detection, the lower the κ value and the more challenging it is to correctly classify the data. We observe major differences in accuracy between case 1 and either 2 or 3, because the latter contain more moving parts. However, the accuracies between cases 2 and 3 are roughly similar, likely because only small structural differences characterize the two datasets. Third, a lower SNR makes it more challenging to correctly classify the data, which is not surprising. However, it was surprising that, for the base scenario, even at the lowest SNRs and given how small of a feature we were trying to detect, we could still recover meaningful information and reasonably clean classes using the 2-D masking approach in particular, and to a lesser extent using the 3-D masking approach. In scenarios 2-3, higher SNRs were required to recover the correct classes (0.025 compared to 0.006, or ~4 times as high). The analyses were repeated after perturbing the orientation parameters (Supplementary Table 1), or after deriving them completely de novo (Supplementary Table 2). The same general trends were observed in both cases, although, as expected, the numerical accuracies and corresponding Kappa values were lower. Clearly, providing the correct orientations benefits focused classification.
Our experiments reveal that the 2-D masking approach, in its implementation within the likelihood-based framework of Frealign/cisTEM, does not completely isolate the area of interest from its surrounding density. While the 2-D masking approach produces more accurate results in the cases analyzed, its primary disadvantage is that projection images can contain additional density along the direction of the projection; if this density is homogeneous, it should be neutral in terms of classification, but if it is itself heterogeneous, it can bias the classification results. To account for this and to quantify the bias, we went back to the base scenario, where only the arm/hand combinations can move, but applied the mask onto an area of a leg and classified in that region (Figure 2B). We thus asked whether we can recover density for the arms, despite the mask being situated in a different location. As before, the number of correctly assigned particles was judged based on the arm/hand classes. If the arms completely determine the classification results, we would expect to see a κ coefficient of 1, whereas in the absence of crosstalk between arms and legs, the arms/hands would be randomly assigned and the κ coefficient would be 0. Table 2 shows that only at the highest SNRs does the heterogeneity outside of the area of interest influence the classification, and with a maximum κ coefficient of 0.23, the bias is not very severe. For SNR values of 0.025 and below, the results are effectively random. For the same dataset, a κ coefficient of 0.87 is obtained for an SNR of 0.025 when the mask is in its correct position around an arm. In contrast to the 2-D mask, when a 3-D mask is applied to the same location, the results are completely random at all SNRs. This is exactly what we would expect, because density outside this mask should not be introduced into a projection image after application of a 3-D mask. The above results indicate that bias generated by heterogeneity outside the area of interest is present but minor when using the 2-D masking approach, and absent in the 3-D masking approach.
3.3. Focused classification can identify covariant components in distinct regions of a map
Each individual object within a heterogeneous single-particle cryo-EM experiment can contain a unique combination of dynamic elements residing in distinct structural states. When multiple components are dynamic, and/or if they bind (or dissociate) in different regions, the conformational/compositional states of the components can be linked, for example by allostery. Using focused classification, one can treat two distinct regions separately, and then ask whether there is any inter-dependence by calculating covariances within masked regions.
To evaluate covariance between distinct regions of a map, we used the datasets prepared for scenarios 2-3. In scenario 2, the presence of either hand, or either foot, are random and are not related to one another. In contrast, in scenario 3, the presence of a hand on one side of the humanoid is always correlated with the presence of a foot on that same side, whereas the opposite foot is randomly occupied and is not correlated with anything. Thus, one can focus classification on the hands and evaluate the density in the region of the feet, which are not included in the focus region. Quantitatively, once the dataset is classified and subdivided into groups, one would simply calculate the fractional density occupied by each component within the class (e.g. hand in position 1, hand in position 2, near foot, and far foot) normalized to its expected value, and compute a correlation matrix (see Methods) between the components. Since the presence of a foot is always correlated with the hand on the same side of the humanoid, irrespective of the conformation of the arm/hand, we further simplify the analysis by grouping both mutually exclusive hand positions into, more generally, a “near hand”. Thus, there are three regions for which fractional occupancies are computed – a “near hand” (blue in Figure 3), where the mask is applied for classification, a “near foot” (purple in Figure 3) on the same side of the humanoid, and a “far foot” (pink in Figure 3) on the opposite side of the humanoid. Given the nature of the mask, everything except for the hands is excluded from the classification. Since the mask is applied on an asymmetric-unit basis, the region that would otherwise constitute the “far hands” is not separated, and both mixed conformations are observed.
For scenario 2, whereby no covariance is expected, the volumes captured through focused classification on an asymmetric-unit basis, and representing the four non-degenerate classes, are displayed in Figure 3A. As expected, they differ in the presence, absence, and overall conformation of the hands. For example, classes 1,2 or classes 3,4 differ by the presence or absence of a single hand; classes 1,3 or classes 2,4 either do or do not have hands, respectively, but differ in the conformation of the arms; finally, classes 1,4 or classes 2,3 differ in both hand occupancy and arm conformation. Other than the hand/arm differences, no other regions of the maps have any apparent variability. Quantitatively, this is summarized by a correlation matrix that describes the relative interdependence between the different components (Figure 3B). A value of 1 means that the pairwise occupancies of any two components are perfectly correlated, whereas a value of 0 means that they are completely random (a value of −1 means that they are anti-correlated). Identical components, related by the diagonal, are perfectly correlated, by definition. Otherwise, it is apparent that no two regions of the map are correlated with one another. This situation is different for scenario 3, however, which was designed to have the nearby hand and foot covary. The volumes captured through focused classification again represent the expected non-degenerate classes, and the hands/arms are related to one another in an identical manner as before. However, this time, it is clear that classes 2 and 4 are missing the nearby foot, whereas classes 1 and 3 maintain full occupancy. The correlation matrix now shows that the hand is always co-associated with the nearby foot. The occupancy of the far foot, on the other hand, remains random, and is accordingly associated with a low correlation value. The same experiment can be performed for more complicated combinations of hands and feet, but the principle is the same – that assessing the inter-dependence of density occupancies within distinct regions of a macromolecular complex can provide insight into structural covariance within the data.
We also performed an analysis of covariance within an experimental dataset. Previously, we showed that assembling large ribosomal subunits can be isolated in ribosomal protein rpL17-depleted cells and subjected to structural analysis using single-particle cryo-EM [31]. The resulting dataset is highly heterogeneous, and 13 distinct intermediates could be recovered, ranging from the large subunit core to the mature subunit, and substructures in between. The structures assemble in a block-wise manner, and nearby protein and RNA occupancies covary with one another to some extent. Here, we reclassified 20k randomly selected particles from the original data (EMPIAR-10076) and aligned them to the common core ribosomal scaffold. Subsequently, we performed focused classification on a masked region, specifying 10 different classes, and calculated the fractional volume occupancy in regions either inside or outside of the mask. Covariance analysis for masked region 1 (Supplementary Figure 4A) showed, as expected, covariance with Helix 63. Some covariance was detected within a nearby neighboring Helix 58 that is part of the same “block” within the assembly pathway and was not “seen” during classification, but not the L1 stalk (which is distal from the region and does not covary within the reconstructed volumes), or the large subunit base (which remains constant throughout). Similarly, we then performed the same analysis on masked region 2 (Supplementary Figure 4B). We found the highest covariance with the corresponding Helix 58, lower covariance within the neighboring helix 63 that was not “seen” during classification and little to no covariance in the L1 stalk or the control large subunit base. Such analyses conceptually provide an approach to determining covariance in single-particle analysis, but detailed further analyses would have to be performed to define the best protocol for heterogeneous datasets.
3.4. Focused classification facilitates deconvolving heterogeneous regions within an experimental dataset
The techniques described here have been used to decipher both conformational and compositional heterogeneity within biological samples (for example, [16,26,55]). In addition to the published results, one area where they will be particularly useful is to deconvolve conformational heterogeneity when using scaffolds for the purpose of structure determination. Several groups have shown that larger protein and/or nucleic-acid scaffolds can be used to aid in the determination of smaller structures, which by themselves would be too challenging to analyze [56,57]. However, the problem with all current approaches is that the structures of interest are not necessarily rigidly bound. Thus, the regions closer to the attachment site will be characterized by less heterogeneity (and a lower B-factor), whereas the regions further from the attachment site will exhibit more heterogeneity (and a higher B-factor). To demonstrate this, we used a bacterial 70S ribosome as a scaffold, and engineered in a fusion RNA representing the HIV-1 Transactivation Response (TAR) element. Subsequently, we performed either global classifications on the entire dataset or focused classifications on the region around TAR.
The HIV-1 TAR element was uniformly inserted into Helix 45 of the E. coli large 23S ribosomal RNA. Ribosomes containing the TAR knock-in were selectively purified (see Methods) and subjected to single-particle cryo-EM analysis. We collected 929 micrographs, providing 346,851 particles in the dataset (Supplementary Figure 5A). A single-model refinement, in the absence of any classification, showed high-resolution in the ribosome core, and lower resolution in the regions characterized by structural heterogeneity (Supplementary Figure 5B-C). Due to a large amount of mobility, the site of TAR fusion was only partially visible at the normal thresholds used for displaying the density in the 70S ribosome. We then performed a global classification of the data, using a soft-edge spherical mask. This procedure resulted in distinct classes, separated according to the expected heterogeneity associated with purified bacterial ribosomes [58] (Supplementary Figure 5D). The combined differences are summarized with a merged map, demonstrating the full extent of heterogeneity for the global classification case (Figure 4A); notably, the resolved heterogeneity did not improve the density at the site of fusion. Subsequently, we performed a focused classification of the data using 2-D masks, applying the mask to the area where TAR was inserted. As expected, the resulting maps were able to clearly separate out some of the different conformations of TAR (Supplementary Figure 5E). However, the majority of the normal ribosomal heterogeneity was largely ignored, as summarized by the merged difference maps (Figure 4B) and an overlay of the reconstructed classes (Figure 4C). In terms of characterizing classification performance, this result is important for several reasons. First, even though the area of interest is small, the focused classification approach using 2-D masks can partially deconvolve the density. Second, despite the extensive “normal” structural heterogeneity present on bacterial ribosomes (e.g. Figure 4A), which may confound the 2-D focused classification approach (e.g. Figure 2 and Table 2), we do not observe this in our results. We also performed focused classifications using 3-D masks, but the quality of the reconstructed TAR region was noticeably poorer (not shown), consistent with the poorer performance of the 3-D masking approach using synthetic data (e.g. Table 1). These experimental results further demonstrate the ability of the 2-D masking approach to separate out local structural variabilities in the context of otherwise extensive global structural differences.
The best reconstruction of HIV-1 TAR showed a clearly defined RNA helix, a marked improvement over a global classification strategy alone (Figure 4D). The density was characterized by progressively poorer resolution, as a function of distance from the site of attachment. For a largely A-form HIV-1 TAR RNA helix, the behavior of the fusion can be thought of as a lever pivoting around a fulcrum; the further out from the attachment site, the more inherent mobility, and thus the lower the resolution. A similar behavior was observed with other scaffolding strategies, whereby the peripheral regions were characterized by lower resolution [56,57]. In addition to providing novel biological insight, focused classifications can broadly facilitate scaffolding approaches for solving structures of small proteins and RNAs.
4. Discussion
Using a synthetic dataset, we describe a quantitative assessment for several focused classification implementations within the Frealign/cisTEM processing packages. The algorithms have been used to classify features in several experimental studies [16,26,55], and we further demonstrate the applicability of the approaches for deconvolving heterogeneous regions within small scaffolded RNAs to facilitate the development of substrate supports for cryo-EM [56,57].
The present study will help users decide which strategy to use in a particular case. Focused classification using 2-D masks can be applied to individual asymmetric features (also known as symmetry expansion [27]), and, as implemented within Frealign/cisTEM, have generally been found to perform better than 3-D masking approaches, due to density mismatch between particle images and reference projections after 3-D masking. A possible disadvantage of the 2-D masking approach arises from the projection nature of the data. Any area within a 2-D projection image will not only contain density relevant to the region of interest, but also residual density along the projection path. If the residual density is itself heterogeneous, it can potentially confuse or bias the classification procedure (especially if the variability within the region of interest is significantly smaller compared to variability elsewhere). In Table 2, we demonstrate that this effect is real, at least with high SNR data. However, in practice this problem appears to be small, based on the results obtained with the synthetic data (compare Tables 1 and 2), and in an experimental setting in the context of large-scale global heterogeneity in the current work (Figure 4A-B), and in previous biological studies [16,18]. Conflating heterogeneity along the projection path would be treated as noise, in a manner that is perhaps analogous to incomplete density subtraction.
Our tests with the synthetic dataset demonstrate that additional questions, such as those pertaining to structural covariance, can be addressed in single-particle experiments. We showed how classifying variability in a region of a density map can reveal covariance with a secondary region, in this case between a hand and a foot. With synthetic data, such analyses are predicated upon having knowledge of the real density; in an experimental setting, an analogous approach would mask out regions corresponding to, for example, known components prior to analyzing the resulting correlation matrices, as has been previously shown in one simplified example with ribosome-associated factors [59]. In general, the ability to classify independently on separate regions of a map provides opportunities to inter-relate distinct regions of an object beyond simply recovering densities, a form of computational identification of structural covariance within a system. If the local variability occurs on larger domains of a complex that undergoes itself discrete or continuous motion, it may be necessary to first align the particles based on the larger mobile domain, followed by analysis of local variability. The alignment of the larger domain in this case could be accomplished using signal subtraction [13] or the recently introduced multi-body alignment approach [60]. Some caution should be exercised in the analyses of covariance. First, to avoid under-sampling, it is advisable to compute an equal or greater number of classes than expected. Second, and related to the previous point, classifications should be run multiple times, starting from different random particle seeds. Both of these precautions will ensure that sufficient pairwise occupancies have been calculated to reach statistical significance and avoid spurious correlations. Third, some caution should be exercised in the interpretations of results using 2-D masks (due to the possibility of “leaky” biases during classification), although our experimental observations suggest that the biases should be minimal (Figure 4B). Finally, global classifications can also be used for the purpose of covariance analysis, and they can have specific advantages, as they would recover non-degenerate differences that are lost during classification on an individual asymmetric unit (which is easily seen with the experimental setup of the humanoid, as the number of non-degenerate structures (globally) far outnumbers the number of distinct asymmetric units). Whereas focused classification helps constrain the number of different classes and can simplify the analysis, the results should ideally relate to the global context of heterogeneity. In the future, more elaborate methods could be devised for broader applicability beyond pairwise covariances.
Our results using HIV-1 TAR fused to bacterial large ribosomal subunits show how focused classifications can help computationally deconvolve highly mobile features within experimental cryo-EM datasets. These data are particularly applicable for the development of structural scaffolds for the analysis of small proteins and RNAs [56,57]. The TAR fusions are universally mobile about a central fulcrum point, which corresponds approximately to the site of attachment, and the density is lost in the absence of proper classification. However, careful application of masks during focused classification enables partial recovery of some of the structural elements within the TAR fusion, visualizing most of the A-form RNA helix. Scaffolding approaches are gaining popularity in single-particle analysis, because small proteins may not have sufficient signal for accurate assignment of Eulerian orientations. Focused classification can help ameliorate problems associated with structural mobility and bring out the most of the structure of interest.
Supplementary Material
Highlights.
Quantitative evaluation of two single-particle cryo-EM classification methods implemented in the cisTEM and Frealign image processing packages
Detection of discrete states and structural covariance in macromolecular complexes
Classification of a small flexible RNA segment attached to a ribosomal large subunit
Acknowledgements
We thank Dr. Kurt Fredrick for assistance and helpful discussions regarding the ribosome purifications, Bill Anderson for help with cryo-EM data collections. We acknowledge the support of NIH grants RO1 GM065056 (to KMF) and P50 GM103368 (HIVE Center, to KMF and DL). DL also acknowledges the support of DP5 OD021396. WAC was supported by a Pelotonia Postdoctoral Fellowship from the Ohio State University. NG is an Investigator of the Howard Hughes Medical Institute.
Footnotes
Conflict of Interest
The authors declare no competing financial interests.
Code availability
Frealign and cisTEM are open-source and distributed under the Janelia Research Campus Software License (http://license.janelia.org/license/janelia_license_1_2.html). All scripts and datasets for these studies will be made available from the Lyumkis laboratory upon request.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- [1].Cheng Y, Grigorieff N, Penczek PA, Walz T, A primer to single-particle cryo-electron microscopy, Cell. 161 (2015) 438–449. doi: 10.1016/j.cell.2015.03.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Nogales E, The development of cryo-EM into a mainstream structural biology technique, Nature Methods. 13 (2016) 24–27. doi: 10.1038/nmeth.3694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Bartesaghi A, Aguerrebere C, Falconieri V, Banerjee S, Earl LA, Zhu X, et al. , Atomic Resolution Cryo-EM Structure of β-Galactosidase, Structure. (2018). doi: 10.1016/j.str.2018.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Tan YZ, Aiyer S, Mietzsch M, Hull JA, McKenna R, Grieger J, et al. , Sub-2 Å Ewald Curvature Corrected Single-Particle Cryo-EM, bioRxiv. (2018) 305599. doi: 10.1101/305599. [DOI] [Google Scholar]
- [5].Murata K, Wolf M, Cryo-electron microscopy for structural analysis of dynamic biological macromolecules, Biochim. Biophys. Acta 1862 (2018) 324–334. doi: 10.1016/j.bbagen.2017.07.020. [DOI] [PubMed] [Google Scholar]
- [6].Scheres SHW, Processing of Structurally Heterogeneous Cryo-EM Data in RELION, in: The Resolution Revolution: Recent Advances in cryoEM, Elsevier, 2016: pp. 125–157. doi: 10.1016/bs.mie.2016.04.012. [DOI] [PubMed] [Google Scholar]
- [7].Gao H, Valle M, Ehrenberg M, Frank J, Dynamics of EF-G interaction with the ribosome explored by classification of a heterogeneous cryo-EM dataset, J. Struct. Biol 147 (2004) 283–290. doi: 10.1016/j.jsb.2004.02.008. [DOI] [PubMed] [Google Scholar]
- [8].Lyumkis D, Brilot AF, Theobald DL, Grigorieff N, Likelihood-based classification of cryo-EM images using FREALIGN, J. Struct. Biol 183 (2013) 377–388. doi: 10.1016/j.jsb.2013.07.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Scheres SHW, Gao H, Valle M, Herman GT, Eggermont PPB, Frank J, et al. , Disentangling conformational states of macromolecules in 3D-EM through likelihood optimization, Nature Methods. 4 (2007) 27–29. doi: 10.1038/nmeth992. [DOI] [PubMed] [Google Scholar]
- [10].Spahn CM, Penczek PA, Exploring conformational modes of macromolecular assemblies by multiparticle cryo-EM, Curr. Opin. Struct. Biol 19 (2009) 623–631. doi: 10.1016/j.sbi.2009.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Valle M, Sengupta J, Swami NK, Grassucci RA, Burkhardt N, Nierhaus KH, et al. , Cryo-EM reveals an active role for aminoacyl-tRNA in the accommodation process, Embo J. 21 (2002) 3557–3567. doi: 10.1093/emboj/cdf326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Zhou Q, Huang X, Sun S, Li X, Wang H-W, Sui S-F, Cryo-EM structure of SNAP-SNARE assembly in 20S particle, Cell Res. 25 (2015) 551–560. doi: 10.1038/cr.2015.47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Bai X-C, Rajendra E, Yang G, Shi Y, Scheres SHW, Sampling the conformational space of the catalytic subunit of human γ-secretase, Elife. 4 (2015). doi: 10.7554/eLife.11182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Penczek PA, Frank J, Spahn CM, A method of focused classification, based on the bootstrap 3D variance analysis, and its application to EF-G-dependent translocation, J. Struct. Biol 154 (2006) 184–194. doi: 10.1016/j.jsb.2005.12.013. [DOI] [PubMed] [Google Scholar]
- [15].Ilca SL, Kotecha A, Sun X, Poranen MM, Stuart DI, Huiskonen JT, Localized reconstruction of subunits from electron cryomicroscopy images of macromolecular complexes, Nat Commun. 6 (2015) 8843. doi: 10.1038/ncomms9843. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Abeyrathne PD, Koh CS, Grant T, Grigorieff N, Korostelev AA, Subramaniam S, Ensemble cryo-EM uncovers inchworm-like translocation of a viral IRES through the ribosome, Elife. 5 (2016) e14874. doi: 10.7554/eLife.14874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].von Loeffelholz O, Natchiar SK, Djabeur N, Myasnikov AG, Kratzat H, Ménétret J-F, et al. , Focused classification and refinement in high-resolution cryo-EM structural analysis of ribosome complexes, Curr. Opin. Struct. Biol 46 (2017) 140–148. doi: 10.1016/j.sbi.2017.07.007. [DOI] [PubMed] [Google Scholar]
- [18].Loveland AB, Demo G, Grigorieff N, Korostelev AA, Ensemble cryo-EM elucidates the mechanism of translation fidelity, Nature. 546 (2017) 113–117. doi: 10.1038/nature22397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Nguyen THD, Galej WP, Bai X-C, Savva CG, Newman AJ, Scheres SHW, et al. , The architecture of the spliceosomal U4/U6.U5 tri-snRNP, Nature. 523 (2015) 47–52. doi: 10.1038/nature14548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Grigorieff N, Frealign: An Exploratory Tool for Single-Particle Cryo-EM, Meth. Enzymol 579 (2016) 191–226. doi: 10.1016/bs.mie.2016.04.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Grant T, Rohou A, Grigorieff N, cisTEM, user-friendly software for single-particle image processing, Elife. 7 (2018) e14874. doi: 10.7554/eLife.35383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Autzen HE, Myasnikov AG, Campbell MG, Asarnow D, Julius D, Cheng Y, Structure of the human TRPM4 ion channel in a lipid nanodisc, Science. 359 (2018) 228–232. doi: 10.1126/science.aar4510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Oldham ML, Grigorieff N, Chen J, Structure of the transporter associated with antigen processing trapped by herpes simplex virus, Elife. 5 (2016) 213. doi: 10.7554/eLife.21829. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Ballandras-Colas A, Brown M, Cook NJ, Dewdney TG, Demeler B, Cherepanov P, et al. , Cryo-EM reveals a novel octameric integrase structure for betaretroviral intasome function, Nature. 530 (2016) 358–361. doi: 10.1038/nature16955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Huiskonen JT, Jäälinoja HT, Briggs JAG, Fuller SD, Butcher SJ, Structure of a hexameric RNA packaging motor in a viral polymerase complex, J. Struct. Biol 158 (2007) 156–164. doi: 10.1016/j.jsb.2006.08.021. [DOI] [PubMed] [Google Scholar]
- [26].Passos DO, Li M, Yang R, Rebensburg SV, Ghirlando R, Jeon Y, et al. , Cryo-EM structures and atomic model of the HIV-1 strand transfer complex intasome, Science. 355 (2017) 89–92. doi: 10.1126/science.aah5163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Huiskonen JT, Image processing for cryogenic transmission electron microscopy of symmetrymismatched complexes, Bioscience Reports. (2018) BSR20170203. doi: 10.1042/BSR20170203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Koning RI, Gomez-Blanco J, Akopjana I, Vargas J, Kazaks A, Tars K, et al. , Asymmetric cryo-EM reconstruction of phage MS2 reveals genome structure in situ, Nat Commun. 7 (2016) 12524. doi: 10.1038/ncomms12524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Baxter WT, Grassucci RA, Gao H, Frank J, Determination of signal-to-noise ratios and spectral SNRs in cryo-EM low-dose imaging of molecules, J. Struct. Biol 166 (2009) 126–132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Ludtke SJ, Baldwin PR, Chiu W, EMAN: semiautomated software for high-resolution singleparticle reconstructions, J. Struct. Biol 128 (1999) 82–97. doi: 10.1006/jsbi.1999.4174. [DOI] [PubMed] [Google Scholar]
- [31].Davis JH, Tan YZ, Carragher B, Potter CS, Lyumkis D, Williamson JR, Modular Assembly of the Bacterial Large Ribosomal Subunit, Cell. 167 (2016) 1610–1622.e15. doi: 10.1016/j.cell.2016.11.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Liu Q, Fredrick K, Contribution of intersubunit bridges to the energy barrier of ribosomal translocation, Nucl. Acids Res 41 (2013) 565–574. doi: 10.1093/nar/gks1074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Qin D, Abdi NM, Fredrick K, Characterization of 16S rRNA mutations that decrease the fidelity of translation initiation, Rna. 13 (2007) 2348–2355. doi: 10.1261/rna.715307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Asai T, Zaporojets D, Squires C, Squires CL, An Escherichia coli strain with all chromosomal rRNA operons inactivated: Complete exchange of rRNA genes between bacteria, Proceedings of the National Academy of Sciences. 96 (1999) 1971–1976. doi: 10.1073/pnas.96.5.1971. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Gay P, Le Coq D, Steinmetz M, Berkelman T, Kado CI, Positive selection procedure for entrapment of insertion sequence elements in gram-negative bacteria, J. Bacteriol 164 (1985) 918–921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Chiu J, March PE, Lee R, Tillett D, Site-directed, Ligase-Independent Mutagenesis (SLIM): a single-tube methodology approaching 100% efficiency in 4 h, Nucl. Acids Res 32 (2004) e174–e174. doi: 10.1093/nar/gnh172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Suloway C, Pulokas J, Fellmann D, Cheng A, Guerra F, Quispe J, et al. , Automated molecular microscopy: the new Leginon system, J. Struct. Biol 151 (2005) 41–60. doi: 10.1016/j.jsb.2005.03.010. [DOI] [PubMed] [Google Scholar]
- [38].Lander GC, Stagg SM, Voss NR, Cheng A, Fellmann D, Pulokas J, et al. , Appion: an integrated, database-driven pipeline to facilitate EM image processing, J. Struct. Biol 166 (2009) 95–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Zheng SQ, Palovcak E, Armache J-P, Verba KA, Cheng Y, Agard DA, MotionCor2: anisotropic correction of beam-induced motion for improved cryo-electron microscopy, Nature Methods. 14 (2017) 331–332. doi: 10.1038/nmeth.4193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Grant T, Grigorieff N, Measuring the optimal exposure for single particle cryo-EM using a 2.6 Å reconstruction of rotavirus VP6, Elife. 4 (2015) e06980. doi: 10.7554/eLife.06980. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Rohou A, Grigorieff N, CTFFIND4: Fast and accurate defocus estimation from electron micrographs, J. Struct. Biol 192 (2015) 216–221. doi: 10.1016/j.jsb.2015.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Roseman AM, FindEM--a fast, efficient program for automatic selection of particles from electron micrographs, J. Struct. Biol 145 (2004) 91–99. [DOI] [PubMed] [Google Scholar]
- [43].Zhang K, Gctf: Real-time CTF determination and correction, J. Struct. Biol 193 (2016) 1–12. doi: 10.1016/j.jsb.2015.11.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Kimanius D, Forsberg BO, Scheres SH, Lindahl E, Subramaniam S, Accelerated cryo-EM structure determination with parallelisation using GPUs in RELION-2, Elife. 5 (2016) e18722. doi: 10.7554/eLife.18722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Scheres SHW, RELION: Implementation of a Bayesian approach to cryo-EM structure determination, J. Struct. Biol 180 (2012) 519–530. doi: 10.1016/j.jsb.2012.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, et al. , UCSF Chimera--a visualization system for exploratory research and analysis, J Comput Chem 25 (2004) 1605–1612. doi: 10.1002/jcc.20084. [DOI] [PubMed] [Google Scholar]
- [47].Harauz G, van Heel M, Exact filters for general geometry 3-dimensional reconstruction, Optik. 73 (1986) 146–156. [Google Scholar]
- [48].Tan YZ, Baldwin PR, Davis JH, Williamson JR, Potter CS, Carragher B, et al. , Addressing preferred specimen orientation in single-particle cryo-EM through tilting, Nature Methods. 14 (2017) 793–796. doi: 10.1038/nmeth.4347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Hohn M, Tang G, Goodyear G, Baldwin PR, Huang Z, Penczek PA, et al. , SPARX, a new environment for Cryo-EM image processing, J. Struct. Biol 157 (2007) 47–55. doi: 10.1016/j.jsb.2006.07.003. [DOI] [PubMed] [Google Scholar]
- [50].Fischer N, Neumann P, Konevega AL, Bock LV, Ficner R, Rodnina MV, et al. , Structure of the E. coli ribosome-EF-Tu complex at <3 Å resolution by Cs-corrected cryo-EM, Nature. 520 (2015) 567–570. doi: 10.1038/nature14275. [DOI] [PubMed] [Google Scholar]
- [51].Emsley P, Lohkamp B, Scott WG, Cowtan K, Features and development of Coot, Acta Crystallogr. D Biol. Crystallogr 66 (2010) 486–501. doi: 10.1107/S0907444910007493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [52].Voss NR, Lyumkis D, Cheng A, Lau P-W, Mulder A, Lander GC, et al. , A toolbox for ab initio 3-D reconstructions in single-particle electron microscopy, J. Struct. Biol 169 (2010) 389–398. doi: 10.1016/j.jsb.2009.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [53].Cohen J, A coefficient of agreement for nominal scales, Educational and Psychologicla Measurement. XX (1960) 37–46. [Google Scholar]
- [54].Youden WJ, Index for rating diagnostic tests, Cancer. 3 (1950) 32–35. doi: 10.1016/j.jsb.2009.12.005. [DOI] [PubMed] [Google Scholar]
- [55].Loveland AB, Korostelev AA, Structural dynamics of protein S1 on the 70S ribosome visualized by ensemble cryo-EM, Methods. 137 (2018) 55–66. doi: 10.1016/j.ymeth.2017.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [56].Liu Y, Gonen S, Gonen T, Yeates TO, Near-atomic cryo-EM imaging of a small protein displayed on a designed scaffolding system, Proceedings of the National Academy of Sciences. 115 (2018) 3362–3367. doi: 10.1073/pnas.1718825115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [57].Martin TG, Bharat TAM, Joerger AC, Bai X-C, Praetorius F, Fersht AR, et al. , Design of a molecular support for cryo-EM structure determination, Proc. Natl. Acad. Sci. U.S.a 113 (2016) E7456–E7463. doi: 10.1073/pnas.1612720113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [58].Agirrezabala X, Lei J, Brunelle JL, Ortiz-Meoz RF, Green R, Frank J, Visualization of the hybrid state of tRNA binding promoted by spontaneous ratcheting of the ribosome, Mol. Cell 32 (2008) 190–197. doi: 10.1016/j.molcel.2008.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [59].Lyumkis D, Oliveira Dos Passos D, Tahara EB, Webb K, Bennett EJ, Vinterbo S, et al. , Structural basis for translational surveillance by the large ribosomal subunit-associated protein quality control complex, Proceedings of the National Academy of Sciences. 111 (2014) 15981–15986. doi: 10.1073/pnas.1413882111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [60].Nakane T, Kimanius D, Lindahl E, Scheres SH, Characterisation of molecular motions in cryo-EM single-particle data by multi-body refinement in RELION, Elife. 7 (2018) 1485. doi: 10.7554/eLife.36861. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.