Skip to main content
eLife logoLink to eLife
. 2018 Mar 7;7:e35383. doi: 10.7554/eLife.35383

cisTEM, user-friendly software for single-particle image processing

Timothy Grant 1,, Alexis Rohou 1,†,, Nikolaus Grigorieff 1,
Editor: Edward H Egelman2
PMCID: PMC5854467  PMID: 29513216

Abstract

We have developed new open-source software called cisTEM (computational imaging system for transmission electron microscopy) for the processing of data for high-resolution electron cryo-microscopy and single-particle averaging. cisTEM features a graphical user interface that is used to submit jobs, monitor their progress, and display results. It implements a full processing pipeline including movie processing, image defocus determination, automatic particle picking, 2D classification, ab-initio 3D map generation from random parameters, 3D classification, and high-resolution refinement and reconstruction. Some of these steps implement newly-developed algorithms; others were adapted from previously published algorithms. The software is optimized to enable processing of typical datasets (2000 micrographs, 200 k – 300 k particles) on a high-end, CPU-based workstation in half a day or less, comparable to GPU-accelerated processing. Jobs can also be scheduled on large computer clusters using flexible run profiles that can be adapted for most computing environments. cisTEM is available for download from cistem.org.

Research organism: None

Introduction

The three-dimensional (3D) visualization of biological macromolecules and their assemblies by single-particle electron cryo-microscopy (cryo-EM) has become a prominent approach in the study of molecular mechanisms (Cheng et al., 2015; Subramaniam et al., 2016). Recent advances have been primarily due to the introduction of direct electron detectors (McMullan et al., 2016). With the improved data quality, there is increasing demand for advanced computational algorithms to extract signal from the noisy image data and reconstruct 3D density maps from them at the highest possible resolution. The promise of near-atomic resolution (3–4 Å), where densities can be interpreted reliably with atomic models, has been realized by many software tools and suites (Frank et al., 1996; Hohn et al., 2007; Lyumkis et al., 2013; Punjani et al., 2017; Scheres, 2012; Tang et al., 2007; van Heel et al., 1996). Many of these tools implement a standard set of image processing steps that are now routinely performed in a single particle project. These typically include movie frame alignment, contrast transfer function (CTF) determination, particle picking, two-dimensional (2D) classification, 3D reconstruction, refinement and classification, and sharpening of the final reconstructions.

We have written new software called cisTEM to implement a complete image processing pipeline for single-particle cryo-EM, including all these steps, accessible through an easy-to-use graphical user interface (GUI). Some of these steps implement newly-developed algorithms described below; others were adapted from previously published algorithms. cisTEM consists of a set of compiled programs and tools, as well as a wxWidgets-based GUI. The GUI launches programs and controls them by sending specific commands and receiving results via TCP/IP sockets. Each program can also be run manually, in which case it solicits user input on the command line. The design of cisTEM, therefore, allows users who would like to have more control over the different processing steps to design their own procedures outside the GUI. To adopt this new architecture, a number of previously existing Fortran-based programs were rewritten in C++, including Unblur and Summovie (Grant and Grigorieff, 2015b), mag_distortion_correct (Grant and Grigorieff, 2015a), CTFFIND4 (Rohou and Grigorieff, 2015), and Frealign (Lyumkis et al., 2013). Additionally, algorithms described previously were added for particle picking (Sigworth, 2004), 2D classification (Scheres et al., 2005) and ab-initio 3D reconstruction (Grigorieff, 2016), sometimes with modifications to optimize their performance. cisTEM is open-source and distributed under the Janelia Research Campus Software License (http://license.janelia.org/license/janelia_license_1_2.html).

cisTEM currently does not support computation on graphical processing units (GPUs). Benchmarking of a hotspot identified in the global orientational search to determine particle alignment parameters showed that an NVIDIA K40 GPU performs approximately as well as 16 Xeon E5-2687W CPU cores after the code was carefully optimized for the respective hardware in both cases. Since CPU code is more easily maintained and more generally compatible with existing computer hardware, the potential benefit of GPU-adapted code is primarily the lower cost of a high-end GPU compared with a high-end CPU. We chose to focus on optimizing our code for CPUs.

Results

Movie alignment and CTF determination

Movie alignment and CTF determination are based on published algorithms previously implemented in Unblur and Summovie (Grant and Grigorieff, 2015b), and CTFFIND4 (Rohou and Grigorieff, 2015), respectively, and these are therefore only briefly described here. Unblur determines the translations of individual movie frames necessary to bring features (particles) visible in the frames into register. Each frame is aligned against a sum of all other frames that is iteratively updated until there is no further change in the translations. The trajectories along the x- and y-axes are smoothed using a Savitzky–Golay filter to reduce the possibility of spurious translations. Summovie uses the translations to calculate a final frame average with optional exposure filtering to take into account radiation damage of protein and maximize its signal in the final average. cisTEM combines the functionality of Unblur and Summovie into a single panel and exposes all relevant parameters to the user (Figure 1). Both programs were originally written in Fortran and have been rewritten entirely in C++.

Figure 1. Movie alignment panel of the cisTEM GUI.

Figure 1.

All Action panels provide background information on the operation they control, as well as a section with detailed explanations of all user-accessible parameters. All Action panels also have an Expert Options section that exposes additional parameters.

CTFFIND4 fits a calculated two-dimensional CTF to Thon rings (Thon, 1966) visible in the power spectrum calculated from either images or movies. The fitted parameters include astigmatism and, optionally, phase shifts generated by phase plates. When computed from movies, the Thon rings are often more clearly visible compared to Thon rings calculated from images (Figure 2; [Bartesaghi et al., 2014]). When selecting movies as inputs, the user can specify how many frames should be summed to calculate power spectra. An optimal value to amplify Thon rings would be to sum the number of frames that correspond to an exposure of about four electrons/Å2 (McMullan et al., 2015).

Figure 2. Thon ring pattern calculated for micrograph ‘0000’ of the high-resolution dataset of β-galactosidase (Bartesaghi et al., 2015) used to benchmark cisTEM.

Figure 2.

The left pattern was calculated from the average of non-exposure filtered and aligned frames while the right pattern was calculated using the original movie with 3-frame sub-averages. The pattern calculated using the movie shows significantly stronger rings compared to the other pattern.

Since our original description of the CTFFFIND4 algorithm (Rohou and Grigorieff, 2015), several significant changes were introduced. (1) The initial exhaustive search over defocus values can now be performed using a one-dimensional version of the CTF (i.e. with only two parameters: defocus and phase shift) against a radial average of the amplitude spectrum. This search is much faster than the equivalent search over the 2D CTF parameters (i.e., four parameters: two for defocus, one for astigmatism angle and one for phase shift) and can be expected to perform well except in cases of very large astigmatism (Zhang, 2016). Once an initial estimate of the defocus parameter has been obtained, it is refined by a conjugate gradient minimizer against the 2D amplitude spectrum, as done previously. In cisTEM, the default behavior is to perform the initial search over the 1D amplitude spectrum, but the user can revert to previous behavior by setting a flag in the ‘Expert Options’ of the ‘Find CTF’ Action panel. (2) If the input micrograph’s pixel size is smaller than 1.4 Å, the resampling and clipping of its 2D amplitude spectrum will be adjusted so as to give a final spectrum for fitting with an edge corresponding to 1/2.8 Å−1, to avoid all of the Thon rings being located near the origin of the spectrum, where they can be very poorly sampled. (3) The computation of the quality of fit (CCfit in [Rohou and Grigorieff, 2015]) is now computed over a moving window, similar to (Sheth et al., 2015), rather than at intervals delimited by nodes in the CTF. (4) Following background subtraction as described in Mindell and Grigorieff (2003), a radial, sine-edged mask is applied to the spectrum, and this masked version is used during search and refinement of defocus, astigmatism and phase shift parameters. The sine is 0.0 at the Fourier space origin, and 1.0 at a radius corresponding to 1/4 Å−1, and serves to emphasize high-resolution Thon rings, which are less susceptible to artefacts caused by imperfect background subtraction. For all outputs from the program (diagnostic image of the amplitude spectrum, 1D plots, etc.), the background-subtracted, but non-masked, version of the amplitude spectrum is used. (5) Users receive a warning if the box size of the amplitude spectrum and the estimated defocus parameters suggest that significant CTF aliasing occurred (Penczek et al., 2014).

Particle picking

Putative particles are found by matching to a soft-edged disk template, which is related to a convolution with Gaussians (Voss et al., 2009) but uses additional statistics based on an algorithm originally described by Sigworth (2004). The use of a soft-edged disk template as opposed to structured templates has two main advantages. It greatly speeds up calculation, enabling picking in ‘real time’, and alleviates the problem of templates biasing the result of all subsequent processing towards those templates (Henderson, 2013; Subramaniam, 2013; van Heel, 2013). Any bias that is introduced will be towards a featureless ‘blob’ and will likely be obvious if present.

Rather than fully describing the original algorithm by (Sigworth, 2004), we will emphasize here where we deviated from it. The user must specify three parameters: the radius of the template disk, the maximum radius of the particle, which sets the minimum distance between picks, and the detection threshold value, given as a number of standard deviations of the (Gaussian) distribution of scores expected if no particles were present in the input micrograph. Values of 1.0 to 6.0 for this threshold generally give acceptable results. All other parameters mentioned below can usually remain set to their default values.

Prior to matched filtering, micrographs are resampled by Fourier cropping to a pixel size of 15 Å (the user can override this by changing the ‘Highest resolution used in picking’ value from its default 30 Å), and then filtered with a high-pass cosine-edged aperture to remove very low-frequency density ramps caused by variations in ice thickness or uneven illumination.

The background noise spectrum of the micrograph is estimated by computing the average rotational power spectrum of 50 areas devoid of particles, and is then used to ‘whiten’ the background (shot +solvent) noise of the micrograph. Normalization, including CTF effects, and matched filtering are then performed as described (Sigworth, 2004), except using a single reference image and no principal components’ decomposition. When particles are very densely packed on micrographs, this approach can significantly over-estimate the background noise power so that users may find they have to use lower thresholds for picking. It might also be expected that under those circumstances, micrographs with much lower particle density will suffer from a higher rate of false-positive picks.

One difficulty in estimating the background noise spectrum of the micrograph is to locate areas devoid of particles without a priori knowledge of their locations. Our algorithm first computes a map of the local variance and local mean in the micrograph (computed over the area defined by the maximum radius given by the user [Roseman, 2004; Van Heel, 1982]) and the distribution of values of these mean and variance maps. The average radial power spectrum of the 50 areas of the micrograph with the lowest local variance is then used as an estimate of the background noise spectrum. Optionally, the user can set a different number of areas to be used for this estimate (for example if the density of particles is very high or very low) or use areas with local variances closest to the mode of the distribution of variances, which may also be expected to be devoid of particles.

Matched-filter methods are susceptible to picking high-contrast features such as contaminating ice crystals or carbon films. (Sigworth, 2004) suggests subtracting matched references from the extracted boxes and examining the remainder in order to discriminate between real particles and false positives. In the interest of performance, we decided instead to pick using a single artificial reference (disk) and to forgo such subtraction approaches. To avoid picking these kinds of artifacts, the user can choose to ignore areas with abnormal local variance or local mean. We find that ignoring high-variance areas often helps avoid edges of problematic objects, e.g. ice crystals or carbon foils, and that avoiding high- and low-mean areas helps avoid picking from areas within them, e.g. the carbon foil itself or within an ice crystal (Figure 3). The thresholds used are set to Mo+2FWHM for the variance and Mo±2FWHMfor the mean, where Mo is the mode (i.e. the most-commonly-occurring value) and FWHM the full width at half-maximum of the distribution of the relevant statistic. For micrographs with additional phase plate phase shifts between 0.1 and 0.9 π, where much higher contrast is expected, the variance threshold is increased to Mo+8FWHM. We have found that in favorable cases many erroneous picks can be avoided. Remaining false-positive picks are removed later during 2D classification.

Figure 3. Particle picking panel of the cisTEM GUI.

Figure 3.

The panel shows the preview mode, which allows interactive tuning of the picking parameters for optimal picking. The red circles overlaying the image of the sample indicate candidate particles. The picking algorithm avoids areas of high variance, such as the ice contamination visible in the image.

Because of our emphasis on performance, our algorithm can be run nearly instantaneously on a typical ~4K image, using a single processor. In the Action panel, the user is presented with an ‘Auto preview’ mode to enable interactive adjustment of the picking parameters (Figure 3). In this mode, the micrograph is displayed with optional and adjustable low-pass and high-pass filters, and the results of picking using the currently selected parameters are overlaid on top. Changing one or more of the parameters leads to a fast re-picking of the displayed micrograph, so that the parameters can be optimized in real-time. Once the three main parameters have been adjusted appropriately, the full complement of input micrographs can be picked, usually in a few seconds or minutes.

A possible disadvantage of using a single disk template exists when the particles to be picked are non-uniform in size or shape (e.g. in the case of an elongated particle). In this case, it may be expected that a single template would have difficulty in picking all the different types and views of particles present, and that in this case using a number of different templates would lead to a more accurate picking. In practice, we found that with careful optimization of the parameters, elongated particles and particles with size variation (Figure 3) were picked adequately.

The underlying implementation of the algorithm supports multiple references as well as reference rotation. These features may be exposed to the graphical user interface in future versions, for example enabling the use of 2D class averages as picking templates (Scheres, 2015).

2D classification

2D classification is a relatively quick and robust way to assess the quality of a single-particle dataset. cisTEM implements a maximum likelihood algorithm (Scheres et al., 2005) and generates fully CTF-corrected class averages that typically display clear high-resolution detail, such as secondary structure. Integration of the likelihood function is done by evaluating the function at defined angular steps dα that are calculated according to

dα=R/D (1)

where R is the resolution limit of the data and D is the diameter of the particle (twice the mask radius that is applied to the iteratively-refined class averages). cisTEM runs a user-defined number of iterations n defaulting to 20. To speed up convergence, the resolution limit is adjusted as a function of iteration cycle l (0 l <n):

R=Rstart+lRfinish-Rstart/n-1 (2)

where Rstart and Rfinish are user-defined resolution limits at the first and last iteration, defaulting to 40 Å and 8 Å, respectively. The user also sets K, the number of classes to calculate. Depending on this number and the number of particles N in the dataset, only a percentage p of the particles are included in the calculation. These particles are randomly reselected for each iteration and p is typically small, for example 0.1, in the first 10 iterations (p0-9), then increases to 0.3 for iteration 10 to 14 (p10-14) and finishes with five iterations including all data (p15-19):

p09={300K/N, 300K/N<11,300K/N1
p1014={0.3, p09<0.3p09, p090.3 (3)
p1519=1.

For example, for a dataset containing N=100,000 particles, p0-9=0.15, that is, 15% of the data will be used to obtain K=50 classes. Apart from speeding up the calculation, the stepwise increase of the resolution limit and the random selection of subsets of the data also reduce the chance of overfitting (see also the calculation of ab-initio 3D reconstructions and 3D refinement below) and, therefore, increase the convergence radius of the 2D classification algorithm.

For the calculation of the likelihood function, the particle images Xi are noise-whitened by dividing their Fourier transforms {Xi} by the square root of the radially average noise power spectrum, NPS:

{Xi}(g)={Xi}(g)/NPS(g) (4)

where g is the 2D reciprocal space coordinate and g=|g| its magnitude. The noise power spectrum is calculated from the boxed particle images using the area outside the circular mask set by the user according to the expected particle size. To increase accuracy, it is further averaged across 2000 randomly selected particles. The background (density outside the mask) is further normalized by adding a constant to each particle that yields a background average of zero.

Finally, at the beginning of each iteration, noise features in the class averages Ai are suppressed by resetting negative values below a threshold ti to the threshold:

ti=-0.3maxjAi,j (5)

where j runs over all pixels in average Ai.

3D refinement (FrealignX)

The refinement of 3D reconstructions in cisTEM uses a version of Frealign (Lyumkis et al., 2013) that was specifically designed to work with cisTEM. Most of Frealign’s control parameters are exposed to the user in the ‘Manual Refine’ Action panel (Figure 4). The ‘Auto Refine’ and ‘Ab-Initio’ panels also use Frealign but manage many of the parameters automatically (see below). Frealign’s algorithm was described previously (Grigorieff, 2007; Lyumkis et al., 2013) and this section will mostly cover important differences, including a new objective function used in the refinement, different particle weighting used in reconstructions, optional likelihood-based blurring, as well as new masking options.

Figure 4. Manual refinement panel with Expert Options exposed.

Figure 4.

Most of the parameters needed to run FrealignX can be accessed on this panel. The panel also allows application of a 3D mask, which can be imported as a Volume Asset.

Matched filter

To make Frealign compatible with cisTEM’s GUI, the code was completely rewritten in C++, and it will be referred to here as Frealign v10, or FrealignX. The new version makes use of a matched filter (McDonough and Whalen, 1995) to maximize the signal in cross correlation maps calculated between particle images and reference projections. This requires whitening of the noise present in the images and resolution-dependent scaling of the reference projections to match the signal in the noise-whitened images. Both can be achieved if the spectral signal-to-noise ratio (SSNR) of the data is known. As part of a 3D reconstruction, Frealign calculates the resolution-dependent PSSNR, the radially averaged SSNR present in the particle images before they are affected by the CTF (Sindelar and Grigorieff, 2012). Using PSSNR and the CTF determined for a particle, the SSNR in the particle image can be calculated as

SNRg=PSSNRg×CTF2g (6)

(as before, g is the 2D reciprocal space coordinate and g=g). Here, SNR is defined as the ratio of the variance of the signal and the noise. The Fourier transform {Xi} of the noise-whitened particle image X~i can then be calculated as

{Xi}(g)={Xi}(g)|{Xi}|r2(g)1+SNR(g) (7)

where {Xi} is the Fourier transform of the original image Xi, is the absolute value, and |{Xi}|r2 is the radially averaged spectrum of the squared 2D Fourier transform amplitudes of image Xi. To implement Equation (7), a particle image is first divided by its amplitude spectrum, which includes power from both signal and noise, and then multiplied by a term that amplifies the image amplitudes according to the signal strength in the image. The reference projection Ai can be matched by calculating

{Ai}(g)={Ai}(g)|{Ai}|r2(g)SNR(g)

Equation (8) scales the variance of the signal in the reference to be proportional to the measured signal-to-noise ratio in the noise-whitened images. The main term in the objective function Oϕ maximized in FrealignX is therefore given by the cross-correlation function

CC(ϕ)=Re(R0,R3{Ai(ϕ)}R1,R3{Xi})R1,R3{Ai(ϕ)}R1,R3{Xi} (9a)

where ϕ is a set of parameters describing the particle view, x,y position, magnification and defocus, Re is the real part of a complex number, is the Euclidean norm, i.e. the square root of the sum of the squared pixel values, and R1,R3{  } is the conjugate complex value of the Fourier transform R1,R3{  }. The subscripts R1 and R3 specify the low- and high-resolution limits of the Fourier transforms included in the calculation of Equation (9a), as specified by the user. To reduce noise overfitting, the user has the option to specify also a resolution range in which the absolute value of the cross terms in the numerator of Equation (9a) are used (Grigorieff, 2000; Stewart and Grigorieff, 2004), instead of the signed values (option ‘Signed CC Resolution Limit’ under ‘Expert Options’ in the ‘Manual Refine’ Action panel). In this case

CC(ϕ)=Re(R1,R2{Ai(ϕ)}R1,R2{Xi})+|Re(R2,R3{Ai(ϕ)}R2,R3{Xi})|R1,R3{Ai(ϕ)}R1,R3{Xi} (9b)

where R2 is specified by the ‘Signed CC Resolution Limit.’ The objective function also includes a term RϕΘ to restrain alignment parameters (Chen et al., 2009; Lyumkis et al., 2013; Sigworth, 2004), which currently only includes the x,y positions:

RϕΘ=-σ2Mx-x-22σx2+y-y-22σy2 (10)

where σ is the standard deviation of the noise in the particle image and Θ represents a set of model parameters including the average particle positions in a dataset, x- and y-, and the standard deviations of the x,y positions from the average values, σx and σy, and M is the number of pixels in the mask applied to the particle before alignment. The complete objective function is therefore

Oϕ=CCϕ+RϕΘ

The maximized values determined in a refinement are converted to particle scores by multiplication with 100.

CTF refinement

FrealignX can refine the defocus assigned to each particle. Given typical imaging conditions with current instrumentation (300 kV, direct electron detector), this may be useful when particles have a size of about 400 kDa or larger. Depending on the quality of the sample and images, these particles may generate sufficient signal to yield per-particle defocus values that are more accurate than the average defocus values determined for whole micrographs by CTFFIND4 (see above). Refinement is achieved by a simple one-dimensional grid search of a defocus offset applied to both defocus values determined in the 2D CTF fit obtained by CTFFIND4. FrealignX applies this offset to the starting values in a refinement, typically determined by CTFFIND4, and evaluates the objective function, Equation (11), for each offset. The offset yielding the maximum is then used to assign refined defocus values. In a typical refinement, the defocus offset is searched in steps of 50 Å, in a range of ±500 Å. In the case of β-galactosidase (see below), a single round of defocus refinement changed the defocus on average by 60 Å; the RMS change was 80 Å. For this refinement, the resolution for the signed cross terms equaled the overall refinement resolution limit (3.1 Å), i.e. no unsigned cross terms were used. The refinement produced a marginal improvement of 0.05 Å in the Fourier Shell Correlation (FSC) threshold of 0.143, suggesting that the defocus values determined by CTFFIND4 were already close to optimal. In a different dataset of rotavirus double-layer particles, a single round of defocus refinement changed the defocus on average by 160 Å; the RMS change was 220 Å. In this case, the refinement increased the resolution from ~3.0 Å to ~2.8 Å.

Masking

FrealignX has a 3D masking function to help in the refinement of structures that contain significant disordered regions, such as micelles in detergent-solubilized membrane proteins. To apply a 3D mask, the user supplies a 3D volume that contains positive and negative values. cisTEM will binarize this volume by zeroing all voxels with values less than or equal to zero, and setting all other voxels to 1, indicating the region of the volume that is inside the mask. A soft cosine-shaped falloff of specified width (e.g. 10 Å) is then applied to soften the edge of the masked region and avoid sharp edges when the mask is applied to a 3D reconstruction. The region of the reconstruction outside the mask can be set to zero (simple multiplication of the mask volume), or to a low-pass filtered version of the original density, optionally downweighted by multiplication by a scaling factor set by the user. At the edge of the mask, the low-pass filtered density is blended with the unfiltered density inside the mask to produce a smooth transition. Figure 5 shows the result of masking the reconstruction of an ABC transporter associated with antigen processing (TAP, [Oldham et al., 2016]). The mask was designed to contain only density corresponding to protein and the outside density was low-pass filtered at 30 Å resolution and kept with a weight of 100% in the final masked reconstruction. The combination of masking and low-pass filtering in this case keeps a low-pass filtered version of the density outside the mask in the reconstruction, including the detergent micelle. Detergent micelles can be a source of noise in the particle images because the density represents disordered material. However, at low, 20 to 30 Å resolution, micelles generate features in the images that can help in the alignment of the particles. In the case of TAP, this masking prevented noise overfitting in the detergent micelle and helped obtain a reconstruction at 4 Å resolution (Oldham et al., 2016).

Figure 5. 3D masking with low-pass filtering outside the mask.

Figure 5.

(A) Orthogonal sections through the 3D reconstruction of the transporter associated with antigen processing (TAP), an ABC transporter (Oldham et al., 2016). Density corresponding to the protein, as well as the detergent micelle (n-Dodecyl b-D-maltoside; highlighted with arrows), is visible. (B) Orthogonal sections through a 3D mask corresponding to the sections shown in A). The sharp edges of this mask are smoothed before the mask is applied to the map. (C) Orthogonal sections through the masked 3D reconstruction. The regions outside the mask are low-pass filtered at 30 Å resolution to remove high-resolution noise from the disordered detergent micelle, but keeping its low-resolution signal to help particle alignment.

3D reconstruction

In Frealign, a 3D reconstruction Vk of class average k and containing N images is calculated as (Lyumkis et al., 2013; Sindelar and Grigorieff, 2012)

Vk=1{i=1Nqikσi2(ϕi, wikCTFi{X^i})i=1Nqikσi2(ϕi,wikCTFi2 )+1/PSSNRk} (12)

where qik is the probability of particle i belonging to class k, σi is the standard deviation of the noise in particle image i, ϕi are its alignment parameters, wik the score-based weights (Equation (14), see below), CTFi the CTF of the particle image, (ϕi, ) the reconstruction operator merging data into a 3D volume according to alignment parameters ϕi, PSSNR the radially averaged particle SSNR derived from the FSC between half-maps (Sindelar and Grigorieff, 2012), X^i the noise-whitened image i, and 1{  } the inverse Fourier transform. For the calculation of the 3D reconstructions, as well as 3D classification (see below) the particle images are not whitened according to Equation (7). Instead, they are whitened using the radially- and particle-averaged power spectrum of the background around the particles:

{X^i}(g)={Xi}(g)|{B(Xi)}|r2(g) (13)

where BXi is a masked version of image Xi with the area inside a circular mask centered on the particle replaced with the average values at the edge of the mask, and scaled variance to produce an average pixel variance of 1 in the whitened image X^i. Using the procedure in Equation (13) has the advantage that whitening does not depend on the knowledge of the SSNR of the data, and reconstructions can therefore be calculated even when the SSNR is not known.

Score-based weighting

In previous versions of Frealign, resolution-dependent weighting was applied to the particle images during reconstruction (the Frealign parameter was called ‘PBC’, (Grigorieff, 2007)). The weighting function took the form of a B-factor dependent exponential that attenuates the image data at higher resolution. FrealignX still uses B-factor weighting but the weighting function is now derived from the particle scores (see above) as

w(score, g)=eBSC4(scorescore)g2. (14)

BSC converts the difference between a particle score and score¯, the score average, into a B-factor. Setting BSC to zero will turn off score-based particle weighting. Scores typically vary by about 10, and values for BSC that produce reasonable discrimination between high-scoring and low-scoring particles are between 2 and 10 Å2, resulting in B-factor differences between particles of 20 to 100 Å2.

3D classification 

FrealignX uses a maximum-likelihood approach for 3D classification (Lyumkis et al., 2013). Assuming that all images were noise-whitened according to Equation (13), which scales the variance of each image such that the average standard deviation of the noise in a pixel is 1, the probability density function (PDF) of observing image Xi, given alignment parameters ϕi and reconstruction Vk, is calculated as (Lyumkis et al., 2013)

Γ(Xi|ϕik,Vk)=(12π)Mexp[X^iCTFi(Vk,ϕik)M22]γ(ϕik|Θk). (15)

As before, ϕik are the alignment parameters (usually just Euler angles and x,y shifts) determined for image i with respect to class average k, is the projection operator producing an aligned 2D projection of reconstruction Vk according to parameters ϕik, X^i-CTFiVk,ϕikM~2 is the sum of the squared pixel value differences between whitened image X^i and the reference projection inside a circular mask defining the area of the particle with user-defined diameter, M~ is the number of pixels inside this mask, and γϕikΘk is a hierarchical prior describing the probability of observing alignment parameters ϕik given model parameters Θk (see Equation 10). Equation (15) does not include marginalization over alignment parameters. Marginalization could be added to improve classification when particle alignments suffer from significant errors. However, this is currently not implemented in cisTEM. Given the joint probability, Equation (15), determined in a refinement, the probability qik of particle i belonging to class k can be updated as (Lyumkis et al., 2013)

qik=ΓXiΘik,Vkπkk=1KΓXiΘik,Vkπk (16)

where the summation in the denominator is taken over all classes and the average probabilities πk for a particle to belong to class k are given by the average values of qik determined in a prior iteration, calculated for the entire dataset of N particles:

πk=1Ni=1Nqik. (17)

An example of 3D classification is shown in Figure 6 for F1FO-ATPase, revealing different conformational states of the γ subunit (Zhou et al., 2015).

Figure 6. 3D classification of a dataset of F1FO-ATPase, revealing different conformational states (reproduced from Figure 6A and B in Zhou et al., 2015).

Figure 6.

Sections through the F1 domain showing the γ subunit (arrows) in three different states related by 120° rotations are shown on the left. A surface rendering of the map corresponding to State 1a is shown on the right. Scale bars, 25 Å.

Focused classification

3D classification can be improved by focusing on conformationally- or compositionally-variable regions of the map. To achieve this, a mask is applied to the particle images and reference projections, the area of which is defined as the projection of a sphere with user-specified center (within the 3D reconstruction) and radius. This 2D mask is therefore defined independently for each particle, as a function of its orientation. When using focused classification, M~ in Equation (15) is adjusted to the number of pixels inside the projected mask and the sum of the squared pixel value differences in Equation (15) is limited to the area of the 2D mask. By applying the same mask to image and reference, only variability inside the masked region is used for 3D classification. Other regions of the map are ignored, leading to a ‘focusing’ on the region of interest. The focused mask also excludes noise contained in the particle images outside the mask and therefore improves classification results that often depend on detecting small differences between particles and references. A typical application of a focused mask is in the classification of ribosome complexes that may exhibit localized conformational and/or compositional variability, for example the variable conformations of an IRES (Abeyrathne et al., 2016) or different states of tRNA accommodation (Loveland et al., 2017).

Likelihood-based blurring

In some cases, the convergence radius of refinement can be improved by blurring the reconstruction according to a likelihood function. This procedure is similar to the maximization step in a maximum likelihood approach (Scheres, 2012). The likelihood-blurred reconstruction is given by

Vkn=i=1N1σi2ϕαxyΓ(Xi|ϕi,Vkn1)(ϕi, wiCTFiXi)dϕαxyi=1Nqikσi2(ϕi,wiCTFi2 )+1/PSSNRk (18)

where, in the case of FrealignX, ϕαxy only includes the x,y particle positions and in-plane rotation angle α, which are a subset of the alignment parameters ϕi, and Vkn-1 is the reconstruction from an earlier refinement iteration. As before, ΓXiϕi,Vkn-1 is the probability of observing image i, given alignment parameters ϕi and reconstruction Vkn-1. Integration over these three parameters can be efficiently implemented and, therefore, does not produce a significant additional computational burden.

Resolution assessment

The resolution of reconstructions generated by FrealignX is assessed using the FSC criterion (Harauz and van Heel, 1986) using the 0.143 threshold (Rosenthal and Henderson, 2003). FSC curves in cisTEM are calculated using two reconstructions (‘half-maps’) calculated either from the even-numbered and odd-numbered particles, or by dividing the dataset into 100 equal subsets and using the even- and odd-numbered subsets to calculate the two reconstructions (in the cisTEM GUI, the latter is always used). The latter method has the advantage that accidental duplication of particles in a stack is less likely to affect the FSC calculation. All particles are refined against a single reference and, therefore, the calculated FSC values may be biased towards higher values (Grigorieff, 2000; Stewart and Grigorieff, 2004). This bias extends slightly beyond the resolution limit imposed during refinement, by approximately 2/Dmask, where Dmask is the mask radius used to mask the reconstructions (see above). During auto-refinement (see below), the resolution limit imposed during refinement is carefully adjusted to stay well below the estimated resolution of the reconstruction and the resolution estimate is therefore unbiased (Scheres and Chen, 2012). However, users have full control over all parameters during manual refinement and will have to make sure that they do not bias the resolution estimate by choosing a resolution limit that is close to, or higher than, the estimated resolution of the final reconstruction. Calculated FSC curves are smoothed using a Savitzky–Golay cubic polynomial that reduces the noise often affecting FSC curves at the high-resolution end.

The FSC calculated between two density maps is dependent on the amount of solvent included inside the mask applied to the maps. A larger mask that includes more solvent background will yield lower FSC values than a tighter mask. To obtain an accurate resolution estimate in the region of the particle density, one possibility is to apply a tight mask that closely follows the boundary of the particle. This approach bears the risk of generating artifacts because the particle boundary is not always well defined, especially when the particle includes disordered domains that generate weak density in the reconstruction. The approach in Frealign avoids tight masking and instead calculates an FSC curve using generously masked density maps, corrected for the solvent content inside the mask (Sindelar and Grigorieff, 2012). The corrected FSC curve is referred to as Part_FSC and is calculated from the uncorrected FSCuncor as (Oldham et al., 2016)

Part_FSChalfmaps=fFSCuncor1+(f1)FSCuncor, (19)

where f is the ratio of mask volume to estimated particle volume. The particle volume can be estimated from its molecular mass Mw as 30.81DaMw (Matthews, 1968). FSC curves obtained with the generous masking and subsequent solvent correction yield resolution estimates that are very close to those obtained with tight masking (Figure 7C). Equation (19) assumes that both maps have similar SSNR values, as is normally the case for the two reconstructions calculated from two halves of the dataset, indicated by the subscript half-maps. If one of the maps does not contain noise from solvent background, for example when calculating the FSC between a reconstruction and a map derived from an atomic model, the solvent-corrected FSC is given as

Part_FSCmodelmap=fFSCuncor21+(f1)FSCuncor2. (20)
Figure 7. Processing results of the β-galactosidase dataset (Bartesaghi et al., 2015) used to benchmark cisTEM.

Figure 7.

(A) Different stages of the ab-initio reconstruction procedure, starting from a reconstruction from randomly assigned Euler angles. The process takes less than an hour to complete on a high-end CPU-based workstation. (B) High-resolution detail of the refined β-galactosidase reconstruction with an average resolution of 2.2 Å, showing sidechain details for most amino acids. (C) FSC plots for the refined β-galactosidase reconstruction. The black curve was calculated using a tight mask applied to the half maps (Masked_FSC). A correction for potential masking artifacts (Chen et al., 2013) did not lead to adjustments of this curve. The red curved was calculated with a more generous spherical mask and adjusted for the solvent background within that mask (Part_FSC, Equation (19)). The resolution limit of 3.1 Å, which was not exceeded during refinement, as well as the FSC = 0.143 threshold are indicated by lines.

Figure 7—source data 1. Source data for the curves shown in Figure 7C.
DOI: 10.7554/eLife.35383.009

Speed optimization

FrealignX has been optimized for execution on multiple CPU cores. Apart from using optimized library functions for FFT calculation and vector multiplication (Intel Math Kernel Library), the processing speed is also increased by on-the-fly cropping in real and reciprocal space of particle images and 3D reference maps. Real-space cropping reduces the interpolation accuracy in reciprocal space and is therefore limited to global parameter searches that do not require the highest accuracy in the calculation of search projections. Reciprocal-space cropping is used whenever a resolution limit is specified by the user or in an automated refinement (ab-initio 3D reconstruction and auto-refinement). For the calculation of in-plane rotated references, reciprocal-space padding is used to increase the image size four-fold, allowing fast nearest-neighbor resampling in real space with sufficient accuracy to produce rotated images with high fidelity.

Ab-initio 3D reconstruction

Ab-initio reconstruction offers a convenient way to proceed from single particle images to a 3D structure when a suitable reference is not available to initialize 3D reconstruction and refinement. Different ab-initio methods have been described (Hohn et al., 2007; Punjani et al., 2017; Reboul et al., 2018) and cisTEM’s implementation follows a strategy published originally by (Grigorieff, 2016). It is based on the premise that iterative refinement of a reconstruction initialized with random angular parameters is likely to converge on the correct structure if overfitting is avoided and the refinement proceeds in small steps to reduce the chance of premature convergence onto an incorrect structure. The procedure is implemented as part of cisTEM’s GUI and uses FrealignX to perform the refinements and reconstructions.

After initialization with random angles, cisTEM performs a user-specified number of global alignment parameter searches, recalculating the reconstruction after each search and applying an automatic masking procedure to it before the next global search. Similar to 2D classification (see above), only a randomly selected subset of the data is used in each iteration and the resolution limit applied during the search is increased with every iteration. The number of iterations n defaults to 40, the starting and final resolution limits Rstart and Rfinish default to 20 Å and 8 Å, respectively, and the starting and final percentage of included particles in the reconstruction, pstart and pfinish default to 2500K/N and 10,000K/N, respectively (results larger than one are reset to 1), with K the number of 3D classes to be calculated as specified by the user, and N the number of particles in the dataset. If symmetry is applied, N is replaced by NOsym where Osym is the number of asymmetric units present in one particle. The resolution limit is then updated in each iteration l as in Equation (2), and the percentage is updated as

p=pstart+l(pfinishpstart)/(n1)

again resetting results larger than 1 to 1. cisTEM actually performs a global search for a percentage 3p of the particle stack, that is, three times as many particles as are included in the reconstructions for each iteration. The particles included in the reconstructions are then chosen to be those with the highest scores as calculated by FrealignX.

The global alignment parameters are performed using the ‘general’ FrealignX procedure with the following changes. Firstly, the PSSNR is not directly estimated from the FSC calculated at each round. Instead, for the first three iterations, a default PSSNR is calculated based on the molecular weight. From the fourth iteration onwards, the PSSNR is calculated from the FSC, however if the calculated PSSNR is higher than the default PSSNR, the default PSSNR is taken instead. This is done in order to avoid some of the overfitting that will occur during refinement. Secondly, during a normal global search the top h (where h defaults to 20) results of the grid search are locally refined, and the best locally refined result is taken. In the ab-initio procedure, however, the result of the global search for a given particle image is taken randomly from all results that have a score which lies in the top 15% of the difference between the worst score and the best score.

During the reconstruction steps, the calculated σ for each particle is reset to 1 prior to 3D reconstruction and score weighting is disabled. This is done because the σ and score values are not meaningful until an approximately correct solution is obtained.

The reconstructions are automatically masked before each new refinement iteration to suppress noise features that could otherwise be amplified in subsequent iterations. The same masking procedure is also applied during auto-refinement (see below). It starts by calculating the density average ρ- of the reconstruction and resetting all voxel values below ρ- to ρ-. This thresholded reconstruction is then low-pass filtered at 50 Å resolution and turned into a binary mask by setting densities equal or below a given threshold t to zero and all others to 1. The threshold is calculated as

t=ρ-filtered+0.03ρ-max_500-ρ-filtered (22)

where ρ-filtered is the density average of the low-pass filtered map and ρ-max_500 is the average of the 500 highest values in the filtered map. The largest contiguous volume in this binarized map is then identified and used as a mask for the original thresholded reconstruction, such that all voxels outside of this mask will be set to ρ-. Finally, a spherical mask, centered in the reconstruction box, is applied by resetting all densities outside the mask to zero.

The user has the option to repeat the ab-initio procedure multiple times using the result from the previous run as the starting map in each new run, to increase the convergence radius if necessary. In the case of symmetric particles, the default behavior is to perform the first 2/3rds of the iterations without applying symmetry. The non-symmetrized map is then aligned to the expected symmetry axes and the final 1/3rd of the iterations are carried out with the symmetry applied. This default behavior can be changed by the user such that symmetry is always applied, or is never applied.

Alignment of the model to the symmetry axes is achieved using the following process. A brute force grid search over rotations around the x, y and z axes is set up. At each position on the grid the 3D map is rotated using the current x, y and z parameters, and then its projection along Euler angle (0, 0, 0) is calculated. All of the symmetry-related projections are then also calculated, and for each one a cross-correlation map is calculated using the original projection as a reference, and the peak within this map is found. The sum of all peaks from all symmetry-related directions is taken and the x,y,z rotation that most closely aligns the original 3D map along the symmetry axes should provide the highest peak sum. To improve robustness, this process is repeated for two additional angles (−45,–45, −45 and 15, 70,–15) that were chosen with the aim of including different-looking areas when the map to be aligned is unusual in some way. The x,y,z rotation that results in the largest sum of all peaks, over all three angles, is taken as the final rotation result. Shifts for this rotation are then calculated based on the found 2D x,y shifts between the initial and symmetry-related projections, with the z shift being set to 0 for C symmetries. The symmetry alignment is also included as a command-line program, which can be used to align a volume to the symmetry axis when the ab-initio is carried out in C1 only, or when using a reference obtained by some other means.

Automatic refinement

Like ab-initio 3D reconstruction, auto-refinement makes use of randomly selected subsets of the data and of an increasing resolution limit as refinement proceeds. However, unlike the ab-initio procedure, the percentage of particles pl and the resolution limit Rl used in iteration l depend on the resolution of the reconstructions estimated in iteration l-1. When the estimated resolution improved in the previous cycle,

pl=maxpR,pl-1 (23)

with

pR=8000Ke75/Rl-12/N (24)

where K is the number of 3D classes to be calculated and N the number of particles in the dataset. As before, if the particle has symmetry, N is replaced by NOsym where Osym is the number of asymmetric units present in one particle. If the calculated pl exceeds 1, it is reset to 1. The resolution limit is estimated as

R=FSC0.5-2/Dmask (25)

where FSC0.5 is the point at which the FSC, unadjusted for the solvent within the mask (see above) crosses the 0.5 threshold and Dmask is the user-specified diameter of the spherical mask applied to the 3D reference at the beginning of each iteration, and to the half-maps used to calculate the FSC. The term 2/Dmask accounts for correlations between the two half-maps due to the masking (see above). When the resolution did not improve in the previous iteration,

pl=1.5pl-1 (26)

(reset to one if resulting in a value larger than 1). At least five refinement iterations are run and refinement stops when pl reaches 1 (all particles are included) and there was no improvement in the estimated resolution for the last three iterations.

If multiple classes are refined, the resolution limit in Equation (25) is set independently for each class, however the highest resolution used for classification is fixed at 8 Å. At least nine iterations are run and refinement stops when pl reaches 1, the average change in the particle occupancy in the last cycle was 1% or less, and there was no improvement in the estimated resolution for the last three iterations.

In a similar manner to the ab-initio procedure, σ values for each particle are set to one and score weighting is turned off. This is done until the refinement resolution is better than 7 Å, at which point it is assumed the model is of a reasonable quality.

Map sharpening

Most single-particle reconstructions require some degree of sharpening that is usually achieved by applying a negative B-factor to the map. cisTEM includes a map sharpening tool that allows the application of an arbitrary B-factor. Additionally, maps can be sharpened by whitening the power spectrum of the reconstruction beyond a user-specified resolution (the default is 8 Å). The whitening amplifies terms at higher resolution similar to a negative B-factor but avoids the over-amplification at the high-resolution end of the spectrum that sometimes occurs with the B-factor method due to its exponential behavior. Whitening is applied after masking of the map, either with a hollow spherical mask of defined inner and outer radius, or with a user-defined mask supplied as a separate 3D volume. The masking removes background noise and makes the whitening of the particle density more accurate. Both methods can be combined in cisTEM, together with a resolution limit imposed on the final reconstruction. The whitened and B-factor-sharpened map can optionally be filtered with a figure-of-merit curve calculated using the FSC determined for the reconstruction (Rosenthal and Henderson, 2003; Sindelar and Grigorieff, 2012).

GUI design and workflow

cisTEM’s GUI required extensive development because it is an integral part of the processing pipeline. GUIs have become more commonplace in cryo-EM software tools to make them more accessible to users (Conesa Mingo et al., 2018; Desfosses et al., 2014; Moriya et al., 2017; Punjani et al., 2017; Scheres, 2012; Tang et al., 2007). Many of the interfaces are designed as so-called wrappers of command-line driven tools, i.e. they take user input and translate it into a command line that launches the tool. Feedback to the user takes place by checking output files, which are also the main repository of processing results, such as movie frame alignments, image defocus measurements and particle alignment parameters. As processing strategies become more complex and the number of users new to cryo-EM grows, the demands on the GUI increase in the quest for obtaining the best possible results. Useful GUI functions include guided user input (so-called wizards) that adjust to specific situations, graphical presentation of relevant results, user interaction with running processes to allow early intervention and make adjustments, tools to manipulate data (e.g. masking), implementation of higher-level procedures that combine more primitive processing steps to achieve specific goals, and a global searchable database that keeps track of all processing steps and result. While some of these functions can be or have been implemented in wrapper GUIs, the lack of control of these GUIs over the data and processes makes a reliable implementation more difficult. For example, keeping track of results from multiple processing steps, some of them perhaps repeated with different parameters or run many times during an iterative refinement, can become challenging if each step produces a separate results file. Communicating with running processes via files can be slow and is sometimes unreliable due to file system caching. Communication via files may complicate the implementation of higher-level procedures, which rely on the parsing of results from the more primitive processing steps.

The cisTEM GUI is more than a wrapper as it implements some of the new algorithms in the processing pipeline directly, adjusting the input of running jobs as the refinement proceeds. It enables more complex data processing strategies by tracking all results in a single searchable database. All processing steps are run and controlled by the GUI, which communicates with master and slave processes through TCP/IP. cisTEM uses an SQL database, similar to Appion (Lander et al., 2009), to store all results (except image files), offers input functions that guide the user or set appropriate defaults, and implements more complex procedures to automate processing where possible. cisTEM’s design is flexible to allow execution in many different environments, including single workstations, multiple networked workstations and large computer clusters.

User input and the display of results is organized into different panels that make up cisTEM’s GUI, each panel dedicated to specific processing steps (for examples, see Figures 1, 3 and 4). This design guides users through a standard workflow that most single particle projects follow: movie alignment, CTF determination, particle picking, 2D classification, 3D reconstruction, refinement and classification, and sharpening of the final reconstructions. Three types of panels exist, dealing with Assets, Actions and Results. Assets are mostly data that can be used in processing steps called Actions. They include Movies, Images, Particle Positions and Volumes. One type of Asset, a Refinement Package, defines the data and parameters necessary to carry out refinement of a 3D structure (or a set of structures if 3D classification is done), it contains a particle stack, as well as information about the sample (e.g. particle size and molecular weight) along with parameters for each particle (e.g. orientations and defocus values). Actions comprise the above mentioned workflow steps, with additional options for ab-initio 3D reconstruction, as well as automatic and manual 3D refinement to enable users to obtain the best possible results from their data. The results of most of these Actions are stored in the database and can be viewed in the related Results panels, which display relevant data necessary to evaluate the success of each processing step. The option to sort and select results by a number of different metrics is available in the movie alignment and CTF estimation Results panels. For example images can be sorted/selected based on the CTF fit resolution (Rohou and Grigorieff, 2015). Movie alignment, 3D refinement and reconstruction also produce new Image and Volume Assets, respectively.

Importing or generating new Assets is accomplished by wizards that solicit the necessary user input and perform checks were possible to avoid nonsensical input. In the more complex case of creating a new Refinement Package Asset, the wizard allows the specification of input data, for example based on particle picking results or the selection of 2D and 3D classes. Once an Action has been launched, results are displayed as soon as they become available, together with an overall progress bar, giving users an estimate of how long a processing step will take and of whether the results are as expected. If desired, an Action can be aborted and restarted with a different set of parameters, or the Action can be run again after regular termination to test different parameters. In the latter case, all prior results remain accessible and users can specify those to be used for the next step in the workflow. This provides users with the flexibility to pick and choose the best results in cases where different parts of a dataset require different settings to yield optimal results.

Parallelization

cisTEM uses a home-grown scheme to accelerate processing in parallel environments. Image processing of single-particle data is an embarrassingly parallel problem, i.e. the parallelization of most tasks can be achieved simply by dividing the data to be processed into smaller chunks that are each processed by a separate program thread, without the need for inter-process communication. Only certain steps require merging of data, such as the calculation of a 3D reconstruction from the entire dataset. cisTEM parallelizes processing steps by running multiple instances of the same program, each dealing with a subset of the data, then directly communicating with the launched processes over TCP/IP sockets. This enables the cisTEM GUI to distribute jobs and receive results in real time. Communication is directed through a ‘manager’ process, which enables jobs to be run on a cluster, while the GUI itself can run on a local workstation.

How each copy of the program is run is specified by the user by setting up a ‘Run Profile’. This profile is a user defined command, or script that will be run to launch the job, and is designed to be flexible to enable the user to set up parallelization in many different environments. For example, users can design profiles to run on multiple machines via SSH, or to submit to a cluster (e.g. using qsub) etc., or even merge the two in a single profile. One disadvantage of this system is that it may be difficult to create profiles for clusters that require many jobs to be submitted using one command.

Another advantage of using a home-grown scheme over existing schemes (e.g. MPI) occurs when jobs are run on a multi-node computing cluster. In this case, jobs will complete even if the full number of requested processors is not available. For example, if a user requests 300 CPUs for a processing step but only 100 are available, cisTEM launches 300 jobs of which 200 will remain in the job scheduler’s queue. Data processing starts immediately with the 100 jobs that are allowed to run and will complete even if the remaining jobs never run. In such a scenario, an MPI-based job could only run when 300 CPUs become available, potentially delaying execution. In the few cases were a step requires merging of an entire dataset, for example in a 3D reconstruction, parallelization is achieved by calculating multiple intermediate 3D reconstructions for subsets of the data, dumping the intermediate reconstructions to disk and merging them after all reconstruction jobs have finished. It can therefore help to designate a fast disk as a scratch disk to allow rapid dumping and reading of the relatively large files (200 MB – 5 GB).

Benchmarking with β-galactosidase

A high-resolution dataset of β-galactosidase (Bartesaghi et al., 2015) was used to benchmark Relion 2 (Kimanius et al., 2016) and is also used here to illustrate the workflow of cisTEM and assess the time for the completion of different processing steps. The entire dataset was downloaded from the EMPIAR database (Iudin et al., 2016) and consists of 1539 movies containing 38 frames, recorded at super-resolution on a K2 Summit camera (Gatan, Inc., Pleasanton, CA) and stored locally as tif files using LZW compression (the conversion to tiff and compression was performed by mrc2tif (Mastronarde and Held, 2017)). The pixel size of the super-resolution frames was 0.3185 Å, and images were binned to a pixel size of 0.75 Å after movie processing. For 2D classification and ab-initio 3D reconstruction, particles were boxed using 384 × 384 pixel boxes. For auto- and manual refinement, the particles were re-boxed into 648 × 648 pixel boxes (boxing is part of the creation of Refinement Packages, see above). For all processing steps, a Dell Precision T7910 workstation containing two E5-2699 v4 Xeon processors with a total of 44 cores was used. Processing parameters were left on default settings except for CTF determination, which was performed at 3.5 Å resolution using the movie averages instead of the frames, and particle picking, which used optimized parameters based on previewing a few selected images (Figure 3). The resolution limit during refinement (auto, manual and CTF) never exceeded 3.1 Å. The data were stored on a local SSD Raid 0 disk for fast access. Table 1 lists the timings of the different processing steps using all 44 CPU cores. Results obtained at different points in the workflow are shown in Figure 7.

Table 1. Benchmarking of cisTEM using a high-resolution dataset of β-galactosidase (Bartesaghi et al., 2015).

Processing step Details Time (hours)
Movie processing 1539 movies, 38 frames, super-resolution 1.1
CTF determination Using aligned movie average as input 0.1
Particle picking 131,298 particles 0.1
2D classification 50 classes, 28 selected with 119,523 particles 0.8
Ab initio 3D reconstruction 40 iterations 0.8
Auto refinement 8 iterations, final resolution 2.2 Å 1.4
Manual refinement 1 iteration (incl. defocus), final resolution 2.2 Å 0.4
Total 4.7

Discussion

The implementation of a complete image processing workflow in cisTEM offers users a uniform experience and guarantees smooth transitions between processing steps. It also helps developers maintain the software as all the tools and algorithms are developed in-house.

The main focus of cisTEM is on the processing of single-particle cryo-EM data and high-resolution 3D reconstruction. Future releases of cisTEM may include on-the-fly processing of data as it is collected, particle-based movie alignment, support for helical particles, improved 3D masking tools, more reliable resolution and quality indicators, as well as miscellaneous tools such as the determination of the detective quantum efficiency of electron detectors.

Since cisTEM does not rely on third-party libraries, such as Python, MPI or CUDA, that usually have to be installed and compiled separately on the target system, ready-to-run binaries can be made available for download that are optimized for different architectures. Using the wxWidgets library also means that cisTEM can be compiled for different operating systems, including Linux, Windows and OSX. Using a configure script, different options for the fast Fourier transforms (FFTs) can be specified, including the FFTW (http://www.fftw.org) and Intel MKL (http://software.intel.com/en-us/mkl) libraries. The downloadable binaries are statically linked against the MKL library as this exhibits superior speeds compared to the FFTW library on Intel-based CPUs.

While the lack of support for GPUs simplifies the installation and execution of cisTEM, it can also be a limitation on workstations that are optimized for GPU-accelerated code. These workstations often do not have many CPU cores and, therefore, cisTEM will run significantly more slowly than code that can take advantage of the GPU hardware. Users who would like to run both CPU and GPU-optimized software may therefore have to invest in both types of hardware.

Materials and methods

Development of cisTEM

The entire cisTEM image processing package was written in C++ using the wxWidgets toolkit (http://wxwidgets.org) to implement the GUI, as well as the libtiff library (http://www.libtiff.org) to support the tiff image format, the SQLite library (https://sqlite.org) to implement the SQL database, and Intel’s MKL library (http://software.intel.com/en-us/mkl) for the calculation of Fourier transforms and vector products. Optionally, cisTEM can also be linked against the FFTW library (http://www.fftw.org) to replace the MKL library. The code was written and edited using Eclipse (http://www.eclipse.org), and GitHub (http://github.com) was used for version control. This code is available in Source code 1.

Benchmark dataset

The performance of cisTEM was benchmarked using a cryo-EM dataset of β-galactosidase (Bartesaghi et al., 2015), entry EMPIAR-10061 in the EMPIAR database (Iudin et al., 2016).

Image and data formats

cisTEM stores all image data using the MRC format (Crowther et al., 1996). Additionally, particle parameters can be imported from, and exported to Frealign (Grigorieff, 2016) and Relion (Scheres, 2012).

Acknowledgements

The authors are grateful for feedback from early testers of cisTEM, including Ruben Diaz-Avalos, Sarah Loerch, Priyanka Abeyrathne, Peter Rickgauer, Ben Himes, Andrei Korostelev, Anna Loveland, Gabriel Demo, Jue Chen, Dmitry Lyumkis, Hiro Furukawa, Wei Lu and Juan Du.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Timothy Grant, Email: tim@tgrant.co.uk.

Alexis Rohou, Email: a.rohou@gmail.com.

Nikolaus Grigorieff, Email: niko@grigorieff.org.

Edward H Egelman, University of Virginia, United States.

Funding Information

This paper was supported by the following grant:

  • Howard Hughes Medical Institute to Timothy Grant, Alexis Rohou, Nikolaus Grigorieff.

Additional information

Competing interests

Reviewing editor, eLife.

No competing interests declared.

Author contributions

Conceptualization, Software, Formal analysis, Supervision, Validation, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing.

Conceptualization, Software, Formal analysis, Supervision, Validation, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing.

Conceptualization, Software, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing.

Additional files

Source code 1. Source code for cisTEM 1.0 beta.
elife-35383-code1.gz (6.2MB, gz)
DOI: 10.7554/eLife.35383.011
Transparent reporting form
DOI: 10.7554/eLife.35383.012

Major datasets

The following previously published dataset was used:

Bartesaghi A, author; Merk A, author; Banerjee S, author; Matthies D, author; Wu X, author; Milne JL, author; Subramaniam S, author. 2.2 A resolution cryo-EM structure of beta-galactosidase in complex with a cell-permeant inhibitor. 2015 https://www.ebi.ac.uk/pdbe/emdb/empiar/entry/10061 Publicly available at the Electron Microscopy Public Image Archive (accession no. 10061)

References

  1. Abeyrathne PD, Koh CS, Grant T, Grigorieff N, Korostelev AA. Ensemble cryo-EM uncovers inchworm-like translocation of a viral IRES through the ribosome. eLife. 2016;5:e14874. doi: 10.7554/eLife.14874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bartesaghi A, Matthies D, Banerjee S, Merk A, Subramaniam S. Structure of β-galactosidase at 3.2-Å resolution obtained by cryo-electron microscopy. PNAS. 2014;111:11709–11714. doi: 10.1073/pnas.1402809111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bartesaghi A, Merk A, Banerjee S, Matthies D, Wu X, Milne JL, Subramaniam S. 2.2 A resolution cryo-EM structure of β-galactosidase in complex with a cell-permeant inhibitor. Science. 2015;348:1147–1151. doi: 10.1126/science.aab1576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chen JZ, Settembre EC, Aoki ST, Zhang X, Bellamy AR, Dormitzer PR, Harrison SC, Grigorieff N. Molecular interactions in rotavirus assembly and uncoating seen by high-resolution cryo-EM. PNAS. 2009;106:10644–10648. doi: 10.1073/pnas.0904024106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Chen S, McMullan G, Faruqi AR, Murshudov GN, Short JM, Scheres SH, Henderson R. High-resolution noise substitution to measure overfitting and validate resolution in 3D structure determination by single particle electron cryomicroscopy. Ultramicroscopy. 2013;135:24–35. doi: 10.1016/j.ultramic.2013.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cheng Y, Grigorieff N, Penczek PA, Walz T. A primer to single-particle cryo-electron microscopy. Cell. 2015;161:438–449. doi: 10.1016/j.cell.2015.03.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Conesa Mingo P, Gutierrez J, Quintana A, de la Rosa Trevín JM, Zaldívar-Peraza A, Cuenca Alba J, Kazemi M, Vargas J, Del Cano L, Segura J, Sorzano COS, Carazo JM. Scipion web tools: Easy to use cryo-EM image processing over the web. Protein Science. 2018;27:269–275. doi: 10.1002/pro.3315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Crowther RA, Henderson R, Smith JM. MRC image processing programs. Journal of Structural Biology. 1996;116:9–16. doi: 10.1006/jsbi.1996.0003. [DOI] [PubMed] [Google Scholar]
  9. Desfosses A, Ciuffa R, Gutsche I, Sachse C. SPRING - an image processing package for single-particle based helical reconstruction from electron cryomicrographs. Journal of Structural Biology. 2014;185:15–26. doi: 10.1016/j.jsb.2013.11.003. [DOI] [PubMed] [Google Scholar]
  10. Frank J, Radermacher M, Penczek P, Zhu J, Li Y, Ladjadj M, Leith A. SPIDER and WEB: processing and visualization of images in 3D electron microscopy and related fields. Journal of Structural Biology. 1996;116:190–199. doi: 10.1006/jsbi.1996.0030. [DOI] [PubMed] [Google Scholar]
  11. Grant T, Grigorieff N. Automatic estimation and correction of anisotropic magnification distortion in electron microscopes. Journal of Structural Biology. 2015a;192:204–208. doi: 10.1016/j.jsb.2015.08.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Grant T, Grigorieff N. Measuring the optimal exposure for single particle cryo-EM using a 2.6 Å reconstruction of rotavirus VP6. eLife. 2015b;4:e06980. doi: 10.7554/eLife.06980. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Grigorieff N. Resolution measurement in structures derived from single particles. Acta Crystallographica Section D Biological Crystallography. 2000;56:1270–1277. doi: 10.1107/S0907444900009549. [DOI] [PubMed] [Google Scholar]
  14. Grigorieff N. FREALIGN: high-resolution refinement of single particle structures. Journal of Structural Biology. 2007;157:117–125. doi: 10.1016/j.jsb.2006.05.004. [DOI] [PubMed] [Google Scholar]
  15. Grigorieff N. Frealign: An exploratory tool for single-particle Cryo-EM. Methods in enzymology. 2016;579:191–226. doi: 10.1016/bs.mie.2016.04.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Harauz G, van Heel M. Exact filters for general geometry 3-dimensional reconstruction. Optik. 1986;73:146–156. [Google Scholar]
  17. Henderson R. Avoiding the pitfalls of single particle cryo-electron microscopy: Einstein from noise. PNAS. 2013;110:18037–18041. doi: 10.1073/pnas.1314449110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Hohn M, Tang G, Goodyear G, Baldwin PR, Huang Z, Penczek PA, Yang C, Glaeser RM, Adams PD, Ludtke SJ. SPARX, a new environment for Cryo-EM image processing. Journal of Structural Biology. 2007;157:47–55. doi: 10.1016/j.jsb.2006.07.003. [DOI] [PubMed] [Google Scholar]
  19. Iudin A, Korir PK, Salavert-Torres J, Kleywegt GJ, Patwardhan A. EMPIAR: a public archive for raw electron microscopy image data. Nature Methods. 2016;13:387–388. doi: 10.1038/nmeth.3806. [DOI] [PubMed] [Google Scholar]
  20. Kimanius D, Forsberg BO, Scheres SH, Lindahl E. Accelerated cryo-EM structure determination with parallelisation using GPUs in RELION-2. eLife. 2016;5:e18722. doi: 10.7554/eLife.18722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Lander GC, Stagg SM, Voss NR, Cheng A, Fellmann D, Pulokas J, Yoshioka C, Irving C, Mulder A, Lau PW, Lyumkis D, Potter CS, Carragher B. Appion: an integrated, database-driven pipeline to facilitate EM image processing. Journal of Structural Biology. 2009;166:95–102. doi: 10.1016/j.jsb.2009.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Loveland AB, Demo G, Grigorieff N, Korostelev AA. Ensemble cryo-EM elucidates the mechanism of translation fidelity. Nature. 2017;546:113–117. doi: 10.1038/nature22397. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Lyumkis D, Brilot AF, Theobald DL, Grigorieff N. Likelihood-based classification of cryo-EM images using FREALIGN. Journal of Structural Biology. 2013;183:377–388. doi: 10.1016/j.jsb.2013.07.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Mastronarde DN, Held SR. Automated tilt series alignment and tomographic reconstruction in IMOD. Journal of Structural Biology. 2017;197:102–113. doi: 10.1016/j.jsb.2016.07.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Matthews BW. Solvent content of protein crystals. Journal of Molecular Biology. 1968;33:491–497. doi: 10.1016/0022-2836(68)90205-2. [DOI] [PubMed] [Google Scholar]
  26. McDonough RN. Whalen AD. Detection of Signals in Noise. 2nd edn. San Diego: Academic Press; 1995. [Google Scholar]
  27. McMullan G, Faruqi AR, Henderson R. Direct electron detectors. Methods in Enzymology. 2016;579:1–17. doi: 10.1016/bs.mie.2016.05.056. [DOI] [PubMed] [Google Scholar]
  28. McMullan G, Vinothkumar KR, Henderson R. Thon rings from amorphous ice and implications of beam-induced Brownian motion in single particle electron cryo-microscopy. Ultramicroscopy. 2015;158:26–32. doi: 10.1016/j.ultramic.2015.05.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Mindell JA, Grigorieff N. Accurate determination of local defocus and specimen tilt in electron microscopy. Journal of Structural Biology. 2003;142:334–347. doi: 10.1016/S1047-8477(03)00069-8. [DOI] [PubMed] [Google Scholar]
  30. Moriya T, Saur M, Stabrin M, Merino F, Voicu H, Huang Z, Penczek PA, Raunser S, Gatsogiannis C. High-resolution single particle analysis from electron cryo-microscopy images using SPHIRE. Journal of Visualized Experiments. 2017 doi: 10.3791/55448. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Oldham ML, Grigorieff N, Chen J. Structure of the transporter associated with antigen processing trapped by herpes simplex virus. eLife. 2016;5:e21829. doi: 10.7554/eLife.21829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Penczek PA, Fang J, Li X, Cheng Y, Loerke J, Spahn CM. CTER-rapid estimation of CTF parameters with error assessment. Ultramicroscopy. 2014;140:9–19. doi: 10.1016/j.ultramic.2014.01.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Punjani A, Rubinstein JL, Fleet DJ, Brubaker MA. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nature Methods. 2017;14:290–296. doi: 10.1038/nmeth.4169. [DOI] [PubMed] [Google Scholar]
  34. Reboul CF, Eager M, Elmlund D, Elmlund H. Single-particle cryo-EM-Improved ab initio 3D reconstruction with SIMPLE/PRIME. Protein Science. 2018;27:51–61. doi: 10.1002/pro.3266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Rohou A, Grigorieff N. CTFFIND4: Fast and accurate defocus estimation from electron micrographs. Journal of Structural Biology. 2015;192:216–221. doi: 10.1016/j.jsb.2015.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Roseman AM. FindEM--a fast, efficient program for automatic selection of particles from electron micrographs. Journal of Structural Biology. 2004;145:91–99. doi: 10.1016/j.jsb.2003.11.007. [DOI] [PubMed] [Google Scholar]
  37. Rosenthal PB, Henderson R. Optimal determination of particle orientation, absolute hand, and contrast loss in single-particle electron cryomicroscopy. Journal of Molecular Biology. 2003;333:721–745. doi: 10.1016/j.jmb.2003.07.013. [DOI] [PubMed] [Google Scholar]
  38. Scheres SH, Chen S. Prevention of overfitting in cryo-EM structure determination. Nature Methods. 2012;9:853–854. doi: 10.1038/nmeth.2115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Scheres SH, Valle M, Nuñez R, Sorzano CO, Marabini R, Herman GT, Carazo JM. Maximum-likelihood multi-reference refinement for electron microscopy images. Journal of Molecular Biology. 2005;348:139–149. doi: 10.1016/j.jmb.2005.02.031. [DOI] [PubMed] [Google Scholar]
  40. Scheres SH. RELION: implementation of a Bayesian approach to cryo-EM structure determination. Journal of Structural Biology. 2012;180:519–530. doi: 10.1016/j.jsb.2012.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Scheres SH. Semi-automated selection of cryo-EM particles in RELION-1.3. Journal of Structural Biology. 2015;189:114–122. doi: 10.1016/j.jsb.2014.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Sheth LK, Piotrowski AL, Voss NR. Visualization and quality assessment of the contrast transfer function estimation. Journal of Structural Biology. 2015;192:222–234. doi: 10.1016/j.jsb.2015.06.012. [DOI] [PubMed] [Google Scholar]
  43. Sigworth FJ. Classical detection theory and the cryo-EM particle selection problem. Journal of Structural Biology. 2004;145:111–122. doi: 10.1016/j.jsb.2003.10.025. [DOI] [PubMed] [Google Scholar]
  44. Sindelar CV, Grigorieff N. Optimal noise reduction in 3D reconstructions of single particles using a volume-normalized filter. Journal of Structural Biology. 2012;180:26–38. doi: 10.1016/j.jsb.2012.05.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Stewart A, Grigorieff N. Noise bias in the refinement of structures derived from single particles. Ultramicroscopy. 2004;102:67–84. doi: 10.1016/j.ultramic.2004.08.008. [DOI] [PubMed] [Google Scholar]
  46. Subramaniam S, Earl LA, Falconieri V, Milne JL, Egelman EH. Resolution advances in cryo-EM enable application to drug discovery. Current Opinion in Structural Biology. 2016;41:194–202. doi: 10.1016/j.sbi.2016.07.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Subramaniam S. Structure of trimeric HIV-1 envelope glycoproteins. PNAS. 2013;110:E4172–E4174. doi: 10.1073/pnas.1313802110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Tang G, Peng L, Baldwin PR, Mann DS, Jiang W, Rees I, Ludtke SJ. EMAN2: an extensible image processing suite for electron microscopy. Journal of Structural Biology. 2007;157:38–46. doi: 10.1016/j.jsb.2006.05.009. [DOI] [PubMed] [Google Scholar]
  49. Thon F. Zur Defokussierungsabhängigkeit des Phasenkontrastes bei der elektronenmikroskopischen Abbildung. Zeitschrift für Naturforschung A. 1966;21a:476–478. doi: 10.1515/zna-1966-0417. [DOI] [Google Scholar]
  50. van Heel M, Harauz G, Orlova EV, Schmidt R, Schatz M. A new generation of the IMAGIC image processing system. Journal of Structural Biology. 1996;116:17–24. doi: 10.1006/jsbi.1996.0004. [DOI] [PubMed] [Google Scholar]
  51. Van Heel M. Detection of objects in quantum-noise-limited images. Ultramicroscopy. 1982;7:331–341. doi: 10.1016/0304-3991(82)90258-3. [DOI] [Google Scholar]
  52. van Heel M. Finding trimeric HIV-1 envelope glycoproteins in random noise. PNAS. 2013;110:E4175–E4177. doi: 10.1073/pnas.1314353110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Voss NR, Yoshioka CK, Radermacher M, Potter CS, Carragher B. DoG Picker and TiltPicker: software tools to facilitate particle selection in single particle electron microscopy. Journal of Structural Biology. 2009;166:205–213. doi: 10.1016/j.jsb.2009.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Zhang K. Gctf: Real-time CTF determination and correction. Journal of Structural Biology. 2016;193:1–12. doi: 10.1016/j.jsb.2015.11.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Zhou A, Rohou A, Schep DG, Bason JV, Montgomery MG, Walker JE, Grigorieff N, Rubinstein JL. Structure and conformational states of the bovine mitochondrial ATP synthase by cryo-EM. eLife. 2015;4:e10180. doi: 10.7554/eLife.10180. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision letter

Editor: Edward H Egelman1

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "cisTEM: User-friendly software for single-particle image processing" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor (Ed Egelman) and John Kuriyan as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Sjors HW Scheres (Reviewer #2); Henning Stahlberg (Reviewer #3); Bridget Carragher (Reviewer #4).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary

This paper describes a new image processing software solution for cryo-EM single-particle analysis called cisTEM. cisTEM combines many existing and very popular tools from the Grigorieff lab (CTFFIND, UNBLUR, FREALIGN) with several new functionalities, such as auto-picking, 2D classification and ab initio model calculation. The existing tools were all very popular solutions in the field, and their inclusion into cisTEM has motivated a complete re-write in C++. By careful code optimisation, the new programs now run at speeds that are comparable with (or even surpass) implementations by others on GPUs (e.g. RELION or cryoSPARC). In line with their excellent track-record, the authors release the code as open-source, which is laudable in a time where many new software implementations have moved towards licensing solutions for non-academic users. Modifications to existing methods and new methods are described in sufficient detail to be useful for potential users, and in many cases even for those programmers eager to implement their own versions. The GUI described appears to be well thought-through and thereby a good help for less experienced users. This is particularly relevant for access to the 3D classification and refinement approaches, which were less accessible in previous versions of FREALIGN. The results presented for β-gal are impressive: a 2.2A reconstruction is state-of-the-art, and the execution time below 5 hours is extremely fast! Therefore, this tool will undoubtedly be extremely useful for the field. The performance of cisTEM is amazing, in terms of resolution and speed. The authors are to be congratulated for making such a wonderful package available as open-source.

Essential revisions:

1) Eq 15 does not have a CTF_i component. Is that part of the projection operator (with the funny P)? If so, that would be good to spell out. Otherwise, CTF_i should be added to the equation.

2) It was unclear whether the 3D classification approach only marginalises over classes, or whether the marginalisation over the 3 in-plane orientations can also be performed. Perhaps an extra sentence would help.

3) Is cisTEM capable of on-the-fly processing? If so, it might be worth spelling that out, as many EM centres in the world are moving towards automated processing of images while they are being acquired.

4) In subsection “Parallelization”, the authors describe a flexible home-made solution to queueing many little sub-jobs, which could provide additional flexibility on busy clusters. However, it might very well be that many clusters require a given number of free nodes to be specified in their own queueing system in order to use the cluster. Perhaps an additional sentence about this could be added?

5) cisTEM is available under the GNU General Public License. This is not stated in this manuscript but should be.

6) Introduction: Use "direct electron detectors", not "direct detectors".

7) Subsection “Particle picking”: Please define "mode" in this context.

8) Subsection “Particle picking”: It is obvious that the "need" is there now for external users. This statement could be rephrased.

9) Subsection “3D refinement (FrealignX)”: The size threshold of 400kDa, up to which individual particle defocus refinement might be helpful, will depend on various other parameters, such as kV, defocus, ice thickness, electron detector, etc. This statement could be extended to state that this threshold applies to current typical hardware and sample preparation methods.

10) Subsection “3D refinement (FrealignX)”: For the description of applications of particle-wise CTF refinements, it would be helpful to specify, if this was done using the absolute values of the CCterms in Equation (9b), and if so, which Resolution R2 was used as transition from signed to unsigned CC values.

Also, how much does the particle-wise CTF refinement improve the final resolution, if the signed versus the unsigned CC calculation is used?

11) Subsection “3D refinement (FrealignX)”: FSC calculation is done for particles aligned against a single reference, which is resolution limited to a value well below the current resolution estimate minus 2/Dmask. However, here the authors also state that the users can set the resolution limit of the reference to arbitrary values, at their own risk. This sounds like a dangerous option for unexperienced users. Does cisTEM in such cases print out a big fat warning somewhere?

12) Subsection “Ab-initio 3D reconstruction”: "3 direction are chosen". ("s" is missing in "directions").

13) Subsection “Ab-initio 3D reconstruction”: What are the 3 directions that are chosen here? Are those random, linearly independent directions? Or are these related to symmetry axes? Please specify or explain better.

14) Discussion section: GPU hardware doesn't have to be noisier that CPU hardware. Water-cooled housings and water-cooled GPU cards are available, even though at higher costs. Several high-end GPU systems are absolutely silent, even under load. This statement is only valid under very specific settings. I would remove or rephrase it.

15) It would be helpful to know how projects are tracked in a multi user facility. In some facilities there are 100's of different projects, many of which must be well separated for confidentiality reasons. How does cisTEM handle this?

16) Can new programs easily be incorporated into cisTEM? For example, on one of the few features lacking right now is per particle unblurring and I suspect that many people will want to go with motioncorr2. While the authors discuss the intimate link between the GUI and the code as a positive, it would also be good if there were some options to "wrap" other programs as needed. If not, I suspect users will continue to leap in and out of a multitude of packages.

17) Does the package provide a way for sorting particles or images based on confidence values or outputs. E.g. reject all images where the CTF fit is poor etc.

18) While I accept that the DoG algorithm can be modified to select β-gal I would like to know if it really can be optimized on something much more stick like. If not, then I suspect users will have to exit the program to use a template picker until one becomes available. Again, pointing to the value of wrapping one of the many very good available template pickers.

19) It has been discussed by Gabe Lander (and other groups, too) that there is an advantage in packing particles very closely together across holes (e.g. see the Aldolase images in https://www.biorxiv.org/content/early/2017/05/25/141994). How will this affect programs that need to select background regions for determining optimal particle picking parameters or for noise whitening etc.

20) Per particles defocus is also likely to be very useful for images that are tilted relative to the electron beam or that are in two layers (see for example: https://www.biorxiv.org/content/early/2017/12/11/230276). Could the per particle be done in patches? How do the cisTEM results compare to those of gCTF for small particles? Is it possible that the lack of improvement for β-gal compared to the virus is because the method just failed due to the lower signal? How would a user know this? One way might be to plot the defocus values as z-heights which if particles really do all adhere to an air water interface will be in a plane. This might also help smooth out outliers.

21) Should cryoSparc be part of the initial set of software referenced in the Introduction.

22) What FSC is reported, 0.143?

23) Masking instructions are a bit confusing (Subsection “3D refinement (FrealignX)”).

24) Subsection “3D refinement (FrealignX)”: What happened without the mask?

25) BSC scoring (Subsection “3D refinement (FrealignX)”) description is a bit vague.

26) How does the masking affect the FSC exactly – does it come out the same?

27) Subsection “Map sharpening” calculate should be calculated (and this was the only typo I noticed!)

28) Database is interesting, but it is not clear if is it useful to the end user or just the programmers. It might be polite to reference Appion which has been using a very effective database as a tool for delivering results to users and wrapping disparate software packages for over a decade.

29) Can cisTEM be run remotely? Does your computer have to stay awake during a long action?

eLife. 2018 Mar 7;7:e35383. doi: 10.7554/eLife.35383.017

Author response


Essential revisions:

1) Eq 15 does not have a CTF_i component. Is that part of the projection operator (with the funny P)? If so, that would be good to spell out. Otherwise, CTF_i should be added to the equation.

Done.

2) It was unclear whether the 3D classification approach only marginalises over classes, or whether the marginalisation over the 3 in-plane orientations can also be performed. Perhaps an extra sentence would help.

Done (see added text, Subsection “3D refinement (FrealignX)).

3) Is cisTEM capable of on-the-fly processing? If so, it might be worth spelling that out, as many EM centres in the world are moving towards automated processing of images while they are being acquired.

This is currently not implemented but on our to-do list (see added text, Discussion section).

4) In subsection “Parallelization”, the authors describe a flexible home-made solution to queueing many little sub-jobs, which could provide additional flexibility on busy clusters. However, it might very well be that many clusters require a given number of free nodes to be specified in their own queueing system in order to use the cluster. Perhaps an additional sentence about this could be added?

The current run profiles are designed to be flexible in that the user can provide any command or script they choose to run the command. In some cases, the user may prefer to have one command to run many processes, e.g. using mpirun. This is not easily achievable in the current setup, although this ability will likely be added in a future version. We have added a paragraph (subsection “Parallelization”) that discusses the job submission and this limitation in more detail.

5) cisTEM is available under the GNU General Public License. This is not stated in this manuscript but should be.

cisTEM is distributed under the Janelia Research Campus Software License (http://license.janelia.org/license/janelia_license_1_2.html) (see added text, Introduction).

6) Introduction: Use "direct electron detectors", not "direct detectors".

Done.

7) Subsection “Particle picking”: Please define "mode" in this context.

The mode of a distribution is the location of the peak of that distribution. Text has been modified (subsection “Particle picking”).

8) Subsection “Particle picking”: It is obvious that the "need" is there now for external users. This statement could be rephrased.

Done (see text, subsection “Particle picking”).

9) Subsection “3D refinement (FrealignX)”: The size threshold of 400kDa, up to which individual particle defocus refinement might be helpful, will depend on various other parameters, such as kV, defocus, ice thickness, electron detector, etc. This statement could be extended to state that this threshold applies to current typical hardware and sample preparation methods.

Done (see text, subsection “3D refinement (FrealignX)”).

10) Subsection “3D refinement (FrealignX)”: For the description of applications of particle-wise CTF refinements, it would be helpful to specify, if this was done using the absolute values of the CCterms in Equation (9b), and if so, which Resolution R2 was used as transition from signed to unsigned CC values.

Also, how much does the particle-wise CTF refinement improve the final resolution, if the signed versus the unsigned CC calculation is used?

Absolute CC terms were not used (see added text, subsection “3D refinement (FrealignX)”).

11) Subsection “3D refinement (FrealignX)”: FSC calculation is done for particles aligned against a single reference, which is resolution limited to a value well below the current resolution estimate minus 2/Dmask. However, here the authors also state that the users can set the resolution limit of the reference to arbitrary values, at their own risk. This sounds like a dangerous option for unexperienced users. Does cisTEM in such cases print out a big fat warning somewhere?

We will consider adding a general warning to the Manual Refinement panel in the next release. The risk of biased FSC is only one of several that an inexperienced user is exposed to on this panel.

12) Subsection “Ab-initio 3D reconstruction”: "3 direction are chosen". ("s" is missing in "directions").

Done.

13) Subsection “Ab-initio 3D reconstruction”: What are the 3 directions that are chosen here? Are those random, linearly independent directions? Or are these related to symmetry axes? Please specify or explain better.

The three sets of Euler angles used in the format (psi, theta, phi) are (0, 0, 0), (-45, -45, -45) and (15, 70, -15), which were chosen to be sufficiently different to improve robustness of the algorithm. In theory the method should work with any angle, and only one should be needed. In our method we use three and average over the results, decreasing the chance of failure due to a pathologically oriented map, however we have not tested this to determine whether it is needed. We have changed the text to add more detail (subsection “Ab-initio 3D reconstruction”).

14) Discussion section: GPU hardware doesn't have to be noisier that CPU hardware. Water-cooled housings and water-cooled GPU cards are available, even though at higher costs. Several high-end GPU systems are absolutely silent, even under load. This statement is only valid under very specific settings. I would remove or rephrase it.

Removed.

15) It would be helpful to know how projects are tracked in a multi user facility. In some facilities there are 100's of different projects, many of which must be well separated for confidentiality reasons. How does cisTEM handle this?

cisTEM keeps all results of a project in its project directory. In a multi-user facility where confidentiality is important, users can limit read access to this directory to prevent unauthorized access.

16) Can new programs easily be incorporated into cisTEM? For example, on one of the few features lacking right now is per particle unblurring and I suspect that many people will want to go with motioncorr2. While the authors discuss the intimate link between the GUI and the code as a positive, it would also be good if there were some options to "wrap" other programs as needed. If not, I suspect users will continue to leap in and out of a multitude of packages.

Unfortunately, cisTEM’s current design does not allow easy incorporation of other software. cisTEM enables data import and export at all stages, and it should be easy for users to perform motion correction in MotionCorr2, and then import the aligned sums for further processing in cisTEM.

17) Does the package provide a way for sorting particles or images based on confidence values or outputs. E.g. reject all images where the CTF fit is poor etc.

The option to sort results is available in some panels, most notably in the CTF estimation Results panel, where the images can be sorted / selected based on the CTF fit. Due to the fact the results are stored in a database, sorting / selecting is relatively easy to implement, and future versions will have more sorting facilities. We have added a brief description of this in subsection “GUI design and workflow”.

18) While I accept that the DoG algorithm can be modified to select β-gal I would like to know if it really can be optimized on something much more stick like. If not, then I suspect users will have to exit the program to use a template picker until one becomes available. Again, pointing to the value of wrapping one of the many very good available template pickers.

A simple disk as a template will perform increasingly worse as the shape of the particle becomes more stick-like. As indicated in the manuscript, a template-based picker is in the planning. In the meantime, particle coordinates found by a different picker can be imported into cisTEM. We have added a reference to the DoG algorithm (subsection “Particle picking”).

19) It has been discussed by Gabe Lander (and other groups, too) that there is an advantage in packing particles very closely together across holes (e.g. see the Aldolase images in https://www.biorxiv.org/content/early/2017/05/25/141994). How will this affect programs that need to select background regions for determining optimal particle picking parameters or for noise whitening etc.

Our particle picking algorithm will indeed be susceptible to misestimating the background noise spectrum under such circumstances. Picking will still work, but a higher false-positive rate might be expected. This is now spelt out in the text (subsection “Particle picking”).

20) Per particles defocus is also likely to be very useful for images that are tilted relative to the electron beam or that are in two layers (see for example: https://www.biorxiv.org/content/early/2017/12/11/230276). Could the per particle be done in patches? How do the cisTEM results compare to those of gCTF for small particles? Is it possible that the lack of improvement for β-gal compared to the virus is because the method just failed due to the lower signal? How would a user know this? One way might be to plot the defocus values as z-heights which if particles really do all adhere to an air water interface will be in a plane. This might also help smooth out outliers.

The lack of improvement in the case of β-galactosidase could certainly be due to inaccuracies in the defocus refinement. CTF refinement in patches, as it is already implemented in gCTF, is an excellent idea. A simplified patch-based refinement was implemented in Frealign 9 (a “patch” consisted of a whole micrograph). We will add such a feature in a future release of cisTEM.

21) Should cryoSparc be part of the initial set of software referenced in the Introduction.

Done.

22) What FSC is reported, 0.143?

Yes (see added text, subsection “3D refinement (FrealignX)”).

23) Masking instructions are a bit confusing (Subsection “3D refinement (FrealignX)”).

We rewrote the description of the masking procedure (see changed text, Subsection “3D refinement (FrealignX)”).

24) Subsection “3D refinement (FrealignX)”: What happened without the mask?

Without masking, the detergent micelle showed typical signs of noise overfitting (see added text, Subsection “3D refinement (FrealignX)”).

25) BSC scoring (Subsection “3D refinement (FrealignX)”) description is a bit vague.

We added an explanation of how much the effective B-factors used for weighting vary in a typical case (see added text, Subsection “3D refinement (FrealignX)”)).

26) How does the masking affect the FSC exactly – does it come out the same?

Yes, the more generous masking and subsequent solvent correction yields FSC curves that are very close to curves obtained with tightly masked half volumes (see added text, Subsection “3D refinement (FrealignX)”, caption and added curve, Figure 7C).

27) Subsection “Map sharpening” calculate should be calculated (and this was the only typo I noticed!).

Done.

28) Database is interesting, but it is not clear if is it useful to the end user or just the programmers. It might be polite to reference Appion which has been using a very effective database as a tool for delivering results to users and wrapping disparate software packages for over a decade.

Some of the sorting functions (see above) depend on the database. Also, results keep a record of which input data were used to generate them. These functions are useful to both programmers and users. We have added a reference to Appion (Subsection “GUI design and workflow”).

29) Can cisTEM be run remotely? Does your computer have to stay awake during a long action?

Since cisTEM is part of the processing software, it must remain running during processing. Protocols such as NX, VNC or RDP can be used to start cisTEM locally, then log off and connect again remotely.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Figure 7—source data 1. Source data for the curves shown in Figure 7C.
    DOI: 10.7554/eLife.35383.009
    Source code 1. Source code for cisTEM 1.0 beta.
    elife-35383-code1.gz (6.2MB, gz)
    DOI: 10.7554/eLife.35383.011
    Transparent reporting form
    DOI: 10.7554/eLife.35383.012

    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES