Assessment of detailed conformations suggests strategies for improving cryoEM models: helix at lower resolution, ensembles, pre-refinement fixups, and validation at multi-residue length scale

Jane S Richardson; Christopher J Williams; Lizbeth L Videau; Vincent B Chen; David C Richardson

doi:10.1016/j.jsb.2018.08.007

. Author manuscript; available in PMC: 2019 Nov 1.

Published in final edited form as: J Struct Biol. 2018 Aug 11;204(2):301–312. doi: 10.1016/j.jsb.2018.08.007

Assessment of detailed conformations suggests strategies for improving cryoEM models: helix at lower resolution, ensembles, pre-refinement fixups, and validation at multi-residue length scale

Jane S Richardson ¹, Christopher J Williams ¹, Lizbeth L Videau ¹, Vincent B Chen ¹, David C Richardson ¹

PMCID: PMC6163098 NIHMSID: NIHMS1503621 PMID: 30107233

Abstract

We find that the overall quite good methods used in the CryoEM Model Challenge could still benefit greatly from several strategies for improving local conformations. Our assessments primarily use validation criteria from the MolProbity web service. Those criteria include MolProbity’s all-atom contact analysis, updated versions of standard conformational validations for protein and RNA, plus two recent additions: first, flags for cis-nonPro and twisted peptides, and second, the CaBLAM system for diagnosing secondary structure, validating Cα backbone, and validating adjacent peptide CO orientations in the context of the Cα trace. In general, automated ab initio building of starting models is quite good at backbone connectivity but often fails at local conformation or sequence register, especially at poorer than 3.5Å resolution. However, we show that even if criteria (such as Ramachandran or rotamer) are explicitly restrained to improve refinement behavior and overall validation scores, automated optimization of a deposited structure seldom corrects specific misfittings that start in the wrong local minimum, but just hides them. Therefore, local problems should be identified, and as many as possible corrected, before starting refinement. Secondary structures are confusing at 3–4Å but can be better recognized at 6–8Å. In future model challenges, specific steps being tested (such as segmentation) and the required documentation (such as PDB code of starting model) should each be explicitly defined, so competing methods on a given task can be meaningfully compared. Individual local examples are presented here, to understand what local mistakes and corrections look like in 3D, how they probably arise, and what possible improvements to methodology might help avoid them. At these resolutions, both structural biologists and end-users need meaningful estimates of local uncertainty, perhaps through explicit ensembles. Fitting problems can best be diagnosed by validation that spans multiple residues; CaBLAM is such a multi-residue tool, and its effectiveness is demonstrated.

Keywords: CaBLAM, cryoEM model challenge, 3–4Å resolution, model validation, MolProbity

Introduction

Our laboratory developed all-atom contact analysis and the MolProbity validation web service (Word 1999; Davis 2004) to successfully diagnose and guide correction of local model errors in macromolecular crystal structures at 2.5Å or better (Chen 2010; Read 2011; Richardson 2013b). More recently we have worked on tools that could extend these benefits to lower resolutions in the 2.5–4Å range (Richardson 2018a; Williams 2018). Initially these new or modified tools were aimed at crystal structures, but since the cryoEM “revolution” we are exploring how best to extend and apply them to cryoEM structures as well.

Recently, the EMDataBank set up a CryoEM Model Challenge (Lawson 2018; Kryshtafovych 2018), where challenge modelers built automated models for some or all of eight different cryoEM-structure targets, (https://doi.org/10.5281/zenodo.1165999) either ab initio from the maps or by refinement of the cryoEM coordinates. Our lab at Duke was one of the assessors of the challengers’ models, with our results reported here. That challenge provided a very productive learning experience and a boost to software development, for assessors such as ourselves as well as for the modelers. Our laboratory’s approach in the assessment was to examine individual, local examples in order to understand the meaning, and also the gaps, for validation by overall statistics. However, here we also tabulate statistics from several of our newer criteria not included on the EMDB’s model-comparison website: cisnonPro and twisted peptides, ribose pucker and RNA backbone conformers, and especially CaBLAM outliers. Our emphasis, though not exclusively, is on ab initio models and on the higher-resolution maps, both for assessment of the model submissions and for assessing the productive applicability of each validation criterion.

Methods

Target choice, model submission, availability of the relevant files, and overall characteristics, validation, and comparisons of the models were done centrally by the EMDB, as seen at https://doi.org/10.5281/zenodo.1165999.

The all-atom contact method evaluates hydrogen bonds, van der Waals, and steric clashes (unfavorable overlaps ≥0.4Å), after adding and optimizing all explicit hydrogen atoms (Word 1999). Graphical markup for the contacts is shown at the top of Figure 1a, with hotpink spikes for clashes, light green convex dot pillows for H-bonds, and paired concave dot surfaces for van der Waals contacts. An all-atom “clashscore” is reported (number of clashes per 1000 atoms), but more useful, and used here, are individual clashes, which are both local and directional and can guide refitting of problem areas. Another diagnostic feature is poor H-bonding in secondary-structure regions. MolProbity reports the same up-to-date Ramachandran outliers used at PDB deposition, but unfortunately the absence of such outliers does not always imply correct backbone at 2.5–4Å resolution. MolProbity also reports recently updated sidechain rotamers (Hintze 2016), which are useful at any resolution; however, the target for rotamers is 0.3% outliers not zero, and effort is required to ensure the right rotamer choice.

Figure 1 - — MolProbity markup. a) Key to graphical MolProbity representations of model validation measures: clashes, H-bonds & vanderWaals contacts, Cβ deviations (magenta spheres), *cis*-nonPro peptides (lime green), sidechain rotamer outliers (gold), Ramachandran ϕ,ψ outliers (green), RNA ribose pucker outliers (magenta), CaBLAM outlier (hotpink) and disfavored (purple) for CO vs Cα-trace, bond angle and bond length deviations (red if too wide, blue if too short). b) CaBLAM’s validation parameters: two partially overlapping Cα virtual dihedrals ( blue and green) for backbone analysis of the central residue, virtual CO-CO dihedral between successive peptides (thick red line), and Cα virtual angle (thin red line). c.) CaBLAM mark-up on a cryoEM model for target 5 (3j7L: Wang 2016) : magenta lines mark two “outlier” sets of 3 consecutive COs pointing in the same direction in a β-strand pair; the annotated secondary-structure probability here is 0% α and 31% β, based on occurrences in our Top8000 quality-filtered database.

Cis peptides occur before 5% of prolines, but before only 0.03% of non-prolines; genuine cis-nonPro are usually involved in biological function. For about 10 years cis-nonPro peptides were over-used by orders of magnitude at low resolution or in disordered regions of crystal structures (Croll 2015). In response, MolProbity, Coot, and Phenix now flag them prominently (Williams 2015; top right in Figure 1a), and their incidence has since been dropping. We also flag twisted peptides (with the omega dihedral >30° off planar), which are almost never correct. Both are tabulated for the Challenge models in Table 1.

Table 1-.

cis-nonPro, twisted peptide, and CaBLAM validation

Model		#cis- nonP	%cis- nonP	#twist pept	%twist pept	#cablam res	cablam %ou	ca-geo %out	helix %	beta %
4udv	T1	0		0		149	4.0	0.7	53.0	4.0
119_1	optimized	0		0		149	1.3	0.7	53.0	4.0
123_1	optimized	0		0		5513	0.7	0	53.0	4.0
133_1	fitted another	1	0.7	1	0.6	1694	6.5	2.0	50.0	7.1
164_1	optimized	0		0		19584	4.2	0.7	54.9	4.2
192_1	optimized	0		0		7301	4.0	0.7	52.4	5.4
123_2	ab initio	0		0		149	1.3	0.7	53.0	4.7
130_1	ab initio	0		0		7320	6.7	1.7	55.0	3.3
130_2	ab initio	0		0		4068	0	0	72.2	0
181_1	ab initio	5	3.5	4	2.6	149	10.7	7.4	40.3	5.4
194_1	ab initio	1	0.7	6	4.0	129	9.3	3.9	55.0	3.1
3j9i	T2, mapA	1	0.2	1	0.2	5866	2.4	0.5	38.7	18.1
6bdf, xray	T2, mapB
120_1	A, optimized	0		0		5866	1.6	0.8	38.7	18.1
123_1	A, optimized	0		1	0.2	5866	4.5	1.2	36.5	17.2
123_2	B, optimized	0		0		5866	2.9	1.0	38.4	17.7
131_1	A, optimized	0		1	0.2	5866	2.6	0.7	38.4	19.1
164_1	B, optimized	1	0.2	1	0.2	5866	2.4	0.5	38.7	18.1
189_1	A, fitted another	0		8	0.1	5824	3.6	0.3	37.4	18.6
189_2	B, fitted another	1	0.0	15	0.3	5824	3.6	0.4	37.5	18.8
192_1	A, optimized	1	0.2	0		5866	2.6	0.8	37.7	18.7
123_3	B, ab initio	0		0		199	2.5	2.0	36.7	20.6
130_1	B, ab initio	0		2	0.6	3724	2.6	0.4	36.8	15.8
130_2	A, ab initio	1	0.4	0		2548	1.6	0.6	52.8	7.7
130_3	B, ab initio	1	0.4	1	0.4	3122	2.2	0.9	36.8	19.7
130_4	A, ab initio	1	0.5	1	0.4	2394	5.3	1.2	57.3	4.1
181_1	A, ab initio	7	3.7	6	3.1	193	13.0	5.2	7.5	11.9
183_1	A, ab initio	1	0.1	10	0.5	2200	20.7	9.1	21.0	10.4
1ss8 xray	T3	0		0		3640	1.0	0.7	48.3	12.0
1grL xray	T3
119_1	fitted another	6	1.2	3	0.6	7252	10.8	2.9	46.3	9.3
123_1	optimized	0		2	0.4	7280	1.3	0.6	49.4	12.9
133_1	fitted another	3	0.6	1	0.2	7280	4.8	1.5	47.3	12.7
164_1	optimized	0		3	0.6	7252	3.3	0.8	50.4	12.0
164_2	optimized	0		3	0.6	7252	3.3	0.8	50.4	12.0
192_1	fitted another	0		0		7280	1.5	0.4	49.0	12.1
130_1	ab initio	3	1.0	1	0.3	2898	5.3	0.5	63.8	2.9
130_2	ab initio	2	0.7	0		2968	4.2	1.9	69.8	2.4
3j5p	T4	4	0.7	6	1.0	2320	2.8	2.2	55.3	6.3
119_1	optimized	6	1.0	6	1.0	2453	5.2	2.9	51.2	6.7
120_1	optimized	0		0		2472	2.8	1.0	54.4	6.5
123_1	optimized	0		0		1924	2.5	1.3	57.6	4.6
131_1	optimized	3	0.5	1	0.2	2280	3.3	1.6	56.1	6.1
133_1	fitted another	3	0.6	1	0.2	1981	6.8	1.4	54.4	6.4
164_1	optimized	3	0.5	6	1.0	2292	3.0	2.2	56.0	5.2
164_2	optimized	0		0		1244	2.6	1.6	69.8	0.6
192_1	optimized	3	0.5	0		2292	4.4	1.6	54.6	5.9
193_1	optimized	3	0.6	9	1.9	1836	5.9	2.2	58.4	4.7
130_1	ab initio	0		1	0.3	984	2.4	1.2	69.5	0.4
130_2	ab initio	0		0		588	3.4	0.7	64.0	0.7
183_1	ab initio	1	0.3	64	2.0	1393	23.2	10.3	19.9	7.0
3j7L	T5	2	0.4	0		465	2.6	0.2	8.8	29.7
119_1	optimized	2	0.4	4	0.8	467	2.8	0	9.4	28.7
123_2	optimized	0		180	0.6	27900	0.2	0.2	9.7	28.0
133_1	optimized	0	3	0.6	465	5.2	0		9.0	31.4
164_1	optimized	0.7	0.5	0		9600	1.9	0.6	10.0	32.5
192_1	optimized	360	1.3	0		27900	5.2	0.1	11.0	30.8
123_1	ab initio	0		0		160	1.2	0.6	10.6	29.4
130_1	ab initio	180	1.0	420	3.6	8220	19.0	8.0	21.9	13.1
130_2	ab initio	240	1.6	240	1.5	10860	14.4	3.3	20.4	13.8
181_1	ab initio	1	0.7	3	2.0	145	15.9	5.5	3.5	26.2
183_1	ab initio	0		4	0.3	1450	26.8	10.3	2.8	32.5
194_1	ab initio	1	0.2	16	3.4	463	15.1	6.9	4.3	24.2
3j7h	T6, mapA	11	1.2	0		4072	2.4	1.1	12.9	27.4
5a1a	T6, mapB	9	0.9	4	0.1	4072	2.8	1.3	12.4	27.1
119_1	A, optimized	10	1.0	1	0.1	4072	3.7	1.0	12.5	26.7
119_2	B, optimized	9	0.9	0		4072	3.3	0.7	11.8	26.8
123_1	A, optimized	0		0		4072	2.2	0.8	12.5	26.1
123_2	B, optimized	0		0		4072	1.9	0.6	12.2	26.8
128_1	A, fitted another	4	0.4	0		4072	2.2	0.6	12.3	27.9
133_1	B, fitted another	3	0.3	18	1.8	4028	3.9	1.0	12.0	26.8
133_2	A, fitted another	4	0.4	8	0.8	4028	3.5	1.4	12.2	26.8
130_1	A, ab initio	9	0.9	12	1.9	1800	17.1	4.4	11.6	12.0
130_2	B, ab initio	3	0.4	9	1.1	2776	4.5	1.2	14.0	22.9
130_3	A, ab initio	5	1.0	7	1.2	1492	10.2	2.4	9.7	16.6
130_4	B, ab initio	5	0.7	7	0.9	2468	5.3	1.6	13.1	22.0
193_1	A, ab initio	0		10	1.0	1014	8.0	1.5	10.6	26.5
5a63	T7, mapA	0		3	0.3	1191	4.0	1.0	52.3	10.5
4upc	T7, mapB	0		4	1.0	391	9.1	3.2	27.6	16.1
118_1	A, optimized	3	0.5	8	1.1	705	10.2	3.0	22.6	12.3
119_1	A, fitted another	0		7	0.6	1191	5.8	1.4	50.4	9.7
119_2	B, optimized	0		2	0.2	1199	3.4	1.2	52.5	9.7
120_1	B, optimized	0		1	0.1	1199	1.3	0.8	53.2	9.9
123_1	A, optimized	1	0.1	0		1199	1.5	0.8	52.5	9.3
123_2	B, optimized	0		2	0.2	1199	1.8	0.8	52.5	9.1
133_1	B, fitted another	0		10	0.8	1194	5.3	1.0	52.7	9.7
133_2	A, fitted another	0		12	1.0	1194	5.5	0.9	50.4	10.0
164_1	A, optimized	0		3	0.3	900	5.0	1.3	44.4	13.7
164_2	B, optimized	0		3	0.3	1199	4.0	1.0	52.3	10.5
189_1	A, fitted another	1	0.2	0		610	5.6	2.8	23.8	20.5
192_1	A, optimized	0		0		345	12.2	2.6	27.0	16.2
192_2	B, optimized	0		0		1199	3.8	1.1	52.3	9.3
130_1	A, ab initio	9	1.2	38	4.8	507	3.2	1.8	70.6	0.4
130_2	B, ab initio	2	0.3	8	1.0	638	6.0	2.8	52.8	6.3
130_3	A, ab initio	9	1.1	46	5.4	399	5.3	2.0	58.4	2.5
130_4	B, ab initio	2	0.2	7	0.8	645	5.3	1.0	55.4	4.2
181_1	B, ab initio	14	2.2	28	4.2	661	19.1	8.6	25.6	14.4
183_1	B, ab initio	0.6	0.1	8	1.8	6610	25.2	10.4	14.4	9.4
185_1	B, ab initio	0		0		306	0.0	0.0	88.2	0.0
194_1	B, ab initio	1	0.3	5	1.5	223	4.5	0.0	75.3	0.0
5afi	T8, mapA	0		0		6108	6.6	1.6	28.5	18.3
3ja1	T8, mapB	12	0.2	180	2.6	6913	10.1	3.1	28.2	17.1
120_1	A, optimized	0		2	0.3	6116	2.7	0.9	29.6	18.5
131_1	A, optimized	0		0		3043	5.8	1.5	25.6	18.7
192_1	B, optimized	12	0.2	2	0.0	6909	9.3	2.6	27.8	17.2
192_2	A, optimized	0		0		6108	5.7	1.6	28.7	18.3
130_1	B, ab initio	116	2.0	137	2.3	3842	14.8	4.1	38.6	1.7
130_2	A, ab initio	37	0.8	88	1.9	2965	9.4	2.4	43.2	5.4

Open in a new tab

MolProbity (Williams 2018) and phenix.molprobity within the PHENIX software system (Adams 2010) have tools to validate RNA structure (Richardson 2008; Jain 2015), important as a component in many large complexes. It turns out that ribose pucker, a strong influence on surrounding conformation but not directly visible even at 2Å, can be determined by the robustly seen position of the phosphates and direction of the glycosidic bond between base and sugar, with diagnostic markup for the “Pperp” criterion shown in Figure 1a. The communityconsensus list of valid, full-detail RNA backbone conformers can help guide better modeling at any resolution. Because sampling of good reference data is still sparse for the 7 dihedral-angle parameters per sugar-to-sugar “suite”, at least 5% suite outliers, not zero, can be expected in validation. Both pucker and suite measures are reported in Table 1 for RNA chains in the Challenge models (present in targets 1 and 8).

The most generally useful validation tool for 2.5–4Å resolution that we employ here is CaBLAM (Williams 2018; full protocol details in Williams 2015b), which utilizes 5 successive Cα atoms and the two peptides surrounding each residue reported. CaBLAM’s multidimensional parameter space includes two Cα virtual dihedrals, a Cα virtual angle, and a virtual dihedral between successive peptide CO directions (see Figure 1b).

The primary CaBLAM validation defines CaBLAM outliers as a 3-D combination of the virtual CO dihedral with the two Cα virtual dihedrals that is seen in less than 1% of the reference data; those are reported as CaBLAM outliers (see Figure 1c). These outliers can diagnose misfit local backbone even when other criteria have been pushed over the border into allowed regions. The 5% level is also reported, as CaBLAM disfavored. Since it is nearly always the CO dihedral that is in error, one of the two peptides can be reoriented to reach a favorable region of the 3-D CaBLAM plot. CaBLAM outliers are reported for the Challenge models in Table 1, and many examples are shown in the Results section.

The 2-D space of successive Cα virtual dihedrals, when analyzed across several residues, can diagnose the probability of helical or β-sheet secondary structure even when the peptides are modeled incorrectly (for full details of this protocol, see Williams 2013; 2015b). That broadened measure gives CaBLAM an advantage over Ramachandran or DSSP criteria , both of which are derailed by bad peptide orientations.

CaBLAM also reports a Cα-geometry outlier for combinations of Cα dihedrals and angle seen for less than 0.5% of our Top8000 quality-filtered reference data (Richardson 2013a; Williams 2018). This provides an effective model-quality validation of Cα-only structures, reported in the Results section for Cα-only Challenge models. Cα outliers also define regions which have such a deviant Cα trace that we could not trust further CaBLAM analysis.

Results

Crucial trivia: Formats

Some of the submitted coordinate files were not in valid PDB format, and thus often not readable by standard software. Some of the problems were relatively easy to fix, such as a section of junk text, or invalid segIDs for atom type (PDB columns 77–80), or the use of HETATM rather than ATOM__ record type in residues that are standard components of protein or nucleic acid polymer chains. Many models used 2-character chainIDs, which can be handled by MolProbity or Phenix but not by all software. This usage is understandable, because target structures may have more distinct chains, or more chain copies, than the 62 that can be expressed with the PDB single-character alternatives of upper-case, lower-case, and 0–9. The best solution for this problem would be mmCIF format, which allows 4-character chainIDs, and in future model challenges mmCIF format should be accepted.

The most problematic coordinate formats either mixed residues of different molecular types within a single chain, or else listed sequential, connected residues in widely non-sequential order. Target 1 model 164_1 alternates residues of protein and RNA for each copy in the tobacco mosaic virus spiral, and the copies are in a random order, making it difficult to assess contacts between chains. Target 8 model 131_1 very sensibly refines the big ribosomal RNA chains in 375-residue segments, but then lists the output coordinates in the order of first residue in each segment, then 2nd residue in each segment, etc. (that is, 1, 376, 751, 1129, 1504, 2, 377, 752, 1130, 1505, 3, 378 ...). No program we know of checks all-against-all residue connectivity for entire molecules rather than just between successive residues in the file, so for these models Ramachandran, ribose pucker, CaBLAM, and other properties that cross between adjacent residues cannot be evaluated without a re-sorting step.

Format conventions are sometimes necessary and sometimes historical artifacts, but following them lets one participate as a functional member of this scientific community. Within the Model Challenge, format problems can make a file unreadable and therefore ignored, or worse, can cause misinterpretation which usually results in scores lower than the model’s content actually deserves.

A related but distinct issue is that “model ... endmdl” designations were used for challenge submissions in two quite different meanings. The two distinct usages are: 1) the traditional meaning of a thermodynamic or experimental ensemble of alternative models for a given molecule and 2) the Protein Data Bank’s overloading of “model” to also designate crystallographically identical copies in a “biological unit” of the functional molecule. The PDB should instead have defined a new term such as “instance” or “copy”. Given that initial infelicity, some challenge “model” sets represent true ensembles, where each model is an alternative structure for the same molecule, while others are used when there are simply too many (more than 62) chains, or fragments, to be expressed in classic PDB format. The ensemble versus biological-unit usages of “model” imply a different logic of analysis: the models in an ensemble do not interact with one another in covalent, H-bond, or steric contacts, while biological-unit type models do.

Crucial trivia: Model categories, and stumbling-blocks, for assessment

Ab initio models versus optimized models are clearly tackling very different tasks, and different steps in the process. Both tasks are important, but their assessment should be compared separately, and in some cases by different criteria. In addition, it turns out that in practice there was no clear distinction between models labeled as “optimized” versus “fitted another”; when the full method descriptions became available we learned that many of the models designated as optimized had actually used a starting model other than the cryoEM target PDB, and many others did not say one way or the other. This experience can help us formulate the questions and requirements more clearly next time around.

Within ab initio models there is also an important distinction not designated explicitly: whether segmentation between chains was done from scratch or chain boundaries were taken from the target. Segmentation is an important and difficult step for any truly unknown molecule, but it can only be meaningfully assessed if it was actually attempted. In a few Challenge cases we know that segmentation was done, because it was imperfect, and was done well, coming close to a match: for instance, for the T7 model 130_4 shown in Figure 2.

Figure 2. — Superposition of model (white) and target (peach) Cα backbones shows near-perfect application of segmentation step in *ab initio* model 130_4 submitted for γ-secretase target 7 (5a63; Bai 2015).

Once the submitted models became available, we discovered a variety of features in some of them that put stumbling-blocks in the way of meaningful automated assessment, in addition to the format problems mentioned above. Most of these involve model fragmentation, either real or artifactual. Optimized models were not fragmented within a chain unless their starting-points were, but automated ab initio models are almost always, quite justifiably, incomplete. The methods for model-to-target comparison adopted from CASP (Zemla 2003) assume that a prediction will cover the entire sequence and only assess the largest fragment; they had to be modified for the Challenge. Most crystallographic validation software works properly with a modest number of breaks for unseen and unmodeled sections in a chain (what we are calling “real” fragmentation). However, they rely on some cutoff thresholds of plausible bond lengths to detect chain breaks, since, unfortunately, covalent connectivity has never been an explicit feature of either PDB or cif format. Therefore, “artifactual” fragmentation occurs in many Challenge models when a first-cut, approximate starting model is allowed to have extremely loose geometry, as in 40° bond angles or 6Å C-C bond lengths (e.g., T7 model 181_2), causing two related problems. First, wildly over-long bonds will not be flagged as outliers, since the programs assume those atoms must not actually be connected despite their names. Second, criteria such as Ramachandran or CaBLAM use more than one residue and are undefined close to a chain break, so that only a fraction of the residues can be assessed in such a model, making overall scores very misleading. Such a model cannot be meaningfully assessed on its own, but only as to whether the related software can successfully progress from it to build a more final model.

Cis-nonPro peptides

Especially for scoring ab initio models, the target is presumed correct; however, model optimization would be impossible if the target were perfect. We are particularly interested in finding local conformations where we can tell definitively that either the target or the Challenge model is significantly misfit, and then identifying the tools or strategies that would work best to avoid or correct specific types of systematic errors.By fortunate happenstance, Challenge target 6 provides an especially clear such case in the form of the extremely rare cis-nonPro peptide conformation. cis versus trans is inherently two-state, and it is known that E. coli β-galactosidase has 3 and only 3 genuine cis-nonPro peptides, two at catalytic and binding sites near the ends of β-strands 2 and 8 of its TIM barrel domain, and the third on a Greek key connection of one of its β-barrel domains. The excellent 5a1a target at 2.2Å (Bartesaghi 2015) models those 3 cis-nonProlines correctly, but also modeled 6 other incorrect ones. At 2.2Å resolution, this map density (EMD-2984) prefers the correct answer, whereas at 3 to 4Å resolution the density presumably could not distinguish. Figure 3a shows the very highly conserved TIM barrel cis-nonPros in 5a1a: Trp-Asp569 and Ser-His391, along with the map B density and Challenge models 119_2 and 192_2, all of which modeled the cis peptides and fit the density well. Figure 3b shows Challenge models 123_1 and 133_1, which fit trans peptides that do not fit at all convincingly. Figure 3c shows Gly-cis-Gly995, one of the incorrect cis-nonPro in 5a1a, and Figure 3d shows the clearly better trans conformation in the similarly shaped 1.6Å xray density of 4ttg (Wheatley 2015). Figure 3e shows the misfit Gly-Gly cis-nonPro, with its broad 3.2Å map A density of the 3j7h target (Bartesaghi 2014).

Figure 3 - — Analysis of genuine versus incorrect *cis*-nonPro conformations for target 6. a) Overlay of the map B target (5a1a; Bartesaghi 2015) and correctly built optimized Challenge models 119_1 and 192_1, for two genuine and functionally important *cis*-nonPro peptides at the β-galactosidase active site, showing their good fit to the 2.2Å density. b) Overlay of two incorrectly *trans* peptides in optimized Challenge models 123_1 and 133_1, for the same residues shown in Fig. 3a, showing poor fit to map density. c) CaBLAM Cα-geometry outlier (red) as well as CaBLAM outlier (hotpink) on peptide Gly-Gly 995 in target 5a1a indicates a probable backbone modeling error for this non-proline built as *cis*.. d) The same Gly-Gly 995 peptide in 4ttg (Wheatley 2015), at 1.6Å with no error flags and excellent fit to electron density, shows unambiguously that it should be *trans* and would better fit the density in panel c. e) The Gly-Gly 995 peptide in the lower-resolution target 6 map (3j7h; Bartesaghi 2014). In less informative electron density such as this, the CaBLAM outliers and multiple clashes can still guide model-builders away from an incorrect *cis*-nonPro conformation.

Two optimized and one ab initio Challenge models allowed only trans non-Pro, thus missing the 3 genuine ones but doing better statistically. The other optimized models matched very closely the cis-nonPro peptides fit in their starting structure (9 to 11 if they used one of the cryoEM targets, and 3 to 4 if they used an x-ray structure). The other ab initio models varied from 3 to 9 cis-nonPro, including only one correct example. Across all targets, ab initio models had up to 100 times too many cis-nonProlines (3%, rather than the 0.03% in quality-filtered reference data), and optimized models had up to 50 times too many, almost always kept from the target model. Similar over-use is also often seen in deposited PDB entries, cryoEM as well as x-ray.

It appears that in good density at 2–3Å, whenever a cis-nonPro is fit or is tempting, the trans version should be tried and optimized, to see which fits better. At 3–4Å, however, a cis-nonPro cannot be recognized from the density and is justifiable only if it is structurally or biochemically known to occur in closely related proteins, preferably with a functional role to support conservation. Two helpful rules of thumb are, first, that cis-nonPro are about 5 times more likely, and more than one cis-nonPro about 30 times more likely, in carbohydrate-active enzymes such as β-galactosidase than anywhere else (Williams 2018b); second, if a vicinal disulfide between adjacent Cys is present (extremely rare), then 2 of its 4 possible conformations are cis (Richardson 2017).

Cis-nonPro and twisted peptide occurrence is tabulated for the Challenge models in Table 1, and the strong presumption is that they should be zero for all targets other than T6 β-galactosidase. The best strategy, statistically, at 2.5–4Å is to allow no cis-nonPro, but that will miss the rare genuine examples that are almost always biologically important. This is one of the issues that demonstrates why trying for better than 3Å resolution data is truly worthwhile.

RNA validation

The appearance of nucleic-acid density for cryoEM is somewhat different than for x-ray maps at similar resolutions. Presumably because of their negative charge, phosphates are relatively weaker for cryoEM, although still visible and round at 3Å resolution, while positivelycharged bases are stronger (Figure 4). However, by 4Å resolution, base-pair density makes a continuous slab along the stacking direction, not as separate pairs, so intermediate resolutions can be confusing. For nucleic acids, most model validation only checks covalent geometry (bond lengths and angles) and heavy-atom bumps. MolProbity provides the enhanced sterics of all-atom contact analysis, which is very diagnostic for either RNA or DNA (Word 1999). For RNA, it also includes two powerful criteria for backbone conformation, useful in model building as well as validation.

Figure 4. — CryoEM map density for target 8 map A at 2.9Å resolution (5afi; Fischer 2015) shows consistently higher sigma and better-defined contours for base pairs than for phosphates, presumably because of the negatively charged phosphates.

Ribose pucker is two-state in RNA (either C3’-endo or C2’-endo) unless captured in a transition state. This variable is extremely important because each of the two states is compatible only with entirely different relationships among the three base and backbone directions attached to the ribose ring. The pucker is directly observable in the density only at resolutions better than about 2Å. Fortunately, we discovered that pucker state can be very reliably determined from the robustly visible position of the phosphate and direction of the glycosidic bond joining the blobs of ribose and base (Richardson 2008; Methods).

After the discovery that RNA backbone conformation can be better represented if parsed as suite (sugar-to-sugar) rather than nucleotide (PO₄-to-PO₄) units (Murray 2003), a community-consensus set of 54 valid RNA backbone conformations was defined (Richardson 2008; Methods). These conformers cover only about 95% of genuine conformations, because of their 7-dimensional parameter space and relatively sparse dataset. However, they provide fulldetail fragments for model building, 2-character definitions for RNA-structure comparisons, and very useful diagnostic validation for trying out possible corrections.

Table 2 shows these validation scores for Challenge models which contain RNA. Target 1 (tobacco mosaic virus) contains a single long RNA chain, with 3 nucleotides binding to each protein subunit. The target structure 4udv (Fromm 2015) has no ribose pucker or backbone suite outliers. The target 1 Challenge models either include no RNA or else follow the target in having no outliers.

Table 2-.

RNA validation by ribose puckers and backbone suite conformers

Model		#RNA	pucker	pucker	suite	suite
		residues	outliers	%out	outliers	%out
4udv, Target 1		3	0		0
T1 119_1	optimized	3	0		0
T1 123_1	optimized	3	0		0
T1 133_1	fitted another	3	0		0
T1 164_1	opt, no RNA	0
T1 192_1	optimized	3	0		0
T1 123_2	ab initio	3	0		0
T1 130_1	ab initio	3	0		0
T1 130_2	ab initio	3	0		0
T1 181_1	ab init, no RNA	0
T1 194_1	ab initio	3	0		0
5afi	T8, mapA	4763	103	2.16%	858	18.0%
3ja1	T8, mapB	4690	280	5.97%	1114	23.8%
T8 120_1	A, optimized	4763	109	2.49%	859	18.0%
T8 131_1	A, optimized	4763	104	2.17%	858	18.0%
T8 192_1	B, optimized	4690	65	1.39%	1903	18.0%
T8 192_2	A, optimized	4763	40	0.91%	1045	21.9%
T8 130_1	B, ab initio	1580	14	0.89%	773	48.9%
T8 130_2	A, ab initio	2852	5	0.18%	633	22.2%

Open in a new tab

The target 8 70S ribosome is more interesting. With two very large and one small ribosomal RNAs, 3 tRNAs, and an RNA message, the 2.9Å 5afi target (Fischer 2015) has 103 pucker outliers (2.16%), and 858 suite conformer outliers (18%), as shown in Table 2. The ab initio models 130_1 and 130_2 did not fit all of the RNA. Within that they perform worse than the target on suite conformers but much better than the target on ribose puckers, with only 14 and 5 outliers respectively (0.89%, and 0.18%), presumably because they used Phenix, which diagnoses pucker to enable pucker-specific target parameters in refinement (Adams 2010). The other four models were optimized. Model 120_1, and model 131_1 (after being re-sorted and hand-edited for all RNA and most proteins; see above), neither improved nor degraded the target. Models 192_1 and 192_2 did significantly better on pucker outliers and about the same on backbone suite conformers.

CaBLAM: Cα-only validation

The simplest use of CaBLAM is flagging probably wrong regions in Cα-only models. The Challenge includes 4 Cα-only models from 2 groups, plus one Cα-only cryoEM target structure (3cau, used by some for target 3; Ludtke 2008), and there is no other conformational validation for them. They range from 13.5% to 40.3% Cα-geometry outliers, averaging 23.5% (see Table 3). In some cases this may be an underestimate, because CaBLAM treats a Cα-Cα distance over 4.5Å as a chain break, and it cannot diagnose within 2 residues from a break. In comparison, the high-quality reference data has 0.5% Cα-geometry outliers. The targets average 1.5%, the full-backbone optimized models average 1.1% (an improvement over their targets), and the full-backbone ab initio models average 3.6%. This assessment therefore could potentially both evaluate and improve Cα-only models.

Table 3-.

Cα-only models: CaBLAM Cα-geometry outliers

Model		#residues	Cα-geo out	Cα out%
T1 181_2	ab initio	155	27	17.9%
T2 181_2	ab initio	199	27	13.6%
3cau, Target 3		7299	1832	25.1%
T4 194_1	ab initio	375	101	26.9%
T5 181_2	ab initio	151	27	17.4%

Open in a new tab

As a specific case, Figure 5 shows two helices from Cα-only model 194_1 for the target 4 TRPV1 channel at 3.3Å resolution (3j5p; Liao 2013). The first example has good helical conformation at both ends, but an extremely irregular section partway through. This is the kind of case CaBLAM can help diagnose even with just the Cα’s, showing favorable and helical Cα virtual dihedrals on both sides (blue) but 4 out of 5 outliers in the irregular section (red). CaBLAM’s database includes real helix irregularities such as a proline bend or a widened turn, which would score as allowed. Proper correction to a long, straight helix currently depends on the user’s commonsense, however. The second example is so unusually deviant in virtual angles and bond lengths (yellow) that CaBLAM cannot score secondary structure in it at all. Looking down from one end, however (Figure 5), a person can see that for both cases the Cα’s spread out in a long, straight, round cylinder the right size and shape for a helix. This could also be “seen” by a program that finds helix density at 6–8Å resolution

CaBLAM: Diagnosing helix and beta in full-backbone models

In Table 3, CaBLAM’s secondary-structure diagnosis is reported as overall percentages for each Challenge model. In practical use, one should of course instead be guided by the local annotations along the sequence. Those are conservative and integrate across several residues, so if CaBLAM scores a significantly non-zero probability of α or of β, one should definitely try modeling a regular α or β conformation, and even try extending it a bit at either end. These CaBLAM annotations use the pattern of multiple Cα virtual dihedrals, which is very different information than match to ideal secondary-structure fragments or secondary-structure density analysis at lower resolution. In difficult cases perhaps all these assessments could be combined -- always with a bias toward more regularity than is immediately apparent.

CaBLAM outliers: Flipped peptides and sequence misalignments

The most characteristic and powerful feature of the novel CaBLAM parameter space is that it can assess whether the modeled relationship between two successive CO directions (poorly determined at 2.5–4Å resolution) is compatible with the surrounding Cα-trace (relatively well determined, as defined across 5 residues by two Cα virtual dihedrals). 3-dimensional combinations of the CO dihedral and the two Cα virtual dihedrals seen for less than 1% of our reference data are flagged as CaBLAM outliers (see Methods). Figure 6a shows an otherwise regular α-helix interrupted by an incorrect peptide flipped nearly backward and also cis. This error is not flagged by any covalent geometry or Ramachandran criteria, but is reported by CaBLAM, with an outlier (magenta) and a disfavored (purple) residue, and by clashes. This sort of issue is common in the Challenge models.

A less common, but even more serious, problem is shown in Figure 6b and c. The target 1 TMV α-helix in 4udv is completely regular, but model 181_1 narrows to legal but incorrect 3–10 conformation at the RNA-binding site, starting a 10-residue sequence misalignment. Again there are no geometry or Ramachandran outliers, but multiple clashes and CaBLAM outliers flag the backbone contortions needed to bring it back into sequence alignment. CaBLAM outliers range between 1 and 10% in the target structures, and from 0 to 29% in the Challenge models, averaging 5.9%. Nearly all of those mark genuine errors, many of which are fixable once seen.

Restraints on model properties

Covalent bonds, angles, and planarity need restraints even at high resolution, usually applied according to the ESDs seen in high-quality reference data such as small-molecule crystallography. In the 2.5–4Å resolution range, these restraints are typically tighter, since the map density is too broad to convincingly justify occasional, genuine, larger departures from ideality. The bond angle plots in Figure 7a show, first, the expected normal distribution given the parameters and ESDs we use from the Phenix libraries, and then the angle distributions for all models from 11 different Challenge predictor groups. Ideal geometry values vary somewhat between different compilations (for instance, group 193 (Wang 2018) used the values in the CHARMM force field), but those differences are too small to produce 4σ geometry outliers. All groups, quite properly, show very tight angle distributions. That tightness has one unfortunate side effect: the Cβ-deviation validation criterion (Lovell 2003) will never have outliers, and thus cannot report on misfittings like backward-fit Cβ-branched sidechains. However, very tight geometry is necessary for refinement at these resolutions, and fortunately it never forces conformations to become further wrong by large amounts because these parameters are singlevalued with only one energy minimum.

Figure 7 - — Comparison of Challenge model group data distributions versus the high-quality reference data distributions. a) Reference (top left) and individual groups’ backbone bond angle distributions, where the horizontal axis is number of ESDs out from the target value for each angle. Color code is: C-N-Cα orange, N-Cα-C (tau) purple, Cα-C-N yellow, Cα-C-O green, N-C-O black, C-Cα-Cb dark blue, and N-Cα-Cβ light blue. b) Reference Ramachandran generalcase data and contours (top left) and each of 11 individual model group’s Ramachandran ϕ, ψ points for all general-case residues in context of the reference contours.

Other validation criteria such as Ramachandran or rotamers have multiple minima, but are also now often being restrained. This makes minor conformational improvements in many places and cosmetically improves validation scores, but it pushes common fitting errors further down into the wrong local energy well and actually makes those errors even worse than they were. Figure 7b shows, first, the reference general-case Ramachandran distribution for comparison, then composite general-case Ramachandran plots for all models from each of 11 Challenge groups. Nearly all groups have pushed Ramachandran-plot ϕ,Ψ values into the nearest allowable region, usually up to very high contour levels, but in quite different and sometimes strange patterns. The most serious problem with this sort of restraint is that peptide orientations are very unreliable at these resolutions, and when a peptide is fit wrong by 60–180° it also puts both of the two adjacent Ramachandran-plot points in wildly wrong places.

Figure 8 shows an example of this unfortunate problem, for a small β-sheet in the target 3 GroEL structure. Since there were no coordinates deposited for the target EMD-6422 map, Challenge model 119–1 optimized a fit from the original 1grL crystal structure at 2.7A resolution. The model looks clean in this area, with no traditional geometry or conformational outliers, although the extremely sparse β H-bonds are very suspicious (Figure 8a). Clashes and CaBLAM outliers flag probable local errors (Figure 8b). When 119_1 (in brown) is superimposed on the 1.7Å crystal structure of this domain (dark green in Figure 8c), it is clear that CaBLAM outliers flag 4 peptide orientations which are incorrect by 100–180°. Most tellingly, Figure 8d shows the Ramachandran-plot locations of the 8 incorrect model values versus the correct values at 1.7Å, all but one of which has been shifted to entirely the wrong region of the plot. These quite major misfittings have been hidden from classic validation and also probably each made even more incorrect, by the use of Ramachandran restraints in refinement. Such cases are ubiquitous at these resolutions, not just in Challenge models but also in many deposited PDB entries. One well-studied instance is the 744 ϕ,Ψ shifts ≥45° (mostly flipped peptides) and 132 cis-trans shifts (mostly cis-nonPro) identified and corrected in the 3ja8 MCM cryoEM structure (Li 2015) by Tristan Croll (Croll 2018).

Fortunately, there is a feasible strategy that we believe can avoid most cases of this serious issue. After initial model-building but before any refinement, run CaBLAM diagnosis, then try possible correction of orientation for each of the two peptides surrounding each CaBLAM outlier, briefly refine each, evaluate all-atom clashes and other problems, and look for new backbone H-bonding. After such corrections are well fit into the correct local minimum, refinement could then include H-bond and/or Ramachandran restraints, to maintain the better secondary structures and more physically reasonable conformations.

Discussion

The Model Challenge assessment experience has, for us, further confirmed the observation that 3 to 4Å is an especially confusing resolution range. There is indisputably more, and more detailed, information content than at lower resolutions, but some of that detail is actively misleading: for instance, seeming to show that a very irregular helix or an extremely rare conformation such as a cis-nonPro is justified because it appears to fit the density slightly better. Near 2Å resolution, map density follows the backbone clearly and carbonyl oxygens are nearly always visible, so that α-helical density spirals around an empty axis and peptide orientations are clear. Near 6Å resolution, β-sheet is a smooth slab and α-helices are cylindrical tubes with maximum density on the axis. Both low-resolution shapes are relatively featureless but quite clearly recognizable, and approximate strand orientation can even be inferred from the slab’s twist (Richardson 2016). Between 2Å and 6Å, density shape is transitioning between these very distinct regimes, and it does not do so uniformly. At 3–4Å there are many false breaks along backbone and false connections across H-bonds, influenced by conformational details and especially by size and position of neighboring sidechains. Some of this confusion starts even at 2.5Å, both for x-ray and for cryoEM. Nucleic acids make their own awkward transition, from clear base-pairs to density slabs along the base-stacking direction, at a somewhat lower resolution than for proteins, between 3 and 5Å.

We suggest four lessons, and proposals, for working effectively in this exciting but awkward 2.5 to 4Å resolution range, based upon: 1) lowered resolution, 2) multi-residue validation, 3) pre-refinement fixups, and 4) small ensembles.

The first is to identify secondary structure at effectively lower resolution - mimicking 6 to 8Å by further smoothing. Helix and sheet recognition techniques are available from pre-”revolution” cryoEM (e.g., Baker 2007), and we hypothesize that negative sharpening could be tuned to produce a similarly diagnostic level of smoothing. Resolution-exchange molecular dynamics (Wang 2018) should have some of this useful effect, and independent information can also be added by secondary-structure prediction, comparison with related structures, or CaBLAM secondary-structure probabilities. Then, working at the data resolution, assign helix and strand directions and emphasize their regularity in modeling and then refinement.

Second, validation metrics that integrate information across multiple residues are needed at these resolutions. CaBLAM outliers are the most available and broadly useful such criterion at present, effective up to 4Å, and no one is yet restraining them in refinement. EMRinger outliers are an innovative way of using sidechains to report on quality of backbone conformation, effective up to about 3Å resolution (Barad 2015). We, and hopefully others will be developing additional multi-residue criteria such as completeness and strength of backbone H-bonds, which is an especially sensitive indicator for β-sheet, and can already be judged by eye from all-atom contacts or other H-bond representations.

Third, many local conformations will initially be modeled in incorrect local minima, which should be corrected as far as feasible before restrained refinement may hide those problems (as seen in Figure 8). CaBLAM outliers most often turn out to be flagging a highly deviant peptide orientation, for which possible corrections can be tried either automatically (as above) or manually. Such a process is most powerful at the initial model stage before refinement, but can also be done later or on deposited PDB entries.

Finally, both structural biologists and end-users need some way to estimate the level of uncertainty in a model at these resolutions, and we can say with good assurance that no current validation measures provide that. The methods used for Challenge modeling can produce quite reasonable ab initio starting models and can optimize with some improvements and few degradations from the carefully worked-over deposited targets, which is an admirable achievement. There are methods that can often successfully make concerted shifts into new density when there is a large conformational change from the starting model. However, it is extremely rare that a local conformation in the wrong energy minimum can be corrected by optimization procedures; those require explicit sampling of the allowable alternatives.

Most of the Challenge models rate rather similarly by overall scores. All show local regions which are incorrect, but typically those problems are different and in different places between models. This means that these methods constitute a very valuable resource in a perhaps unintended way. At 3Å, and most especially at 4Å, the density plus current technology in modeling, refinement and validation is equally compatible with multiple, significantly distinct models. At the current stage of development, therefore, a serious practitioner solving a new structure would be well advised to use and compare three quite different methods for building initial models. For instance, phenix.maptomodel (Terwilliger 2018), PathWalker (Chen 2016), and EMRosetta (Wang 2016) are all readily available and each uses very different algorithms. The idea is not to pick one model by overall scores, but to find where they differ locally (by backbone dihedrals, or by maximum distance between the same Cα or O atoms or sidechain ends), look closely to pick the best alternative for each local region, and also report on those differences. Sampling of the possibilities is also helped by including a method that explicitly produces an ensemble, as was done by group 183 in the Challenge. A similar strategy could help at any stage: segmentation, sequence alignment, flexible fitting, or final refinement. As methodology develops further, we hope that local selection from a large explicit sample of candidate models will become a routine part of the automated tools.

As a crucial part of estimating uncertainty, both producers and users of 2.5 to 4Å macromolecular structures need to realize that traditional validation scores often make those structures look better than they really are.

Acknowledgements

This work was supported by the National Institutes of Health [grant numbers R01-GM073919 to DCR, P01-GM063210 Project IV to JSR]. We thank Cathy Lawson for the organizational support of the CryoEM Challenge and Andriy Kryshtafovych for the statistical analyses on the model-comparison website.

Abbreviations

EMDB: Electron Microscopy Data Base
PDB: Protein Data Bank
Tx yyy_z, orT000xEMyyy_z: Target x model yyy_z, the z^th submitted model for Challenge target x from modeling group yyy (e.g., T0001EM123_2)
H-bond: hydrogen-bond
CaBLAM: Calpha-Based Low-resolution Annotation Method
Cis-nonPro: a cis peptide preceding a non-proline residue

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

Adams PD, Afonine PV, Bunkóczi G, Chen VB, Davis IW, Echols N, Headd JJ, Hung L-W, Kapral GJ, Grosse-Kunstleve RW, McCoy AJ, Moriarty NW, Oeffner R, Read RJ, Richardson DC, Richardson JS, Terwilliger TC, Zwart PH 2010. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Cryst. D 66, 213–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bai X, Yan C, Yeng G, Lu P, Ma D, Sun L, Zhou R, Sheres SHW, Shi Y 2015. An atomic structure of human gamma-secretase. Nature 525, 212–217. [5a63] [DOI] [PMC free article] [PubMed] [Google Scholar]
Baker M, Ju T, Chiu W 2007. Identification of secondary structure elements in intermediateresolution density maps. Structure 15, 7–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barad BA, Echols N, Wang RY-R, Cheng Y, DiMaio F, Adams PD, Fraser JS 2015. EMRinger: side chain-directed model and map validation for 3D cryo-electron microscopy. Nat. Meth. 12, 943–946. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bartesaghi A, Matthies D, Banerjee S, Merk A, Subramaniam S 2014. Structure of betagalactosidase at 3.2-Å resolution obtained by cryo-electron microscopy. Proc. Natl. Acad. Sci. USA 111, 11709–11714. [3j7h] [DOI] [PMC free article] [PubMed] [Google Scholar]
Bartesaghi A, Merk A, Banerjee S, Matthies D, Wu X, Milne J, Subramaniam S 2015. 2.2Å resolution cryo-EM structure of β-galactosidase in complex with a cell-permeant inhibitor. Science 348, 1147–1151. [5a1a] [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen M, Baldwin PR, Ludtke SJ, Baker ML 2016. De Novo modeling in cryo-EM density maps with Pathwalking. J. Struct. Biol. 196, 289–298. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen VB, Arendall WB III, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW, Richardson JS, Richardson DC 2010. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Cryst. D 66, 12–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
Croll TI 2015. The rate of cis-trans conformation errors is increasing in low-resolution crystal structures. Acta Cryst. D 71, 706–709. [DOI] [PubMed] [Google Scholar]
Croll TI 2018. ISOLDE: A physically realistic environment for model building into lowresolution electron-density maps. Acta Cryst, in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
Davis IW, Murray LW, Richardson JS, Richardson DC, 2004. MolProbity: structure validation and all-atom contact analysis for nucleic acids and their complexes. Nucl. Acids Res, 32, W615–W619. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fischer N, Neumann P, Konevega AL, Bock LV, Ficner R, Rodnina MV, Stark H 2015. Structure of the E. coli ribosome-EF-Tu complex at <3Å resolution by Cs-corrected cryo-EM. Nature 520, 567–570. [5afi] [DOI] [PubMed] [Google Scholar]
Fromm SA, Bharat TAM, Jakobi AJ, Hagen WJH, Sachse C 2015. Seeing tobacco mosaic virus through direct electron detectors. J. Struct. Biol. 189, 87–97. [4udv] [DOI] [PMC free article] [PubMed] [Google Scholar]
Hintze BJ, Lewis SM, Richardson JS, Richardson DC, 2016. MolProbity’s ultimate rotamerlibrary distributions for model validation. Proteins: Struc. Func. Bioinf. 84, 1177–1189. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jain S, Richardson DC, Richardson JS 2015. Computational methods for RNA structure validation and improvement, Chapter 7, in: Woodson S, & Allain F (Eds.), Structures of large RNA molecules and their complexes, Methods in Enzymology series, vol 558 Elsevier, Oxford UK., pp. 181–212. (eBook ISBN 9780128019368) [DOI] [PubMed] [Google Scholar]
Kryshtafovych A, Adams PD, Lawson CL, and Chiu W, Evaluation System and Web Infrastructure for the Second Cryo-EM Model Challenge. J. Struct. Biol. 2018. on-line [DOI] [PMC free article] [PubMed] [Google Scholar]
Lawson C, Patwardhan A, Baker ML, Hryc C, Garcia ES, Hudson BP, Lagerstadt I, Ludtke SJ, Pintille G, Sala R, Westbrook JD, Berman HM, Kleywegt GJ, Chiu W 2018. EMDataBank unified data resource for 3DEM. J. Struct. Biol. 44, D396–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lawson C, Kryshtafovych A, Chiu W, Adams PW, Brunger A, et al. 2018. CryoEM models and associated data submitted to the 2015/2016 EMDataBank Model Challenge, 10.5281/zenodo.1165999 [DOI] [Google Scholar]
Li N, Zhai Y, Zhang Y, Li W, Yang M, Lei J, Tye BK, Gao N 2015. Structure of the eukaryotic MCM complex at 3.8Å. Nature 524, 186–191. [3ja8] [DOI] [PubMed] [Google Scholar]
Li X, Mooney P, Zheng S, Booth CR, Braunfeld MB, Gubbens S, Agard DA, Cheng Y, 2013. Electron counting and beam-induced motion correction enable near-atomic-resolution single-particle cryo-EM. Nat. Methods 10, 584–590. [3j9i] [DOI] [PMC free article] [PubMed] [Google Scholar]
Liao M, Cao E, Julius D, Cheng Y 2013. Structure of the TRPV1 ion channel determined by electron cryomicroscopy. Nature 504, 107–112. [3j5p] [DOI] [PMC free article] [PubMed] [Google Scholar]
Lovell SC, Davis IW, Arendall WB III, de Bakker PIW, Word JM, Prisant MG, Richardson JS, Richardson DC 2003. Structure validation by Cα geometry: ϕ,ψ and Cβ deviation. Proteins: Struct. Funct. Genet. 50, 437–450. [DOI] [PubMed] [Google Scholar]
Ludtke SJ, Baker ML, Chen DH, Song JL, Chuang DT, Chiu W 2008. De novo backbone trace of GroEL from single particle electron cryomicroscopy. Structure 16, 441–448. [3cau] [DOI] [PubMed] [Google Scholar]
Murray LM, Arendall WB III, Richardson DC, Richardson JS 2003. RNA backbone is rotameric. Proc. Nat. Acad. Sci. USA 100, 13904–13909. [DOI] [PMC free article] [PubMed] [Google Scholar]
Read RJ, Adams PD, Arendall WB III, Brunger AT, Emsley P, Joosten RP, Kleywegt GJ, Krissinel EB, Lütteke T, Otwinowski Z, Perrakis A, Richardson JS, Sheffler WH, Smith JL, Tickle IJ, Vriend G, Zwart PH 2011. A new generation of crystallographic validation tools for the protein data bank. Structure 19, 1395–1412. [DOI] [PMC free article] [PubMed] [Google Scholar]
Richardson JS, Schneider B, Murray LW, Kapral GJ, Immormino RM, Headd JJ, Richardson DC, Ham D, Hershkovits E, Williams LD, Keating KS, Pyle AM, Micallef D, Westbrook J, Berman HM 2008. RNA backbone: consensus all-angle conformers and modular string nomenclature. RNA 14, 465–481. [DOI] [PMC free article] [PubMed] [Google Scholar]
Richardson JS, Keedy DA, Richardson DC 2013. The plot thickens: more data, more dimensions, more uses, in: Bansal M & Srinivasan N (Eds.), Biomolecular Forms and Functions: A Celebration of 50 Years of the Ramachandran Map. World Scientific Publishing, Singapore, pp. 46–61. (ISBN 978–981-4449–13-27) [Google Scholar]
Richardson JS, Prisant MG, Richardson DC 2013. Crystallographic model validation: from diagnosis to healing. Curr. Opin. Struct. Biol 23, 707–714. [DOI] [PMC free article] [PubMed] [Google Scholar]
Richardson D, Richardson J 2016. Fitting Tip #12 - Twist tells: Better β strands at ≥3.5Å in x-ray or cryoEM. Comp. Cryst. Newsletter 7, 16–19. [Google Scholar]
Richardson JS, Videau LL, Williams CJ, Richardson DC 2017. Broad analysis of vicinal disulfides: occurrences, conformations with cis or with trans peptides, and functional roles including sugar binding. J. Mol. Biol. 429, 1321–1335. [DOI] [PMC free article] [PubMed] [Google Scholar]
Richardson JS, Williams CJ, Hintze BJ, Chen VB, Prisant MG, Videau LL, Richardson DC, 2018. Model validation -- local diagnosis, correction, and when to quit. Acta Cryst. D 74, 132–142. [DOI] [PMC free article] [PubMed] [Google Scholar]
Richardson JS, Richardson DC 2018. New Help to make your 2.5–4Å cryoEM structures even better. Comp. Cryst. Newsletter 9, 21–24. [Google Scholar]
Terwilliger TC. Automatic map interpretation with map_to_model. 2018 https://www.youtube.com/watch?v=ZYcG8dlmc18.
Walsh MA, Dementieva I, Evans G, Sanishvilli R, Joachimiak A 1999. Taking MAD to the extreme: ultrafast protein structure determination. Acta Cryst. D 55, 1168–1173. [1srv] [DOI] [PubMed] [Google Scholar]
Wang RY-R, Song Y, Barad BA, Cheng Y, Fraser JS, DiMaio F 2016. Automated structure refinement of macromolecular assemblies from cryoEM maps using Rosetta. eLIFE 5, e17219. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang Y, Shekhar M, Thifault D, Williams C, Mcgreevy R, Richardson J, Singharoy A, Tajkhorshid E 2018. Constructing atomic structural models into cry-EM densities using molecular dynamics - pros and cons. J. Struct. Biol, this issue. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang Z, Hryc CF, Bammes B, Afonine PV, Jakana J, Chen DH, Liu X, Baker ML, Kao C, Ludtke SJ, Schmid MF, Adams PD, Chiu W 2014. An atomic model of brome mosaic virus using direct electron detection and real-space optimization. Nat. Commun. 5, 4808–4808. [3j7L] [DOI] [PMC free article] [PubMed] [Google Scholar]
Wheatley RW, Juers DH, Lev BB, Huber RE, Noskov SY 2015. Elucidating factors important for monovalent cation selectivity in enzymes: E. coli beta-galactosidase as a model. Phys. Chem. Chem. Phys. 17, 10899–10909. [4ttg] [DOI] [PubMed] [Google Scholar]
Williams CJ, Hintze BJ, Richardson DC, Richardson JS 2013. CaBLAM identification and scoring of disguised secondary structure at low resolution. Comp. Cryst. Newsletter 4, 33–35. [Google Scholar]
Williams CJ, Richardson JS 2015a. Fitting Tips #9: Avoid excess cis peptides at low resolution or high B. Comp. Cryst. Newsletter 6, 2–6. [Google Scholar]
Williams CJ 2015b. Using C-alpha geometry to describe protein secondary structure and motifs, PhD dissertation, Department of Biochemistry, Duke University, 248 pages. [Google Scholar]
Williams CJ, Hintze BJ, Headd JJ, Moriarty NW, Chen VB, Jain S, Prisant MG, Lewis SM, Videau LL, Keedy DA, Deis LN, Arendall WB III, Verma V, Snoeyink JS, Adams PD, Lovell SC, Richardson JS, Richardson DC 2018. MolProbity: More and better reference data for improved all-atom structure validation. Protein Sci. 27, 293–315. [DOI] [PMC free article] [PubMed] [Google Scholar]
Williams CJ, Videau LL, Hintze BJ, Richardson DC, Richardson JS 2018. Cis-nonPro peptides: genuine occurrences and their functional roles. (forthcoming)
Word JM, Lovell SC, LaBean TH, Zalis ME, Presley BK, Richardson JS, Richardson DC, 1999. Visualizing and quantitating molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms. J. Mol. Biol 285, 1711–1733. [DOI] [PubMed] [Google Scholar]
Zemla A, 2003. LGA: a method for finding 3D similarities in protein structures. Nucl. Acids Res 31, 3370–3374. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Adams PD, Afonine PV, Bunkóczi G, Chen VB, Davis IW, Echols N, Headd JJ, Hung L-W, Kapral GJ, Grosse-Kunstleve RW, McCoy AJ, Moriarty NW, Oeffner R, Read RJ, Richardson DC, Richardson JS, Terwilliger TC, Zwart PH 2010. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Cryst. D 66, 213–221. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Bai X, Yan C, Yeng G, Lu P, Ma D, Sun L, Zhou R, Sheres SHW, Shi Y 2015. An atomic structure of human gamma-secretase. Nature 525, 212–217. [5a63] [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Baker M, Ju T, Chiu W 2007. Identification of secondary structure elements in intermediateresolution density maps. Structure 15, 7–19. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Barad BA, Echols N, Wang RY-R, Cheng Y, DiMaio F, Adams PD, Fraser JS 2015. EMRinger: side chain-directed model and map validation for 3D cryo-electron microscopy. Nat. Meth. 12, 943–946. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Bartesaghi A, Matthies D, Banerjee S, Merk A, Subramaniam S 2014. Structure of betagalactosidase at 3.2-Å resolution obtained by cryo-electron microscopy. Proc. Natl. Acad. Sci. USA 111, 11709–11714. [3j7h] [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Bartesaghi A, Merk A, Banerjee S, Matthies D, Wu X, Milne J, Subramaniam S 2015. 2.2Å resolution cryo-EM structure of β-galactosidase in complex with a cell-permeant inhibitor. Science 348, 1147–1151. [5a1a] [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Chen M, Baldwin PR, Ludtke SJ, Baker ML 2016. De Novo modeling in cryo-EM density maps with Pathwalking. J. Struct. Biol. 196, 289–298. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Chen VB, Arendall WB III, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW, Richardson JS, Richardson DC 2010. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Cryst. D 66, 12–21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Croll TI 2015. The rate of cis-trans conformation errors is increasing in low-resolution crystal structures. Acta Cryst. D 71, 706–709. [DOI] [PubMed] [Google Scholar]

[R10] Croll TI 2018. ISOLDE: A physically realistic environment for model building into lowresolution electron-density maps. Acta Cryst, in press. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Davis IW, Murray LW, Richardson JS, Richardson DC, 2004. MolProbity: structure validation and all-atom contact analysis for nucleic acids and their complexes. Nucl. Acids Res, 32, W615–W619. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Fischer N, Neumann P, Konevega AL, Bock LV, Ficner R, Rodnina MV, Stark H 2015. Structure of the E. coli ribosome-EF-Tu complex at <3Å resolution by Cs-corrected cryo-EM. Nature 520, 567–570. [5afi] [DOI] [PubMed] [Google Scholar]

[R13] Fromm SA, Bharat TAM, Jakobi AJ, Hagen WJH, Sachse C 2015. Seeing tobacco mosaic virus through direct electron detectors. J. Struct. Biol. 189, 87–97. [4udv] [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Hintze BJ, Lewis SM, Richardson JS, Richardson DC, 2016. MolProbity’s ultimate rotamerlibrary distributions for model validation. Proteins: Struc. Func. Bioinf. 84, 1177–1189. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Jain S, Richardson DC, Richardson JS 2015. Computational methods for RNA structure validation and improvement, Chapter 7, in: Woodson S, & Allain F (Eds.), Structures of large RNA molecules and their complexes, Methods in Enzymology series, vol 558 Elsevier, Oxford UK., pp. 181–212. (eBook ISBN 9780128019368) [DOI] [PubMed] [Google Scholar]

[R16] Kryshtafovych A, Adams PD, Lawson CL, and Chiu W, Evaluation System and Web Infrastructure for the Second Cryo-EM Model Challenge. J. Struct. Biol. 2018. on-line [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Lawson C, Patwardhan A, Baker ML, Hryc C, Garcia ES, Hudson BP, Lagerstadt I, Ludtke SJ, Pintille G, Sala R, Westbrook JD, Berman HM, Kleywegt GJ, Chiu W 2018. EMDataBank unified data resource for 3DEM. J. Struct. Biol. 44, D396–403. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Lawson C, Kryshtafovych A, Chiu W, Adams PW, Brunger A, et al. 2018. CryoEM models and associated data submitted to the 2015/2016 EMDataBank Model Challenge, 10.5281/zenodo.1165999 [DOI] [Google Scholar]

[R19] Li N, Zhai Y, Zhang Y, Li W, Yang M, Lei J, Tye BK, Gao N 2015. Structure of the eukaryotic MCM complex at 3.8Å. Nature 524, 186–191. [3ja8] [DOI] [PubMed] [Google Scholar]

[R20] Li X, Mooney P, Zheng S, Booth CR, Braunfeld MB, Gubbens S, Agard DA, Cheng Y, 2013. Electron counting and beam-induced motion correction enable near-atomic-resolution single-particle cryo-EM. Nat. Methods 10, 584–590. [3j9i] [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Liao M, Cao E, Julius D, Cheng Y 2013. Structure of the TRPV1 ion channel determined by electron cryomicroscopy. Nature 504, 107–112. [3j5p] [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Lovell SC, Davis IW, Arendall WB III, de Bakker PIW, Word JM, Prisant MG, Richardson JS, Richardson DC 2003. Structure validation by Cα geometry: ϕ,ψ and Cβ deviation. Proteins: Struct. Funct. Genet. 50, 437–450. [DOI] [PubMed] [Google Scholar]

[R23] Ludtke SJ, Baker ML, Chen DH, Song JL, Chuang DT, Chiu W 2008. De novo backbone trace of GroEL from single particle electron cryomicroscopy. Structure 16, 441–448. [3cau] [DOI] [PubMed] [Google Scholar]

[R24] Murray LM, Arendall WB III, Richardson DC, Richardson JS 2003. RNA backbone is rotameric. Proc. Nat. Acad. Sci. USA 100, 13904–13909. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Read RJ, Adams PD, Arendall WB III, Brunger AT, Emsley P, Joosten RP, Kleywegt GJ, Krissinel EB, Lütteke T, Otwinowski Z, Perrakis A, Richardson JS, Sheffler WH, Smith JL, Tickle IJ, Vriend G, Zwart PH 2011. A new generation of crystallographic validation tools for the protein data bank. Structure 19, 1395–1412. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Richardson JS, Schneider B, Murray LW, Kapral GJ, Immormino RM, Headd JJ, Richardson DC, Ham D, Hershkovits E, Williams LD, Keating KS, Pyle AM, Micallef D, Westbrook J, Berman HM 2008. RNA backbone: consensus all-angle conformers and modular string nomenclature. RNA 14, 465–481. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Richardson JS, Keedy DA, Richardson DC 2013. The plot thickens: more data, more dimensions, more uses, in: Bansal M & Srinivasan N (Eds.), Biomolecular Forms and Functions: A Celebration of 50 Years of the Ramachandran Map. World Scientific Publishing, Singapore, pp. 46–61. (ISBN 978–981-4449–13-27) [Google Scholar]

[R28] Richardson JS, Prisant MG, Richardson DC 2013. Crystallographic model validation: from diagnosis to healing. Curr. Opin. Struct. Biol 23, 707–714. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Richardson D, Richardson J 2016. Fitting Tip #12 - Twist tells: Better β strands at ≥3.5Å in x-ray or cryoEM. Comp. Cryst. Newsletter 7, 16–19. [Google Scholar]

[R30] Richardson JS, Videau LL, Williams CJ, Richardson DC 2017. Broad analysis of vicinal disulfides: occurrences, conformations with cis or with trans peptides, and functional roles including sugar binding. J. Mol. Biol. 429, 1321–1335. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Richardson JS, Williams CJ, Hintze BJ, Chen VB, Prisant MG, Videau LL, Richardson DC, 2018. Model validation -- local diagnosis, correction, and when to quit. Acta Cryst. D 74, 132–142. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] Richardson JS, Richardson DC 2018. New Help to make your 2.5–4Å cryoEM structures even better. Comp. Cryst. Newsletter 9, 21–24. [Google Scholar]

[R33] Terwilliger TC. Automatic map interpretation with map_to_model. 2018 https://www.youtube.com/watch?v=ZYcG8dlmc18.

[R34] Walsh MA, Dementieva I, Evans G, Sanishvilli R, Joachimiak A 1999. Taking MAD to the extreme: ultrafast protein structure determination. Acta Cryst. D 55, 1168–1173. [1srv] [DOI] [PubMed] [Google Scholar]

[R35] Wang RY-R, Song Y, Barad BA, Cheng Y, Fraser JS, DiMaio F 2016. Automated structure refinement of macromolecular assemblies from cryoEM maps using Rosetta. eLIFE 5, e17219. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] Wang Y, Shekhar M, Thifault D, Williams C, Mcgreevy R, Richardson J, Singharoy A, Tajkhorshid E 2018. Constructing atomic structural models into cry-EM densities using molecular dynamics - pros and cons. J. Struct. Biol, this issue. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] Wang Z, Hryc CF, Bammes B, Afonine PV, Jakana J, Chen DH, Liu X, Baker ML, Kao C, Ludtke SJ, Schmid MF, Adams PD, Chiu W 2014. An atomic model of brome mosaic virus using direct electron detection and real-space optimization. Nat. Commun. 5, 4808–4808. [3j7L] [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Wheatley RW, Juers DH, Lev BB, Huber RE, Noskov SY 2015. Elucidating factors important for monovalent cation selectivity in enzymes: E. coli beta-galactosidase as a model. Phys. Chem. Chem. Phys. 17, 10899–10909. [4ttg] [DOI] [PubMed] [Google Scholar]

[R39] Williams CJ, Hintze BJ, Richardson DC, Richardson JS 2013. CaBLAM identification and scoring of disguised secondary structure at low resolution. Comp. Cryst. Newsletter 4, 33–35. [Google Scholar]

[R40] Williams CJ, Richardson JS 2015a. Fitting Tips #9: Avoid excess cis peptides at low resolution or high B. Comp. Cryst. Newsletter 6, 2–6. [Google Scholar]

[R41] Williams CJ 2015b. Using C-alpha geometry to describe protein secondary structure and motifs, PhD dissertation, Department of Biochemistry, Duke University, 248 pages. [Google Scholar]

[R42] Williams CJ, Hintze BJ, Headd JJ, Moriarty NW, Chen VB, Jain S, Prisant MG, Lewis SM, Videau LL, Keedy DA, Deis LN, Arendall WB III, Verma V, Snoeyink JS, Adams PD, Lovell SC, Richardson JS, Richardson DC 2018. MolProbity: More and better reference data for improved all-atom structure validation. Protein Sci. 27, 293–315. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] Williams CJ, Videau LL, Hintze BJ, Richardson DC, Richardson JS 2018. Cis-nonPro peptides: genuine occurrences and their functional roles. (forthcoming)

[R44] Word JM, Lovell SC, LaBean TH, Zalis ME, Presley BK, Richardson JS, Richardson DC, 1999. Visualizing and quantitating molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms. J. Mol. Biol 285, 1711–1733. [DOI] [PubMed] [Google Scholar]

[R45] Zemla A, 2003. LGA: a method for finding 3D similarities in protein structures. Nucl. Acids Res 31, 3370–3374. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Assessment of detailed conformations suggests strategies for improving cryoEM models: helix at lower resolution, ensembles, pre-refinement fixups, and validation at multi-residue length scale

Jane S Richardson

Christopher J Williams

Lizbeth L Videau

Vincent B Chen

David C Richardson

Abstract

Introduction

Methods