Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 2015 Mar 11;24(5):661–669. doi: 10.1002/pro.2639

Estimation of the quality of refined protein crystal structures

Jimin Wang 1,*
PMCID: PMC4420517  PMID: 25581292

Abstract

Crystallographic Rwork and Rfree values, which are measures of the ability of the models of macromolecular structures to explain the crystallographic data on which they are based, are often used to assess structure quality. It is widely known, and confirmed here that both are sensitive to the methods used to compute them, and can be manipulated to improve the apparent quality of the model. As an alternative it is proposed here that the quality of crystallographic models should be assessed using a global goodness-of-fit metric RO2A/Rwork where RO2A is the number of reflections used for refinement divided by the number of nonhydrogen atoms in the structure, and Rwork is the working R-factor of the refined structure. Also, analysis of structures in the Protein Data Bank suggests that many data sets have been truncated at high resolution, thereby improving the R-factor statistics. To discourage this practice, it is proposed that the resolution of a dataset be defined as the resolution of the shell of data where <II> falls to 1. The proposed goodness-of-fit metric encourages investigators to use all the data available rather than a truncated subset.

Keywords: resolution, protein quality, statistical gaming, statistical manipulation, free R-factors, working R-factors, data truncation

Introduction

A large number of statistics are used to assess the geometric and crystallographic quality of macromolecular crystal structures.1 Unlike the statistics used to gauge geometric quality, which are largely independent of the methods used to solve structures, those used to characterize crystallographic quality such as the agreement R-factors for the measured data (i.e., Rmerge, Rmeas, RPIM, CC1/2, etc.) and to judge the correspondence between the model being refined and the measured data (i.e., Rwork and Rfree) are sensitive to the decisions made consciously by crystallographers as they solve structures, or in some cases without their knowledge, by the programs they use. These statistics can make structures appear better than they really are.

Crystallographic quality statistics can be “improved” by systematically excluding weak-intensity, high-resolution (WIHR) reflections from data sets. The net effect is to trade an apparent improvement in structural quality for a reduction in resolution.2,3 The resolution of the crystal structure of a macromolecule is the single most important determinant of its quality. However, while everyone is sure they know what the word “resolution” means; there is no generally accepted method for estimating it. Worse, investigators may omit available high-resolution data during structure refinement because by doing so they can improve apparent working and free R-factors of the resulting structures. One reason investigators may do so is that the free R-factor, rather than resolution, is commonly used today to measure of structure quality.1,4 For example, the structure with PDB code 4HYO, which was solved at a resolution of 1.65 Å with a Rfree value of 18.1%, is likely to be regarded more favorably than the structure of the same molecule with PDB code 3LDC, which was solved at a resolution of 1.45 Å with Rfree of 20.2%.5,6 The tendency of editors and authors to have this preference is perverse.7

There are many examples in the literature of the benefits to be gained by using all of the WIHR available when refining structures. For example, over a decade ago, the structure of GroEL reported in 1DER was rerefined (PDB accession: 1KP8) using all of the WIHR data between a resolution of 2.0 Å and 2.4 Å, the latter being the resolution of the 1DER structure.8,9 At 2.4 Å, <II> is 1.0 in the highest resolution shell, but at 2.0 Å, <II> is only 0.5. The inclusion of these WIHR data made it possible to correct several significant errors in the 1DER structure; and upon further refinement, a structure was obtained (1KP8) that has both working and free R-factor more than 10% lower than those reported for 1DER out to a resolution of 2.4 Å (Fig. 1). It was argued then that the omission of the WIHR data from the data set used to refine the 1DER structure had trapped it in a local minimum from which it could not escape until the WIHR data were taken into account. Here, evidence will be presented that most of the structures in the Protein Data Bank (PDB) would probably be improved if they were refined using all the WIHR data available.

Figure 1.

Figure 1

Distribution of working (filled) and free R-factors (open) for GroEL 1DER (dashed lines) and 1KP8 (solid lines) as a function of reciprocal resolution (Å−1). The reciprocal resolution of 2.4 Å is marked with a green vertical line.

In order to encourage crystallographers to use all the data available, a new metric needs to be developed for judging structural quality that takes account not only of the correspondence between the data and the model, which Rfree certainly does, but also of the explanatory power of the data, which Rfree does not. The explanatory power of the data is the ratio of the number of independent reflections in the data set used for refinement to the number of adjustable parameters in the structural model obtained using those data. Since the number of nonhydrogen atoms in a structure should be proportional to the number of parameters a molecular model must specify, the observation to atoms (O2A) ratio (RO2A), which is easily estimated, ought to be a useful measure of the explanatory power of the data sets. A structure that has a comparatively high Rfree value but also a high O2A ratio can be superior in quality to one with a low Rfree value but also a low O2A ratio. A number of global goodness-of-fit (GGOF) metrics based on this principle are proposed here to encourage further discussion.

The two GroEL structures mentioned above provide a case in point. The R-factors for the GroEL structure originally published (1DER; R = 24.7%, Rfree = 29.8%, Resolution 2.4 Å) are not much different from those that characterize the model obtained for GroEL using all the WIHR data available (1KP8; R = 24.3%, Rfree = 25.8%, Resolution at 2.0 Å).8,9 However, since the 1KP8 GroEL model explains nearly twice the number of experimental observations as the 1DER model, it is clearly superior.

Results

Rfree values can be manipulated

For at least the last decade, the Rfree values reported for the refined crystal structures of macromolecules have been used as a quality metric, in ways that were never intended.4 Investigators may be concerned that they cannot publish structures if their Rfree values are too high, or are significantly poorer than the average structure in the PDB of similar resolution. Yet, referees seldom object to structures on the grounds that their Rfree values are too low, even when, as sometimes happens, they are smaller than the corresponding working R factors, which is essentially impossible.

As the following example illustrates, it is easy to reduce the Rfree value of a structure without doing anything to improve its quality. A 2.5-Å resolution crystal structure was reported recently for E. coli YfbU (4LR3) that has a working R-factor of 21.0% and free R-factor of 24.7%.10 The free and working R-factors of the YfbU structure vary as a function of resolution the same way they do for most macromolecular crystal structures: they are large at both high and low resolution, and small in the middle (Fig. 2). Thus one can improve both the overall free and working R-factors of this structure simply by discarding the part of the data that is poorly explained by the model (Table1). As the resolution of the data used to compute Rfree is reduced from 2.50, to 2.64, 2.79, 3.10, and finally to 3.18 Å, Rfree falls from 24.70%, to 23.46%, then to 22.33% and 20.57%, and then finally to 20.15%, while the <II> values in the highest resolution shell increase from 0.46, to 1.00, then to 2.00, and 3.00, and finally to 4.00. These statistics show that it is unrealistic to compare the working and free R-factors of the 2.5-Å resolution version of the YfbU structure with those of other structures in the PDB having similar nominal resolutions but much higher values of <II> in the highest resolution shells.

Figure 2.

Figure 2

Crystallographic R-factor statistics as a function of reciprocal resolution (Å−1). (a) Working (black filled spheres) and free (red open circles) R-factors of the YfbU structure. (b) For other rerefined structures, RtcB (black lines), catalase in C2 (red), catalase in P21 (blue), and PSII (green) structures.

Table 1.

Statistical Gaming Rfree Values by Data Truncation for YfbU Structure

Resolution range (Å) Number of reflections <II>a Rwork (%) Rfree (%) Reduction in RO2Ab Reduction in Rwork Reduction in Rfree
56-2.50 131,307 0.46 20.79 24.70 1.00 1.00 1.00
56-2.64 112,400 1.00 18.49 23.46 0.86 0.89 0.95
56-2.79 95,391 2.00 17.25 22.33 0.73 0.83 0.90
56-3.10 69,662 3.00 15.54 20.57 0.53 0.75 0.83
56-3.18 64,565 4.00 15.14 20.15 0.49 0.73 0.82
a

The mean <I/σI> value in the corresponding highest resolution shell.

b

Total number of atom is 23,395 and the total number of parameters is 93,580, which results in RO2A of 5.61 at 2.50-Å resolution, which is a reference resolution for reductions in R-factors.

The <I/σI> value in the highest resolution shell for many structures in the PDB is high

In connection with another study,7 structure factors (or intensities) were retrieved for all of the P212121 entries in the PDB in April, 2014, and are also used here. Of all the space groups in which macromolecules crystallize, the most common is P212121, which includes 23.0% of the entries in the PDB, followed by P21 (15.4%), and C2 (9.6%) (Supporting Information Table S1−S3). Thus, the statistical properties of these P212121 data sets should be representative of the entire PDB.11

Using the program SCALEPACK,12 the structure factors (or intensities) of 11,265 P212121 entries in the PDB were binned into 20 resolution shells each containing approximately equal numbers of reflections, and <II> values were computed for each shell. The value reported for <II> in the highest resolution shell was greater than 30 for 37 of these sets. Because these values were so high, these data sets were excluded from the analysis (Fig. 3, Table2, Supporting Information Table S4, S5). Even so, the mean value for <II> in the highest resolution shell for the remaining data sets is 3.88 ± 2.75 (Table3).

Figure 3.

Figure 3

The distribution of <II> values for the highest resolution shells as a function of reciprocal resolution (Å−1) (a) and a histogram as a function the II ratio (b) for all P212121 entries. Shell-averaged <II> for all P212121 entries are shown in red line.

Table 2.

Intensity Distribution of <I/σI> in the Highest Resolution Shells for all P212121 Entries in the PDBa

<II> condition Number (percentage)
<2.0 1550 (13.8%)
2.0< <II> < 5.0 7417 (65.8%)
>5.0 2298 (20.4%)
>10.0 432 (3.83%)
>20.0 99 (0.88%)
>30.0 47 (0.42%)
>40.0 35 (0.31%)
>50.0 32 (0.22%)
>100.0 14 (0.12%)
a

An initial analysis included all the 14,376 P212121 entries, and the final analysis included only 11,265 entries after some entries containing questionable Friedel pair columns with the pdbx prefix were excluded following discussions with Dr. S. Burley and colleagues at the PDB.

Table 3.

Distribution of the Mean <I/σI> Values in the Highest Resolution Shell for all P212121 Entries in the PDB

<I/σI> condition Mean values
<5 2.96 ± 0.99
<10 3.59 ± 1.72
<20 3.88 ± 2.42
<30 3.98 ± 2.78
<40 4.01 ± 2.94

Surprisingly, 3.8% (432) of the data sets have been truncated at <II> = 10.0 (Table2). Another 20.4% (2,298) of the P212121 entries have excluded all the data with <II> less than 5.0. For many of these entries, an analysis shows that Rwork and Rfree in the highest resolution shells are often smaller (or not much higher) than their overall values,7 which suggests that the WIHR data were truncated during structure refinement, as in the YfbU example discussed above. Given that the crystals of most macromolecules diffract weakly, if one were to use <II> = 10.0 as the resolution cut-off criterion, the amount of data discarded from most data sets would often be far greater than the amount of data used for structure determination.

Discussions with the authors of a few of the structures that have very high values reported for <II> in the highest resolution shell indicated that from their point of view, resolution was not an issue. It did not matter whether the resolution of a structure was 1.5 Å or 2.5 Å, as long as the structure enabled well-founded follow-up biochemical experiments.

The Rfree-Rwork differences of the structures in the PDB are useful measures of refinement quality

Both Rwork and Rfree values can be adjusted to some degree by manipulating the resolution range of the data for structure refinement. Rfree values are likely to be more sensitive to the details of the way data are treated than Rwork values because the reflections for calculation of Rfree values are usually based on only 5% of the data. For the following reasons, it is harder to manipulate the difference between Rfree and Rwork. First, this difference should always be positive because if the data have been processed properly, Rfree must always be greater than Rwork. In addition, Rfree should gradually approach Rwork during refinement. Also, once the refinement has converged, the difference between them should vary in a predictable way as a function of both resolution and RO2A. As Figure 4 shows, the high-resolution limit of the value of that difference appears to be about 0.020 for all the P212121 entries in the PDB.

Figure 4.

Figure 4

The distribution of differences between Rfree and Rwork as a function of reciprocal resolution (Å−1) (a) and of RO2A (b) for all P212121 entries. Shell-averaged ΔR-factors are shown in red liens, zero lines in blue, and high-resolution (high RO2A) asymptotic line in green. Data outside the boxes that were included in this analysis are not shown.

Even though it is essentially impossible for Rfree to be less than Rwork, 24 of the P212121 structures in the PDB have differences that are zero or negative (Supporting Information Table S6). Another 57 entries are characterized by differences less than 0.005, and the total number of entries having differences less than asymptotic limit is 1010 (6.4%) (Supporting Information Table S6). Finally, the average R-factor difference for all structures at resolutions lower than 2.85 Å is much smaller than the value predicted by the trends using all the P212121 structures in the PDB (Fig. 4). Explanations for most these anomalies remain unknown. Nevertheless, the fact that so many structures can be refined in ways that result in very small differences between the two R-factors does call into question the wisdom of using Rfree to judge structure quality.

The importance of the observation-to-atom ratios for structures in the PDB

The most obvious shortcoming of R-factors as measures of structure quality is that they do not reflect the capacity of the X-ray data to determine the parameters of the structure. A simple metric that could be used to provide this information is the number of (independent) reflections used to refine a structure, divided by the number of non-hydrogen atoms in the asymmetric unit. This ratio is designated here as RO2A. It is an imperfect measure of the quality of the structural model because the number of adjustable parameters in the model depends in part on the method used for its refinement, and on the way B-factors are treated. It is also the case that the number of parameters per nonhydrogen atom needed to model solvent molecules is not the same as the number per atom for the macromolecule itself. Nevertheless, RO2A will increase as the cube of the reciprocal of the resolution (Fig. 5), as it should, and it automatically takes into account solvent-content variations in crystals. In an analysis done in April 2014, the average solvent content for the all entries in the PDB was 50.2 ± 8.2%, and 48.2 ± 8.5% for the P212121 entries (Supporting Information Table S1−S3). However, there is a considerable variation from one crystal to the next. For example, 95% of the volume of the unit cell in the crystals used to solve the 2YQ3 structure is occupied by solvent.13 Thus, this structure would have an RO2A 10 fold higher than that of a structure solved at the same resolution using crystals that have a solvent content of 50%.

Figure 5.

Figure 5

The distribution of RO2A values for all P212121 entries. (a) As a function of reciprocal resolution cubed (Å−3). (b) As a function of reciprocal resolution (Å−1) for fitted model from (a). Intercepts at RO2A of 4 (green) and 9 (blue) are also shown.

When RO2A is plotted against reciprocal resolution cubed for all the P212121 entries in the PDB, the resulting distribution can be fitted to a line with an overall correlation coefficient of 0.898 (Fig. 5). However, its intercept does not pass through the origin of the plot, as one would anticipate. This failure may be caused by a tendency of increases in solvent content to correlate with decreases in the resolution of macromolecular crystal structures. The average solvent content is about 50% for all the structures in the PDB, but it is 70% for all structures with resolutions lower than 5.0 Å (Supporting Information Table S1). Under-representation of low-resolution crystal structures in the set of structures considered may also contribute, as may systematic differences in the way WIHR data are treated, with more WIHR data being used for low-resolution structure determinations.

The regression line in Figure 5 indicates that on average, RO2A reaches 4 at 3.11 Å and 9 at 1.94 Å, which implies that unconstrained refinement of atomic positions and isotropic B-factors is likely to fail for macromolecular structures solved at resolutions of 3.11 Å or lower, and that unconstrained refinement of atomic positions and individual anisotropic B-factor parameters cannot be expected to work well unless the resolution of the data exceeds 1.94 Å. These estimates agree well with conventional wisdom in the macromolecular crystallographic community (e.f., Ref.14).

A proposal for some new measures of structure quality

The goal of all structure refinements is to arrive at the physically plausible model for the molecule of concern that best explains all the observations available. For this reason alone, all the WIHR data available ought to be included in the structure refinement. It is also proposed that structure quality be assessed using a global goodness of fit (GGOF) statistic, the simplest of which, GGOF1, is defined as follows.

graphic file with name pro0024-0661-m1.jpg

Experience shows that structures having a GGOF1 > 100 should be considered “high quality” (Table4), which implies that a structure with an RO2A> 15 would have to have a Rwork <15% to be considered high quality. If the solvent content of a structure determined at resolution of about 5 Å were 70%, which is unusually high (Supporting Information Table S1), the RO2A value would be about 3.5. In order for such a structure to be considered high quality, its Rwork would have to be less than 3.5%, which has never been achieved for a 5.0-Å resolution structure.

Table 4.

Application of the GGOF structural quality metricsa

PDB Reso (Å) RO2A Rwork Rfree GGOF1 GGOF2 Rsigma
4F1U 0.98 44.7 0.088 0.096 508.0 69.6 0.041
4F1V 0.88 61.3 0.125 0.140 490.4 55.9 0.059
4F1V/Trim 0.98 38.2 0.107 0.123 357.0 50.3 0.045
4F18 0.96 47.1 0.095 0.110 495.8 62.4 0.053
4F19 0.95 52.0 0.096 0.111 541.4 64.9 0.064
2OL9 0.85 47.9 0.073 0.078 655.6 69.2 0.042
4AYO 0.85 67.3 0.095 0.105 708.4 78.1 0.056
4AYP 0.85 72.0 0.097 0.106 742.3 80.0 0.059
4AYQ 1.10 33.2 0.088 0.105 377.3 54.9 0.053
4AYR 1.10 30.6 0.084 0.102 364.3 54.2 0.031
4GHO 1.10 44.3 0.097 0.117 605.2 56.9 0.092
4LTG 1.18 28.6 0.089 0.113 321.3 47.3 0.042
4MJ9 0.97 53.0 0.086 0.096 616.2 75.8 0.041
4LR3 2.50 5.6 0.208 0.247 27.5 9.8 0.138
4LR3/Trim 2.64 4.8 0.185 0.235 26.0 9.3 0.122
4LR3/Trim 2.79 4.1 0.172 0.223 23.7 9.1 0.107
4LR3/Trim 3.10 3.0 0.155 0.206 19.2 8.4 0.084
4LR3/Trim 3.18 2.8 0.151 0.202 18.3 8.2 0.080
1DER 2.40 5.8 0.247 0.283 23.4 8.5 0.121
1KP8 2.00 9.5 0.243 0.258 39.0 11.9 0.164
RtcB/Mnb 1.48 19.1 0.098 0.144 194.9 30.3 0.102
C2Catalseb 1.53 9.9 0.075 0.134 131.7 23.5 0.084
3P9Q 1.48 16.0 0.143 0.177 111.9 22.6 0.123
3P9Q/Newb 1.48 13.7 0.096 0.143 142.7 25.9 0.123
3ARC 1.90 10.8 0.177 0.204 61.4 16.2 0.047
3ARC/Newb 1.90 10.3 0.114 0.162 90.1 19.8 0.047
a

The first group of PDB entries is with reported Rwork of less than 10% with an exception of 4F1V, which is closely related to 4F1U. The second group of PDB entries is of structures discussed in this manuscript, including some unpublished rerefined structures. The trimming of the WIHR data is also included in this table. See text for definitions of parameters used in this table. Rsigma is 1/<II> for all the data.

b

These new structures included rerefinement of the structures published from author's laboratory as well as from some other laboratories.

The results obtained with the YfbU structure by trimming the WIHR data used to refine it suggest that Rfree values fall much slower than Rwork values, and that the decrease of Rwork appears to be proportional to RO2A, while that of Rfree appears to be proportional toInline graphic (Supporting Information Table S1), assuming that the structure refinement has fully converged. These observations suggest that a second GGOF metric might be considered,Inline graphic:

graphic file with name pro0024-0661-m4.jpg

When dealing with the WIHR data, it is important to include the weighted R-factors by the measurement errors, which many refinement programs often report but are not quoted in many publications. These weighted R-factors can be used for calculations of GGOF metrics as well.

Applications of GGOF metrics

Table4 provides the GGOF metrics for a handful of structures in the PDB. It should be noted that GGOF1 for 1KP8, the higher resolution structure of the two GroEL structures mentioned earlier, is much better than that of 1DER, its lower resolution mate, as it should be, and that 1KP8 is also superior to 1DER based on their GGOF2 statistics.

GGOF metrics can also be used to determine whether the version of 1EGW15 that has been rerefined with two identical copies of a DNA duplex bound in two different orientations (i.e., two alternative conformations for the entire DNA duplex) bound to the two monomers of that homo-dimeric protein is better than the one rerefined with one asymmetric DNA duplex bound to both monomers (plus three nucleotides per strand or six nucleotides in total in two alternative conformations) that differ in orientation between two monomers.7 The first model has a working R-factor of 14.2%, a free R-factor of 18.2%, and the number of atoms is 3,165 so that RO2A is 9.24. The second model has a working R-factor of 15.3%, free R-factor of 19.2%, and the number of atoms is 2591 (a smaller number than the first structure) so that RO2A is 11.28. Thus, the first model has GGOF1 of 65.1 and GGOF2 of 16.7, whereas the second model has GGOF of 73.7, and GGOF2 of 17.5. Both criteria suggest that the second model with fewer atoms is better than the first. The 4HYO structure,6 which was originally determined in P1, provides another instructive example. When it is refined in the space group appropriate for the crystals it forms, P4212, its GGOF metric is 30% better than it is when it is rerefined in P1 under identical conditions.6,7

To give the reader some idea of the kinds of GGOF values that should be aspired to, Table4 includes data on some structures of exceptional quality, which were chosen from among the 67 single-crystal structures in the PDB that have working R-factors less than 10% at resolutions in the atomic to sub-atomic range.1621 These values range from 320 for 1LTG, which was solved at a resolution of 1.1 Å, to 740 for 4AYP, which is a 0.85-Å resolution structure (Table4).

The structure described by 4F1V, which was determined at 0.88-Å resolution with working R-factor of 12.5%, is included in this high-resolution group because it is instructive to compare its statistics with those of a closely related entry, 4F1U, which was solved at 0.98-Å resolution with working R-factor of 9.6%.16 For these structures, the number of measured observations at a resolution of 0.98 Å is 176,838, but at a resolution of 0.88 Å the number is 248,683, and the corresponding RO2A values are 44.7 and 61.3. Omission of the WIHR data from the data used for refining the 4F1V structure consistently resulted in poorer GGOF metrics, even though it led to improved apparent working and free R-factor values. Thus, if the GGOF metrics proposed here were used to assess quality, it would be apparent that there is no justification for omission of any of the WIHR data available for the 4F1V structure, no matter what the effect on R-factors (Table4).

Discussion

Consequences of excluding poorly measured observations in data processing

It is surprising that one in every five structures in the PDB has an <II> value of 5.0, or higher, in the highest resolution shell, and that this value is 10.0 or higher for one in every thirty structures in the PDB. There are at least two ways data can be processed so that such high <II> values are obtained in the highest resolution shells: (i) exclusion of all the poorly measured weak-intensity reflections whatever the resolution, and/or (iii) exclusion of all the WIHR data. Both will improve the statistics of the processed data, but neither is good practice.

As the HKL Users Manual explains, individual observations of specific reflections from the data sets should not be excluded on the grounds that they have been poorly measured.21 Some reflections in any data set obtained from a macromolecular crystal are bound to be weak, and when a weak reflection crosses the Ewald sphere, the value recorded for its intensity may be negative as a consequence of counting statistics. Those apparently nonphysical values for intensities must be averaged with all the other observations made of the intensities of the corresponding reflections. If not, the intensity estimates that emerge for those reflections when observations are averaged will be larger than they should be. This practice could give the impression that intensities of reflections that are zero because they are systematically absent due to crystal symmetry are non-zero, and thus lead them to incorrect conclusions about crystal symmetry.7 Questions related to negative intensity measurements are best addressed using techniques that rely on Bayesian statistics.23 These measurements should not be omitted.

During structure refinement, the knowledge that a particular reflection is weak is as important as the knowledge that some other reflection is strong. What counts are the differences between predicted and measured amplitudes, not the absolute values of measured amplitudes as such. Thus, if weak data are omitted during structure refinement, and the model that emerges fails to predict that these data should be weak, the model cannot be accurate, nor can the phases calculated using that model be relied upon. Thus, an important distinction exists between setting the intensities of weak reflections to zero and omitting them altogether during refinement. If the intensity of a weak reflection is not taken into account during refinement, its value will not constrain the model being refined. By contrast, if it is taken into account but its amplitude is set to zero, models will be favored that predict that its amplitude should be weak, which is as it should be.

A proposed metric to measure crystallographic quality

The major problem with the metrics commonly used for assessing crystallographic quality today is that they give too little emphasis on resolution and too much emphasis on the correspondence between the model and the data. The primary virtue of the GGOF parameters proposed here as metrics of crystallographic quality is that they will “reward” use of all the data available.

If GGOF parameters of the sort being advocated here become widely adopted, it will become even more important than it is today for the community to arrive at an agreement as to how “resolution” is defined. In earlier times, when data collection was much more difficult than it is today, the resolution of a crystal structure was taken to be the Bragg spacing at which <II> falls to 2.0. Today, as pointed out above, “resolution” turns out to be the Bragg spacing of the highest resolution shell of data used to refine the structure, no matter what the cutoff value of <II> may be. Given the quality of the data sets being collected today, it would be reasonable to define the resolution of a data set, and hence that of the structure obtained from it, as the resolution at which <II> falls to 1.0. By itself, this change in the operational definition of resolution might encourage the crystallographic community to be more aggressive in its use of high-resolution data than it has been in the recent past.

The service the PDB provides as a repository of both structures and data cannot be overstated. As most crystallographers know, it can be remarkably hard to locate and retrieve the data sets that are more than a few years old from one's own computer system. Obsolescence of the media on which the data are stored can become a problem for data sets that are more than a few years old, and but for the PDB, closure of the laboratory that collected them would be the equivalent of a death sentence. Thus, it is important that the authors responsible for the 20% of the structures in the PDB for which there are no data deposited beyond <I/σI>=5 reprocess the data they have to the highest resolution possible before they are lost forever. It is less urgent, but also important that for the authors of the 86% of all the structures in the PDB that lack data beyond <I/σI>=2.0 do what they can do to extend the resolution of their data sets (Table2). The availability of the WIHR data for these structures will make it possible for anyone who becomes interested in the future to re-refine them using the programs are available then, which are all but certain to be even better than those available today.

Materials and Methods

All structural factors were retrieved in mid-April 2014 from the PDB and analyzed as described elsewhere.7 The program SCALEPACK was used to analyze the <II> distribution in the highest resolution shell of the 20 shells of approximately equal number of expected observations.12

Acknowledgments

The author acknowledges Professors Peter Moore and Brian Matthews for extensively editing this manuscript. The 2YQ3 entry cited in this paper had the highest reported solvent content among all the PDB entries retrieved and analyzed on April 2014, which was no longer so after November 26, 2014 when the solvent content for the 2YQ3 entry was revised.

Supporting Information

Additional Supporting Information may be found in the online version of this article.

Supplementary Information

pro0024-0661-sd1.docx (118.5KB, docx)

References

  1. Brown EN, Ramaswamy S. Quality of protein crystal structures. Acta Cryst. 2007;D63:941–950. doi: 10.1107/S0907444907033847. [DOI] [PubMed] [Google Scholar]
  2. Diederichs K, Karplus PA. Better models by discarding data? Acta Cryst. 2013;D69:1215–1222. doi: 10.1107/S0907444913001121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Karplus PA, Diederichs K. Linking crystallographic model and data quality. Science. 2012;336:1030–1033. doi: 10.1126/science.1218231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Brunger AT. Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature. 1992;355:472–475. doi: 10.1038/355472a0. [DOI] [PubMed] [Google Scholar]
  5. Ye S, Li Y, Jiang Y. Novel insights into K+ selectivity from high-resolution structures of an open K+ channel pore. Nat Struct Mol Biol. 2010;17:1019–1023. doi: 10.1038/nsmb.1865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Posson DJ, McCoy JG, Nimigean CM. The voltage-dependent gate in MthK potassium channels is located at the selectivity filter. Nat Struct Mol Biol. 2013;20:159–166. doi: 10.1038/nsmb.2473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Wang J. On the validation of crystallographic symmetry and the quality of structures. Protein Sci. 2015;24:621–632. doi: 10.1002/pro.2595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Boisvert DC, Wang J, Otwinowski Z, Horwich AL, Sigler PB. The 2.4 A crystal structure of the bacterial chaperonin GroEL complexed with ATP gamma S. Nat Struct Biol. 1996;3:170–177. doi: 10.1038/nsb0296-170. [DOI] [PubMed] [Google Scholar]
  9. Wang J, Boisvert DC. Structural basis for GroEL-assisted protein folding from the crystal structure of (GroEL-KMgATP)14 at 2.0A resolution. J Mol Biol. 2003;327:843–855. doi: 10.1016/s0022-2836(03)00184-0. [DOI] [PubMed] [Google Scholar]
  10. Wang J, Wing R. Diamonds in the rough: a strong case for the inclusion of weak-intensity X-ray diffraction data. Acta Cryst. 2014;D70:1491–1497. doi: 10.1107/S1399004714005318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Berman HM, Bhat TN, Bourne PE, Feng Z, Gilliland G, Weissig H, Westbrook J. The Protein Data Bank and the challenge of structural genomics. Nat Struct Biol. 2000;7(Suppl):957–959. doi: 10.1038/80734. [DOI] [PubMed] [Google Scholar]
  12. Otwinowski Z, Minor W. Processing of X-ray diffraction data collected in oscillation mode. Macromol Cryst A. 1997;276:307–326. doi: 10.1016/S0076-6879(97)76066-X. [DOI] [PubMed] [Google Scholar]
  13. El Omari K, Iourin O, Harlos K, Grimes JM, Stuart DI. Structure of a pestivirus envelope glycoprotein E2 clarifies its role in cell entry. Cell Rep. 2013;3:30–35. doi: 10.1016/j.celrep.2012.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Moore PB. Visualizing the invisible. Imaging techniques for the structural biologist. New York: Oxford University Press; 2012. [Google Scholar]
  15. Santelli E, Richmond TJ. Crystal structure of MEF2A core bound to DNA at 1.5 A resolution. J Mol Biol. 2000;297:437–449. doi: 10.1006/jmbi.2000.3568. [DOI] [PubMed] [Google Scholar]
  16. Elias M, Wellner A, Goldin-Azulay K, Chabriere E, Vorholt JA, Erb TJ, Tawfik DS. The molecular basis of phosphate discrimination in arsenate-rich environments. Nature. 2012;491:134–137. doi: 10.1038/nature11517. [DOI] [PubMed] [Google Scholar]
  17. Sawaya MR, Sambashivan S, Nelson R, Ivanova MI, Sievers SA, Apostol MI, Thompson MJ, Balbirnie M, Wiltzius JJ, McFarlane HT, Madsen AO, Riekel C, Eisenberg D. Atomic structures of amyloid cross-beta spines reveal varied steric zippers. Nature. 2007;447:453–457. doi: 10.1038/nature05695. [DOI] [PubMed] [Google Scholar]
  18. Thompson AJ, Dabin J, Iglesias-Fernandez J, Ardevol A, Dinev Z, Williams SJ, Bande O, Siriwardena A, Moreland C, Hu TC, Smith DK, Gilbert HJ, Rovira C, Davies GJ. The reaction coordinate of a bacterial GH47 alpha-mannosidase: a combined quantum mechanical and structural approach. Angew Chem Int Ed Engl. 2012;51:10997–11001. doi: 10.1002/anie.201205338. [DOI] [PubMed] [Google Scholar]
  19. Pace CN, Fu H, Lee Fryar K, Landua J, Trevino SR, Schell D, Thurlkill RL, Imura S, Scholtz JM, Gajiwala K, Sevcik J, Urbanikova L, Myers JK, Takano K, Hebert EJ, Shirley BA, Grimsley GR. Contribution of hydrogen bonds to protein stability. Protein Sci. 2014;23:652–661. doi: 10.1002/pro.2449. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hall JP, Sanches-Weatherby J, O'Sullivan K, Kelly JM, Cardin CJ. (To be published) Dehydration/rehydration of a nucleic acid system containing a polypyridyl ruthenuum complex at 74% relative humidity.
  21. Hall JP, Beer H, Buchner K, Cardin DJ, Cardin CJ. (To be published) Lamda-[Ru(TAP)2(dppz-10-Me)2+ bound to a synthetic DNA oligomer.
  22. Gerwirth D. The fourth edition of the HKL manual. New Haven, CT: Yale University Press; 1995. [Google Scholar]
  23. French S, Wilson K. Treatment of negative intensity observations. Acta Cryst. 1978;A34:517–525. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information

pro0024-0661-sd1.docx (118.5KB, docx)

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES