Significance
Modern crystallographic structure determination uses maximum likelihood methods. They rely on error estimates between the work model and the unknown target based on a small fraction of the data. This can introduce a large uncertainty and, even worse, restricts the method to projects where sufficient data are available. We investigate the method. It enables the use of all data for error estimation. It reduces the uncertainty associated with the conventional approach for small datasets. We show that our approach reduces the effect of overfitting. This enables maximum likelihood methods to be extended to a much wider field of applications, including free electron laser experiments, high-pressure crystallography, and low-resolution structures.
Keywords: structure determination, reliability index, maximum likelihood refinement, overfitting, model bias
Abstract
The crystallographic reliability index is based on a method proposed more than two decades ago. Because its calculation is computationally expensive its use did not spread into the crystallographic community in favor of the cross-validation method known as . The importance of has grown beyond a pure validation tool. However, its application requires a sufficiently large dataset. In this work we assess the reliability of and we compare it with k-fold cross-validation, bootstrapping, and jackknifing. As opposed to proper cross-validation as realized with , relies on a method of reducing bias from the structural model. We compare two different methods reducing model bias and question the widely spread notion that random parameter shifts are required for this purpose. We show that has as little statistical bias as with the benefit of a much smaller variance. Because the calculation of is based on the entire dataset instead of a small subset, it allows the estimation of maximum likelihood parameters even for small datasets. enables maximum likelihood-based refinement to be extended to virtually all areas of crystallographic structure determination including high-pressure studies, neutron diffraction studies, and datasets from free electron lasers.
The quality of crystallographic models is described by several quality indicators. Both for small and macromolecular structure deposition, the crystallographic reliability index must be provided (1, 2). It is calculated for the dataset H of observations and a structural model as
[1] |
Depending on the data-to-parameter ratio, is affected by more or less severe overfitting (3, 4). To overcome this problem, cross-validation was introduced into crystallography (5–9). For cross-validation in crystallography, a certain fraction of the observations, typically 5–10%, are withheld as test set T and never used for model building and refinement. They are only used to calculate the reliability index :
[2] |
is much less affected by overfitting and since its introduction it has gained importance beyond validation of the structural model. It is used to optimize weights for restrained refinement (4, 10–13). The concept of paved the way for maximum likelihood methods in crystallography. It was shown that the estimation of maximum likelihood parameters based on the test set T provides much better accuracy than that based on the data used during refinement (14–16).
Cross-validation reduces the bias of a statistic (17, 18) but can show large variance, especially when T is small (8, 17). The relative error of the crystallographic was established as (19). The test set should hold at least 500 data points so that . Maximum likelihood methods estimate parameters in resolution bins, and a total of may be required for robust estimation. To assess the accuracy of a statistic such as one could apply k-fold cross-validation, the bootstrap method, and the jackknife method (7, 8, 17, 20). k-fold cross-validation divides the dataset into k approximately equally sized and pairwise disjoint subsets and cross-validation is carried out for each of the parts separately. and are calculated from the k resulting . As mentioned above, for small test sets, that is, , becomes very large. Both the bootstrap and the jackknife method reduce the variance of an estimator like . The jackknife artificially creates datasets , that is, with the ith data point removed, so that
[3] |
The estimator is calculated as arithmetic mean
[4] |
with the jackknife estimate of variance (8)
[5] |
Bootstrapping differs from jackknifing in that the bootstrap datasets are generated from H by random sampling with replacement with . Thus, one could calculate up to times, although a few thousand samples are usually sufficient. Let b the number of bootstrap samples. The bootstrap R value and its estimate of variance are defined as (8)
[6] |
[7] |
None of these methods avoids the deficiency that the variance of the respective R value is large when the test sets are very small. This was already shown in ref. 6 and can be seen in SI Appendix, Fig. S1: Because the R value has a lower bound of 0, large outliers will drag any mean up from its real value. Our interest in alternative ways to calculate arose during the project in ref. 21. Macromolecular neutron datasets are often small with low data completeness. Leaving out 500 or more data points during model building and refinement would destabilize these processes and thus impede the quality of the final model. In high-pressure crystallography the situation is even worse because the incompleteness of the data owing to shadows from the experimental setup is systematic and leads to data-to-parameter ratios too low to rely on alone. The entire dataset may have fewer than 500 observations (22).
To circumvent these difficulties, Brünger (6) suggested the method of validation: Instead of creating the test sets required for k-fold cross-validation at the very beginning after data collection, they are created only when the calculation of a reliable R value is needed. Strictly speaking, the method is not cross-validation because the statistic of interest, the R value, is not calculated as mean from a number of refinement runs, but in analogy to Eq. 1 from the entire dataset, as will be detailed below. The critical point for using is the question of how to reduce the effect of overfitting from the structural model after it was refined against all data points. Proper cross-validation as realized with does not share this problem because the data from the test set are never used during refinement and model building throughout the entire process from data acquisition to publication. Brünger (6) suggested simulated annealing. Others apply random parameter perturbation (13, 23, 24). A third option that has been discussed in the crystallographic community was suggested by Tickle (25): Refinement of a structural model to convergence should reduce the effect of overfitting against any observation not used during such a refinement run. Here we concentrate on “Tickle’s conjecture” for its obvious advantage: Both simulated annealing and parameter perturbation introduce random shifts into the structural model. In the worst case this may result in a nonchemical structure shown in SI Appendix, Fig. S4. It may result in several structures that differ significantly, that is, so that a biologist or chemist would speak of different structures. Hence, one could no longer speak of the structural model and its R value. In this article we present a series of experimental approaches that show that the method results in as little bias as . We show that varies much less than in the case of very small datasets. We confirm Tickle’s conjecture and, thus, in the light of ref. 15, our work enables maximum likelihood-based refinement of crystallographic models against small datasets as in neutron diffraction, high-pressure crystallography, low-resolution macromolecular studies, supramolecular chemistry, and, at its current state, structural data from free electron lasers.
This manuscript is structured as follows. The Methods section first describes how we calculate . The following subsections describe the experiments we carried out. The Results section repeats all of the subsections with the respective results. The description is held as general as possible. The details about programs and parameters are given in SI Appendix.
Methods
The datasets used in this work are summarized in Table 1 including their IDs used throughout this manuscript. Throughout this manuscript we use the terms “working set” and “test set,” defined below. These terms are commonly used in crystallography. In other contexts the equivalent terms “training set” and “validation set” are used, respectively. In the presence of a test set, the reliability index defined in Eq. 1 is calculated only from the observations used in refinement, that is, only for .
Table 1.
Dataset | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Name | n/a | n/a | n/a | Ciprofloxacin | Hormaomycin | Insulin | Insulin | Elastase |
SG | ||||||||
, Å | 0.44 | 1.00 | 1.00 | 0.70 | 1.02 | 1.10 | 2.30 | 1.37 |
No. of atoms | 56 | 60 | 60 | 60 | 215 | 436 | 802 | 2,163 |
No. of data | 42,997 | 5,117 | 5,069 | 6,227 | 7,800 | 32,598 | 3,747 | 44,784 |
Source | 31 | 32 | 32 | 33 | 34 | SI | 35 | SI |
Dataset 6′ is identical to dataset 6 but with reduced to 1.9 Å with 6,533 observations. Dataset 7 is the neutron dataset. n/a, not applicable. No. of atoms, number of non-H atoms per asymmetric unit; for dataset 7, number of all atoms. No. of data, number of unique observations; SG, space group; SI, see SI Appendix.
Data Preparation and Calculation of .
The starting point is a merged dataset H and the structural model P (i.e., the set of parameters for which is to be calculated). The model should have been refined against the entire dataset until convergence. The dataset is randomly partitioned into k test sets so that and . If k does not divide , the last test set is smaller than the other test sets. For better readability of the manuscript we generally do not point out this fact when abbreviating the test set size as . The structural model is refined until convergence against each of the working sets , resulting in the structural models . Then, related to equation 16 in ref. 6,
[8] |
By construction, is calculated from the entire dataset. In the numerator of Eq. 8 is calculated from from the model for an observation , which was not used in the refinement of model .
Note the difference to k-fold cross-validation:
[9] |
The following subsections describe the experiments we carried out.
Stability with Respect to the Test Set Size.
Both and were calculated for datasets 5 and 6′. The partition size was varied between 1 and 500 (see SI Appendix, Fig. S1 and Tables S1 and S2).
Stability of with Partition.
Unless , may depend on the partitioning of the dataset. We randomly partitioned datasets 6′, 4, and 7 20 times and calculated for each partition to assess how much it varies with the partitioning. Results are listed in SI Appendix, Tables S3–S5.
Validation I: How “free” Is ?
Dataset 8 was partitioned into 90 test sets . The test sets and the working sets were separated to ensure the test sets were not used in any of the subsequent steps. For each working set, the structure was automatically solved with standard single-wavelength anomalous dispersion of S atoms (S-SAD) and expanded to a poly-Alanine model. Each poly-Alanine model was subsequently further completed by automated model building with the amino acid sequence as input. These models were finally refined with 200 cycles conjugate gradient least-squares refinement. was calculated with each model against its test set. Because the test set was never used during the creation of the structural model, is free from overfitting. For each structural model, was calculated as described above. Results are listed in SI Appendix, Tables S7–S9.
As a second type of experiment the small-molecule datasets 2 and 3 were each partitioned into 20 test sets and solved by standard direct methods against the working sets . Dataset 3 is similar to dataset 2 except for a disordered solvent molecule, resulting in greater fluctuations. Each of the 20 resulting structural models was refined against its respective work set with 10 cycles of least-squares minimization. The values were calculated from each structural model against its test set. Results are listed in SI Appendix, Tables S10 and S11.
Validation II: Comparison with Calculated Data.
Diffraction data were calculated from the structural model of dataset 6′ to Å and from the structural model of dataset 4 to Å. Hydrogen atoms were not included for the calculations. We checked that in both cases against the calculated data without refinement. For the structural model of dataset 6′, the oxygen atoms of four water molecules were removed and two oxygen atoms were replaced as sodium atoms. For the structural model of dataset 4, the oxygen atom of one water molecule was replaced as sodium atom (i.e., the model contains three electrons too many compared with the data). The values were calculated without refinement, thus representing the real value. The small molecule from dataset 4 was refined with 50 cycles of least-squares minimization, and the macromolecule from dataset 6′ was refined with 30 cycles of conjugate gradient least-squares minimization. was calculated with for dataset 4 and for dataset 6′. Whereas the experiments of the previous subsection address the resistance of against overfitting, the experiments of this subsection also address the effects of structural model bias. The R values are listed in SI Appendix, Table S12.
Effect of Parameter Perturbation.
We use the symbol X for the amount of random perturbation of coordinates and atomic displacement parameters of the structural models . Coordinates of atoms not on special positions were displaced by an average distance X Å in a random direction. When applicable hydrogen atoms were generated after the application of shifts. No shifts were applied to fixed coordinates (e.g., for special positions). Isotropic atomic displacement parameters and the main diagonal elements were multiplied by a random factor so that they change by an average of X Å2. Off-diagonal atomic displacement parameters , , and for anisotropic atoms were not modified to avoid the generation of matrices with physically impossible nonpositive eigenvalues.
To investigate how random parameter perturbation reduces the effect of overfitting from the structural model, we created a regular grid of dummy atoms. We used the cell from dataset 6 as an example of a noncentrosymmetric space group and from dataset 1 as an example of a centrosymmetric space group. The number of grid points corresponds roughly to the number of atoms for the respective structure. This ensures realistic data-to-parameter ratios. To introduce overfitting the set of parameters was refined to convergence without restraints against the respective data at various resolution cut-offs (see SI Appendix, Tables S13 and S14, respectively). The parameters of both overfitted structural models were randomly perturbed with an amplitude X varying from 0.1 to 1.0 and their values was calculated against all data up to the given resolution. The perturbation was repeated 500 times and the values averaged.
The numerical results are listed per resolution cut-off in SI Appendix, Tables S15–S20 for dataset 6 and in SI Appendix, Tables S21–26 for dataset 1.
Influence of Parameter Perturbation on Convergence Rate.
The value was monitored for the structural model of dataset 6′ with varying amplitudes of perturbation. The number of least-squares refinement cycles is listed in SI Appendix, Table S27; 4,000 and 10,000 cycles were calculated only for and .
Results
Stability with Respect to the Test Set Size.
Cross-validation and especially k-fold cross-validation are known to produce values with theoretically little bias, yet with small test sets they suffer from large variance (17). In addition to the large variance, the averaged mean of any value with a lower bound but no upper bound such as the crystallographic reliability index will probably be pushed up by very large outliers. We compared the behavior of with that of for small test set sizes. For this purpose we calculated both values for the structural models of datasets 5 and 6′ in dependence of the test set size. Our results show that is independent of the test set size. The mean value averaged over all tested set sizes is for dataset 5 and for dataset 6′. , on the contrary, shows the expected extremely large variance. More importantly, its value rises when the test set size is below 20, a behavior known since the introduction of (6). The bootstrapped values are listed in SI Appendix, Tables S1 and S2, respectively. They replicate the values of with an SE one order of magnitude smaller. Hence, bootstrapping does not avoid the instability of for small test sets. However, is reliable even when the entire dataset except a single observation is used for refinement. At the suggested lower limit for the test set size (6), has a reasonably narrow range within for dataset 5 and for dataset 6′. However, with , the range increases to for dataset 5 and for dataset 6′. Note that these are the ranges for one particular partition. They do not cover all possible test sets except for . Because for the conventional the free set is chosen randomly, one might have ended up with any such value for the same model. This illustrates why we describe as unstable. can be calculated from any convenient test set size to optimally balance between computation time and data completeness used for refinement.
Stability of with Partition.
Except for there are a large number of possible partitions for a dataset, and might vary depending on which partition is used. We computed and σ(Rcomplete) from 20 different partitionings. We find for dataset 6′, for dataset 7, and for dataset 4 (i.e., does not depend on the choice of partition). We conclude that can be calculated from a single partitioning. In combination with the previous subsection, the size of the subsets of the partitioning of the dataset can be chosen as convenient and only a single partition needs to be considered.
Validation I: How “free” Is ?
One of the basic questions for the relevance of our work is whether the procedure described above really reduces the effect of overfitting (i.e., whether is really “free”). We carried out proper k-fold cross-validation in the sense that we calculated from test sets that were never used for model building or refinement throughout the entire process.
Dataset 8 was solved from 90 different working sets by SAD phasing, density modification, and model completion by autobuilding. Each of the resulting 90 structural models was refined to convergence. For each structural model a proper was calculated against its respective test set and was calculated as described above.
Our calculations resulted in
[10] |
[11] |
Note that 90 structural models can have significant differences in the number of amino acids, the orientation of side chains, and so on. Therefore, the calculation of is not meaningful. Within less than half an SD, so that we consider as free from overfitting as . The ratio 1.12 between and indicates the effect of overfitting present in , as one would expect. When bootstrapping is applied to the ratio , the average value remains at 0.9866 with and only within (see SI Appendix). With bootstrapping as criterion we can consider to slightly suffer from overfitting compared with proper cross-validation, but still much less than , underlining the value of for validation.
To assess whether correlates with the quality of the respective structural models, we calculated the average phase difference between each structural model and the fully refined structure. The correlation between and for all 90 models is 99.1%, compared with only 74.1% between and . The correlation between and is 98.9% (i.e., for these high-quality data it compares with ). We conclude that is a good estimator for the quality of a structural model.
We repeated a similar experiment with the small molecule dataset 2. Despite two independent approaches the results are remarkably similar:
[12] |
[13] |
The large variation for the ratio with once more underlines the greater stability of compared with for small test sets, in this case. Bootstrapping the ratio between and with 20,000-fold resampling results in the same average ratio 0.9900 with (i.e., in this case within ).
Similarly, for dataset 3:
[14] |
[15] |
In this case, bootstrapping provides and thus within only .
The values for dataset 2, listed in SI Appendix, Table S10, clearly cluster about two values, and . Inspection of the structural models revealed that the structure solution step wrongly assigned one particular carbon atom, having six electrons, as a nitrogen atom, having seven electrons, in exactly those models with . Neither nor reveal the same. This is an example where is superior to both and
For dataset 3, displays a similar sensitivity. It points at two outlier runs that neither nor make obvious. The disordered solvent molecule is a tetrahydrofuran, a five-membered ring with four carbon atoms and one oxygen atom. In all cases with as well as the run with , an incorrect six-membered all-carbon ring was modeled. In the run with , a five-membered all-carbon ring was modeled. The decreased value of for the six-membered ring might be due to a better modeling of the disorder, but it may also be due to the addition of four parameters by the extra carbon atom. The run with contains another error: The oxygen of a second, ordered tetrahydrofuran molecule was assigned as nitrogen. Hence, in this case, is capable of distinguishing two types of structures different by only one electron out of 385 in total.
Validation II: Comparison with Calculated Data.
The computation of provides a set of calculated structure factor amplitudes for the entire dataset H. With the method is computed from a structural model that was not refined against the particular observation . We were interested in whether the structure factor amplitudes from the computation of result in better electron density maps. Electron density maps are difficult to compare, the differences may be very subtle, and the map quality is affected by Fourier truncation errors as well as noise from missing low-resolution observations. For this reason we used calculated data from the structural models for datasets 4 and 6′, modified as described above.
The -based electron density map from dataset 4 has a stronger signal for the wrongly placed sodium than the conventional electron density map (see SI Appendix, Fig. S3A). Similar results are shown for dataset 6′ in SI Appendix, Fig. S3B. In both cases the -based map is less biased toward the structural model.
Because we were interested in whether parameter perturbation might have a different effect, we produced SI Appendix, Fig. S4, a nonchemical structure resulting from a perturbation amplitude of only . It illustrates why we do not recommend applying random parameter perturbation if one wishes to calculate the reliability index of one particular structural model. The next two sections illustrate this further.
Effect of Parameter Perturbation.
The previous examples show that is as good a quality indicator as with the benefit that it can be computed for datasets with very few data at constant reliability. The method we propose and assessed in this work uses refinement to convergence to reduce the effect of overfitting from the structural model. As mentioned in the introduction, alternatives have been suggested such as simulated annealing and random parameter perturbation (6, 13, 23). We addressed the question as to what extent random parameter perturbation affects the reduction of the effect of overfitting. For this purpose we created one set of parameters aligned on a regular grid for the centrosymmetric space group and similarly for the noncentrosymmetric space group . Neither of these sets of parameters contains any chemical information and refinement of these parameters is purely based on overfitting. Even can be reached when the data-to-parameter ratio is well below 1.
The effect on the reduction of overfitting was checked by calculating after random parameter perturbation. When only the coordinates are perturbed, the Wilson limits, 82.8% for centric and 58.6% for noncentric space groups (26), are hardly reached even with very large amplitudes , and only for very high data-to-parameter ratios. When both coordinates and atomic displacement parameters are perturbed, the situation is a little better, although even then the Wilson limit is only reached at high amplitudes X (see SI Appendix, Fig. S5). However, random parameter perturbation can severely compromise the structural integrity of a model (see SI Appendix, Fig. S4). We do not recommend the use of random parameter perturbation for the computation of .
Influence of Parameter Perturbation on Convergence Rate.
Although we already came to the recommendation not to use parameter perturbation for the calculation of , we were interested in the effect of random parameter perturbation on the rate of convergence. We used dataset 6′ and the corresponding structural model. The input structural model was refined to convergence with . The value of was monitored with an increasing number of refinement cycles with five perturbation amplitudes applied both to the coordinates and the atomic displacement parameters. Parameter perturbation has no beneficial effect on the rate of convergence (see SI Appendix, Fig. S6 and Table S27). After 100 cycles of refinement with and without parameter perturbation has reached the same value 23.8%, then fluctuates about this value. Graphs such as SI Appendix, Fig. S6 could be used to determine the number of refinement cycles needed to achieve the desired precision for .
Conclusions
Crystallographic studies make intensive use of the concept: A structural model is cross-validated against a small test set. The data of the test set are never used for refinement or model building. Therefore, cross-validation with is unaffected by overfitting. and the “free” set of observations are not only used for validation purposes. Weights for restrained refinement are optimized by minimizing , and the test set is used for estimating maximum likelihood parameters (15, 16, 27–29). The calculation of should be based on at least 500 data points. For reliable parameter estimation, at least 2,000 data points are usually set aside. There are many types of crystallographic studies that cannot afford excluding the required data points from refinement because the entire dataset is too small. Such studies include low-resolution macromolecular studies, high-pressure studies, neutron studies, and some of the latest data from free electron lasers (21, 22, 30).
In this work we assessed an alternative to , namely the method of . Its calculation was first suggested along with (6). In contrast to , is calculated from the entire dataset with observations that, at some point, were previously used during refinement. Therefore, the method relies on the reduction of the effect of overfitting from the structural model. Several methods have been suggested to reduce the effect of overfitting including simulated annealing (6), random parameter perturbation (13, 21, 23), and refinement until convergence. We show here that refinement until convergence is sufficient. We show that has at least the same low statistical bias as . Unlike , the value of does not vary even when only a single observation is left out from each refinement run. Therefore, the method enables the estimation of maximum likelihood parameters for small datasets.
To carry out model building, the structural model used as input for the calculation of should also be refined against all data (cf. ref. 14). The bias reduced electron density map is a byproduct of the calculation of . Analyzing fluctuations of specific atoms in the structural models , that are produced by the method, point at parts of the model that deserve special attention, such as weak electron density. See SI Appendix, section 4 for an example.
Supplementary Material
Acknowledgments
We thank our referees and the expert editor for their supportive and constructive criticism. We thank R. Pannu and P. Skubak for discussion about the expansion of the presented ideas to maximum likelihood-based refinement; P. Lafond for discussion about cross-validation, bootstrap, and jackknife methods; G. Murshudov for information about implementation details in Refmac5; G. M. Sheldrick for critical reading of the manuscript; and A. Paesch for crystal growth and data collection of cubic insulin. T.G. was partially supported by the Volkswagen Stiftung via the Niedersachsenprofessur awarded to Prof. G. M. Sheldrick.
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1502136112/-/DCSupplemental.
References
- 1.Berman HM, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28(1):235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Kennard O. Cambridge Crystallographic Database. Acta Crystallogr A. 1981;37:C343. [Google Scholar]
- 3.Kleywegt GJ, Jones TA. Where freedom is given, liberties are taken. Structure. 1995;3(6):535–540. doi: 10.1016/s0969-2126(01)00187-3. [DOI] [PubMed] [Google Scholar]
- 4.Kleywegt GJ, Brünger AT. Checking your imagination: Applications of the free R value. Structure. 1996;4(8):897–904. doi: 10.1016/s0969-2126(96)00097-4. [DOI] [PubMed] [Google Scholar]
- 5.Brünger AT. Free R value: A novel statistical quantity for assessing the accuracy of crystal structures. Nature. 1992;355(6359):472–475. doi: 10.1038/355472a0. [DOI] [PubMed] [Google Scholar]
- 6.Brünger AT. Free R value: Cross-validation in crystallography. Methods Enzymol. 1997;277:366–396. doi: 10.1016/s0076-6879(97)77021-6. [DOI] [PubMed] [Google Scholar]
- 7.Geisser S. 1993. Predictive Inference: An Introduction. Monographs on Statistics and Applied Probability (Chapman & Hall, London), Vol 55.
- 8.Efron B, Tibshirani R. 1994. An Introduction to the Bootstrap. Monographs on Statistics and Applied Probability (Chapman & Hall, London), Vol 57.
- 9.Stone M. Cross-validatory choice and assessment of stratistical predictions. J R Stat Soc B. 1974;36:111–147. [Google Scholar]
- 10.Schröder GF, Levitt M, Brunger AT. Super-resolution biomolecular crystallography with low-resolution data. Nature. 2010;464(7292):1218–1222. doi: 10.1038/nature08892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Brunger AT, et al. Improving the accuracy of macromolecular structure refinement at 7 Å resolution. Structure. 2012;20(6):957–966. doi: 10.1016/j.str.2012.04.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Pražnikar J, Afonine PV, Gunčar G, Adams PD, Turk D. Averaged kick maps: less noise, more signal... and probably less bias.. Acta Crystallogr D Biol Crystallogr. 2009;65(Pt 9):921–931. doi: 10.1107/S0907444909021933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Joosten RP, Long F, Murshudov GN, Perrakis A. The PDB_REDO server for macromolecular structure model optimization. IUCrJ. 2014;1(Pt 4):213–220. doi: 10.1107/S2052252514009324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Brünger AT. Assessment of phase accuracy by cross validation: The free R value. Methods and applications. Acta Crystallogr D Biol Crystallogr. 1993;49(Pt 1):24–36. doi: 10.1107/S0907444992007352. [DOI] [PubMed] [Google Scholar]
- 15.Lunin V, Skovoroda T. R-free likelihood-based estimates of errors for phases calculated from atomic models. Acta Crystallogr A. 1995;51(Pt 6):880–887. [Google Scholar]
- 16.Pannu NS, Murshudov GN, Dodson EJ, Read RJ. Incorporation of prior phase information strengthens maximum-likelihood structure refinement. Acta Crystallogr D Biol Crystallogr. 1998;54(Pt 6 Pt 2):1285–1294. doi: 10.1107/s0907444998004119. [DOI] [PubMed] [Google Scholar]
- 17.Efron B, Gong G. A leisurely look at the bootstrap, the jackknife, and cross-validation. Am Stat. 1983;37:6–48. [Google Scholar]
- 18.Kohavi R. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Morgan Kaufmann; Burlington, MA: 1995. pp. 1137–1145. [Google Scholar]
- 19.Tickle IJ, Laskowski RA, Moss DS. Rfree and the rfree ratio. II. Calculation Of the expected values and variances of cross-validation statistics in macromolecular least-squares refinement. Acta Crystallogr D Biol Crystallogr. 2000;56(Pt 4):442–450. doi: 10.1107/s0907444999016868. [DOI] [PubMed] [Google Scholar]
- 20.Gong G. Cross–validation, the jackknife, and the bootstrap: Excess error estimation in forward logistic regression. J Am Stat Assoc. 1986;81:108–113. [Google Scholar]
- 21.Gruene T, Hahn HW, Luebben AV, Meilleur F, Sheldrick GM. Refinement of macromolecular structures against neutron data with SHELXL2013. J Appl Cryst. 2014;47(Pt 1):462–466. doi: 10.1107/S1600576713027659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Fabbiani FP, Buth G, Levendis DC, Cruz-Cabeza AJ. Pharmaceutical hydrates under ambient conditions from high-pressure seeds: A case study of GABA monohydrate. Chem Commun (Camb) 2014;50(15):1817–1819. doi: 10.1039/c3cc48466a. [DOI] [PubMed] [Google Scholar]
- 23.Pražnikar J, Turk D. Free kick instead of cross-validation in maximum-likelihood refinement of macromolecular crystal structures. Acta Crystallogr D Biol Crystallogr. 2014;70(Pt 12):3124–3134. doi: 10.1107/S1399004714021336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Turk D. MAIN 2011: Refining against all diffraction data – free of R-free. Acta Crystallogr A. 2011;67:C598. [Google Scholar]
- 25. Tickle I (2015) CCP4 Bulletin Board. Available at www.ccp4.ac.uk/ccp4bb.php. Accessed November 25, 2014.
- 26.Wilson A. Largest likely value for the reliability index. Acta Crystallogr. 1950;3:397–398. [Google Scholar]
- 27.Tronrud DE. Introduction to macromolecular refinement. Acta Crystallogr D Biol Crystallogr. 2004;60(Pt 12 Pt 1):2156–2168. doi: 10.1107/S090744490402356X. [DOI] [PubMed] [Google Scholar]
- 28.Murshudov GN, et al. REFMAC5 for the refinement of macromolecular crystal structures. Acta Crystallogr D Biol Crystallogr. 2011;67(Pt 4):355–367. doi: 10.1107/S0907444911001314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Adams PD, et al. PHENIX: A comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr D Biol Crystallogr. 2010;66(Pt 2):213–221. doi: 10.1107/S0907444909052925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Barends TR, et al. De novo protein crystal structure determination from X-ray free-electron laser data. Nature. 2014;505(7482):244–247. doi: 10.1038/nature12773. [DOI] [PubMed] [Google Scholar]
- 31.Herbst-Irmer R. Experimental charge density studies: Discard valid data and overfit? Acta Crystallogr A. 2014;70:C282. [Google Scholar]
- 32.Mondal KC, et al. One-electron-mediated rearrangements of 2,3-disiladicarbene. J Am Chem Soc. 2014;136(25):8919–8922. doi: 10.1021/ja504821u. [DOI] [PubMed] [Google Scholar]
- 33.Holstein J, Hubschle C, Dittrich B. Electrostatic properties of nine fluoroquinolone antibiotics derived directly from their crystal structure refinements. CrystEngComm. 2012;14:2520–2531. [Google Scholar]
- 34.Gruene T, Sheldrick G, Zlatopolskiy B, Kozhushkov S, de Meijere A. Structure of hormaomycin, a naturally occurring cyclic octadepsipeptide, in the crystal. Z Naturforsch B. 2014;69b:945–949. [Google Scholar]
- 35.Ishikawa T, et al. An abnormal pK(a) value of internal histidine of the insulin molecule revealed by neutron crystallographic analysis. Biochem Biophys Res Commun. 2008;376(1):32–35. doi: 10.1016/j.bbrc.2008.08.071. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.