Abstract
Here we describe the updated MolProbity rotamer-library distributions derived from an order-of-magnitude larger and more stringently quality-filtered dataset of about 8000 (vs. 500) protein chains, and we explain the resulting changes and improvements to model validation as seen by users. To include only sidechains with satisfactory justification for their given conformation, we added residue-specific filters for electron-density value and model-to-density fit. The combined new protocol retains a million residues of data, while cleaning up false-positive noise in the multi-χ datapoint distributions. It enables unambiguous characterization of conformational clusters nearly 1000-fold less frequent than the most common ones. We describe examples of local interactions that favor these rare conformations, including the role of authentic covalent bond-angle deviations in enabling presumably strained sidechain conformations. Further, along with favored and outlier, an allowed category (0.3% to 2.0% occurrence in reference data) has been added, analogous to Ramachandran validation categories. The new rotamer distributions are used for current rotamer validation in Mol-Probity and PHENIX, and for rotamer choice in PHENIX model-building and refinement. The multi-dimensional χ distributions and Top8000 reference dataset are freely available on GitHub. These rotamers are termed “ultimate” because data sampling and quality are now fully adequate for this task, and also because we believe the future of conformational validation should integrate sidechain with backbone criteria.
Keywords: sidechain rotamer library, rare sidechain conformations, structural bioinformatics, structure validation, Phenix, high-quality dataset, protein conformation
2 Introduction
Protein sidechains take on preferred conformations which fall into distinct local energy minima known as rotamers, defined by the set of sidechain dihedral (χ) angles. For tetrahedral geometry, χ values fall into three discrete ranges: p (plus, centered near +60°), t (trans, centered near 180°), and m (minus, centered near −60°), as named in[1]1. These correspond to low-energy staggered conformations expected between sp3 hybridized atoms [3]. sp3 to sp2 bonds (tetrahedral to planar geometries) have more complex and much broader distributions. Overall rotamer conformations, however, are not simply the product of their individual χ distributions, since wider steric and other atomic interactions influence the preferred, or even the possible, combinations.
Rotamers have been studied extensively since the concept was introduced by Ponder and Richards in 1987 [4], and they are important tools in structural biology [5]. Rotamer libraries classically catalog favored sidechain conformations by the mean χ values and standard deviations for each rotamer. They are created by performing statistical analysis on a selected dataset of experimentally-determined models, usually crystal structures archived in the Protein Data Bank (PDB) [6]. Along with Ramachandran backbone ϕ, ψ analysis [7, 8], these libraries form the conformational criteria used in a variety of applications including crystallographic model building and refinement [9, 10, 11, 12, 13, 14], protein structure prediction and design [15, 16, 17], and protein model validation [18, 19, 20].
Different rotamer libraries represent the allowed variability around each central conformation in one of three ways. Early rotamer libraries simply provided the mean value and some estimate of allowable range for all χ angles (or, often, just for χ1 and χ2) of each identified rotamer for each side-chain type [4, 21, 22]. A user would either simply use the mean-value conformation, or else would optimize it manually or computationally within the allowable range. Because modeling the lowest-energy conformation fails to capture allowed variation and further minimization is computationally expensive, design methods expanded to employ “grid” libraries of arbitrarily-spaced, discrete sample points in χ space around the low-energy mean of each rotamer, which allowed development of the influential Dead-End Elimination method in protein design [23, 24]. A third type of rotamer software [25, 20] evaluates a given sidechain conformation by its position in a multi-dimensional probability distribution. Early such distributions were binned (often at ≥ 10°) [18], but recent ones use smooth contour surfaces, scored by what percent of the reference data lies outside that contour [1, 26]. Design libraries and validation libraries focus on two distinct areas of the reference data distributions. While design and prediction are primarily concerned with statistics and cluster shapes inside the low-energy wells [25], validation is primarily concerned with robustly identifying the outliers beyond the edges of those wells. Such outliers are usually wrong but sometimes valid and interesting and so are always worth examining [27].
Because rotamer libraries or distributions are an integral component of modern structural biology, it is imperative that they provide only authentic, low-energy sidechain conformations so that errors are not propagated. Such errors of circular reasoning can be documented for many early cases [1], such as high-energy eclipsed χ angles added for “completeness”, or real empirical data clusters caused by incorrect backward-fit branched sidechains. The accuracy of these libraries depends on including only reliably modeled sidechains in the reference dataset. All rotamer libraries filter their datasets at the file level by resolution and redundancy. As the PDB grew in size, it became increasingly practical to filter also at the residue level. This process very effectively lowers noise and sharpens clustering, because even at high resolution the poorly ordered regions are susceptible to misfitting and are often worse than the good parts at low resolution.
Previously our group developed the “penultimate” rotamer library, using the quality-filtered Top240 PDBs [1]. Soon afterward we updated the library using our Top500 dataset [28]. This library used many file-level filters such as requiring ≤ 1.8 resolution, a clashscore < 30 [29], and few backbone bond-angle outliers from Engh and Huber standards [30]. If the file contained multiple identical chains, only the best-ordered one was used. The residue-level filters required B factors < 40 and occupancies of 1 for all atoms in the residue and the absence of serious steric clashes, defined as hydrogen-aware atomic overlaps ≥ 0.4 Å [29]. These filters were used in order to eliminate residues with questionable justification for their given conformation, thereby increasing reliability. Specifically, the B-factor filter (the best metric available before mandatory data deposition) was meant to eliminate residues with poor electron density or other local uncertainty. However, as we realized and as Shapovalov and Dunbrack later clearly demonstrated [31], in many structures the B factors are a poor indicator of good density, primarily because they are often restrained to change only modestly between adjacent, covalently bonded atoms. Since 2008, when the wwPDB started requiring structure-factor data with all depositions, it has become feasible to do routine analysis of electron density directly.
In the current work we use a new quality-filtered dataset curated by our lab, the Top8000, to develop a new rotamer library. The main differences between this dataset and our previous dataset are sheer size (8000 vs. 500 chains) and improved quality filters, which are stricter and better balanced at the overall file (or chain) level and especially at the residue level. We now use a three-component residue filter that adds real-space correlation coefficient (RSCC) and local map value to B factor, effectively eliminating all residues with poor electron density. We explain how these strict filters reveal authentic, rare, and interesting rotameric states, including cases where bond angles must open up significantly. The improved statistics enable a three-level rotamer classification for validation of favored, allowed, and outlier (analogous to classic Ramachandran measures), where now only 0.3%, rather than the previous 1.0%, of the high-quality, filtered reference data lies outside both the allowed and the favored regions.
3 Methods
3.1 Overall Chain-level Dataset Filters
Our previous rotamer library distributions [1] used the Top500 quality-filtered data [28] as the reference dataset. Since then the number of high-resolution structures in the PDB has skyrocketed, allowing us to create a new quality-filtered, X-ray model database, the Top8000. The Top8000 was curated by assessing all PDB crystal structures as of March 29, 2011 with a protein chain of ≥ 38 residues at < 2.0 Å resolution. Hydrogens were added to each PDB file using Reduce, including Asn, Gln, and His flip corrections [32]. MolProbity analysis was performed on each chain [20] and the results entered into a MySQL database. The chains were additionally filtered on the following criteria: chain MolProbity score < 2.0, ≤ 5% of residues with bond length outliers (> 4σ), ≤ 5% of residues with bond angle outliers (> 4σ), and ≤ 5% of residues with Cβ deviation outliers (> 0.25Å).
In order to control redundancy in the dataset, we made use of the PDB homology clusters, taken separately for 50% sequence identity (most stringent similarity filtering), 70%, 90%, and 95%. For each homology cluster, we selected the best chain based on the average of resolution and chain MolProbity score. This scoring scheme produced ties within some clusters (for < 1% of the final chain tallies); these were resolved, arbitrarily but reproducibly, by alphabetical order of PDB ID + single-character chain ID. At each homology level, a second list was chosen with the additional requirement of deposited structure-factor data. All 8 chain lists are available on GitHub [33] (e.g. Top8000/Top8000-SFbest_hom70_pdb_chain_list.csv, at http://github.com/rlabduke/reference_data; details on the rotamer database and access via GitHub can be found in the supplemental material). The “SF” lists are somewhat smaller, since some clusters may include no otherwise-acceptable structures with deposited data. At the 70% similarity level optimal for most data-mining purposes, the Top8000-SFbest_hom70 list contains 7,419 chains and the Top8000-best_hom70 list contains 7,957 chains. We therefore call these datasets the “Top8000”, as successor to the Top500.
All rotamer statistical analysis presented here used the Top8000-SFbest_hom70 dataset version. Deposited structure factors not only are required for electron-density based residue filtering, but they also enable manual examination of the model in the density map for individual examples of conformations deemed dubious or interesting. Of the 7,419 chains, 28 failed in the RSCC calculation described below, and 175 were dropped because ≤ 20% of their residues remained after residue-level filtering. The final rotamer dataset of 7,216 chains is listed on GitHub at Top8000/Top8000_rotamer_pdb_chain_count.csv.
3.2 Residue-level Dataset Filters
Although the chain-level filters select for structures with high overall quality, local quality varies considerably within any model. In this context, quality refers to canonical macro-molecular model validation criteria (sterics, geometry, and conformation), as well as the fit to electron density. An important aspect of this research is to include only amino-acid sidechains with sufficiently clear electron density to justify their given conformation. Our previous rotamer library was based on the Top500 dataset where only three residue-level filters were used: no clashes, no alternates, and no B factors ≥ 40 Å2. The B factor filter was then (in 2003) our best available proxy for electron-density quality; however, residues with dubious sidechain density still remained. To remedy this, the current dataset was additionally filtered by a local real-space correlation coefficient (RSCC) metric [34] and by local map value.
(1) |
The local RSCC looks at the correlation between the σA-weighted 2mFo-DFc and the Fc (calculated from model) electron density maps for a given local region, in this case, around each individual atom. Equation 1 shows how the RSCC was calculated, as implemented in PHENIX. o and c are σ values at grid points in the 2mFo-DFc and the Fc maps, respectively, in a radius around the atom. The radius used is resolution dependent; a 1Å radius is used for resolutions below 1Å and a 1.5 Å radius is used for resolutions between 1 Å and 2 Å. n is the number of grid points in the selected region and ō and c̄ are the local mean values for o and c, respectively. The RSCC alone is not an adequate density-fit metric for our purposes. However, local RSCC (for shape), local 2mFo-DFc σ value at the atom coordinate (for height), and B factor (for spread) together create a satisfactory fit-to-density metric. We used phenix.real_space_correlation in the PHENIX software package [12] to calculate the RSCC and 2mFo-DFc σ values at each atom for all structures in the Top8000. Per-residue filter values were then assigned by taking the worst atom B (greatest value), worst atom RSCC (least value), and worst atom 2mFo-DFc σ value (least value) in each residue.
In selecting filter thresholds for the dataset, we had the goal of keeping a large number of residues while reliably eliminating those with dubious density. To do this, we analyzed the counts of residues remaining for several combinations of filter thresholds. The best balance between filter levels and a reasonably large number of residues yielded the following thresholds: worst per-atom isotropic B factor < 40 Å2, worst correlation coefficient > 0.7, and worst map value > 1.1 σ. Upon visual inspection of residues and maps close to these filter thresholds, we determined that the selected combination of thresholds were indeed effective at keeping only residues with satisfactory electron density. Additionally, residues were required to have no all-atom clashes, an occupancy of 1.0, B factors > 1.0 Å2 and all backbone atoms modeled, achieved indirectly by ensuring that ϕ, ψ, ω and τ were defined for each residue. The latter test also drops first and last residue in each PDB chain, which are somewhat less and differently affected by backbone interactions.
After all filters, the final reference dataset for this work contains more than a million residues, 983,574 of which are rotamer-relevant, non-Gly non-Ala residues. A csv file of all 983,574 residues in the rotamer-relevant reference dataset is available on GitHub (See Top8000/Top8000_rotamer_residues.csv at http://github.com/rlabduke/reference_data).
3.3 Determination of Distributions, Scores, and Contours
The described rotamer library’s primary use is for validation of sidechains in protein models, which flags an outlier if the sidechain has an extremely rare (and presumably high-energy) conformation. This assessment requires a scoring system based upon the multi-dimensional distribution of observed conformations in our high-quality, residue-filtered reference data. We have taken great care to ensure smooth, accurate, and robust contours dividing outlier from allowed. To achieve this we calculated smooth distributions in the multi-dimensional χ space for each residue type, using an adaptive local-density-dependent kernel density estimation (KDE) [35]. Our method has two steps, both using a cosine kernel normalized to have an area or volume of 1.0. A cosine is used, rather than a gaussian, because it reaches zero at a well-defined edge. In the first step, the width of each cosine kernel is 5°. In the second step, the width of the kernel is varied, dependent on the density at that location, as calculated in the first step. Kernel widths are wider in sparse regions and narrower as density increases. The consequence of this is distributions that remain smooth in sparse regions but preserve the sharp transitions where occurrence frequency falls off quickly. A full explanation of this method can be found in [28]. The distributions are stored as a discrete grid in χ space with coordinates and KDE values. Grid spacing is dependent on dimension number: in order of χ dimensions, 1–4, the grid spacings are 1, 5, 8, and 10 degrees. Each datapoint residue can be assigned a value by interpolating its χ values within the grid. One can then determine what grid value is just greater than for the lowest, say, 1% of the quality-filtered reference data. The grid values are then rescaled to represent those percentage values, which are known as the rotamericity of a sidechain conformation, or simply as the rotamer score. The rotamer-score grids are used in PHENIX validation (GUI, phenix.rotalyze, and phenix.molprobity) and the MolProbity web service, and they are available as plaintext numerical arrays on GitHub, under Top8000/Top8000_rotamer_pct_contour_grids at http://github.com/rlabduke/reference_data.
To visualize the rotamer distributions, smooth contours are drawn at chosen levels using our internal programs Silk, kin2Dcont, and kin3Dcont [36, 37]. For the 4-dimensional Lys and Arg cases, the 3-D plots for χ1m, χ1t, and χ1p are shown separately. For validation purposes, the interpolated grid value of a given multi-χ conformation is its rotamer score. In the new system developed here, a score of <0.3% qualifies as a rotamer outlier – its score is worse than 99.7% of the good data. Scores ≥ 0.3% and < 2.0% are considered allowed while those ≥ 2.0% are considered favored, as is traditional for Ramachandran criteria [18, 28].
3.4 χ and Covalent Bond Angle Statistics
Maximum rotamer ranges were defined manually for each χ dimension by inspecting the smoothed contours (See 3.3) and placing boundaries at saddle points between rotamers or to reasonably encompass the rotamer well. To avoid wrapping complications, circular statistics were used (See Equations 2 and 3); this is important for χ distributions that cross zero, such as Asp p0 and m-30, or for rotamers near 90° in symmetric aromatics. For each rotamer, we report a central value and a standard deviation (σ), for both dihedral χ values and covalent bond angles. The mean for the bond angles (Equation 4) and the σ for both χ and bond angles (Equation 3) are calculated straightforwardly from the measures of each example in the filtered Top8000. Because of complex shapes, the central value for the χ measures is the center-of-mass (COM) in the contoured data. This is found by collecting the stored contour grid points in each rotamer well and, for each dimension, calculating the COM (Equation 5).
(2) |
(3) |
(4) |
(5) |
where n is the number of grid points, θi is the ith χ coordinate in radians, wi is the KDE value at the ith coordinate and W is the sum of all KDE values in the rotamer well. To avoid wrapping issues when working with angular data, circular statistics breaks angle measures into its two unit circle components, x and y. This is part of what is being done in Equation 2, where x̄ and ȳ are the average x and y components, respectively, for all angular measures, θ. In Equation 5, a weighted average of the x and y components is being calculated. The arctan function simply takes the x̄ and ȳ components and returns the corresponding angular value.
3.5 Rotamer Assignment and Score
For the purpose of setting the appropriate contour level for the new outlier cutoff, a large test dataset was created from all PDBs in the Top8000 by including all chains and all residues in each PDB file (as a surrogate for entire new PDB files later being validated), hereafter called the unfiltered dataset. phenix.rotalyze was used to assign rotamer names and scores for each residue in the unfiltered dataset. The score is calculated from the individual residue’s χ values, by interpolating over the nearest contour grid points (see section 3.3). This was done twice, once using the new Top8000 contours and once using our previous Top500 contours. Outlier counts were made at several Top8000 contour cutoffs and then compared to the outlier counts as determined by the Top500. In order to keep outlier counts approximately equal between the previous and new systems, a Top8000 outlier cutoff of 0.3% was selected. If the score is above the outlier threshold, then a rotamer name is assigned based on which rotamer well the given χ angles fall within (well ranges were described in Section 3.4). Otherwise the side-chain conformation is classified as an outlier.
The contour peaks for 214 rotamer clusters reach the 0.3% level. To ensure validity, for clusters with ≤ 8 non-outlier datapoints we manually examined model, electron density, and interactions for the example residues. Four such clusters were judged reliable and only two unreliable (both for Lys and with ≤ 4 datapoints). There are therefore 212 named rotamers in the final list.
4 Results
In the Top8000 system there are a total of 212 named rotamer clusters, compared with 153 in our 2000 paper [1]. Many of the central values have shifted somewhat from the Top500 system, both because of much more, higher-quality data and because of the new center-of-mass definition (see Section 3.4), which we feel represents the clusters better than either the modal or the common-atom values defined previously [1]. MolProbity’s “ultimate” rotamers and their parameters are given in two tables: Table S3 describes the frequency and count of each rotamer in each residue type and Tables S4-S21 give central dihedral and covalent angle values for each rotamer. These numbers are available in CSV format on GitHub under Top8000/Top8000_rotamer_central_values at http://github.com/rlabduke/reference_data. The more useful multi-dimensional distributions are also on GitHub, as linked in Section 3.3.
The order-of-magnitude increase in size and quality of our reference dataset greatly improves signal-to-noise, allowing us to do a better job of the same functions as before. More importantly, it enables new features and new conclusions. The simplest view of a rotamer is of a favored sidechain conformation described by the list of dihedral angles, with ideal staggered χ values between sp3 atoms and little preference for χ between sp3 and sp2 atoms. The next level of detail adds a standard deviation for each χ. However, conformational preferences are really much more complex, especially in a macromolecular interior where numerous interactions can compensate for the energy required to depart from ideal. We have high confidence in the reliability of our more nuanced conformational distributions, as we took great care to include only physically plausible sidechains that fit clear electron density (see Section 3.2). As Figure 1 demonstrates, sidechain conformations do cluster in discrete regions of χ space. Further, there is large variance in χ values within a rotamer cluster, the shape is usually not axis-oriented, and its center is rarely at the nominally ideal staggered conformation. A rotamer is better thought of as a local energy well, with a potentially complex shape, describing the favorable extent around an allowed sidechain conformation.
Figure 1.
χ1χ2 space for Leu, Ile, and His. The 0.3% contour outline is in gray. Each point of the background data cloud represents a residue in the filtered Top8000 dataset. Crosshairs mark the nominally ideal staggered values between sp3 hybridized atoms, labeled as m (−60°), t (180°), and p (+60°).
4.1 Residue Filter Effects
Our strict residue-level filtering aims to include essentially only conformations with unassailable reliability, at some expense to the overall numbers. The fraction of residues kept after filtering differed between amino-acid types (Table S1). Residue-level filtering eliminated the most data, in order, for lysine, glutamate, and arginine. This makes sense since long sidechains are more susceptible to dynamics that can blur or eliminate density. Also, charged sidechains are usually found on the molecular surface interacting with mobile solvent. Hence, surface-exposed positions such as these, lacking strong interactions that can hold them in a disfavored conformation, have no need and no ability to adopt rare, unfavored rotamers. If such sidechains are modeled, they should be in one, or more often several, common rotamers.
4.2 Rotamer Evaluation
The primary purpose of this rotamer library is to analyze the rotamericity of protein sidechain conformations and robustly distinguish outliers. Rotamericity here refers to where a given conformation lies in χ space relative to the calculated contours (see Section 3.3). This evaluation gives a rotamer score between 0 and 100, corresponding to the percentage of the high-quality reference data that lies outside that contour (i.e., how much of the reference data’s scores are worse than the given residue). In addition, for scores above the outlier cutoff, the evaluation also assigns a rotamer name. Each local minimum, or cluster, in χ space is given a name derived from the central χ values of that cluster. Values for χs between sp3 hybridized atoms are named m, p, or t roughly corresponding to the staggered values −60°, +60° and 180°. χs between sp3 and sp2 hybridized atoms are assigned a number which is the COM value (Section 3.4) of the given χ, rounded off. For instance, a Gln rotamer with COM at χ1 = −174, χ2 = −82, and χ3 = −22 is named tm-20.
For validation purposes, users are often most interested in the binary issue of whether a sidechain conformation is rotameric or an outlier, i.e. falls within or outside the rotamer distribution. With the Top500 data, contours were smooth and reproducible (between different datapoint selections) only out to the 1% level. Due to greater numbers and increased reliability in the filtered Top8000 distributions, a lower outlier cutoff is now feasible as well as desirable, since an even lower percentage of the reference datapoints are now dubious. We also preferred to keep outlier numbers for unfiltered, general data roughly equal between the previous and new systems. Therefore a count of the 1% outliers by residue type in the unfiltered dataset was performed using the Top500 contours, and those numbers were compared to outlier counts at several different Top8000 contour cutoffs. A new cutoff of 0.3% matched best, was found to behave smoothly, and happens also to match the 3σ level for a normal distribution (3 out of 1000). Table S2 reports the outlier counts in the unfiltered dataset for both the Top500 and Top8000 contours. Further, along with favored and outlier, a new category has now been added: allowed, for scores ≥ 0.3% and <2.0%. This new category lets users know if a given conformation is at the edge of the given rotamer’s distribution, close to the outlier region. This change matches the 3-part system, and the division at 2.0%, long judged useful for Ramachandran criteria [38, 19, 28].
An important question is how the new system will change rotamer analysis, specifically the number and identity of outliers. A rotamer outlier is simply a conformation that lies outside the outlier contours of the reference dataset, in this case either the Top500 or the Top8000. For residues that have low-energy sidechain conformations, nothing will change. However, a sidechain conformer in a strained position near the outlier edge could change status between the Top500 and Top8000 analyses. Figures 2 and S1 show Top500 and Top8000 allowed regions superimposed for all residues with two χs. For the most part, the contours overlap, but with some differences near the edges. Asp shows very little change, Trp has a significantly larger allowed region now than in the Top500 system, and the Ile Top8000 contours have tightened up relative to the Top500. Thus, differences are only in low-population regions of χ space, and overall outlier counts will be nearly identical.
Figure 2.
Areas in orange (from Top500 data) and in blue (from Top8000) fill the allowed regions for Asp, Trp, and Ile. The extensive areas in green are where the two systems both declare allowed conformations. See Figure S1 for the rest of the two-χ residues.
4.3 Dihedral and Bond Angle Deviations
As we have shown, rotamers are more complex than just a collection of mean χ values and standard deviations. The allowed regions are often large and complex, and the mean or modal χ angles can deviate substantially from the expected staggered conformation. Such deviations usually occur in well-packed environments where the sidechain makes tight atom-atom interactions, either via van der Waals (vdW) or H-bonding, with its own backbone (or within the sidechain). The χ deviations help allow the contact to be a small overlap rather than a steric clash.
For sidechains that have sp3 atoms out to the δ atom (Leu, Ile, Met, Glu, Gln, and Arg), there are four χ1χ2 combinations that produce close sidechain-backbone interactions; pm and mp interact with the NH while tm and pp interact with the carbonyl C. Comparing statistics on all χ1χ2 combinations for these amino acids in the filtered dataset (Table 1), we see that the backbone-interacting combinations are the least populated and that they have large χ deviations from the ideal staggered positions, especially for χ2.
Table 1.
All χ1χ2 combinations in the filtered Top8000 for Leu, Ile, Met, Glu, Gln, and Arg. The backbone-interacting combinations are in bold, as are large deviations from ideal stagger.
χ1χ2 | n | % | χ1 mean | Δ ideal | χ1 sd | χ2 mean | Δ ideal | χ2 sd |
---|---|---|---|---|---|---|---|---|
pp | 1301 | 0.4 | 60.7 | 0.7 | 8.3 | 84.0 | 24.0 | 10.4 |
pt | 20733 | 5.6 | 64.1 | 4.1 | 7.7 | 177.0 | 3.0 | 11.4 |
pm | 2086 | 0.6 | 69.5 | 9.5 | 9.3 | −83.8 | 23.8 | 9.4 |
tp | 50951 | 13.7 | −178.6 | 1.4 | 8.9 | 63.3 | 3.3 | 7.8 |
tt | 49913 | 13.4 | −176.0 | 4.0 | 8.9 | 176.1 | 3.9 | 11.7 |
tm | 2390 | 0.6 | −172.5 | 7.5 | 9.5 | −84.5 | 24.5 | 10.5 |
mp | 8917 | 2.4 | −71.3 | 11.3 | 12.0 | 78.2 | 18.2 | 14.5 |
mt | 192806 | 51.9 | −65.5 | 5.5 | 7.7 | 175.2 | 4.8 | 10.1 |
mm | 42219 | 11.4 | −62.8 | 2.8 | 8.9 | −63.9 | 3.9 | 10.1 |
Breaking down these statistics by residue type reveals that Glu and Gln make up 54.3% of mp, 59.8% of tm, and 92.2% of pm examples. The likely reason for this is that the terminal oxygens on Glu and Gln can H-bond with their own NH for pm and mp. In tm a strong vdW interaction is made with the C-terminal peptide plane, and the sidechain amide or carboxyl group is generally further stabilized by one, or more often multiple, H-bonds. However, as for all sidechain-backbone interacting rotamers, the dihedral χ deviations don’t tell the whole story – covalent bond angles also must deviate to avoid steric clashes.
It has long been known that certain sidechain conformations show large deviations in the otherwise narrowly distributed covalent bond angles [1]; this is seen in the scenarios above where sidechain-backbone interactions occur. One rotamer of this type is methionine ppp, which has just 48 examples (0.29% of Met) in the filtered Top8000. Figure 3 shows one specific example and demonstrates how deviations in both χ dihedrals and covalent bond angles are needed to avoid a clash and form a favorable non-bonded interaction. All-atom contacts demonstrate the tight vdW packing. For this residue, the angles C-Cα-Cβ, Cα-Cβ-Cγ, and Cβ-Cγ-Sδ each open up from Engh & Huber values, by +1.6°, +0.9°, and +3.0° respectively.
Figure 3.
Sidechain contacts for Met A 240 in 2bmo, which adopts the rare methionine rotamer ppp. (a) Sulfur contacts, modeled with ideal staggered dihedrals and Engh & Huber bond angles, (b) with Engh & Huber bond angles, but dihedrals as deposited, (c) for the deposited structure, and (d) all contacts for the deposited structure, showing its tight vdW packing.
4.4 Interesting Rotamer Anecdotes
Consistently large deviations from ideal values in the χ and covalent bond angles for some rotamers raise two questions, which are interesting and sometimes illuminating: (1) What is it about each of these rotamers that requires such deviations? (2) What are the structural features and roles for each of these rotamers? These questions were answered by inspecting individual examples from the filtered dataset. After an interesting rotamer was identified, at least five random examples of that rotamer were selected from the filtered dataset. To visualize interactions for the residue of interest, all-atom contacts were displayed in KiNG [39]. This process revealed commonalities of the rotamer environment, reasons for the deviations, and possible reasons for a rare rotamer frequency. What follows are select observations from this process.
4.4.1 Glutamate pm20 & mp0
Glutamate pm20 and mp0 make up 2.6% and 6.4% of all glutamates in our dataset and would not be considered rare in the context of this paper. These rotamers are of interest, however, for three reasons: (1) some average χ and bond angles have relatively large deviations from ideal (Table 2), (2) despite this, these rotamers are significantly more common than other rotamers with similar deviations, and (3) the χ distributions for both show that all three χ measures strongly depend on one another, forming correlated datapoint clusters (Figure 4a). In both pm20 and mp0, the same carboxylate oxygen makes a strong H-bond with its own NH, but they take a different path to do so (Figure 4b). The large geometry deviations are required in order to avoid a steric clash of Cδ with the backbone. Glutamine can adopt equivalent arrangements, as previously noted even from much sparser data [40].
Table 2.
Mean angle differences, relative to Engh & Huber, and modal χ-dihedral differences from stagger, for Glu pm20
Angle | E&H Δ | χ | stagger Δ |
---|---|---|---|
pm20 n=1442 | |||
Cα-Cβ-Cγ | +1.5° | 1 | +9.1° |
Cβ-Cγ-Cδ | +1.8° | 2 | −24.7° |
Cγ-Cδ-Oε1 | +1.6° | 3 | - |
| |||
mp0 n=3568 | |||
Cα-Cβ-Cγ | +0.3° | 1 | −6.8° |
Cβ-Cγ-Cδ | +1.6° | 2 | +22.5° |
Cγ-Cδ-Oε1 | +1.4° | 3 | - |
Figure 4.
Panel (a) shows the 2.0% contour surfaces for the filtered Top8000 along with data points, as projected onto the χ2-χ3 plane for rotamers Glu pm20 (orange) and mp0 (blue). (b) shows how pm20 and mp0 both make a good H-bond (green dots) with the adjacent backbone NH, albeit through different conformations.
The strong sidechain-backbone H-bond, between atoms separated by only 5 covalent bonds, is an important characteristic of these conformations. It compensates for the energetic cost of distortions better than would a vdW contact, and especially it gives rise to a characteristic interdependence of the three χ angles: as one χ changes, the others change in predictably compensating ways to preserve the favorable H-bond.
4.4.2 Isoleucine pp
With just 249 examples in the filtered Top8000, pp is the rarest isoleucine rotamer. This rotamer epitomizes most rare ones, by having numerous surrounding vdW interactions to pack the sidechain into this specific rotamer state, which is rare because of strain from its need for large average bond and dihedral angle distortions. The modal χ2 value is +24.3° from staggered and bond angles Cα-Cβ-Cγ1 and Cβ-Cγ1-Cδ1 open, relative to Engh & Huber, by 2° and 1°, respectively (Table 3). In pp the terminal methyl is in the “down” position approximately parallel to the course of the backbone (Figure 5). In all examples examined, local structure packed the methyl into this rare conformation by sterically prohibiting the more common pt rotamer (methyl “up”). The resulting bond-angle and dihedral openings are necessary to minimize a steric clash between Cδ1 and the backbone C.
Table 3.
Mean bond-angle differences relative to Engh & Huber, and modal χ-dihedral differences from stagger, for Ile pp
Angle | E&H Δ | χ | stagger Δ |
---|---|---|---|
pm20 n=249 | |||
Cα-Cβ-Cγ1 | +2.08° | 1 | −2.1° |
Cβ-Cγ1-Cδ1 | +1.19° | 2 | +24.3° |
Figure 5.
Two different examples of Ile pp, both demonstrating the close Cδ1/C contact, and the extensive vdW interactions that prevent the more common pt conformation. (a) 1s99 Ile A 76, (b) 2wvx Ile C 235.
4.4.3 Methionine mpm
Methionine mpm is extremely rare, with only 13 examples in the filtered Top8000. The deviations from ideal are remarkable: Cα-Cβ-Cγ and Cβ-Cγ-Sδ open by more than 2° and 3°, respectively, and the modal χ3 in mpm is a mere 20° from being eclipsed – probably permitted by the fact that χ3 in Met has a lower rotational barrier than all-carbon tetrahedral torsions because of the longer C-S bond [41, 42]. Here, if χ3 were any closer to staggered, the ε-methyl would clash with its backbone NH.
Out of the 13 Met mpm examples in the filtered Top8000, 11 are structurally very similar, with much of the surrounding sequence identical. All 11 belong to the large superfamily of subtilisin-like serine proteases, which contains six families A-F. A large multi-sequence alignment ([43]) reveals that the Met is completely conserved in families A, B, and C, absent in D and E, and mostly present in F. The Met occurs in the first turn of a helix that is disrupted n+3 to the Met by a proline conserved in the same families (Figure 6a). Most significantly, immediately preceding the Met is the Ser of the canonical Asp-His-Ser catalytic triad (Figure 6b). Figure 6a shows the extensive contact between the catalytic Ser and the mpm Met. Also shown is the extensive contact between the Met and Ile 246, which excludes the possibility of Met χ3 adopting either a p or t conformation. Although the identity of residue 246 changes across the 11 structures, the constraining hydrophobic contact is maintained.
Figure 6.
3D43 chain B, one of the 11 subtilisins containing a Met mpm. (a) The local arrangement of Ser 250 (red) and Met 251 (blue) on the helix, with the interacting Pro 254 and Ile 246; (b) rotated to show the active site, with its canonical catalytic triad in red and Met 251 in blue.
This case illustrates that 70% homology filtering does not always produce independent examples when a rotamer is conserved for functional reasons. In the families where the Met is conserved, it seems structurally important as part of a motif packing tightly with the catalytic Ser. However, in families where the Met is missing, no other sidechain fills in to sterically position the active Ser. Therefore the functional reason for conservation must be more subtle, at the frontier of our understanding. Experimental research will be needed to untangle its effects on catalysis.
4.5 Conformations Between Tetrahedral and Planar Atoms
Due to differing physical constraints, dihedral angles between tetrahedral and planar atoms do not follow the well-clustered ptm conformations seen between two tetrahedral atoms: when one planar sp2 branch is staggered the other is eclipsed. This is reflected in the Top8000 filtered distributions where sp3-sp2 dihedrals (always in the final χ of the sidechain) often show only weak preferences across their total range (e.g., see Figure 1 for χ2 of histidine).
For the aspartate and asparagine contours shown in Figure 7, sometimes χ2 has disallowed regions and sometimes the entire range is allowed. This fact makes it even more difficult, and less meaningful, to assign a central value and standard deviation that adequately describes the allowed conformations. Asp and Asn also show clearly that there is distinct, complex fine-structure of local clustering in the data distributions; note especially the smaller elongated clusters at top right of χ1 t. These datapoint clusters are primarily due to patterns of sidechain H-bonding to specific backbone donors or acceptors local in sequence, or occasionally due to especially favorable vdW packing against local backbone. For example, the Asn Nδ2 can either H-bond with the i-4 CO in a regular α-helix in an m-80 conformer or pack against that CO in the more common m-20 conformer [40].
Figure 7.
Filtered Top8000 Asp and Asn datapoints and outlier contours (gray outline). χ2 does not follow the ptm convention since it is an sp3 sp2 dihedral.
Both these issues confirm that a complex probability density function, such as our filtered and smoothed empirical contours, represents sidechain conformational preferences much better than a library of simple box or ellipsoid shapes. Its contour outline at low probability is a definitive way to flag rotamer outliers. For protein structure prediction or design, the details of favored local motifs at high contour levels should also be considered, preferably with the addition of information both about ϕ, ψ values [15] and local secondary structure [1]. Either as a library or as a distribution, it is preferable to avoid the “rare” rotamers (identified in Table S3) unless the other data is robust enough to support assignment of a very low-probability conformation.
5 Discussion
5.1 In What Sense is this Ultimate?
Calling this set of rotamer-library distributions “ultimate” is a claim that requires both explanation and justification. 16 years after our “penultimate” rotamers, we had to confront the issue of whether or not MolProbity’s line of validation-focused rotamer distributions had reached its ultimate stage. We decided that indeed it had, from three separate lines of argument, reflected in the wording of this work’s title.
We claim it as ultimate only for the multi-dimensional χ distributions used to validate sidechain models, such as done by the MolProbity website and in PHENIX. The rotamers presented here are not necessarily ultimate or even appropriate for other purposes.
The changes from penultimate to ultimate distributions are fairly minor in quantitative terms, even after 16 years. The importance of the new system lies in its ability to support a 3-level evaluation, to identify very rare rotamers, and to specify rotamer-dependent bond-angle deviations. A million reliably-modeled residues of data now provide fully adequate sampling for the basic task of robustly locating smooth contours that separate favored (98%), allowed, and outlier regions. The fundamental χ distributions would be unlikely to change much at all if they were redetermined in the future, and we plan to keep them freely available on GitHub for testing, for use within other software systems, and for spinoffs such as approximation by differentiable functions.
What we do expect will improve in the future is a redefinition of conformational validation, enabled by expansion of data and computing power, and driven by the needs of structure determination at lower resolution. Rather than doing separate Ramachandran and rotamer evaluation, we should move toward analyzing all backbone and sidechain torsional dimensions together [8], including allowance for the influence of secondary structure and local motifs. Thus our “ultimate” claim asserts the position that MolProbity’s type of rotamer evaluation should no longer be updated but should evolve into something better.
5.2 Use of the Top8000 Rotamer Distributions in Model Building
Although the Top8000 rotamer distributions were created specifically for MolProbity’s validation of sidechain conformations, they also can be used in model building for crystallography, design, homology, etc., as the Top500 library has been [10, 44, 11, 45, 46, 12, 24, 47]. To make model building computationally tractable, the procedure has usually been to initially fit sidechains as discrete conformations at rotamer values, then to minimize or to test at neighboring sample points, using σ to inform the allowable range for each χ. However, most rotamer distributions are far from normal and often have quite complex shapes, as can be seen in Figures 7 and 1. Thus the approach of using a central value and σ cannot adequately describe allowed sidechain conformations.
The contour values in our rotamer distributions correspond to the percentage of high-quality reference data that lies outside that contour. The consequence of this treatment is that the contour data represent a probability density function in χ space describing sidechain conformations. We recommend using those contours to delimit sampling or minimization in χ space, with the added benefit that the contour levels provide prior probabilities for each position. If a contour level higher than 0.3% is chosen to limit sampling within a rotamer well, then entire rotamers which do not reach the chosen percentage level should also be omitted. Depending on details of the algorithm used, it may or may not be beneficial to model significant rotamer-dependent bond-angle deviations. It should also be noted that some rotamer wells are very elongated, and if minimization is only allowed to move by a fixed distance from the central value, then additional sample points may need to be defined. Elongated rotamers only occur when the final χ angle is around a tetrahedral-to-planar bond, easily recognizable because the rotamer name ends in a number (e.g., Trp t160 or Arg ppp-140).
5.3 I Have a Rotamer Outlier; is it Wrong?
An accurate macromolecular model gives unparalleled mechanistic knowledge at the single-molecule level. Many such models, of different cellular components, combined with structural perturbations such as point mutations and ligand binding, provide mechanistic detail on the cellular level. These methods not only provide interesting knowledge, they allow understanding crucial to the treatment of disease. The utility of a model in this process is correlated with how accurate it is. As such, structural biologists want to build models that come as close as possible to representing a valid state of the actual macromolecule. Unfortunately, the experimental data alone seldom provide all information needed to build an adequate model. Fortunately, empirical knowledge of chemical and macromolecular structure helps greatly, and in crystallography much of this knowledge (e.g. bond lengths and angles, chirality) is already part of the automated software. Structure validation such as MolProbity provides even more empirical knowledge, often highlighting errors that refinement was unable to fix.
Sometimes we get disturbing reports that people are trying to achieve a sort of “MolPro-bity Nirvana” by attempting to eliminate every single outlier. To approach eliminating all clashes is a worthy goal: in principle the clash target is zero, as sterics do not allow serious atomic overlaps. In practice however, (a) all-atom contact analysis includes approximations (e.g. spherical atoms); (b) at high resolution a zero score would require the difficult reconciliation of occupancies for alternate conformations including waters; and (c) usually a few puzzles remain. An important point to keep in mind is that the primary purpose and usefulness of model validation is to help diagnose and correct places where the conformation has been fit in the wrong local energy well. Such corrections often matter to biological interpretation, and will be stable to further refinement. In contrast, a small shift across the border into the allowed region will often shift right back again, and is not a very meaningful improvement.
However, the target is very definitely not zero for Ramachandran and rotamer validation. The idea that every single torsion needs to be allowed is a misunderstanding. As outlined here, validation of torsion angles relies on contoured conformational distributions from quality-filtered datasets. The outer contour defines an outlier cutoff, meaning that even the filtered reference data has outliers but not that those conformations are wrong. There is room for valid torsion outliers, more so for sidechain rotamers than for backbone Ramachandran values. Even in 2005 the curve of rotamer-outlier percentage, fit as a function of resolution, was found to asymptote at 0.5% [10]. For quality-filtered data the rotamer outlier cutoff is now 0.3%, meaning that 3 out of 1,000 residues are expected to be valid rotamer outliers. For Ramachandran the outlier cutoff is 0.05%, so only a very low 5 out of 10,000 residues – but not zero – are expected to be valid Ramachandran outliers. In either case, to be accepted as valid, an outlier should have clear electron density to support its occurrence, and should have either H-bonds or tight packing to hold it in the presumably quite strained conformation. This is exemplified by the Asn rotamer outliers in Figure 8, which each have an eclipsed χ1 but are validated by excellent density and three sidechain H-bonds. Valid outliers are more likely to have been selected and maintained by evolution if there is a functional need of some sort (folding, catalysis, binding) for that specific conformation; therefore such cases will probably be of interest and will reward as well as require detailed examination.
Figure 8.
Shown is the χ1 = p slice of the Asn distribution as well as the allowed (green) and outlier (red) contours. Each point in the distribution represents one residue in the filtered dataset. There are several residues that lie outside the outlier contours – thus are outliers. Two examples that are far from any allowed contour (inside the circle) are shown with excellent 2mFo-DFc density. These valid outliers both exhibit extensive H-bonding which allows them to be held in an outlier conformation.
Supplementary Material
Table 4.
Mean bond-angle differences, relative to Engh & Huber, and modal χ-dihedral differences from stagger, for MET mpm.
Angle | E&H Δ | χ | stagger Δ |
---|---|---|---|
N-Cα-Cβ | 0.54° | 1 | −17.2° |
Cα-Cβ-Cγ | 2.04° | 2 | +4.0° |
Cβ-Cγ-Sδ | 3.15° | 3 | −41.8° |
Acknowledgments
We would like to thank Richardson lab members Dan Keedy and Bryan Arendall for setting up the chain-level Top8000, Jeff Headd for moving the MolProbity utilities to use the PHENIX cctbx toolbox, and Michael Prisant for help with GitHub. Also, thanks to PHENIX Project members Nigel Moriarty and Nat Echols for help with integration into PHENIX automation and GUI. This work was supported by NIH grants R01-GM073919 and ProjectIV of P01-GM063210, and by an NSF Graduate Research Fellowship to BJH.
Footnotes
The p, t, m nomenclature was adopted in [1] and in MolProbity [2], to give a single letter for use in rotamer strings, and because the more common g+, t, g− terminology was, in 2000, still being assigned inconsistently (see discussion in [1]).
All work performed at Duke University, Durham, North Carolina.
References
- 1.Lovell Simon C, Michael Word J, Richardson Jane S, Richardson David C. The penultimate rotamer library. Proteins: Structure, Function, and Genetics. 2000;40(3):389–408. [PubMed] [Google Scholar]
- 2.Davis IW, Murray LW, Richardson JS, Richardson DC. MolProbity: structure validation and all-atom contact analysis for nucleic acids and their complexes. Nucleic Acids Research. 2004 Jul;32(Web Server):W615–W619. doi: 10.1093/nar/gkh398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Eyring Henry. Steric hindrance and collision diameters. J Am Chem Soc. 1932 Aug;54(8):3191–3203. [Google Scholar]
- 4.Ponder Jay W, Richards Frederic M. Tertiary templates for proteins. Journal of Molecular Biology. 1987 Feb;193(4):775–791. doi: 10.1016/0022-2836(87)90358-5. [DOI] [PubMed] [Google Scholar]
- 5.Dunbrack Roland L. Rotamer libraries in the 21st century. Current Opinion in Structural Biology. 2002;12(4):431–440. doi: 10.1016/s0959-440x(02)00344-5. [DOI] [PubMed] [Google Scholar]
- 6.Berman HM. The protein data bank. Nucleic Acids Research. 2000 Jan;28(1):235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ramachandran GN, Ramakrishnan C, Sasisekharan V. Stereochemistry of polypep-tide chain configurations. Journal of Molecular Biology. 1963 Jul;7:95–9. doi: 10.1016/s0022-2836(63)80023-6. [DOI] [PubMed] [Google Scholar]
- 8.Richardson Jane S, Keedy Daniel A, Richardson David C. “The Plot” thickens: more data, more dimensions, more uses. In: Bansal M, Srinivasan N, editors. Biomolecular Forms and Functions: A Celebration of 50 Years of the Ramachandran Map, IISc Press-WSPC publication. World Scientific Publishing Company Incorporated; 2013. pp. 46–61. [Google Scholar]
- 9.Jones TA, Zou JY, Cowan SW, Kjeldgaard M. Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Cryst Sect A. 1991 Mar;47(2):110–119. doi: 10.1107/s0108767390010224. [DOI] [PubMed] [Google Scholar]
- 10.Bryan Arendall W, Tempel Wolfram, Richardson Jane S, Zhou Weihong, Wang Shuren, Davis Ian W, Liu Zhi-Jie, Rose John P, Michael Carson W, Luo Ming, Richardson David C, Wang Bi-Cheng. A test of enhancing model accuracy in high-throughput crystallography. J Struct Funct Genomics. 2005 Mar;6(1):1–11. doi: 10.1007/s10969-005-3138-4. [DOI] [PubMed] [Google Scholar]
- 11.Emsley P, Lohkamp B, Scott WG, Cowtan K. Features and development of Coot. Acta Crystallographica Section D. 2010 Apr;66(4):486–501. doi: 10.1107/S0907444910007493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Adams Paul D, Afonine Pavel V, Bunkóczi Gábor, Chen Vincent B, Davis Ian W, Echols Nathaniel, Headd Jeffrey J, Hung Li-Wei, Kapral Gary J, Grosse-Kunstleve Ralf W, McCoy Airlie J, Moriarty Nigel W, Oeffner Robert, Read Randy J, Richardson David C, Richardson Jane S, Terwilliger Thomas C, Zwart Peter H. Phenix: a comprehensive python-based system for macromolecular structure solution. Acta Cryst Sect D. 2010 Jan;66(2):213–221. doi: 10.1107/S0907444909052925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Winn Martyn D, Ballard Charles C, Cowtan Kevin D, Dodson Eleanor J, Ems-ley Paul, Evans Phil R, Keegan Ronan M, Krissinel Eugene B, Leslie Andrew GW, McCoy Airlie, McNicholas Stuart J, Murshudov Garib N, Pannu Navraj S, Pot-terton Elizabeth A, Powell Harold R, Read Randy J, Vagin Alexei, Wilson Keith S. Overview of the CCP 4 suite and current developments. Acta Crystallogr D Biol Cryst. 2011 Mar;67(4):235–242. doi: 10.1107/S0907444910045749. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Joosten RP, Joosten K, Cohen SX, Vriend G, Perrakis A. Automatic rebuilding and optimization of crystallographic structures in the protein data bank. Bioinformatics. 2011 Oct;27(24):3392–3398. doi: 10.1093/bioinformatics/btr590. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Bower Michael J, Cohen Fred E, Dunbrack Roland L. Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. Journal of Molecular Biology. 1997 Apr;267(5):1268–1282. doi: 10.1006/jmbi.1997.0926. [DOI] [PubMed] [Google Scholar]
- 16.Kuhlman B. Design of a novel globular protein fold with atomic-level accuracy. Science. 2003 Nov;302(5649):1364–1368. doi: 10.1126/science.1089427. [DOI] [PubMed] [Google Scholar]
- 17.Gainza Pablo, Roberts Kyle E, Donald Bruce R. Protein design using continuous rotamers. PLoS Computational Biology. 2012 Jan;8(1):e1002335. doi: 10.1371/journal.pcbi.1002335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Laskowski RA, MacArthur MW, Moss DS, Thornton JM. PROCHECK: a program to check the stereochemical quality of protein structures. Journal of Applied Crystallography. 1993 Apr;26(2):283–291. [Google Scholar]
- 19.Hooft RW, Vriend G, Sander C, Abola EE. Errors in protein structures. Nature. 1996 May;381(6580):272. doi: 10.1038/381272a0. [DOI] [PubMed] [Google Scholar]
- 20.Chen Vincent B, Bryan Arendall W, Headd Jeffrey J, Keedy Daniel A, Immormino Robert M, Kapral Gary J, Murray Laura W, Richardson Jane S, Richardson David C. MolProbity : all-atom structure validation for macromolecular crystallography. Acta Cryst Sect D. 2009 Dec;66(1):12–21. doi: 10.1107/S0907444909042073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Tuffery P, Etchebest C, Hazout S, Lavery R. A new approach to the rapid determination of protein side chain conformations. Journal of Biomolecular Structure and Dynamics. 1991 Jun;8(6):1267–1289. doi: 10.1080/07391102.1991.10507882. [DOI] [PubMed] [Google Scholar]
- 22.Schrauber Hannelore, Eisenhaber Frank, Argos Patrick. Rotamers: To be or not to be? Journal of Molecular Biology. 1993 Mar;230(2):592–612. doi: 10.1006/jmbi.1993.1172. [DOI] [PubMed] [Google Scholar]
- 23.De Maeyer M, Desmet J, Lasters I. All in one: a highly detailed rotamer library improves both accuracy and speed in the modelling of sidechains by dead-end elimination. Folding & design. 1997 Jan;2(1):53–66. doi: 10.1016/s1359-0278(97)00006-0. [DOI] [PubMed] [Google Scholar]
- 24.Gainza Pablo, Roberts Kyle E, Georgiev Ivelin, Lilien Ryan H, Keedy Daniel A, Chen Cheng-Yu, Reza Faisal, Anderson Amy C, Richardson David C, Richardson Jane S, Donald Bruce R. OSPREY: Protein Design with Ensembles, Flexibility, and Provable Algorithms. Academic Press; 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Dunbrack Roland L, Cohen Fred E. Bayesian statistical analysis of protein side-chain rotamer preferences. Protein Sci. 1997 Aug;6(8):1661–1681. doi: 10.1002/pro.5560060807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Read Randy J, Adams Paul D, Bryan Arendall W, Brunger Axel T, Emsley Paul, Joosten Robbie P, Kleywegt Gerard J, Krissinel Eugene B, Lütteke Thomas, Otwinowski Zbyszek, Perrakis Anastassis, Richardson Jane S, Sheffler William H, Smith Janet L, Tickle Ian J, Vriend Gert, Zwart Peter H. A new generation of crystallographic validation tools for the protein data bank. Structure. 2011 Oct;19(10):1395–1412. doi: 10.1016/j.str.2011.08.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Richardson Jane S, Prisant Michael G, Richardson David C. Crystallographic model validation: from diagnosis to healing. Current Opinion in Structural Biology. 2013 Oct;23(5):707–714. doi: 10.1016/j.sbi.2013.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Lovell Simon C, Davis Ian W, Bryan Arendall W, de Bakker Paul IW, Michael Word J, Prisant Michael G, Richardson Jane S, Richardson David C. Structure validation by Cα geometry: ϕ, and Cβ deviation. Proteins. 2003 Jan;50(3):437–450. doi: 10.1002/prot.10286. [DOI] [PubMed] [Google Scholar]
- 29.Michael Word J, Lovell Simon C, LaBean Thomas H, Taylor Hope C, Zalis Michael E, Presley Brent K, Richardson Jane S, Richardson David C. Visualizing and quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms. Journal of Molecular Biology. 1999 Jan;285(4):1711–1733. doi: 10.1006/jmbi.1998.2400. [DOI] [PubMed] [Google Scholar]
- 30.Engh RA, Huber R. Accurate bond and angle parameters for x-ray protein structure refinement. Acta Cryst Sect A. 1991 Jul;47(4):392–400. [Google Scholar]
- 31.Shapovalov Maxim V, Dunbrack Roland L. Statistical and conformational analysis of the electron density of protein side chains. Proteins: Structure, Function, and Bioinformatics. 2007;66(2):279–303. doi: 10.1002/prot.21150. [DOI] [PubMed] [Google Scholar]
- 32.Michael Word J, Lovell Simon C, Richardson Jane S, Richardson David C. Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. Journal of Molecular Biology. 1999 Jan;285(4):1735–1747. doi: 10.1006/jmbi.1998.2401. [DOI] [PubMed] [Google Scholar]
- 33.Dabbish Laura, Stuart Colleen, Tsay Jason, Herbsleb Jim. Social coding in github: Transparency and collaboration in an open software repository. Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, CSCW ’12; New York, NY, USA. ACM; 2012. pp. 1277–1286. [Google Scholar]
- 34.Kleywegt Gerard J, Harris Mark R, Zou Jin-yu, Taylor Thomas C, Wählby Anders, Alwyn Jones T. The Uppsala Electron-Density Server. Acta Crystallographica Section D. 2004 Dec;60(12 Part 1):2240–2249. doi: 10.1107/S0907444904013253. [DOI] [PubMed] [Google Scholar]
- 35.Breiman Leo, Meisel William, Purcell Edward. Variable kernel estimates of multivariate densities. Technometrics. 1977 May;19(2):135–144. [Google Scholar]
- 36.Michael Word J. PhD thesis. Duke University; 2000. All-Atom Small-Probe Contact Surface Analysis. [Google Scholar]
- 37.Davis Ian W. PhD thesis. Duke University; 2006. Local Motion and Local Accuracy in Protein Backbone. [Google Scholar]
- 38.Kleywegt Gerard J, Alwyn Jones T. Phi/psichology: Ramachandran revisited. Structure. 1996 Dec;4(12):1395–1400. doi: 10.1016/s0969-2126(96)00147-5. [DOI] [PubMed] [Google Scholar]
- 39.Chen Vincent B, Davis Ian W, Richardson David C. KiNG (Kinemage, Next Generation): A versatile interactive molecular and scientific visualization program. Protein Science. 2009 Nov;18(11):2403–2409. doi: 10.1002/pro.250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Lovell SC, Word JM, Richardson JS, Richardson DC. Asparagine and glutamine rotamers: B-factor cutoff and correction of amide flips yield distinct clustering. Proceedings of the National Academy of Sciences. 1999 Jan;96(2):400–405. doi: 10.1073/pnas.96.2.400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Butterfoss Glenn L, Hermans Jan. Boltzmann-type distribution of side-chain conformation in proteins. Protein Sci. 2003 Dec;12(12):2719–2731. doi: 10.1110/ps.03273303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Butterfoss Glenn L, Richardson Jane S, Hermans Jan. Protein imperfections: separating intrinsic from extrinsic variation of torsion angles. Acta Cryst Sect D. 2004 Dec;61(1):88–98. doi: 10.1107/S0907444904027325. [DOI] [PubMed] [Google Scholar]
- 43.Siezen Roland J, Leunissen Jack AM. Subtilases: The superfamily of subtilisin-like serine proteases. Protein Science. 1997 Mar;6(3):501–523. doi: 10.1002/pro.5560060301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Langer Gerrit, Cohen Serge X, Lamzin Victor S, Perrakis Anastassis. Automated macromolecular model building for X-ray crystallography using ARP/wARP version 7. Nature Protocols. 2008 Jan;3(7):1171–9. doi: 10.1038/nprot.2008.91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Headd Jeffrey J, Immormino Robert M, Keedy Daniel A, Emsley Paul, Richardson David C, Richardson Jane S. Autofix for backward-fit sidechains: using Mol-Probity and real-space refinement to put misfits in their place. J Struct Funct Genomics. 2008 Nov;10(1):83–93. doi: 10.1007/s10969-008-9045-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Terwilliger Thomas C, Grosse-Kunstleve Ralf W, Afonine Pavel V, Moriarty Nigel W, Zwart Peter H, Hung Li Wei, Read Randy J, Adams Paul D. Iterative model building, structure refinement and density modification with the phenix autobuild wizard. Acta crystallographica. Section D, Biological crystallography. 2008 Jan;64(Pt 1):61–9. doi: 10.1107/S090744490705024X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Porebski Przemyslaw Jerzy, Cymborowski Marcin, Pasenkiewicz-Gierula Marta, Minor Wladek. Fitmunk : improving protein structures by accurate, automatic modeling of side-chain conformations. Acta Cryst Sect D Struct Biol. 2016 Jan;72(2):266–280. doi: 10.1107/S2059798315024730. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.