Abstract
In this review we address a fundamental question: what is the range of conformational energies seen in ligands in protein-ligand crystal structures? This value is important biophysically, for better understanding the protein-ligand binding process; and practically, for providing a parameter to be used in many computational drug design methods such as docking and pharmacophore searches. We synthesize a selection of previously reported conflicting results from computational studies of this issue, and conclude that high ligand conformational energies really are present in some crystal structures. The main source of disagreement between different analyses appears to be due to divergent treatments of electrostatics and solvation. At the same time, however, for many ligands a high conformational energy is in error, due to either crystal structure inaccuracies or incorrect determination of the reference state. Aside from simple chemistry mistakes, we argue that crystal structure error may mainly be due to the heuristic weighting of ligand stereochemical restraints relative to the fit of the structure to the electron density. This problem cannot be fixed with improvements to electron density fitting or with simple ligand geometry checks, though better metrics are needed for evaluating ligand and binding site chemistry in addition to geometry during structure refinement. The ultimate solution for accurately determining ligand conformational energies lies in ultra-high resolution crystal structures that can be refined without restraints.
Keywords: X-ray crystallography, molecular modeling, conformational energy, quality metrics, structure refinement, thermodynamics, protein binding
TOC Image
What is the range of conformational energies seen in ligands in protein-ligand crystal structures? We synthesize results from previous computational studies of this issue and conclude that high energies are present in some crystal structures, whereas errors are mainly due to the weighting of ligand stereochemical restraints relative to structure fitting into the electron density. The solution for accurately determining ligand conformational energies is ultra-high resolution crystal structures which can be refined without restraints.

I. Introduction
The holy grail of structure-based drug design is the ability to predict the affinity with which two molecules will bind to one another. This is a difficult problem for several reasons: it involves a delicate balance of forces, with relatively small differences between very large energy components; these energy components can change dramatically with small changes in molecular structure; and the individual forces or components themselves cannot be predicted with any serious degree of accuracy (Mobley and Dill, 2009; Tirado-Rives and Jorgensen, 2006). There is disagreement even over the scale or range of some of the components of binding energy. One example is the strength of some types of non-conventional molecular interactions, such as C-H···π hydrogen bonds (Tsuzuki et al., 2006; Melandri, 2011; Berg et al., 2016) and halogen bonding complexes (Kolář et al., 2015), and their importance in molecular recognition. Another example, which is the focus of this review, is the change in conformational energy of a ligand upon binding to its receptor.
The scale of this energy change has several important implications. First, many molecular modeling techniques, including docking and pharmacophore searching, rely on the generation of a set of molecular conformations for which the allowable energy range must be established in advance. Second, in structure-based drug design the optimization of a set of interactions between a protein and a lead compound cannot come at the expense of a conformational energy penalty that is too large. Finally, ligand conformational energy contributes to the enthalpy component ΔH of the binding free energy and thus is important for understanding the fundamental biophysics of non-covalent binding and molecular recognition.
Over the past twenty years, modelers and crystallographers analyzing the conformational energies of small molecule ligands have found conflicting results as to whether binding conformational energies are always small, on the order of 5 kcal/mol or less, or whether they can be much larger, up to 25 kcal/mol or more. Some studies have directly addressed this question of the change in conformational energy upon ligand binding by calculating differences in energy between the receptor-bound conformations of a set of small molecules and their unbound reference conformations in vacuum or solvent. The first study of this kind was done in 1995, and analyzed a small set of flexible compounds found in both the Cambridge Structural Database (CSD) of small molecule crystal structures and the Protein Data Bank (PDB) of macromolecular crystal structures. The authors observed that the conformations of the CSD and PDB structures were on average 15.9 kcal/mol higher in energy than the calculated global minimum structure and 7.9 kcal/mol higher than the nearest local minimum (Nicklaus et al., 1995). An updated version of this study looked at a much larger set of well-curated protein-ligand complexes and concluded that, “conformational energy changes of small molecules binding to proteins occur in a range of 0 to ~25 kcal/mol even for the highest-quality crystal structures that can be currently found in the PDB” (Sitzmann et al., 2012). However, a different early study found conversely that, “in the great majority of ligand-protein complexes… the conformational energies required for the ligands to adopt their bioactive conformations are calculated to be less than or equal to 3 kcal/mol” (Boström et al., 1998). A landmark study of PDB structures with drug-like ligands of known binding affinity found that, “while approximately 60% of the ligands were calculated to bind with strain energies lower than 5 kcal/mol, strain energies over 9 kcal/mol were calculated in at least 10% of the cases regardless of the method used” (Perola and Charifson, 2004). However, a later re-analysis of this data set found instead that, “two-thirds of bioactive conformations lie within 0.5 kcal/mol of a local minimum, with penalties above 2.0 kcal/mol being generally attributable to structural determination inaccuracies” (Butler et al., 2009).
Other literature on developing or benchmarking bioactive conformer generation software or methods addresses the same question in a different context – i.e., what range of energies should be considered when searching the conformational space of a small molecule to be sure of finding its receptor-bound conformation? A study on reproducing the bound conformations of a small set of drug-like ligands with the program OMEGA found that, “all 25 bioactive conformations show conformational energy penalties within a tolerable energy cut-off (< 8 kcal/mol), and all but one are found within a satisfactorily low-energy cut-off: 3.2 kcal/mol” (Boström et al., 2003). In contrast to this, studies on larger sets of PDB complexes found that, “20 kcal/mol is the best setting for the ConFirm force field energy range” (Kirchmair et al., 2005), and “Omega’s default energy threshold setting of 25 kcal/mol was proved to be essential for best conformer generation performance since lower limits reject valuable conformations” (Kirchmair et al., 2006). Similarly, a comparison of conformer generation by the programs MOE and Catalyst, using two previously published data sets (from Boström (2001) and Perola and Charifson (2004)), concluded that, “the majority of the bioactive-like conformers have energies ≤ 3 kcal/mol relative to their unbound state… [However] some bioactive-like conformers are still quite strained… ΔE needs to be at least 15 kcal/mol to yield the largest percent of reproduced bioactive conformers” (Chen and Foloppe, 2008). Finally, a careful retrospective analysis of cases where OMEGA failed to reproduce conformations from the PDB and CSB offered the possible conclusion that, “some well-solved solid-state structures from both sources do in fact possess significant torsional strain for reasons that are not at the moment completely understood” (Hawkins and Nicholls, 2012).
In addition to analyses of conformer generation and binding conformational energies, methods for re-refining crystal structures to improve ligand structure and energy have been developed and reported. The method used in the program AFITT is to enumerate a set of low-energy ligand conformations and fit them to the electron density in the binding site. When this refinement method was tested on a set of protein-ligand complexes from the PDB, the largest strain energy observed was only 24 kJ/mol (5.7 kcal/mol) after removing the electrostatic term from the forcefield (Wlodek et al., 2006). The PHENIX/DivCon refinement method uses a semi-empirical quantum mechanical (QM) functional to describe the ligand and its surrounding atoms and was tested on a set of 50 PDB structures. In this case, after re-refinement the average strain energy of the set of test structures was still quite high, at 24.6 kcal/mol (Borbulevych et al., 2014).
A final method of conformational analysis evaluates the distribution of ligand torsion angles, instead of directly calculating conformational energies. In an interesting study, a set of common torsion motifs from drug-like molecules were selected and their angular distributions were converted into free energies using the Boltzmann equation, giving an approximate strain energy of 0.6 kcal/mol per rotatable bond (Hao et al., 2007). However, another survey of PDB ligand geometries, with the goal of attempting to distinguish between ligands that are truly strained and ligands that have been refined incorrectly, concluded that for high-resolution structures “ligand strain energy is considerably lower than some commentators have previously suggested” and probably much less than 0.6 kcal/mol per torsion (Liebeschuetz et al., 2012). Finally, an analysis of three endogenous cofactors, ATP, NAD, and FAD, which have each been crystallized hundreds of times in a variety of protein superfamilies, found that these ligands all adopt a wide range of binding conformations. The authors conclude that, “torsion angles in many bound ligands, even those from high-resolution structures, fall well outside preferred, low-energy ranges” (Stockwell and Thornton, 2006). This suggests that observations of high conformational energies are not confined to exogenous drug-like ligands, and that the same ligand across different proteins may bind with a wide range of conformational energies.
At first glance, the question of the actual range of binding conformational energies seen in protein-ligand crystal structures does not seem like it should be difficult to answer, given the quantity of available experimental information: the number of ligand instances in the PDB available via the Ligand Expo database (Feng et al., 2004) has recently surpassed one million. In this review we examine closely some of the possible methodological reasons behind the discrepancies outlined above, beginning by carefully defining and examining the concept of conformational energy itself, and moving on to consider potential sources, both real and artificial, for large binding conformational energy changes. We conclude with some suggestions for future improvements in both computational and crystallographic structure analysis methods to clarify this important question.
II. Conformational energy
We begin with a discussion of what conformational energy, also called chemical strain energy, means and where it comes from. This notion of molecular strain is confusing enough that it was covered in a recent review on “fuzzy concepts” in chemistry (Gonthier et al., 2012). The idea was first developed based on observations of hydrocarbon rings, where it was noted that while 5- and 6-membered rings are common, 3- and 4-membered rings are not, because of deviations away from the ideal tetrahedral carbon bond angles which increase the internal potential energy of the molecule. From bond angle strain in rings, the concept was extended to include other deviations from ideal or preferred molecular geometry, including bond length distortions, steric strain from too-close non-bonded interactions, and torsional strain from bond rotations (Wiberg, 1986).
Strain energy can be measured experimentally or calculated ab initio by the difference in heat of formation between a strained molecule and a (possibly hypothetical) strain-free one with the same number of bonds and atoms (Wiberg, 1986; Karton et al., 2016). With receptor-bound ligands, differences in strain or conformational energy usually result from torsion angle rotations, and in this case, rather than calculating heats of formation, it is easier and more useful to calculate and compare the internal energies between two or more different conformations of the same molecule.
The full energy landscape or potential energy surface for a molecule has as many spatial dimensions as the number of rotatable bonds. In Figure 1 we show a fragment example, NH2OH, which has only one rotatable bond and therefore a one-dimensional energy landscape. The minima on a potential energy surface are located where the gradient (1st derivative) of the energy is zero and where the curvature of the surface is concave, i.e. the 2nd derivatives are positive (Schlegel, 1998). The global minimum is the point with the lowest energy, located for NH2OH where the H-N-O-H torsion angle is −125°. This molecule also has two equivalent local minima at 10° and 105°.
Figure 1.

Potential energy surface of NH2OH in the OPLS forcefield (Jorgensen et al., 1996) as a function of rotation around its central H-N-O-H bond. The global minimum conformation is marked with a red circle, and one of the local minima is marked with a yellow diamond. A hypothetical receptor-bound conformation is marked with a black + sign.
A hypothetical receptor-bound conformation is marked in Figure 1 with a + symbol. While such a receptor-bound conformation may not be at a minimum on the potential energy surface of the isolated ligand, it is a minimum on the free energy surface of the complex at equilibrium. When bound, a ligand may adjust its conformation, incurring a conformational energy penalty, in order to optimize its non-bonded interactions with the receptor. An energy minimization of this conformation, when it is removed from the protein binding site, would follow the slope of the potential energy surface down to the nearest minimum at +10°. The difference in energy between this minimized conformation and the initial receptor-bound conformation is called the local conformational energy:
| (1) |
Similarly, the global conformational energy is the difference between the global minimum and the receptor-bound conformation.
In molecular mechanics (MM) forcefields, potential energy surfaces are traditionally described with a sum of cosine functions to reproduce the angular locations of energy maxima and minima as bonds are rotated. For example, in the widely-used forcefield MMFF94, the energy as a function of a torsion angle ϕ is:
| (2) |
where the parameters Vn are dependent on the chemical types of the bonded atoms. Parameters are assigned to fit energy curves generated from ab initio torsion scans in vacuum, typically at the MP2 level of theory (Halgren and Nachbar, 1996; Wang et al., 2004; Vanommeslaeghe et al., 2010).
It is important to note that the torsional energy profiles in MM forcefields also have a contribution from the non-bonded electrostatic and van der Waals terms. The electrostatic energy for each atom pair is calculated as a function of their partial charges and the distance between them using Coulomb’s law. Atoms that are bonded (1-2) or connected in an angle term (1-3) are excluded from the list of pairs, but pairs of atoms in a torsional relationship (1-4) are included, often with the electrostatic energy scaled down by an empirical factor. While the cosine function could be thought of as modeling the orbital interactions that determine the locations of the energy minima as a function of the torsion angle, the non-bonded interactions contribute to the relative energies of different minima and to the barrier heights between minima. This is especially true for forcefields that use a single cosine function for torsions, such as CHARMM (Smith and Karplus, 1992). Other forcefields such as MMFF94 (see Equation 2 above) can reproduce some energy barrier asymmetries with a Fourier series. It is also in theory possible to use only non-bonded interactions to fit a torsional energy profile (Darley and Popelier, 2008).
Assigning or distributing the torsional conformational energies between the terms in a forcefield equation is thus to some extent arbitrary, but regardless of the details of a forcefield’s conceptual framework, its torsional and non-bonded energies are intrinsically linked. During forcefield development the torsion parameters and the atomic partial charges are generally fit and refined iteratively (Wang et al., 2000; Vanommeslaeghe et al., 2010). If the electrostatic term is “turned off” as is sometimes done both in crystallographic refinement and certain molecular modeling situations (see Section IV), the energy profile as a function of torsion rotation will change and will then no longer be correctly fitted to the ab initio target data.
III. Thermodynamic considerations
It is often argued that ligands do not, and cannot, bind with large conformational energies due to the exponential nature of binding affinities, where a 1.4 kcal/mol increase in ΔG, the binding free energy, corresponds to a 10-fold shift in the binding equilibrium constant Kd. While it is true that a change in conformational energy upon binding should contribute to the enthalpy, it is unlikely that two ligands differing in their conformational energy would not also have large differences in some of the other components of the binding free energy, any of which could match or exceed the conformational energy change.
The other components of binding ΔG that could counterbalance conformational energies include more-favorable solvation or increased entropy in the system as a whole. Data on the entropy of ligand binding from ITC experiments is collected in BindingDB (Liu et al., 2007; Gilson et al., 2016) for 102 protein-ligand complexes, and shows a range for −TΔS of −24.3 to +35.6 kcal/mol. The SCORPIO (Structure/Calorimetry of Reported Protein Interactions Online) database (Olsson et al., 2008) of 254 protein-ligand complexes includes a similar range of −70 to +90 kJ/mol (−16.7 to +21.5 kcal/mol) for −TΔS. The PDBcal database of 409 protein-ligand complexes (Li et al., 2008) has an even wider range for −TΔS of −32.8 to +29.6 kcal/mol. It should be noted, however, that there is concern about the accuracy of some of these experimental measurements (Chodera and Mobley, 2013; Klebe, 2015). For solvation energies, the FreeSolv database (Mobley and Guthrie, 2014), a collection of experimental hydration free energies for 643 small neutral molecules curated from the literature, contains values ranging from −25.5 to +3.4 kcal/mol. Solvation energies for larger and more polar drug-like molecules may be expected to be even larger.
The conformational strain energy in a bound ligand may also be balanced by a favorable set of protein-ligand interactions, as a component of the enthalpic part of binding ΔG. The S66 database of non-covalent interaction energies for biologically-relevant molecular complexes, calculated at the very high CCSD(T)/CBS level of theory (Rezáč et al., 2011), includes a range of energies from about −19 to −0.8 kcal/mol. The largest of these complexes is on the order of a single nucleotide, so actual protein-ligand interaction energies may be stronger. There does seem to be a limit to the maximal possible ligand binding affinity at about 1.5 kcal/mol/heavy atom, plateauing at 15-20 heavy atoms, which applies to both natural ligands and synthetic enzyme inhibitors (Kuntz et al., 1999). The mechanism for limiting binding affinity may be either that the ligand is binding in a relatively high-energy conformation, or that not all available atomic structural features are used for binding (Andrews et al., 1984; Reynolds et al., 2008). The evolutionary reason for overall limits to binding affinity is probably a function of the disadvantages of long dissociation times (Kuntz et al., 1999; Smith et al., 2012).
Although there is no thermodynamic reason why any of these terms necessarily should balance one another out, the ranges they cover are illustrative of the energy potentially “available” for compensating ligand internal strain. Based on these data, a range for conformational energies up to +25 kcal/mol may thus not be unreasonable. At the very least it seems unproductive to set a low theoretical limit for the allowable range of ligand conformational energy changes upon protein binding.
IV. Energy range discrepancies
In this section, we discuss some possible sources for the divergence in findings and views described in the Introduction as to the size and range of conformational energy changes in ligands on binding to proteins. These are: differences in energy minimization methodology and how raw data from the PDB is handled; the level of theory used for the calculations; and differences in the treatment of electrostatics and solvation. We have collected and combined a set of the literature results discussed above in an attempt to perform, as far as possible, a meta-analysis of these methodological factors that might potentially affect the conformational energy ranges observed in protein-ligand crystal structures. The full set of data plotted in the graphs below is given in the Supplementary Information.
We focus in this section on results describing local conformational energies calculated (as in Equation 1) by comparing the difference in energy between a restrained minimization of the crystal conformation and a full unrestrained minimization to the nearest local minimum. The reason for this is that the potential energy surface for a simple molecule can be systematically calculated (as with NH2OH in Figure 1) but for larger, more flexible molecules, e.g. with more than about 8-9 rotatable bonds, this becomes computationally intractable and the energy surface must be mapped stochastically. This means that it is difficult to be absolutely certain that any conformational search has converged, with all energy minima located and the global minimum identified, although with enough sampling the likelihood of convergence increases (Chen and Foloppe, 2013). The nearest local minimum, however, can always be definitively located with an energy minimization.
A. Differences in methodology for handling PDB coordinates
It is widely appreciated that ligand coordinates from the PDB should not be used directly for energy calculations because of small but significant differences in the parameters for bond lengths, angles, and torsional minima between different forcefields. (This applies generally to any change in computational protocol.) The majority of PDB structures have been refined with restraints based on simplified molecular mechanics forcefields, lacking explicit hydrogen atoms, and often with attractive non-bonded interactions turned off (Evans, 2007). This minimal functional form is combined with fitting to the electron density (see Section V for further discussion of this issue).
There are three ways of estimating the conformational energies of ligands in their crystal conformation – either the coordinates must be “relaxed into” a new forcefield by a restrained minimization, or a conformational search is performed in the new forcefield and the conformer whose structure is most similar to the crystal coordinates is chosen as the bioactive conformation. (Note that in these cases the conformers are generated or the ligand is minimized outside the binding site which removes its interactions with the protein.) The third way is by crystal structure re-refinement to improve ligand geometries or energies, in which case a fit to the experimental electron density is used as a restraint. The goal in all cases is to produce a ligand conformation compatible with both the forcefield or functional used for energy calculation and the experimental electron density, which is not necessarily a straightforward task.
Methods for performing restrained energy minimizations include harmonic or flat-bottomed positional restraints on atomic positions, holding torsions fixed but allowing bonds and angles to relax, and fitting or re-fitting to the shape of the electron density. We investigated the effect of these different minimization protocols on the size and range of the calculated ligand conformational energies.
The given or implicit rationale for the use of flat-bottomed restraints, where atoms are free to move within a certain radius of the crystal coordinates and beyond that distance an energy penalty is applied, is that there is some uncertainty in the crystallographic positions of the atoms and allowing small adjustments of bond lengths in particular can remove most spurious strain. These restraints were used in two studies. In Boström et al. (1998), the radius of the flat bottom restraint was 0.3Å with a penalty force constant of 120 kcal/mol/Å. In a later study by Perola and Charifson (2004) the flat-bottom radius was slightly larger, at 0.5Å, and the force constant was 500 kcal/mol/Å.
In contrast to this approach, studies by Nicklaus et al. (1995) and Sitzmann et al. (2012) have used a stepwise optimization of the internal coordinates, with minimization of bond lengths, followed by angles, followed by torsions, and calculation of the conformational energy as the difference between the last two steps. These authors argue that the torsion angles are the main carriers of conformational information and that isotropic flat-bottom potentials are too lenient and may allow for the erroneous minimization and removal of real torsional changes that are occurring upon ligand binding. However, Boström et al. (1998) tested the application of torsion restraints and found that small individual torsion errors summed over the whole molecule could lead to what they believed were unnaturally high conformational energies, resulting in large (Cartesian) conformational changes upon release, whereas flat-bottomed restraints produced optimized structures that should still be able to fit within the observed electron density.
In a rebuttal to both of these earlier methods, Butler et al. (2009) have criticized flat-bottom constraints due to the essentially arbitrary selection of the parameters for well-width and force constant, and suggested that these constraints may be too permissive. They have also argued that minimizing in internal coordinates is not the best approach for crystal structures that are solved and reported in Cartesian space, and may not allow for sufficient relaxation of the structure. Instead, their minimization method used harmonic restraints where the force constant for each atom was scaled in inverse proportion to its B-factor.
In the case of QM/MM re-refinement, a measure of how well the atomic coordinates reproduce the experimental electron density is added to the forcefield (or semi-empirical functional) energy during minimization (Fu et al., 2011, 2012, 2013; Borbulevych et al., 2014). A variation of this is used in the AFITT method (Wlodek et al., 2006; Janowski et al., 2016), where a shape-based fit to the ligand’s electron density “blob” is added to the forcefield energy. The weighting of the forcefield or functional energy versus the fit to electron data is also an arbitrarily selected parameter, and we discuss this issue further in Section V.A.1.
We combined and averaged the reported ligand conformational energies from studies using each of these restraint types. These are graphed in Figure 2, along with the highest (maximum) energy seen with each method. Although there is little overlap in the protein-ligand structures sampled between different sets, the averaged conformational energies suggest that flat-bottomed restraints tend to produce lower energies than harmonic restraints, which in turn are lower in energy than fixed torsions. This is in line with the stringency of the restraints; in other words, the size of the distance any atom can move before an energy penalty is applied. However, structures with high energies of over 20 kcal/mol are seen regardless of the methodology used.
Figure 2.

Values for the difference in conformational energy between the bound ligand conformation and the nearest local energy minimum, compared between literature data sets generated with different restrained minimization protocols: flat-bottomed restraints (Boström et al., 1998; Perola and Charifson, 2004), harmonic restraints (Butler et al., 2009), internal coordinate optimization with fixed torsions (Nicklaus et al., 1995; Sitzmann et al., 2012), and fitting to the electron density (Wlodek et al., 2006; Fu et al., 2011, 2012, 2013; Borbulevych et al., 2014). ‘Average’ refers to the average across all data sets for a given protocol and ‘max’ refers to the maximum value seen in any data set. Error bars indicate the standard deviation between data sets.
B. Theory level and accuracy of energy calculations
Several studies have argued that molecular mechanics forcefields may not be accurate enough to correctly estimate relative conformational energies (Tirado-Rives and Jorgensen, 2006; Foloppe and Chen, 2009; Avgy-David and Senderowitz, 2015). Avgy-David and Senderowitz (2015) found a lack of correlation between conformational energies calculated on the same structures with different forcefields, and concluded that QM-level methods are required for reliable energies. Analyses of conformer generation with the MMFF forcefield have tentatively concluded that this forcefield may be overestimating the strain in some PDB structures, or that it “seems to favor a different part of conformational space than that occupied by X-ray structures” (Sadowski and Boström, 2006). Similarly, Hawkins and Nicholls (2012) have questioned the adequacy with which MMFF may be describing torsional energies in crystal structures.
Both molecular mechanics forcefields and semi-empirical functionals are parameterized, as discussed in Section II, but the torsion energy profiles in forcefields are parameterized to fit rotational profiles calculated at the quantum mechanical level, and energies are calculated relative to an idealized unstrained molecular conformation. In contrast, semi-empirical parameters are fitted to reproduce heats of formation and other thermochemistry data, and energies are calculated relative to the elements in their standard states. While semi-empirical methods are more accurate than forcefields in many cases, they can be less accurate for calculating conformational energies (Seabra et al., 2009; Jiang et al., 2010). Improvement in this regard is seen with PM7 where this issue was specifically addressed in its development (Stewart, 2013; Hostas et al., 2013). Forcefields may even in some cases outperform density functional theory (DFT), especially when dispersion effects are not taken into account (Paton and Goodman, 2009), perhaps because density functionals are also often designed and optimized to reproduce thermochemistry and non-covalent interaction data, instead of conformational energies (Risthaus et al., 2014). Thus we believe the verdict is still out on what level of theory will produce the most accurate conformational energies (and to make the situation more complicated, this may depend on the functional groups present in the ligand and how well those have been parameterized).
To shed some quantitative light on this issue, we investigated the extent to which the level of theory used for energy minimizations affects the range of the calculated conformational energies. Here, we collected data sets minimized using molecular mechanics forcefields: CHARMm (Nicklaus et al., 1995), MM3 and AMBER (Boström et al., 1998), MMFF (Perola and Charifson, 2004; Butler et al., 2009) and OPLS (Perola and Charifson, 2004). These were compared to energies from a regional QM re-refinement study using the semi-empirical functional AM1 (Borbulevych et al., 2014), as well as two studies using DFT: the structures from Butler et al. (2009) which were also optimized at the B3LYP/6-31G* level, and a higher-quality subset from Sitzmann et al. (2012), produced by filtering based on crystallographic quality parameters, using geometries optimized with B3LYP/6-31G* and single-point energies calculated at the B3LYP/6-311++G(3df,2p) level. Finally, we included a series of QM/MM re-refinement studies on three single ligands at the MP2 level with HF/6-31G* geometries (Fu et al., 2011, 2012, 2013).
These data sets for each level of theory were again combined and averaged. In Figure 3 we see that on average the ligand conformational energies calculated with molecular mechanics forcefields are in fact the lowest, though close to MP2, and highest of all are energies calculated with the semi-empirical functional. Thus while forcefield conformational energies may be inaccurate, they do not appear to be systematically higher than the energies obtained by other methods, based on the data sets examined here.
Figure 3.

Values for the difference in conformational energy between the bound ligand conformation and the nearest local energy minimum, compared between literature data sets calculated at different levels of theory: molecular mechanics forcefields (Nicklaus et al., 1995; Boström et al., 1998; Perola and Charifson, 2004; Butler et al., 2009), the semi-empirical AM1 functional (Borbulevych et al., 2014), DFT (B3LYP) (Butler et al., 2009; Sitzmann et al., 2012), and MP2 (Fu et al., 2011, 2012, 2013). ‘Average’ refers to the average across all data sets for a given protocol and ‘max’ refers to the maximum value seen in any data set. Error bars indicate the standard deviation between data sets.
C. Inclusion or not of electrostatics and solvation energy
In addition to the level of theory and the methodology used for energy minimizations, computational methods for dealing with electrostatics and solvation will also significantly affect conformational energies. Electrostatic forces are stronger than other forcefield components especially at short ranges. In the context of the surrounding environment, partial charges on ligand atoms are shielded from one another by either solvent water or the surrounding atoms in the protein binding site. When the energy of the ligand is evaluated in isolation, these electrostatic forces can be very high. The effects of environmental shielding can be modeled to some extent by various “implicit solvent” methods in which a local dielectric constant is taken into account.
Here, we compare data sets of conformational energies calculated in vacuum to those calculated in some form of implicit solvent (a polarizable continuum model, generalized Born, or a simple distance-dependent dielectric) and those calculated in fully explicit water solvent. The vacuum data sets are from the early studies by Nicklaus et al. (1995), who neutralized all charged groups as an ad hoc method for reducing some of the artificially high electrostatic energy that can occur in vacuum, and by Boström et al. (1998) who did not neutralize their compounds. The latter authors argue that for structures in their data set where the conformational energy is higher than 3 kcal/mol, it is because forcefield energies in vacuum are inaccurate for strongly polar compounds. The study by Perola and Cherifson (2004) used a distance-dependent dielectric throughout. Other studies explicitly compared results in vacuum with those calculated in some form of implicit solvent: Butler et al. (2009) calculated energies in vacuum, with a distance-dependent dielectric, and with a generalized Born model, and found, interestingly, that while the distance-dependent dielectric reduced conformational energies relative to vacuum, generalized Born treatment increased them. Sitzmann et al. (2012) calculated energies in the gas phase and with the IEF-PCM solvent model and found slightly lower energies with solvent on average. Finally, a recent study by Foloppe and Chen (2016) used molecular dynamics simulations to calculate differences in conformational energy between the ensemble structures of a small set of drug-like compounds bound to their receptors and unbound in explicit water solvent.
We combined and averaged the ligand conformational energies from each type of solvent treatment. In Figure 4, we can observe that the averaged energies in vacuum, implicit solvent and in explicit water solvent are all very similar, and that the highest maximum energy found is actually in implicit solvent. However, very low maximum energies (< 10 kcal/mol) are only seen in cases of what we have labeled as “high screening” in Figure 4. These methods involving more extensive solvent screening are discussed in detail below.
Figure 4.

Values for the difference in conformational energy between the bound ligand conformation and the nearest local energy minimum, compared between literature data sets calculated in vacuum (Nicklaus et al., 1995; Boström et al., 1998; Wlodek et al., 2006; Butler et al., 2009; Sitzmann et al., 2012) or with different solvent modeling methods. Implicit solvent data sets are from (Perola and Charifson, 2004; Butler et al., 2009; Sitzmann et al., 2012), the explicit solvent data set is from (Foloppe and Chen, 2016), and high screening data sets (see text) are from (Vieth et al., 1998; Wlodek et al., 2006; Wang and Pang, 2007; Butler et al., 2009). ‘Average’ refers to the average across all data sets for a given protocol and ‘max’ refers to the maximum value seen in any data set. Error bars indicate the standard deviation between data sets.
In one set of cases, electrostatics are attenuated or removed completely from the forcefield. The study by Wlodek et al. (2006) compared the effects of including or removing the electrostatic term in the MMFF forcefield while testing their AFITT ligand re-fitting method on 11 PDB complexes. The authors found that removing electrostatics improved the fitting results and significantly lowered the conformational energies, and they argue that this is reasonable as interactions with the protein are not accounted for, which could compensate for unfavorable intra-ligand electrostatic interactions.
Turning off electrostatics is not without precedent, as many software programs for generating bioactive conformations use no electrostatics. OMEGA by default uses a version of MMFF94 from which both electrostatic and attractive van der Waals terms are removed (Hawkins et al., 2010), similarly the version of CHARMm used in Catalyst also lacks electrostatics (Li et al., 2007), as do many others (Klebe and Mietzner, 1994; McMartin and Bohacek, 1997; Sperandio et al., 2009). But, as we have shown above in Section II, this has the effect of changing torsional energy profiles and thus the relative energies of conformers. Values calculated in this way cannot be considered true energies in the physical sense (within the limits of the MM computational paradigm). They are instead an “energy-like” function of the molecular geometry, useful for further manipulation of this geometry, which cannot be quantitatively compared with fully defined forcefield energies that include electrostatics.
A similar effect of severely dampening electrostatic interactions is achieved by setting the local dielectric constant to 80, as was done in the study by Wang and Pang (2007). This is the correct value for bulk water, but not for local interactions in a forcefield designed for condensed phase simulations. While the dielectric in molecular mechanics calculations is sometimes modified to be on the order of 2-4, forcefield electrostatic parameters (i.e. the atomic partial charges) are calibrated for a constant vacuum dielectric of 1 (Cornell et al., 1995; MacKerell et al., 1998).
In the second set of cases, a solvation energy is added to the internal energy to approximate some larger portion of the enthalpy or free energy. In the study by Vieth et al. (1998), a set of ten small ligand structures was compared in solution and in receptor-bound crystal structures. Solution conformations were generated by conformational searching and clustering, and a “free energy” was calculated for each cluster by summing the internal forcefield energy of a representative structure, the Poisson-Boltzmann solvation energy of a representative structure, and the cluster population (as an approximation of entropy). The conformational energy was calculated as the energy difference between the lowest-energy solution cluster and the solution cluster most similar in torsion space to the bound ligand conformation. The study by Butler et al. (2009) examined many simulation protocols but achieved uniformly low conformational energies across their data set only by calculating the solvation free energy with polarizable continuum models and adding this to the internal DFT energy.
Minimized geometries should almost certainly be generated in solvent (see Section V.B below), but adding the solvation energy to the internal energy produces a result that cannot then be considered an actual conformational energy. This is especially true for receptor-bound conformations, which are not fully solvated in water, and whose environmental energy should be calculated from specific interactions with the binding site. There is evidence that conformational energies and solvation energies may in some cases cancel out or compensate for one another, where a molecule in solution adopts a strained conformation in order to take advantage of its higher solvation energies (Nicholls et al., 2009).
A molecular mechanics forcefield is a model describing the potential energy of a molecule as a function of conformation – its internal bond lengths, angles, and torsions, and the distances between atoms that are not directly bonded. Each of these geometric terms is described by a relatively simple mathematical formulation and scaled with a set of parameters in order to reproduce experimental data and quantum mechanically-calculated energies. Although the terms in a forcefield are to some extent arbitrary and empirical, as discussed above in Section II, it is designed for calculating molecular energies using all of its terms and only those terms. Leaving out certain terms or adding in new ones can only lead to inaccuracies in the calculated energies.
V. Sources of error
In the previous section we have observed that large conformational energies are seen in some protein-ligand complexes regardless of the details of the minimization protocol, the level of computational theory, or the solvent treatment used to calculate them. We can assume, however, that at least some, and possibly the majority, of this observed ligand conformational energy is artefactual. The reaction of the crystallography community to papers detailing the presence of highly strained ligand conformations in the PDB seems mainly to be embarrassment (Reynolds, 2014), and guidance has been issued to modelers on best practices for using and understanding crystal structure data (Wlodawer et al., 2008; Davis et al., 2008; Cooper et al., 2011; Deller and Rupp, 2015; Lamb et al., 2015), along with the development of tools for ligand validation and error checking (Kleywegt and Harris, 2007; Cereto-Massagué et al., 2013; Weichenberger et al., 2013). This implies that many crystallographers favor the argument that high ligand conformational energies are spurious, based on the Bayesian idea that extraordinary claims require extraordinary evidence. In other words, the size of an energy difference away from a minimum is proportional to the probability of observing it, according to the Boltzmann distribution function (Hao et al., 2007). This becomes something of a self-fulfilling prophecy as medium-resolution crystal structures do not contain enough data or information to provide unambiguous evidence for significant deviation from standard geometries for small molecule ligands (Pozharski et al., 2013).
Sources of errors in ligand conformational energies can be divided into two categories, based on the terms in Equation 1. These are errors in the structure of the bound ligand in the crystal, and errors in its comparison structure: the unbound global or local energy minimum in solution.
A. The bound ligand in a protein-ligand crystal structure
In refining a protein crystal structure, a molecular model is threaded through an initial map of the electron density. The model is refined iteratively by varying parameters to optimize the agreement between the atomic coordinates and the experimental data, measured as the crystallographic R-factor (Tronrud, 2007). Model overfitting is addressed by the Rfree factor (Brünger, 1992) in which a subset of the data is reserved for testing against the model after it has been fitted to the rest of the data. As the resolution of collected data sets improves, for example by better-diffracting crystals, the number of experimental data points on which a model can be based increases exponentially. However, at a medium resolution of ~2Å, a range in which one finds a large percentage of the structures in the PDB, there is not a sufficient data-to-parameter ratio to refine the molecular model without risking overfitting (Kleywegt, 2007), so additional data is introduced into the model in the form of prior knowledge about the expected geometries of chemical structures.
1. Restraints
This prior knowledge takes the form of restraints or penalty functions added to the refinement optimization. Restraints are applied to bond lengths, bond angles, planar groups, chiral centers, and non-bonded interactions and usually take the form of a harmonic oscillator function with a target value and a weight (Evans, 2007). Restraints for amino acids and nucleotides have been carefully tabulated (Engh and Huber, 1991) and are included as “dictionaries” in refinement programs. A problem arises when a set of reasonable restraints needs to be defined for a new small molecule ligand. If this is done incorrectly then the geometry of the ligand will be adversely affected (Kleywegt, 2007). Additionally, restraints do not by any means attempt to accurately capture the full energy landscape of the ligand and its surroundings. They are a simplified version of a molecular mechanics forcefield, frequently limited by the use of united atoms (because hydrogens cannot be seen except at very high resolution) and with no explicit inclusion of an electrostatic component. This simplification can lead to structural problems in the protein binding site, especially in cases involving ionic interactions (Davis et al., 2008).
Some of these issues related to restraint inadequacies can be addressed with quantum mechanically-derived starting structures for ligands and/or with QM/MM refinement or re-refinement (Ryde and Nilsson, 2003; Yu et al., 2005; Moriarty et al., 2009; Metz et al., 2014; Borbulevych et al., 2014; Fadel et al., 2015). The use of QM methods offers a definite improvement over incorrect ligand restraints, and probably also over correct restraints with the minimal forcefield used in traditional refinement. Concerns about their overall accuracy remain, however, since intermolecular interactions have not historically been a strength of lower-level ab initio methods, though this situation may be improving (Sedlak et al., 2013).
Another more subtle and more serious problem has to do with the weighting of the restraints relative to the experimental data. The basic equation optimized in refinement is
| (3) |
where EC is the computational energy (MM or QM) of the forcefield restraints and w establishes its weight in the model relative to the experimental diffraction data (Jack and Levitt, 1978). The relative weighting of the restraint parameters is difficult or impossible to connect mathematically to the size of any errors in the molecular coordinates. It has been shown that the deviations from ideal bond lengths observed in structures at both medium and high resolution are in fact correlated with the software program used for refinement and its default weighting of the restraints (Jaskolski et al., 2007). This echoes earlier work by Thornton, also showing the correlation of protein backbone geometries with refinement method (Laskowski et al., 1993; MacArthur and Thornton, 1996). Restraints that are too tight or too loose can be identified over a population of structures by looking at the size of RMS deviations from the average, but not a priori for a given single structure of interest.
The problem is illustrated in Figure 2 with the high maximum energies seen with electron density fitting compared to atomic coordinate restraints and in Figure 3 with the wide difference in scale between the energies in the semi-empirical vs. the MP2//HF methods for crystal structure re-refinement. It seems unlikely that this is due to the level of theory used – AM1 for the ligand and binding site and standard stereochemistry restraints for the rest of the structure (Borbulevych et al., 2014), as opposed to HF/6-31G* for the ligand and binding site with the AMBER forcefield for the rest of the protein (Fu et al., 2011).
The MP2//HF studies by Fu et al. use values for w ranging from 0.3 to 1.0 (Fu et al., 2012). The weights used in DivCon (Borbulevych et al., 2014) are not reported, but the default in phenix.refine (Afonine et al., 2012), is that the weighting of the X-ray data is determined by an automatic procedure in order to obtain the best Rfree for the structure as a whole (Adams et al., 1997). In the re-refinement ligand placement procedure of AFITT, the MMFF94 conformational energy of the ligand correlates nearly linearly with the weighting of its fit into the shape of the electron density. The best value for the weight is determined heuristically for each ligand with a series of adiabatic optimizations (Wlodek et al., 2006; Janowski et al., 2016). Regardless of the value of the weighting factor, there is a tradeoff between low ligand energies and a worse fit to the X-ray data, versus higher ligand energies but a good fit to the data, as measured by the R-factor in the form of Rfree or RSR for the ligand (Ryde et al., 2002; Li et al., 2012) (see below for more details on these metrics).
Thus which end of the ligand energy spectrum is chosen for a final structure model may ultimately depend on whether the researcher favors a small value for Rfree or a small value for ligand conformational energy. It is possible to produce a structure with implausibly high energies by weighting the restraints too loosely and forcing too tight a fit to the experimental data, and it is also possible to find a low-energy structure if the restraints are chemically correct and are weighted strongly enough. The implication is that some ligand structures in the PDB might actually be too low in conformational energy due to the use of default restraint data in the absence of experimental data with enough resolution.
2. Conformational sampling
The main reason for the tension between the R-factor as a measure of overall model fit to the electron density and the strain energy of the bound ligand is probably that experimental data is in reality an ensemble of conformations. What is observed in the crystal is an average in both time and space: different individual protein molecules may be binding ligands in different conformations or may remain unbound, and in addition the bound ligand vibrates with thermal energy and may change conformation over time.
Fitting a single ligand torsion angle to experimental data that, for example, is comprised of two populated low-energy conformations can have the paradoxical effect of producing an averaged value for the torsion angle that is located at or near an energy maximum between the two local minima. Methods for deconvoluting this type of conformational disorder from the electron density by fitting structural ensembles with partial occupancies are under development, and for a recent review on this subject, see Woldeyes et al. (2014). However, none of these methods are in widespread use, and no consensus on the best approach for ensemble refinement or for clearly communicating the results (multiple structures, each with fractional occupancies, positional uncertainties, and thermal parameters) has yet been reached. The need for progress in this area is increasingly being recognized by the crystallography community (Adams et al., 2016).
3. Resolution
In the last ten years, advances in X-ray sources and detectors have led to many more atomic (< 1.2Å) and sub-atomic (< 0.85Å) resolution structures in the PDB. At atomic resolution, individual atoms can be resolved and often stereochemical restraints can be applied with lower weights. Here there are enough experimental observables to allow anisotropic instead of spherical B-factors, such that the vibrational motion of each atom is described by six parameters (Wlodawer et al., 2008). At sub-atomic resolution, the number of multiple occupancies at the level of single atoms dramatically increases, allowing occupancy to be decoupled from the B-factors (Tronrud, 2007). It is therefore only at this resolution that atomic positions and inter-atomic distances are believed to be accurate enough to validate significant deviations away from standard stereochemical geometries (Petrova and Podjarny, 2004; Wlodawer et al., 2008; Davis et al., 2008). Well-ordered parts of the model can then also be refined without any stereochemical restraints, alleviating the issue discussed above of how to weight the restraints relative to the fit to the electron density. However, structures at atomic and sub-atomic resolution are not uniformly well-resolved, and contain regions of conformational flexibility and disorder. Jaskolski et al. (2007) have pointed out the need for improved refinement methods, to allow restraint weights to vary across different regions of the structure, and to estimate how strongly to weight each region of the structure based on the amount of disorder present.
At ultra-high resolutions, deformations in the electron density can be seen, as there are up to ~80 observables per heavy atom (Evans, 2007). In this regime, additional molecular information on connectivity and bond orders can be extracted from the electron density by fitting to a model with spherical scatterers between the atoms, located at the high points of the bonding electron density (Afonine et al., 2007). From this, details of the chemical interactions between protein and ligand (as opposed to simply their geometric conformations) may be inferred, including protonation states and even individual atomic reactivities from orbital occupancies. This can allow a determination of the extent to which any high-energy torsions may be relevant to biological function (Cachau and Podjarny, 2005). The acquisition of data at sub-atomic resolution remains an art form requiring exceptionally high-quality crystals. There is however an expectation that this resolution bin within the PDB will grow faster in the coming years, mostly because of the increasing ability to collect sub-atomic data from very small crystals (Chapman et al., 2011).
B. The unbound ligand structure in solution
1. Conformational sampling
Ideally, the bound conformation should be compared to the full ensemble of conformations found in solution, weighted by occupancy according to their relative energies. NMR experimental methods have only recently been developed for accurately observing the Boltzmann distribution of unbound ligand conformations in solution (Blundell et al., 2013). If this kind of experimental data can be compiled for a much wider collection of compounds it would provide a much-needed counterpart to the CSD, which is currently the major source of data for molecular geometry analysis (Bruno et al., 2004). Computational methods for generating and analyzing conformational ensembles in solution are also under development (Forti et al., 2012; Juárez-Jiménez et al., 2015) but accuracy requires extensive and fairly high level calculations. In a rigorous molecular dynamics study of 26 compounds, Foloppe and Chen (2016) collected data for 0.5–1 μs on each compound and found that even this was not enough sampling to achieve full convergence in some cases, where there are high energy barriers between different conformations.
For computational expedience a bound ligand has often been compared to a single lowest-energy conformer in solution (Boström et al., 1998). However, in the absence of experimental data, identifying the correct solution conformation is difficult. One problem, as discussed earlier, is that the enumeration of possible conformers for medium- or larger-sized flexible molecules must be done stochastically. But even if a conformational search to locate the global minimum is replaced by a simple energy minimization to calculate the local conformational energy, a second problem is that, especially for polar and charged molecules, intermolecular interactions with solvent are as important as intramolecular energies in determining molecular conformation.
2. Electrostatic effects
The shape and smoothness of the potential energy surface of a molecule depends on its surrounding environment. This is illustrated in a study by Zhu et al. (2012), where the potential energy surfaces for rotation around the χ1 and χ2 torsions of each of the amino acid sidechains was calculated in vacuum. This surface was termed the “intrinsic energy landscape” and compared to the distribution of rotamers found in high resolution crystal structures. For polar and charged sidechains, the deepest energy minima seen in the vacuum calculations (conformations in which strong interactions were made between the sidechains and the backbone), tended not to be observed in protein structures. This is in line with the observed tendency for charged ligands to “collapse” upon energy minimization outside of the binding site if a strong intramolecular interaction can be formed (Nicklaus et al., 1995; Boström et al., 1998; Perola and Charifson, 2004; Butler et al., 2009). These intramolecularly bonded conformations may be seen in solution, but do not dominate, because charged or polar functional groups can form equally strong electrostatic interactions with water molecules.
A computational solution to this problem is the inclusion of explicit water molecules when calculating the solution conformation, but the relatively long-range nature of solvent shielding, and thus the number of waters that need to be included, means that the configurational sampling is prohibitive. Implicit solvent models do not require sampling, but an averaged dielectric continuum lacks the strong directional interactions necessary to accurately model these interactions. Indeed, conformational collapse has been observed to occur with generalized Born solvent models as well (Foloppe and Chen, 2016). If a charged or polar molecule falls into one of these deep energy minima its conformational energy penalty, when compared to its binding conformation, will be spuriously large. This is illustrated in Figure 4 with the much higher maximum energies seen in vacuum and implicit solvent compared to explicit solvent.
Some hybrid implicit/explicit solvation methods have been proposed to address this issue, where a shell or a small set of explicit waters is surrounded by bulk solvent modeled as a continuum (Lee et al., 2004; White and Meirovitch, 2006). For a recent comprehensive review, see Skyner et al. (2015).
VI. The quest for accurate understanding
The amount of macromolecular structure data amassed in the past 40 years is truly amazing and unexpected in its variety and complexity, yet the PDB remains a rather weak source of chemical information, in particular if we compare it with the far more precise CSD. This is partly due to intrinsic limitations: the size of the macromolecular systems, the difficulties in crystallizing macromolecules, etc. Some limitations, however, are the consequence of an emphasis on global measures of structure quality at the expense of the details in functionally important regions of the structure such as the ligand binding site, and an emphasis on novel structures vs. high quality structure evaluations or the development of improved ligand and binding site chemistry descriptors. To date, there has been a greater emphasis on solving previously unknown structures rather than improving the quality of known structures, by e.g. repeating the solution of a structure multiple times at different temperatures as suggested by Gilbert et al. (1983). This trend has probably been driven, at least to some extent, by the pressing need to characterize novel targets for drug design efforts, such that many structures are abandoned when a “reasonable” resolution is obtained (frequently at or just below 2Å resolution) resulting in a dilution of the value of the structures in the PDB for the chemist. Structures at ~2Å resolution can be “presented as an elegant picture” (Wlodawer et al., 2008) or, especially in the industrial context, used as a source of ideas for inhibitor syntheses (Danley, 2006) but do not contain enough accurate information for a quantitative understanding of protein-ligand interactions, including the exact geometry and conformational energy of the ligand.
We have suggested in Section V.A.1 that the source of much of the difference in ligand conformational energy seen with different methods of crystal structure re-refinement may be the value chosen for the scaling factor w in Equation 3, which weights the modeled forcefield energy of the structure relative to its fit to the electron density data. The tightness of the fit and the strain in the ligand seem to be correlated. The relative scaling of these two competing factors in the final model is more-or-less arbitrary (from the point of view of the ligand) if it is chosen to give the best overall model R or Rfree.
R and Rfree are global measures, and do not pinpoint any local areas that are incorrectly modeled. Since a small molecule ligand is 1/100th or 1/1000th the size of a protein, problems with its structure are out-weighted (Pozharski et al., 2013). However, new structure quality measurements in addition to the R-factor have been explored in more recent years. Metrics for analyzing individual fragments of a crystal structure include the RSR (real-space residual) and the RSCC (real-space correlation coefficient), both of which compare the observed and calculated electron densities over grid points on a selected fragment of the structure (Jones et al., 1991). Newer methods for comparing various types of calculated density contour maps as an improvement on the correlation coefficient have been suggested (Urzhumtsev et al., 2014), and new metrics based on the difference density have also been proposed as a measure of the accuracy of the proposed conformation for a given residue in the structure (Tickle, 2012).
While evaluation of the fit of the ligand to the electron density is an important component of model validation, this does not address the issues of restraint weighting and conformational heterogeneity discussed above. These issues cannot be fixed by more mathematically sophisticated methods for contouring the electron density or for analyzing the fit in local regions of the structure. Moreover, it is entirely possible for a ligand structure to fit well within the visible electron density and still be oriented incorrectly, and these quality measures will not necessarily detect it (Malde and Mark, 2011). While ligand structures can be improved by eliminating frank errors in their chemistry, which are more common than they should be, forcing a low energy model as the final structure is not a solution either. Ligand structure validation metrics ultimately need to be disentangled from this dichotomy.
A number of tools also exist for checking the stereochemical correctness of the ligand structure. This checking, as done by Mogul (Bruno et al., 2004) in the PDB validation suite, (Read et al., 2011; Gore et al., 2012) can flag torsions and ring conformations that are outliers in terms of the values seen in the Cambridge Structural Database, but there is no consideration of whether such an observed deviation from the conformations seen in small molecule crystal structures may be in fact supported by strong interactions with residues in the binding site. Other issues with ligand chemistry, such as the assignment of correct bond orders, the presence of resonance structures, the identification of correct tautomeric and protonation states, the optimization of the hydrogen bonding network, etc. are also not supported with current tools (Deller and Rupp, 2015). Analyzing the fit and chemical complementarity of the ligand into the binding site must still be done by hand with molecular visualization software (Deller and Rupp, 2015) and requires an intuitive sense of what “looks right,” achievable by experienced modelers and structural biologists but not necessarily by every end user of PDB crystal structures. (There is conversely the danger that an unwary crystallographer may be biased toward placing a ligand into a “good-looking” pose that is not fully supported by the underlying data (Pozharski et al., 2013)).
Many of the above-mentioned methodological limitations are the result of the lack of an abundance of options in the current refinement suites and molecular modeling software packages. A crystallographer does not have, in many cases, a choice of methods for describing the ligand while refining the structure model. A remarkable proof of this is the rapid acceptance of QM methods for X-ray refinement – but only after they became available as part of a crystallography package (PHENIX), whereas QM methods had been used in the biophysics and modeling communities in general for decades. Conversely, the lack of access to structure factor data or experimental restraints in modern modeling tools is equally concerning. It creates an odd degree of sub-specialization in an area of research where such a degree of separation should be unwelcome.
Awareness in the field is growing, albeit slowly, of the need to revisit the many existing PDB protein-ligand co-crystal structures in efforts such as PDB_REDO (Joosten et al., 2009, 2012) as well as the need for improving the generation, validation, deposition, and archiving of structural data on protein-ligand complexes. The first wwPDB/CCDC/D3R Ligand Validation Workshop held at the Research Collaboratory for Structural Bioinformatics at Rutgers University on July 30–31, 2015 clearly voiced this need. The published workshop report contains the general recommendation for future best practices for PDB deposition of co-crystal structures, “[Depositors should] communicate … perceived ambiguities regarding tautomers and protonation states of ligands not determined conclusively from the crystallographic data and chemical environment of the ligand by either (a) using the existing alternative conformation mechanism or (b) providing [unambiguous chemical definitions for ligands present in the crystal mother liquor and in the refined structural model, including hydrogen atoms and covalent modifications]” (Adams et al., 2016). Pushing this further, we propose that a new metric for the chemical evaluation of ligand structure quality, taking into account the balance between the internal geometry of the ligand and its interactions with the binding site and surrounding solvent, while representing the chemistry of the interacting partners as accurately as possible, should become part of the macromolecular structure refinement and validation process. The formulation of this metric in a statistical way, so that it is separated from the stereochemical restraint weighting but can be compared between structural models, should be further considered and developed. We plan future studies on this issue.
There is also a dearth of accurate data and methods for determining and analyzing unbound ligand structures, in particular the description of the solvated ensemble of the ligand as a comparison point for calculating the conformational energy change upon binding. We could note, however, that the concept of a “correct” comparison point for binding energies, while essential for quantifying the conformational energy contribution to the total free energy of binding, is not necessarily a biophysical observable. What is the precise point when the binding process starts, and from which point should we then determine the reference energy? Is it when the drug enters the body, or when it crosses the cell membrane, or when it approaches the target protein, or when it is at the entrance of the binding pocket? An arbitrarily defined energetic reference point may ultimately be less useful than a determination of the overall kinetics of the binding process and the nature of what is most likely a series of energy barriers that must be crossed to achieve the formation of a bound protein-ligand complex. The understanding of how ligand binding kinetics (rather than thermodynamics) affects pharmacology is still in its infancy, but there is a recent surge of interest in the residence time of a binding interaction as a key parameter in drug efficacy (Cusack et al., 2015). The conformational changes that a ligand undergoes, that may be stabilized in the bound state by conformational changes in the protein as well, are an essential component of the binding off-rate and residence time (Copeland, 2011).
VII. Summary
The range of conformational energies that should be seen or expected for ligands in protein-ligand crystal structures has been a controversial issue in the fields of protein crystallography and molecular modeling for twenty years. We attempt in this review to clarify and reconcile the apparent controversy, while exploring sources of uncertainty or error in the available experimental data and computational methods.
We examine several potential sources for discrepancies between different computational studies, and suggest that differences in minimization or coordinate-handling protocols do not, on average, produce large differences in calculated conformational energies for PDB ligands, but that structures with high conformational energies are seen despite the method used. We also suggest that forcefield inaccuracies are not the source of high conformational energies either, but rather the only methods that produce no high-energy conformers are those where the treatment of electrostatics or solvation causes “over screening” of the conformational energies. If solvation energies are not added to the internal energy, and if electrostatics are not ignored, then ligand conformational energies, as observed in protein-ligand crystal structures, can be quite large.
We also discuss sources of errors, in both the experimentally determined structure of the bound ligand and in the calculated reference point of the unbound structure, that can cause spuriously high conformational energies. We suggest that small molecule conformations, as “solved” in medium-resolution protein-ligand crystal structures, are essentially probabilistic rather than deterministic in nature. This is due to the nature of the refinement process, in which the final single static conformation of the ligand that is typically reported is dependent on the weighting of the restraints relative to the fit to electron density and the amount of local disorder in that region of the structure.
If small molecule ligands in protein crystal structures do sometimes have high conformational energies, regardless of the methods by which these are determined computationally, and if some of this energy is “real” but some (maybe most) is an artifact of the limitations of the experimental data, then we must be able to distinguish between these two contributions to the energy in order for ligand structures from protein-ligand complexes to be used in a quantitative way for structure-based drug design. Ultimately, only a sufficient number of atomic and sub-atomic resolution structures combined with high-quality refinement methods will allow us to unequivocally answer the question of what conformational energy ranges are truly present in protein-ligand complexes.
Supplementary Material
Acknowledgments
The authors thank J.A. Kelley and S.H. Bloch for their helpful comments and suggestions. This work was supported in part by the Intramural Research Program of the National Institutes of Health, Center for Cancer Research, National Cancer Institute, and in part with Federal funds from the Frederick National Laboratory for Cancer Research, National Institutes of Health, under contract HHSN261200800001E. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products or organizations imply endorsement by the U.S. Government.
References
- Adams PD, Aertgeerts K, Bauer C, Bell JA, Berman HM, Bhat TN, Young J, et al. Outcome of the first wwPDB/CCDC/D3R ligand validation workshop. Structure. 2016;24:502–8. doi: 10.1016/j.str.2016.02.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Adams PD, Pannu NS, Read RJ, Brünger AT. Cross-validated maximum likelihood enhances crystallographic simulated annealing refinement. Proc Natl Acad Sci USA. 1997;94:5018–23. doi: 10.1073/pnas.94.10.5018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Afonine PV, Grosse-Kunstleve RW, Adams PD, Lunin VY, Urzhumtsev A. On macromolecular refinement at subatomic resolution with interatomic scatterers. Acta Crystallogr D Biol Crystallogr. 2007;63:1194–7. doi: 10.1107/S0907444907046148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Afonine PV, Grosse-Kunstleve RW, Echols N, Headd JJ, Moriarty NW, Mustyakimov M, Terwilliger TC, Urzhumtsev A, Zwart PH, Adams PD. Towards automated crystallographic structure refinement with phenix.refine. Acta Crystallogr D Biol Crystallogr. 2012;68:352–67. doi: 10.1107/S0907444912001308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Andrews PR, Craik DJ, Martin JL. Functional group contributions to drug-receptor interactions. J Med Chem. 1984;27:1648–57. doi: 10.1021/jm00378a021. [DOI] [PubMed] [Google Scholar]
- Avgy-David HH, Senderowitz H. Toward focusing conformational ensembles on bioactive conformations: A molecular mechanics/quantum mechanics study. J Chem Inf Model. 2015;55:2154–67. doi: 10.1021/acs.jcim.5b00259. [DOI] [PubMed] [Google Scholar]
- Berg L, Mishra BK, Andersson CD, Ekström F, Linusson A. The nature of activated non-classical hydrogen bonds: A case study on acetylcholinesterase-ligand complexes. Chemistry. 2016;22:2672–81. doi: 10.1002/chem.201503973. [DOI] [PubMed] [Google Scholar]
- Blundell CD, Packer MJ, Almond A. Quantification of free ligand conformational preferences by NMR and their relationship to the bioactive conformation. Bioorg Med Chem. 2013;21:4976–87. doi: 10.1016/j.bmc.2013.06.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Borbulevych OY, Plumley JA, Martin RI, Merz KM, Westerhoff LM. Accurate macromolecular crystallographic refinement: incorporation of the linear scaling, semiempirical quantum-mechanics program DivCon into the PHENIX refinement package. Acta Crystallogr D Biol Crystallogr. 2014;70:1233–47. doi: 10.1107/S1399004714002260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boström J. Reproducing the conformations of protein-bound ligands: a critical evaluation of several popular conformational searching tools. J Comput Aided Mol Des. 2001;15:1137–52. doi: 10.1023/a:1015930826903. [DOI] [PubMed] [Google Scholar]
- Boström J, Greenwood JR, Gottfries J. Assessing the performance of OMEGA with respect to retrieving bioactive conformations. J Mol Graph Model. 2003;21:449–62. doi: 10.1016/s1093-3263(02)00204-8. [DOI] [PubMed] [Google Scholar]
- Boström J, Norrby PO, Liljefors T. Conformational energy penalties of protein-bound ligands. J Comput Aided Mol Des. 1998;12:383–96. doi: 10.1023/a:1008007507641. [DOI] [PubMed] [Google Scholar]
- Brünger AT. Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature. 1992;355:472–5. doi: 10.1038/355472a0. [DOI] [PubMed] [Google Scholar]
- Bruno IJ, Cole JC, Kessler M, Luo J, Motherwell WDS, Purkis LH, Smith BR, Taylor R, Cooper RI, Harris SE, Orpen AG. Retrieval of crystallographically-derived molecular geometry information. J Chem Inf Comput Sci. 2004;44:2133–44. doi: 10.1021/ci049780b. [DOI] [PubMed] [Google Scholar]
- Butler KT, Luque FJ, Barril X. Toward accurate relative energy predictions of the bioactive conformation of drugs. J Comput Chem. 2009;30:601–10. doi: 10.1002/jcc.21087. [DOI] [PubMed] [Google Scholar]
- Cachau RE, Podjarny AD. High-resolution crystallography and drug design. J Mol Recognit. 2005;18:196–202. doi: 10.1002/jmr.738. [DOI] [PubMed] [Google Scholar]
- Cereto-Massagué A, Ojeda MJ, Joosten RP, Valls C, Mulero M, Salvado MJ, Arola-Arnal A, Arola L, Garcia-Vallvé S, Pujadas G. The good, the bad and the dubious: VHELIBS, a validation helper for ligands and binding sites. J Cheminform. 2013;5:36. doi: 10.1186/1758-2946-5-36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chapman HN, Fromme P, Barty A, White TA, Kirian RA, Aquila A, Spence JCH, et al. Femtosecond X-ray protein nanocrystallography. Nature. 2011;470:73–7. doi: 10.1038/nature09750. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen I-J, Foloppe N. Conformational sampling of druglike molecules with MOE and Catalyst: implications for pharmacophore modeling and virtual screening. J Chem Inf Model. 2008;48:1773–91. doi: 10.1021/ci800130k. [DOI] [PubMed] [Google Scholar]
- Chen I-J, Foloppe N. Tackling the conformational sampling of larger flexible compounds and macrocycles in pharmacology and drug discovery. Bioorg Med Chem. 2013;21:7898–920. doi: 10.1016/j.bmc.2013.10.003. [DOI] [PubMed] [Google Scholar]
- Chodera JD, Mobley DL. Entropy-enthalpy compensation: Role and ramifications in biomolecular ligand recognition and design. Annu Rev Biophys. 2013;42:121–42. doi: 10.1146/annurev-biophys-083012-130318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cooper DR, Porebski PJ, Chruszcz M, Minor W. X-ray crystallography: Assessment and validation of protein-small molecule complexes for drug discovery. Expert Opin Drug Discov. 2011;6:771–82. doi: 10.1517/17460441.2011.585154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Copeland RA. Conformational adaptation in drug-target interactions and residence time. Future Med Chem. 2011;3:1491–501. doi: 10.4155/fmc.11.112. [DOI] [PubMed] [Google Scholar]
- Cornell WD, Cieplak P, Bayly CI, Gould IR, Merz KM, Ferguson DM, Spellmeyer DC, Fox T, Caldwell JW, Kollman PA. A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J Am Chem Soc. 1995;117:5179–97. [Google Scholar]
- Cusack KP, Wang Y, Hoemann MZ, Marjanovic J, Heym RG, Vasudevan A. Design strategies to address kinetics of drug binding and residence time. Bioorg Med Chem Lett. 2015;25:2019–27. doi: 10.1016/j.bmcl.2015.02.027. [DOI] [PubMed] [Google Scholar]
- Danley DE. Crystallization to obtain protein-ligand complexes for structure-aided drug design. Acta Crystallogr D Biol Crystallogr. 2006;62:569–75. doi: 10.1107/S0907444906012601. [DOI] [PubMed] [Google Scholar]
- Darley MG, Popelier PLA. Role of short-range electrostatics in torsional potentials. J Phys Chem A. 2008;112:12954–65. doi: 10.1021/jp803271w. [DOI] [PubMed] [Google Scholar]
- Davis AM, St-Gallay SA, Kleywegt GJ. Limitations and lessons in the use of X-ray structural information in drug design. Drug Discov Today. 2008;13:831–41. doi: 10.1016/j.drudis.2008.06.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deller MC, Rupp B. Models of protein-ligand crystal structures: trust, but verify. J Comput Aided Mol Des. 2015;29:817–36. doi: 10.1007/s10822-015-9833-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Engh RA, Huber R. Accurate bond and angle parameters for X-ray protein structure refinement. Acta Crystallogr A. 1991;47:392–400. [Google Scholar]
- Evans PR. An introduction to stereochemical restraints. Acta Crystallogr D Biol Crystallogr. 2007;63:58–61. doi: 10.1107/S090744490604604X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fadel F, Zhao Y, Cachau R, Cousido-Siah A, Ruiz FX, Harlos K, Howard E, Mitschler A, Podjarny A. New insights into the enzymatic mechanism of human chitotriosidase (CHIT1) catalytic domain by atomic resolution X-ray diffraction and hybrid QM/MM. Acta Crystallogr D Biol Crystallogr. 2015;71:1455–70. doi: 10.1107/S139900471500783X. [DOI] [PubMed] [Google Scholar]
- Feng Z, Chen L, Maddula H, Akcan O, Oughtred R, Berman HM, Westbrook J. Ligand Depot: a data warehouse for ligands bound to macromolecules. Bioinformatics. 2004;20:2153–5. doi: 10.1093/bioinformatics/bth214. [DOI] [PubMed] [Google Scholar]
- Foloppe N, Chen I-J. Conformational sampling and energetics of drug-like molecules. Curr Med Chem. 2009;16:3381–413. doi: 10.2174/092986709789057680. [DOI] [PubMed] [Google Scholar]
- Foloppe N, Chen I-J. Towards understanding the unbound state of drug compounds: Implications for the intramolecular reorganization energy upon binding. Bioorg Med Chem. 2016;24:2159–89. doi: 10.1016/j.bmc.2016.03.022. [DOI] [PubMed] [Google Scholar]
- Forti F, Cavasotto CN, Orozco M, Barril X, Luque FJ. A multilevel strategy for the exploration of the conformational flexibility of small molecules. J Chem Theory Comput. 2012;8:1808–19. doi: 10.1021/ct300097s. [DOI] [PubMed] [Google Scholar]
- Fu Z, Li X, Merz KM. Accurate assessment of the strain energy in a protein-bound drug using QM/MM X-ray refinement and converged quantum chemistry. J Comput Chem. 2011;32:2587–97. doi: 10.1002/jcc.21838. [DOI] [PubMed] [Google Scholar]
- Fu Z, Li X, Merz KM. Conformational analysis of free and bound retinoic acid. J Chem Theory Comput. 2012;8:1436–48. doi: 10.1021/ct200813q. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu Z, Li X, Miao Y, Merz KM. Conformational analysis and parallel QM/MM X-ray refinement of protein bound anti-Alzheimer drug donepezil. J Chem Theory Comput. 2013;9:1686–93. doi: 10.1021/ct300957x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gilbert WA, Kuriyan J, Petsko GA, Ponzi DR. Mapping the spatial distribution of protein fluctuations by X-ray diffraction. In: Clementi E, Sarma RH, editors. Structure and Dynamics: Nucleic Acids and Proteins. Guilderland, NY: Adenine Press; 1983. [Google Scholar]
- Gilson MK, Liu T, Baitaluk M, Nicola G, Hwang L, Chong J. BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res. 2016;44:D1045–1053. doi: 10.1093/nar/gkv1072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gonthier JF, Steinmann SN, Wodrich MD, Corminboeuf C. Quantification of “fuzzy” chemical concepts: a computational perspective. Chem Soc Rev. 2012;41:4671–87. doi: 10.1039/c2cs35037h. [DOI] [PubMed] [Google Scholar]
- Gore S, Velankar S, Kleywegt GJ. Implementing an X-ray validation pipeline for the Protein Data Bank. Acta Crystallogr D Biol Crystallogr. 2012;68:478–83. doi: 10.1107/S0907444911050359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Halgren TA, Nachbar RB. Merck molecular force field. IV. Conformational energies and geometries for MMFF94. J Comput Chem. 1996;17:587–615. [Google Scholar]
- Hao M-H, Haq O, Muegge I. Torsion angle preference and energetics of small-molecule ligands bound to proteins. J Chem Inf Model. 2007;47:2242–52. doi: 10.1021/ci700189s. [DOI] [PubMed] [Google Scholar]
- Hawkins PCD, Nicholls A. Conformer generation with OMEGA: Learning from the data set and the analysis of failures. J Chem Inf Model. 2012;52:2919–36. doi: 10.1021/ci300314k. [DOI] [PubMed] [Google Scholar]
- Hawkins PCD, Skillman AG, Warren GL, Ellingson BA, Stahl MT. Conformer generation with OMEGA: algorithm and validation using high quality structures from the Protein Databank and Cambridge Structural Database. J Chem Inf Model. 2010;50:572–84. doi: 10.1021/ci100031x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hostas J, Rezac J, Hobza P. On the performance of the semiempirical quantum mechanical PM6 and PM7 methods for noncovalent interactions. Chem Phys Lett. 2013;568:161–6. [Google Scholar]
- Jack A, Levitt M. Refinement of large structures by simultaneous minimization of energy and R factor. Acta Crystallogr A. 1978;34:931–5. [Google Scholar]
- Janowski PA, Moriarty NW, Kelley BP, Case DA, York DM, Adams PD, Warren GL. Improved ligand geometries in crystallographic refinement using AFITT in PHENIX. Acta Crystallogr D Struct Biol. 2016;72:1062–72. doi: 10.1107/S2059798316012225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jaskolski M, Gilski M, Dauter Z, Wlodawer A. Stereochemical restraints revisited: how accurate are refinement targets and how much should protein structures be allowed to deviate from them? Acta Crystallogr D Biol Crystallogr. 2007;63:611–20. doi: 10.1107/S090744490700978X. [DOI] [PubMed] [Google Scholar]
- Jiang J, Wu Y, Wang Z-X, Wu C. Assessing the performance of popular quantum mechanics and molecular mechanics methods and revealing the sequence-dependent energetic features using 100 tetrapeptide models. J Chem Theory Comput. 2010;6:1199–209. [Google Scholar]
- Jones TA, Zou JY, Cowan SW, Kjeldgaard M. Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Crystallogr A. 1991;47:110–9. doi: 10.1107/s0108767390010224. [DOI] [PubMed] [Google Scholar]
- Joosten RP, Joosten K, Murshudov GN, Perrakis A. PDB_REDO: constructive validation, more than just looking for errors. Acta Crystallogr D Biol Crystallogr. 2012;68:484–96. doi: 10.1107/S0907444911054515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Joosten RP, Salzemann J, Bloch V, Stockinger H, Berglund A-C, Blanchet C, Bongcam-Rudloff E, Combet C, Da Costa AL, Deleage G, Diarena M, Fabbretti R, Fettahi G, Flegel V, Gisel A, Kasam V, Kervinen T, Korpelainen E, Mattila K, Pagni M, Reichstadt M, Breton V, Tickle IJ, Vriend G. PDB_REDO: automated re-refinement of X-ray structure models in the PDB. J Appl Crystallogr. 2009;42:376–84. doi: 10.1107/S0021889809008784. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jorgensen WL, Maxwell DS, Tirado-Rives J. Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J Am Chem Soc. 1996;118:11225–36. [Google Scholar]
- Juárez-Jiménez J, Barril X, Orozco M, Pouplana R, Luque FJ. Assessing the suitability of the multilevel strategy for the conformational analysis of small ligands. J Phys Chem B. 2015;119:1164–72. doi: 10.1021/jp506779y. [DOI] [PubMed] [Google Scholar]
- Karton A, Schreiner PR, Martin JML. Heats of formation of platonic hydrocarbon cages by means of high-level thermochemical procedures. J Comput Chem. 2016;37:49–58. doi: 10.1002/jcc.23963. [DOI] [PubMed] [Google Scholar]
- Kirchmair J, Laggner C, Wolber G, Langer T. Comparative analysis of protein-bound ligand conformations with respect to catalyst’s conformational space subsampling algorithms. J Chem Inf Model. 2005;45:422–30. doi: 10.1021/ci049753l. [DOI] [PubMed] [Google Scholar]
- Kirchmair J, Wolber G, Laggner C, Langer T. Comparative performance assessment of the conformational model generators omega and catalyst: a large-scale survey on the retrieval of protein-bound ligand conformations. J Chem Inf Model. 2006;46:1848–61. doi: 10.1021/ci060084g. [DOI] [PubMed] [Google Scholar]
- Klebe G. Applying thermodynamic profiling in lead finding and optimization. Nat Rev Drug Discov. 2015;14:95–110. doi: 10.1038/nrd4486. [DOI] [PubMed] [Google Scholar]
- Klebe G, Mietzner T. A fast and efficient method to generate biologically relevant conformations. J Comput Aided Mol Des. 1994;8:583–606. doi: 10.1007/BF00123667. [DOI] [PubMed] [Google Scholar]
- Kleywegt GJ. Crystallographic refinement of ligand complexes. Acta Crystallogr D Biol Crystallogr. 2007;63:94–100. doi: 10.1107/S0907444906022657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kleywegt GJ, Harris MR. ValLigURL: a server for ligand-structure comparison and validation. Acta Crystallogr D Biol Crystallogr. 2007;63:935–8. doi: 10.1107/S090744490703315X. [DOI] [PubMed] [Google Scholar]
- Kolář MH, Deepa P, Ajani H, Pecina A, Hobza P. Characteristics of a σ-hole and the nature of a halogen bond. Top Curr Chem. 2015;359:1–25. doi: 10.1007/128_2014_606. [DOI] [PubMed] [Google Scholar]
- Kuntz ID, Chen K, Sharp KA, Kollman PA. The maximal affinity of ligands. Proc Natl Acad Sci USA. 1999;96:9997–10002. doi: 10.1073/pnas.96.18.9997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lamb AL, Kappock TJ, Silvaggi NR. You are lost without a map: Navigating the sea of protein structures. Biochim Biophys Acta. 2015;1854:258–68. doi: 10.1016/j.bbapap.2014.12.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laskowski RA, Moss DS, Thornton JM. Main-chain bond lengths and bond angles in protein structures. J Mol Biol. 1993;231:1049–67. doi: 10.1006/jmbi.1993.1351. [DOI] [PubMed] [Google Scholar]
- Lee MS, Salsbury FR, Olson MA. An efficient hybrid explicit/implicit solvent method for biomolecular simulations. J Comput Chem. 2004;25:1967–78. doi: 10.1002/jcc.20119. [DOI] [PubMed] [Google Scholar]
- Li L, Dantzer JJ, Nowacki J, O’Callaghan BJ, Meroueh SO. PDBcal: a comprehensive dataset for receptor-ligand interactions with three-dimensional structures and binding thermodynamics from isothermal titration calorimetry. Chem Biol Drug Des. 2008;71:529–32. doi: 10.1111/j.1747-0285.2008.00661.x. [DOI] [PubMed] [Google Scholar]
- Li J, Ehlers T, Sutter J, Varma-O’brien S, Kirchmair J. CAESAR: a new conformer generation algorithm based on recursive buildup and local rotational symmetry consideration. J Chem Inf Model. 2007;47:1923–32. doi: 10.1021/ci700136x. [DOI] [PubMed] [Google Scholar]
- Li X, Fu Z, Merz KM. QM/MM refinement and analysis of protein bound retinoic acid. J Comput Chem. 2012;33:301–10. doi: 10.1002/jcc.21978. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liebeschuetz J, Hennemann J, Olsson T, Groom CR. The good, the bad and the twisted: a survey of ligand geometry in protein crystal structures. J Comput Aided Mol Des. 2012;26:169–83. doi: 10.1007/s10822-011-9538-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res. 2007;35:D198–201. doi: 10.1093/nar/gkl999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- MacArthur MW, Thornton JM. Deviations from planarity of the peptide bond in peptides and proteins. J Mol Biol. 1996;264:1180–95. doi: 10.1006/jmbi.1996.0705. [DOI] [PubMed] [Google Scholar]
- MacKerell AD, Bashford D, Bellott M, Dunbrack RL, Evanseck JD, Field MJ, Fischer S, Gao J, Guo H, Ha S, Joseph-McCarthy D, Kuchnir L, Kuczera K, Lau FT, Mattos C, Michnick S, Ngo T, Nguyen DT, Prodhom B, Reiher WE, Roux B, Schlenkrich M, Smith JC, Stote R, Straub J, Watanabe M, Wiórkiewicz-Kuczera J, Yin D, Karplus M. All-atom empirical potential for molecular modeling and dynamics studies of proteins. J Phys Chem B. 1998;102:3586–616. doi: 10.1021/jp973084f. [DOI] [PubMed] [Google Scholar]
- Malde AK, Mark AE. Challenges in the determination of the binding modes of non-standard ligands in X-ray crystal complexes. J Comput Aided Mol Des. 2011;25:1–12. doi: 10.1007/s10822-010-9397-6. [DOI] [PubMed] [Google Scholar]
- McMartin C, Bohacek RS. QXP: powerful, rapid computer algorithms for structure-based drug design. J Comput Aided Mol Des. 1997;11:333–44. doi: 10.1023/a:1007907728892. [DOI] [PubMed] [Google Scholar]
- Melandri S. “Union is strength”: how weak hydrogen bonds become stronger. Phys Chem Chem Phys. 2011;13:13901–11. doi: 10.1039/c1cp20824a. [DOI] [PubMed] [Google Scholar]
- Metz S, Kästner J, Sokol AA, Keal TW, Sherwood P. ChemShell—a modular software package for QM/MM simulations. WIREs Comput Mol Sci. 2014;4:101–10. [Google Scholar]
- Mobley DL, Dill KA. Binding of small-molecule ligands to proteins: “what you see” is not always “what you get”. Structure. 2009;17:489–98. doi: 10.1016/j.str.2009.02.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mobley DL, Guthrie JP. FreeSolv: a database of experimental and calculated hydration free energies, with input files. J Comput Aided Mol Des. 2014;28:711–20. doi: 10.1007/s10822-014-9747-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moriarty NW, Grosse-Kunstleve RW, Adams PD. electronic Ligand Builder and Optimization Workbench (eLBOW): a tool for ligand coordinate and restraint generation. Acta Crystallogr D Biol Crystallogr. 2009;65:1074–80. doi: 10.1107/S0907444909029436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nicholls A, Wlodek S, Grant JA. The SAMPL1 solvation challenge: Further lessons regarding the pitfalls of parametrization. J Phys Chem B. 2009;113:4521–32. doi: 10.1021/jp806855q. [DOI] [PubMed] [Google Scholar]
- Nicklaus MC, Wang S, Driscoll JS, Milne GW. Conformational changes of small molecules binding to proteins. Bioorg Med Chem. 1995;3:411–28. doi: 10.1016/0968-0896(95)00031-b. [DOI] [PubMed] [Google Scholar]
- Olsson TSG, Williams MA, Pitt WR, Ladbury JE. The thermodynamics of protein-ligand interaction and solvation: insights for ligand design. J Mol Biol. 2008;384:1002–17. doi: 10.1016/j.jmb.2008.09.073. [DOI] [PubMed] [Google Scholar]
- Paton RS, Goodman JM. Hydrogen bonding and pi-stacking: how reliable are force fields? A critical evaluation of force field descriptions of nonbonded interactions. J Chem Inf Model. 2009;49:944–55. doi: 10.1021/ci900009f. [DOI] [PubMed] [Google Scholar]
- Perola E, Charifson PS. Conformational analysis of drug-like molecules bound to proteins: an extensive study of ligand reorganization upon binding. J Med Chem. 2004;47:2499–510. doi: 10.1021/jm030563w. [DOI] [PubMed] [Google Scholar]
- Petrova T, Podjarny A. Protein crystallography at subatomic resolution. Rep Prog Phys. 2004;67:1565–605. [Google Scholar]
- Pozharski E, Weichenberger CX, Rupp B. Techniques, tools and best practices for ligand electron-density analysis and results from their application to deposited crystal structures. Acta Crystallogr D Biol Crystallogr. 2013;69:150–67. doi: 10.1107/S0907444912044423. [DOI] [PubMed] [Google Scholar]
- Read RJ, Adams PD, Arendall WB, Brunger AT, Emsley P, Joosten RP, Kleywegt GJ, Krissinel EB, Lütteke T, Otwinowski Z, Perrakis A, Richardson JS, Sheffler WH, Smith JL, Tickle IJ, Vriend G, Zwart PH. A new generation of crystallographic validation tools for the protein data bank. Structure. 2011;19:1395–412. doi: 10.1016/j.str.2011.08.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reynolds CH. Protein–ligand cocrystal structures: We can do better. ACS Med Chem Lett. 2014;5:727–9. doi: 10.1021/ml500220a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reynolds CH, Tounge BA, Bembenek SD. Ligand binding efficiency: Trends, physical basis, and implications. J Med Chem. 2008;51:2432–8. doi: 10.1021/jm701255b. [DOI] [PubMed] [Google Scholar]
- Rezáč J, Riley KE, Hobza P. S66: A well-balanced database of benchmark interaction energies relevant to biomolecular structures. J Chem Theory Comput. 2011;7:2427–38. doi: 10.1021/ct2002946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Risthaus T, Steinmetz M, Grimme S. Implementation of nuclear gradients of range-separated hybrid density functionals and benchmarking on rotational constants for organic molecules. J Comput Chem. 2014;35:1509–16. doi: 10.1002/jcc.23649. [DOI] [PubMed] [Google Scholar]
- Ryde U, Nilsson K. Quantum chemistry can locally improve protein crystal structures. J Am Chem Soc. 2003;125:14232–3. doi: 10.1021/ja0365328. [DOI] [PubMed] [Google Scholar]
- Ryde U, Olsen L, Nilsson K. Quantum chemical geometry optimizations in proteins using crystallographic raw data. J Comput Chem. 2002;23:1058–70. doi: 10.1002/jcc.10093. [DOI] [PubMed] [Google Scholar]
- Sadowski J, Boström J. MIMUMBA revisited: torsion angle rules for conformer generation derived from X-ray structures. J Chem Inf Model. 2006;46:2305–9. doi: 10.1021/ci060042s. [DOI] [PubMed] [Google Scholar]
- Schlegel HB. Encyclopedia of Computational Chemistry. Chichester, UK: John Wiley & Sons; 1998. Geometry optimization: 1. [Google Scholar]
- Seabra G de M, Walker RC, Roitberg AE. Are current semiempirical methods better than force fields? A study from the thermodynamics perspective. J Phys Chem A. 2009;113:11938–48. doi: 10.1021/jp903474v. [DOI] [PubMed] [Google Scholar]
- Sedlak R, Janowski T, Pitoňák M, Rezáč J, Pulay P, Hobza P. The accuracy of quantum chemical methods for large noncovalent complexes. J Chem Theory Comput. 2013;9:3364–74. doi: 10.1021/ct400036b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sitzmann M, Weidlich IE, Filippov IV, Liao C, Peach ML, Ihlenfeldt W-D, Karki RG, Borodina YV, Cachau RE, Nicklaus MC. PDB ligand conformational energies calculated quantum-mechanically. J Chem Inf Model. 2012;52:739–56. doi: 10.1021/ci200595n. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skyner RE, McDonagh JL, Groom CR, van Mourik T, Mitchell JBO. A review of methods for the calculation of solution free energies and the modelling of systems in solution. Phys Chem Chem Phys. 2015;17:6174–91. doi: 10.1039/c5cp00288e. [DOI] [PubMed] [Google Scholar]
- Smith RD, Engdahl AL, Dunbar JB, Carlson HA. Biophysical limits of protein-ligand binding. J Chem Inf Model. 2012;52:2098–106. doi: 10.1021/ci200612f. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith JC, Karplus M. Empirical force field study of geometries and conformational transitions of some organic molecules. J Am Chem Soc. 1992;114:801–12. [Google Scholar]
- Sperandio O, Souaille M, Delfaud F, Miteva MA, Villoutreix BO. MED-3DMC: a new tool to generate 3D conformation ensembles of small molecules with a Monte Carlo sampling of the conformational space. Eur J Med Chem. 2009;44:1405–9. doi: 10.1016/j.ejmech.2008.09.052. [DOI] [PubMed] [Google Scholar]
- Stewart JJ. Optimization of parameters for semiempirical methods VI: more modifications to the NDDO approximations and re-optimization of parameters. J Mol Model. 2013;19:1–32. doi: 10.1007/s00894-012-1667-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stockwell GR, Thornton JM. Conformational diversity of ligands bound to proteins. J Mol Biol. 2006;356:928–44. doi: 10.1016/j.jmb.2005.12.012. [DOI] [PubMed] [Google Scholar]
- Tickle IJ. Statistical quality indicators for electron-density maps. Acta Crystallogr D Biol Crystallogr. 2012;68:454–67. doi: 10.1107/S0907444911035918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tirado-Rives J, Jorgensen WL. Contribution of conformer focusing to the uncertainty in predicting free energies for protein-ligand binding. J Med Chem. 2006;49:5880–4. doi: 10.1021/jm060763i. [DOI] [PubMed] [Google Scholar]
- Tronrud DE. Introduction to macromolecular refinement. Methods Mol Biol. 2007;364:231–54. doi: 10.1385/1-59745-266-1:231. [DOI] [PubMed] [Google Scholar]
- Tsuzuki S, Honda K, Uchimaru T, Mikami M, Fujii A. Magnitude and directionality of the interaction energy of the aliphatic CH/pi interaction: significant difference from hydrogen bond. J Phys Chem A. 2006;110:10163–8. doi: 10.1021/jp064206j. [DOI] [PubMed] [Google Scholar]
- Urzhumtsev A, Afonine PV, Lunin VY, Terwilliger TC, Adams PD. Metrics for comparison of crystallographic maps. Acta Crystallogr D Biol Crystallogr. 2014;70:2593–606. doi: 10.1107/S1399004714016289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vanommeslaeghe K, Hatcher E, Acharya C, Kundu S, Zhong S, Shim J, Darian E, Guvench O, Lopes P, Vorobyov I, Mackerell AD., Jr CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields. J Comput Chem. 2010;31:671–90. doi: 10.1002/jcc.21367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vieth M, Hirst JD, Brooks CL. Do active site conformations of small ligands correspond to low free-energy solution structures? J Comput Aided Mol Des. 1998;12:563–72. doi: 10.1023/a:1008055202136. [DOI] [PubMed] [Google Scholar]
- Wang J, Cieplak P, Kollman PA. How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules? J Comput Chem. 2000;21:1049–74. [Google Scholar]
- Wang Q, Pang Y-P. Preference of small molecules for local minimum conformations when binding to proteins. PLoS ONE. 2007;2:e820. doi: 10.1371/journal.pone.0000820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang J, Wolf RM, Caldwell JW, Kollman PA, Case DA. Development and testing of a general Amber force field. J Comput Chem. 2004;25:1157–74. doi: 10.1002/jcc.20035. [DOI] [PubMed] [Google Scholar]
- Weichenberger CX, Pozharski E, Rupp B. Visualizing ligand molecules in Twilight electron density. Acta Crystallogr F Struct Biol Cryst Commun. 2013;69:195–200. doi: 10.1107/S1744309112044387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- White RP, Meirovitch H. Minimalist explicit solvation models for surface loops in proteins. J Chem Theory Comput. 2006;2:1135–51. doi: 10.1021/ct0503217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wiberg KB. The concept of strain in organic chemistry. Angew Chem Int Ed. 1986;25:312–22. [Google Scholar]
- Wlodawer A, Minor W, Dauter Z, Jaskolski M. Protein crystallography for non-crystallographers, or how to get the best (but not more) from published macromolecular structures. FEBS J. 2008;275:1–21. doi: 10.1111/j.1742-4658.2007.06178.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wlodek S, Skillman AG, Nicholls A. Automated ligand placement and refinement with a combined force field and shape potential. Acta Crystallogr D Biol Crystallogr. 2006;62:741–9. doi: 10.1107/S0907444906016076. [DOI] [PubMed] [Google Scholar]
- Woldeyes RA, Sivak DA, Fraser JS. E pluribus unum, no more: from one crystal, many conformations. Curr Opin Struct Biol. 2014;28C:56–62. doi: 10.1016/j.sbi.2014.07.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu N, Yennawar HP, Merz KM. Refinement of protein crystal structures using energy restraints derived from linear-scaling quantum mechanics. Acta Crystallogr D Biol Crystallogr. 2005;61:322–32. doi: 10.1107/S0907444904033669. [DOI] [PubMed] [Google Scholar]
- Zhu X, Lopes PEM, Shim J, MacKerell AD. Intrinsic energy landscapes of amino acid side-chains. J Chem Inf Model. 2012;52:1559–72. doi: 10.1021/ci300079j. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
