Abstract
Accurate and rapid calculation of protein-small molecule interaction free energies is critical for computational drug discovery. Because of the large chemical space spanned by drug-like molecules, classical force fields contain thousands of parameters describing atom-pair distance and torsional preferences; each parameter is typically optimized independently on simple representative molecules. Here we describe a new approach in which small molecule force field parameters are jointly optimized guided by the rich source of information contained within thousands of available small molecule crystal structures. We optimize parameters by requiring that the experimentally determined molecular lattice arrangements have lower energy than all alternative lattice arrangements. Thousands of independent crystal lattice-prediction simulations were run on each of 1,386 small molecule crystal structures, and energy function parameters of an implicit solvent energy model were optimized so native crystal lattice arrangements had the lowest energy. The resulting energy model was implemented in Rosetta, together with a rapid genetic algorithm docking method employing grid-based scoring and receptor flexibility. The success rate of bound structure recapitulation in cross-docking on 1,112 complexes was improved by more than 10% over previously published methods, with solutions within <1 Å in over half of the cases. Our results demonstrate that small molecule crystal structures are a rich source of information for guiding molecular force field development, and the improved Rosetta energy function should increase accuracy in a wide range of small molecule structure prediction and design studies.
Graphical Abstract

INTRODUCTION
Classical force field parameterization based on liquid thermodynamic data and quantum chemistry typically proceeds by fitting different subsets of parameters on different subsets of representative molecules independently 1–4. A challenge with this approach is the transferability of the resulting model to systems not included in the parameterization set 5,6. For example, bond torsional parameters are often obtained by computing the energies of a set of conformations of test molecules with quantum chemistry, and then subtracting the electrostatic and van-der-Waals contributions. However, the resulting fitted function is highly dependent on the molecules selected for training. Using such a model to evaluate the energetics with different flanking chemical groups often yields inaccurate results 7–9. Roos et al3 showed that this issue could be resolved by expanding to hundreds of thousands of parameters fit to reproduce quantum chemistry calculations on many thousands of small molecules. We hypothesized that a balanced and transferable energy model involving far few parameters could be learned, by utilizing the many thousands of crystal structures of small molecules, which span a large diversity of chemical space 10,11. Since these crystal structures form spontaneously, the majority of these must be very low free energy states 12 (polymorphs determined by kinetic reasons generally have < 1kcal/mol energy differences 13,14), and hence the sum of the intra- and interatomic interaction energies must be low compared to almost all alternative packing arrangements and conformations of the molecule in the majority of cases 15,16.
The key steps in our approach are: a) generation of large numbers of alternative “decoy” lattice packing and conformational arrangements of a set of small molecules with known crystal structures; and b) simultaneous optimization of a large set of force field parameters, such that the experimentally observed crystal structures have lower energies than all of the alternative states. The advantages of this approach are that parameters are obtained directly from structural data on molecules 10,17 that are generally larger and more similar to drug-like compounds than the simple model compounds traditionally used for QM calculations. Moreover, as the energy of a crystal involves tradeoffs between different forces, this approach should yield a balanced force field which can (for instance) accurately model the subtle interplay between deviations from bonded geometry minima and optimization of non-bonded interactions. The consideration of both experimentally observed crystal structures and large numbers of alternative decoy structures is an advance over previous approaches in which potentials of mean force (PMF) are learned only from the observed structure16.
METHODS AND MATERIALS
Overview of the approach
We sought to develop and evaluate a generalized force field for drug discovery using a three step procedure: i) first, generation of large numbers of small molecule “decoy” lattices, ii) second, optimization of a force field to discriminate native lattices from among these decoys, and iii) third, evaluation of the force field in small molecule docking experiments. We first generated alternative packing arrangements for small molecules using a diverse set of 1,386 small molecule crystal lattice structures from the Cambridge Structural Database (CSD) 10,17 (870 for training and 516 for testing), by adapting Rosetta symmetry docking machinery 18 to sample space groups, lattice parameters, rigid-body and internal conformation of each small molecule (Fig 1a). We simultaneously fit 175 non-bonded parameters for a generalized implicit solvent force field with 57 atom types (Table S1) plus 269 parameters for a torsion model conditioned on both constituent atom types and bond types 8. The 444 free parameters were optimized to maximize the energy gap between the experimentally observed lattice and the sampled alternative arrangements and to fit small molecule thermodynamic and protein-ligand complex structural data simultaneously (Fig 1b) using the Simplex-search-based dualOptE algorithm 19. Nine iterations of parameter optimization (with each iteration consisting of 300 to 500 rounds of Simplex optimization involved) followed by crystal lattice regeneration were carried out; the final energy model is referred to as RosettaGenFF. RosettaGenFF was then tested on ligand docking benchmark sets using the newly developed docking tool Rosetta GALigandDock. In the following sections, we describe the crystal lattice prediction protocol to generate training data, the energy model and parameter optimization procedure, and the ligand docking method and dataset, in more detail.
Fig1. Force field optimization using small molecule crystal structures.

a) Structure perturbation operations in Monte Carlo conformational search used for small molecule crystal structure prediction. Random space group assignment is done at the start of each simulation, followed by 50 cycles of interspersed lattice parameter and intramolecular perturbation followed by minimization over all degrees of freedom. b) Schematic overview of iterative parameter optimization procedure integrating small molecule crystal structure prediction, the KL divergence of sampled dihedral angle and distance distributions compared to reference distributions derived from ~4,000 small molecule crystal structures, ligand-protein docked pose discrimination tests on 215 complexes each containing hundreds of pre-sampled conformations 30, and agreement with experimental hydration free energy for 643 small molecules 31. At every iteration, new force field parameters are obtained by simplex optimization using dualOptE 19, atom type classification logic is updated as necessary, and new low energy decoy lattice structures are generated. c) Comparison of performance against generalized Amber force field (GAFF1), decomposed by functional groups (left) or by interaction types across symmetry units (right). Statistics are collected from all molecules containing corresponding features and hence individual molecules can be counted multiple times.
Crystal structure prediction protocol
We developed a lattice-docking protocol to sample small molecules in various crystallographic space groups. To handle space groups with mirror symmetries, Rosetta’s symmetry machinery 20 was extended to allow mirror symmetry operations. For each space group, we expose as degrees of freedom (DOFs) the internal coordinates of the asymmetric unit, the rigid-body orientation of the molecule, and the dimensions of the lattice (Fig 1a). The symmetry machinery in Rosetta allows these DOFs to be sampled as well as minimized while maintaining the overall symmetry of the system.
Each run of structure prediction is carried out by running Metropolis Monte Carlo with minimization (MCM) search. At the beginning, cell volume is selected such that the crystal has 80-120% occupancy. Then, each lattice angle is assigned between 60 to 120. Finally, unit cell lengths are assigned randomly for all but one dimension which determines the chosen volume. This process is repeated until the longest to shortest cell length has a ratio of < 5:1. The input ligand conformation is also randomized by uniformly sampling all rotatable dihedral angles and rigid body placements in the lattice. Starting from this initial lattice, perturbation of one of the following sets of DOFs is attempted (Fig 1a) at each MCM cycle: i) translation or rotation of molecule, ii) a single dihedral angle in molecule, and iii) all lattice lengths or angles. Perturbation magnitudes are randomly selected from normal distributions with standard deviations of 0.5 Å / 2.5° / 5.0°, for translation/rotation/dihedral angles of ligands, respectively, and (0.5*sgmultiplicity) Å for lattice dimensions, where sgmultiplicity tries to capture the number of symmetric operators along each axis in a given space group, and is generally larger for space groups with higher symmetry. Lattice angles are sampled by allowing the random axis moves to modify the crystal axis direction as well as its magnitude. Subsequent minimization is made simultaneously on all DOFs, and the Metropolis criterion is applied. The lowest energy conformation after 50 cycles is returned.
Training and validation sets of crystal structures were collected from the Cambridge Structural Database (CSD) 10,17 satisfying the following conditions: (i) has one molecule per asymmetric unit; (ii) has >99% occupancy by the molecule; (iii) is composed of only the elements H, C, N, O, S, P, F, Cl, Br, and I; and (iv) has at least three and at most twelve rotatable bonds. We first curated an extended training set consisting of ~4,000 molecules and used for deriving torsion and distance statistics (Fig 1b). 870 molecules in the set were taken to generate decoys for training. The somewhat modest size of the training set is due to the intensive computational requirements of the “lattice discrimination test” used in parameter optimization (~50 CPU hours with 870 molecules, which is run several thousand times through the entire optimization process). Incorporating more molecules, coupled with an improved parameter optimization technique, should clearly benefit learning a more robust generalizable parameter set in the future. A separate validation set of 516 molecules was later collected from the CSD (independent of the extended training set) with the same conditions mentioned above. For each small molecule crystal lattice, thousands of structures are generated by repeating independent MCM structure predictions starting from random assignments of space group and ligand conformation. Initial ligand conformations were selected from among a pool of maximum 10 structures sampled by “confab” mode in openbabel 21. The space group is randomly assigned amongst a list of most commonly observed ones in the extended training set according to the chirality of the molecule: P 1 21/c 1, P 1 21/n 1, P-1, C 1 2/c 1, P b c a, P n a 21, C 1 c 1, P b c n, P c a 21, P c c n, and P 1 1 2 for achiral molecules; P 21 21 21, P1 21 1, C 1 2 1, and P 21 21 2 for chiral molecules. In addition to these “decoy” structures, near-native conformations were added to the conformation pool by running the same protocol without initial randomization. A total of > 1,000 de novo predictions and > 100 native perturbations were made for each molecule. An example command-line for performing crystal lattice prediction can be found in Supplemental Data.
RosettaGenFF
The energy model presented in this study, hereafter RosettaGenFF, integrates two distinct “sub-models.” The first is the previously developed Rosetta protein energy model 19,22, which is applied to any of the 20 canonical amino acids; more details can be found in previous works 19,22,23
Non-protein molecules and their interactions with canonical amino acids are described by a set of generic energy terms developed in this study:
| [1] |
with atomic parameters defined for Lennard-Jones (LJ) and implicit solvation following the generic atom types (see below and also Table S1). As the partial charges used in Coulomb energy calculations are more molecular properties than atomic properties, we obtain them for each compound using AM1-BCC calculations 24 and keep them fixed during model fitting. The anisotropic mplicit solvation model is described in Ref 22. The functional forms of these terms are shared between the protein and generic sub-models. An exception is for describing torsion preference: in the protein energy model, for LJ and Coulombic interactions three or fewer bonds apart are ignored to avoid overlap with statistical torsion potentials, while in the generic energy model, only interactions one or two bonds apart are ignored.
Generic atom types.
Our general strategy for assigning a distinct generalized atom type to each ligand atom is inspired by OPLS-AA force field 25. We consider 35 common and unique functional groups containing at least one O, N, S, and P in organic molecules listed in Table S1. When the atom does not belong to any of these functional groups, more general atom types are assigned by looking at element type and hybridization state (similar to Tripos force field 26). Then the atom type is further specified based on the number of hydrogens attached in order to take into account variations in the desolvation penalty, a unique aspect associated with the implicit solvation energy model 19,22,23. The initial non-bonded parameters were determined by considering the “best matching” atom in Rosetta’s protein energy model 19,23, followed by manual corrections on 9 LJ parameters to better reproduce experimental bulk liquid properties 27. Note that atom types and their definitions were refined in between rounds of parameter optimization. The final list is given in Table S1.
Generalized torsion term.
Our generalized torsion energy model follows the Karplus model, representing torsion potentials as a series of cosine functions up to 4-th order for an improved description of weakly conjugated systems 28. Coefficients are assigned based on the atom types of the four constituent atoms and the bond order of the central bond. These parameters are optimized through the procedure following. First, the number of torsion occurrences are counted in the extended training set of small molecule crystal structures (see Dataset below). Torsion types observed at least 50 times were assigned unique torsion coefficients, yielding 150 torsion types. The remaining torsions are handled by atom-type grouping, with a total of 65 additional torsion classes. With 4th order expansion of the Karplus equation, there are a total of 860 (=215 x 4) parameters. We further reduce this parameter set to 269 by restricting the coefficient order based on chemical intuition (e.g. torsions with strong preferences to planar conformations may only have non-zero first and second-order coefficients). The initial parameter set for optimization was brought from the best matching torsion in the OPLS-AA force field 25.
Parameter Optimization
Energy parameters were optimized by iteratively applying dualOptE 19 primarily to maximize the energy gap between near-native and decoy lattices (Fig 1b). First, crystal lattice conformations were generated using the previously described lattice sampling method. Then dualOptE was run for 400 to 700 cycles of Nelder-Mead simplex minimization 29, obtaining an optimal parameter set for the given atom type definition logic and decoy sets. The objective function used in dualOptE is represented as a weighted sum of metrics measuring performance on several specific tasks listed below. The number of parameters optimized at each dualOptE run ranged from 100 to 150, reduced from the 444 total parameters (269 torsional, 114 LJ, 57 solvation, and 4 hydrogen-bonding weight parameters) by grouping or sub-sampling parameters for efficiency. Finally, atom-type classification logic was updated by visually inspecting the failures originating from mistyping. This procedure -- from decoy generation to parameter optimization -- was iterated 9 times until atom typing logic converged.
A first phase of optimization (the “condensed phase”) was carried out for the first 6 iterations. Here, LJ, hydrogen bonding, and torsion parameters are optimized considering two tasks: lattice discrimination test and atomic geometry matching (individual tasks are described below). During this phase, the solvation term was turned off, and electrostatics and hydrogen bonding terms were upweighted to their strength in a dielectric media with electrostatic permittivity of 2.0. A second phase (the “solvent phase”) was carried out for the final 3 iterations, beginning with parameters from the end of the condensed phase. Individual solvation and LJ parameters -- together with a global weight controlling torsional energies -- were optimized simultaneously. Two additional tasks considering solvation energies were added to the overall optimization objective function (Fig 1b): ligand pose discrimination and hydration free energy recapitulation. These two tasks were critical for balancing components in the energy model. The ligand pose discrimination task ensures: a) a detailed atomic-level balance between desolvation and other non-bonded interactions, and b) a balance between the protein and generalized energy models. The hydration free energy recapitulation task regularizes solvation parameters in the same type of data as in the protein energetic model 19.
The lattice discrimination task measures how well a given energy function parameterization discriminates near-native lattice conformations against alternate “decoy” conformations for a set of 870 small molecules. Discriminative power is measured by Boltzmann probability metric, which measures the average probability of selecting near-native conformations 19 with variable definitions of “near-natives” of crystal RMSD of 1, 2, 4, 6 Å. The temperature factor (kbT) is defined as 0.1 times the gap between 5 percentile and 95 percentile energy values. Crystal RMSD is measured considering the asymmetric unit and all symmetry mates within 12 Å. Two values are computed and averaged: i) the Boltzmann probability for a set of 100 pre-selected conformations (native and non-native) which are only scored, and ii) the Boltzmann probability for a set of 20 pre-selected conformations that are minimized with the current energy parameters. These decoys are selected at the beginning of dualOptE with the initial parameter set and always included at least one sub-Angstrom structure with the lowest energy.
The atomic geometry matching task measures the Kullback-Leibler (KL) divergence in the distribution of atomic geometries (non-bonded distances and torsion angles) optimal for an energy parameter set against statistics collected from the extended training set of ~4,000 small molecules. Atomic geometry optimal for a parameter set is collected from minimized structures of predicted crystals (see lattice discrimination task above) individually for each type of atomic distance and torsion angle.
The ligand pose discrimination task measures the Boltzmann probability of selecting a near-native ligand pose against a pool of pre-sampled protein-ligand complexes. The pre-sampled complex set comprises both false and near-native poses for 215 various complexes 30, in which i) no target receptors overlap with any of the target receptors in our ligand-docking benchmarks (sequence identity threshold of 90%) and ii) maximum ligand Tanimoto coefficient < 0.4 to any of benchmark target ligands. At the beginning of a dualOptE run, 30 conformations with the lowest energy (including at least one with ligand RMSD < 1Å with the lowest energy) are chosen for each complex. At each cycle of dualOptE, each complex is minimized with the current energy parameterization (fixing the receptor conformation for efficiency), and receptor-ligand interface scores are collected. The Boltzmann probability is measured using the same criteria as in the lattice discrimination task.
The hydration free energy recapitulation task measures how well a solvation parameter set recapitulates experimental hydration free energy values of various small molecules, using a dataset of 643 small molecules 31. The hydration free energy of a molecule (dGhyd) is approximated by a summation of polar (dGpolar) and non-polar (dGnonpolar) contributions to the total solvation free energy, each of which is estimated as:
| [2] |
Where the dGfree,i are the atomic parameters in our implicit solvation model (eq 1, detailed description can be found in Ref 22), SA is the surface area of the molecule, and α and β are weighing factors on each term. These weighing factors are determined by the least-square-fit of this equation to experimental free energy values of amino-acid analogues 32 by taking dGfree,i values determined for protein atom types. The net agreement is measured as the sum of absolute errors in calculated values (in kcal/mol) over these 643 molecules. With this simple linear model fitting scheme, parameter determination is completed in minutes at each optimization cycle.
Finally, we validated the parameters on a list of thermodynamic liquid properties (density and heat-of-vaporization) shown in Fig S6.
GALigandDock: A genetic algorithm based Ligand docking method in Rosetta
We developed a new small molecule docking tool within Rosetta, GALigandDock, that enables fully automated on-the-fly sampling of both receptor and ligand conformational space. This docking tool iteratively evolves a pool of protein/ligand complex conformations against RosettaGenFF. It makes use of several key features broadly utilized in the ligand docking field: a motif-guided search for initial ligand placements, genetic algorithm optimization, and a grid-based energy precomputation.
Overview of Docking method.
GALigandDock accepts a single complex structure as input and searches for a pool of structures optimal for our generalized energy model through a genetic algorithm. While its basic algorithm adopts broadly accepted ideas in the ligand docking field, several unique features are also utilized. Only DOFs describing the ligand conformation (including 6 rigid body DOFs and DOFs describing rotatable torsions) are encoded into genes. If receptor flexibility is used, additional precomputation of the energy values of flexible parts is carried out; those “implicit” DOFs are optimized on-the-fly in their internal coordinates for every structure generated during genetic algorithm. The protocol starts with optimizing receptor side-chains and their protonation states at apo-state (except for self-docking). Then a subset of the initial pool was generated by motif-guided ligand conformation search (see below) portion of which varying between 50~70% depending on the number of possible motif match combinations (more the higher portion), and the rest from randomized genes.
At every iteration in the genetic algorithm, a gene undergoes either mutation (20% chance) or crossover (80% chance) with a randomly selected gene. For every generated conformation, receptor side-chains are optimized by a Monte Carlo (MC) search in discrete rotameric space followed by quasi-Newton minimization in all torsions including those in the ligand, repeating this twice by first ramping LJ repulsion scale from 0.1 at the first cycle to 1.0 at the second cycle. Flexible ring torsion sampling also follows this logic with bond length and angle terms to ensure the ring closure. Both MC and minimization are efficiently carried out using a 3D grid representation of energy (see below). In the 10,000 steps of MC search, one-body and two-body energies of rotamers precalculated at the beginning are utilized 33. Input rotamers possess constant bonuses of −2.0 kcal/mol in their one-body energies in order to prevent drifting away too much from the input. The 100 “parents” and 100 “children” are then pooled and trimmed to the lowest-energy 100 not closer than 1Å to one another; these 100 serve as the next generation’s “parents.” After 10 iterations, the top 20 structures are further side-chain optimized and backbone- and sidechain-minimized using the ungridded (continuous) energy. A single structure having the lowest complex energy is taken as a single representative.
GALigandDock supports a fully automated receptor flexibility logic. Initially, an ellipsoid is constructed around the input ligand conformation. Moments of inertia are computed and are scaled by the half size of the ligand box; if the moment of inertia along an axis is < 0.1 it is increased to 0.1 (for planar molecules). All protein residues whose average side-chain position overlaps this ellipsoid are assigned as flexible. On average 9.8 side-chains are assigned as flexible in the cross-docking benchmark set. There could be a possible caveat that assignments can be sensitive to initial ligand placement for an elongated ligand.
The simulation is repeated 5 and 16 times with median runtime running single simulations of rigid docking and receptor-flexible docking in a single CPU thread are 8.5 and 19.7 minutes, respectively. Multiplying by the number of repeats made per task, median core-hours per target in this study are 0.7, 5.3 hours, respectively. Simulation replicas can be run in parallel through multiple threads, hence the actual wall-clock runtime similar to the single simulation time. All the computational performances for our study were benchmarked in Intel E5-2650 v2 2.2 GHz processors. Examples of running GALigandDock can be found in Supplemental Data.
3D grid representation of energy.
RosettaGenFF is represented in 3D energy grids around the ligand pocket, which allows over 10-fold speed-up of docking simulations 34. For each atom type in the ligand, a per-atom “energy field” is computed on a 0.25 Å grid in a cubic box covering the pocket. The size of the cubic box is allocated depending on the maximum heavy atom distance from center-of-mass of the ligand (rmax), more specifically, as
| [3] |
This results on average 24 Å of cell dimensions in a cubic box. The energy field summarizes the interaction of all rigid receptor atoms to an atom at a particular grid point, allowing ligands to be scored against the grid without explicit enumeration over individual atomic pairs. 3D spline interpolation is used to compute and minimize off-grid points. Flexible side-chains do not contribute to grid energetics.
Special treatment was required for several orientation-dependent terms (as graphical illustrations shown in Fig 2a) highlighted below. For each of attractive and repulsive contributions to ELennard-Jones, and the isotropic portion of EImplicit-Solvation(see Eq. 1), separate grid tables were generated for each of the flexible atom types present. The grid table for the Coulombic term is unified into one representing the electric field. For the orientation-dependent hydrogen-bond term, the sparsity of interactions was exploited: a 3D hash table of receptor donors and acceptors was precomputed, allowing hydrogen bonds to be quickly identified and scored exactly with full consideration of orientation. For the orientation-dependent solvation terms 19,22,23, we could not exploit similar sparsity. Instead, these were approximated as the sum of two isotropic terms per-atom: one based on the atom position, and one based on a “water-binding” virtual position. Comparing exact to grid-computed energies, we see a Pearson correlation of 0.95, with most of the error coming from the orientation-dependent solvation terms (0.84 Pearson correlation).
Fig2. Improved force field leads to more accurate small molecule pose predictions.

a) Schematic description of Rosetta GALigandDock protocol. Graphical illustrations of steps highlighted in colors are shown in insets with corresponding colors (details in Methods). b) Self-docking results using RosettaGenFF and GALigandDock compared to the best reported results using state-of-the-art docking tools brought from literature 47–49 tested on the Astex diverse set 35. Success rates as assessed by ligand RMSD < 1 Å and < 2 Å are shown in solid and patterned bars respectively. Note that total docking time per ligand for the methods in comparison took ~10 times shorter (a few minutes) according to the Refs 47–49. “v.GAFF” stands for GALigandDock runs using GAFF instead of RosettaGenFF. c) Success rates using energy parameters from different stages of optimization; preopt, pre-optimized version; round1.3, after 3rd iteration; round1.6, after 6th iteration; round2.1, after first iteration of solvation parameter optimization (7th iteration in total); RosettaGenFF, the final parameter set. d-g) Examples of structures with highly accurate docked ligand poses. Ligand models are colored in gold for RosettaGenFF, in magenta for GAFF, and cyan for RosettaGenFF with the torsion term replaced with ChemPLP used in GOLD 50, respectively. d) A high accuracy prediction with ligand RMSD of 0.2 Å for a molecule with 12 rotatable internal torsions. e) An example showing the importance of balance between torsion angle preference and non-bonded interactions, 1t40. Right panels, ligand internal energy profiles as a function of χ1 and χ2 torsions are shown for different energy functions. The torsion angles in the predicted pose are indicated by arrows using the color scheme of d-g), the values in the crystal structure are indicated by black arrows. f) An example highlighting the importance of orientation-dependent hydrogen bonding term, 1uou. RosettaGenFF prefers a ligand pose with rich hydrogen bonding (RMSD 0.3 Å) while GAFF prefers one with more solvent exposure (RMSD 5.4 Å). g) An example of the benefit provided by orientation-dependent water-bridging energy term (detailed algorithm described in Ref 22). Crystal water depicted in the red sphere is not modeled explicitly in docking simulation, but still the water-bridging term gives a bonus when virtual water sites overlap (bottom inset) leading to RMSD 0.2 Å prediction; best pose by GAFF lacking this term clashes with this water position (RMSD 1.4 Å).
Motif-guided ligand conformation search.
It is critical for the genetic algorithm to start with a pool of genes that are promising but are also diverse. In initial testing, we found that fully randomized starting conformations had difficulty with ligands making hydrogen bonds deep inside the pocket. Therefore, a motif-guided placement strategy was applied for about 2/3 of our starting pool (50-70 models out of 100, with greater numbers for receptors with many pocket hydrogen bond donors or acceptors). All non-solvent-exposed hydrogen bonding sites in the receptor are identified, and “ideal waters” are built from these sites representing possible hydrogen-bonding ligand atom positions. These waters are clustered using a 4Å radius, and the N clustered motifs with best sum-of-grid-scores are selected, where N is determined within 5 and 15 by altering the solvent-exposure criterion. Groups of hydrogen-bonding atoms in ligands are defined as ligand motifs with the same clustering criteria. Motif matching and optimization of ligand conformation are then carried out for every possible pair combination of M receptor-to-ligand motif matches (M <= 70): for each motif-match, we first translate the ligand to the position where the center of mass of selected motifs overlap, followed by random sampling of ligand orientation and torsion angles; the best after 200 random trials is then minimized against the grid with distance restraints favoring designated motif match. Maximum 70 ligands conformations are generated, each from a unique motif match, prioritizing those matches with a higher sum-of-grid-score. if M <= 50, search on the matches with higher sum-of-grid-score are repeated until 50 conformations are generated.
Ligand Docking Dataset.
We used Astex diverse35 and non-native36 sets for self- and (non-native) cross-docking benchmarks, respectively. Ligand protonation states are fixed as provided in the original mol2 files. When testing ligand docking using a conformation directly built from its chemical connectivity (i.e. SMILES string), its initial conformation was generated by CORINA 37 with a few corrections to the protonation states in the output: carboxylic acids and protonated phosphates are deprotonated (as protonation overly preferred by CORINA). We then further optimized the geometry with AM1 calculation 38 using Antechamber in AMBER suite which helped further optimize 5-6 membered ring conformations from CORINA outputs. An extended self-docking benchmark set consisting of 212 complexes was brought from a subset of previous work 30 (list in Supplemental Data).
RESULTS
Small molecule crystal lattice discrimination
We evaluated the parameterized force field by predicting the crystal lattice structures of 516 small molecules from CSD not used in training. We use as the primary metric for evaluation the “top10 success”, the number of cases in which one of the 10 lowest energy structures is less than 1 Å RMSD to the experimentally observed lattice (crystal lattice prediction is a quite non-trivial challenge)5. We compared performance to the generalized Amber force field (GAFF) 1, which, like our energy function, is sufficiently fast that it can be used for drug discovery studies 39,40. GAFF had an advantage over other such force fields, in addition to its popular and broad usage, for validation as it could be readily implemented in Rosetta (Fig S7) for direct comparison to RosettaGenFF; note that GAFF was not optimized using small molecule crystal data. On the validation set, RosettaGenFF outperformed GAFF in both the Boltzmann weight 19 of the observed crystal structure in the population of sampled structures and in the top10 success rate with the definition aforementioned (Table 1; 58% by RosettaGenFF compared to 30% by GAFF). Two classes of functional groups stand out when the performances of RosettaGenFF and GAFF were compared on a per-group basis (Fig 1c, Fig S1–2). Improved results were obtained for polar conjugating groups (e.g. esters or aryl-nitros) likely because of the improved balance between torsional and non-bonded energy parameters leading to better transferability across different chemical contexts. Improved results with hydrogen-bonding groups are likely due to the explicit treatment of the orientation dependence of hydrogen-bonding in RosettaGenFF, an improvement over the GAFF isotropic point-charge model.
Table 1.
Performance on various training tasks following optimization
| Tasks | Measure | Pre-optimize | Optimized | GAFF | |
|---|---|---|---|---|---|
| Training | Small molecule lattice docking | Boltzmann Probability1) | 0.470 | 0.652 | - |
| Top10 success rate (%)2) | 39.3 | 63.6 | - | ||
| Dihedral distribution | Mean KL-divergence | 0.355 | 0.225 | - | |
| Distance distribution | 0.173 | 0.162 | - | ||
| Ligand pose prediction | Boltzmann Probability1) | 0.529 | 0.610 | - | |
| Hydration free energy | Error (kcal/mol) | 6.4 | 2.0 | - | |
| Validation | Small molecule lattice docking3) | Boltzmann Probability1) | 0.321 | 0.640 | 0.386 |
| Top10 success rate(%)2) | 23.5 | 58.3 | 29.9 |
Botzmann probability selecting near-native structure against non-native ones 19. Values reported are values averaged over 4 criteria of near-native definitions, each corresponding to 1,2,4,6 Å of crystal-interface RMSD measured on the central asymmetric unit and all symmetry mates within 12 Å.
Success defined as any sub-Angstrom structure within 10 lowest energy structures sampled.
Compared against a common set of 430 molecules having at least 5 of sub-Angstrom structures sampled in all cases.
Even with explicit fitting to lattice data, there is clear room for improvement in our energy model. Proper consideration of polarization effects, in particular a general and higher-level description of anisotropic hydrogen bonding and orbital conjugations in torsions, is an important future direction. Methods with proper treatment of polarization effects -- such as density functional theory (DFT) methods or polarizable force fields 5 -- achieve better performances in crystal structure prediction, with top10 success rates of 70-80%. However, such methods are too slow for large-scale drug discovery problems. A force field with similar efficiency to ours by Broo et al 41, specifically designed for crystal lattice docking, performed similarly to ours (50% top10 success rate on their own test set, compared to 51% with RosettaGenFF on the same set). One possible future avenue for improvement would be introducing off-atom charges 42,43.
Small molecule docking with RosettaGenFF
We investigated the use of RosettaGenFF for small molecule docking calculations using the newly developed Rosetta GALigandDock. Accurate ligand pose prediction through molecular docking is of great importance in drug discovery as it provides detailed information about interacting protein residues, and is critical for accurate estimation of relative or absolute binding free energy of potential binders 6,44,45. A unique strength of our approach comes from the grid-representation of water-bridging effects 22 and hydrogen bonding in RosettaGenFF, both are orientation-dependent and are identified as important features in ligand/protein energetics.
We first tested the new energy function and docking method on 85 complexes from the Astex diverse self-docking set 35 keeping the protein backbone and side-chain fixed. RosettaGenFF incorporated into GALigandDock produced lowest energy models with a median RMSD of 0.45 Å, with success rates of 86/94% predicting ligand conformations within 1/2 Å RMSD of the crystal conformation, and 31/56% within 0.3/0.5 Å of the crystal structure, respectively. This high success rate and atomic accuracy suggest that the new energy model successfully identifies both the correct minima in large conformational space as well as precise geometry within the energy basin (Fig 2d–g). When docking calculations were performed on a set of ligand conformations directly built from scratch using chemical connectivity (i.e. SMILES) 46, results are slightly worse, giving a median RMSD of 0.59 Å and success rates of 80/92% using 1/2 Å criteria. Failures arose from input ligand structures not well handled in our docking simulations, especially for ring puckering (Fig S3) which was not sampled efficiently with the current GALigandDock (see Methods). Despite these failures, the results were better than the other methods on the same set (Fig 2b) 47–49. The combination of RosettaGenFF and GALigandDock on an extended docking set of 212 complexes -- non-overlapping with any target in other protein-ligand training/test sets -- again showed a performance superior to GOLD 50 with 7% difference in success rates (Fig S4).
We then repeated the test with variants of the energy function. A clear improvement was observed (Fig 2c) throughout the course of the optimization of RosettaGenFF. This is an encouraging result as the docking benchmark is quite different from the crystal structure training set (contribution from ligand-pose discrimination test used in training was minor). Both sampling and scoring improved together through the optimization, leading to sampling success within 250 samples at RMSD < 1/2 Å criteria by 92/98% after optimization (Fig S5). We also tested the same docking benchmark i) taking GAFF energy parameters or ii) replacing the torsion term into the empirical torsion term used in GOLD while keeping the energy model on the receptor unchanged in both tests. The poorer performance was obtained for both tests (Fig 2e–f), with 78% and 84% success rate at RMSD < 2.0 Å from i) and ii), respectively. For comparison, we further tested iii) removal of the orientation-dependent solvation and hydrogen bond terms from RosettaGenFF. This resulted in an 88% success rate at RMSD < 2.0 Å, indicating the main difference from force field parameters (vdW and torsion) over GAFF -- independent of these orientation terms -- is about 10%.
We next tested the effectiveness of our energy model and docking protocol on the more realistic non-native cross-docking problem, in which compounds are docked onto the same proteins whose structures were determined independently (i.e. apo state or bound to other compounds). GALigandDock allows any residue that can potentially interact with the ligand to sample alternative backbone and side-chain conformations, resulting in as many as 20 pocket residues to be optimized along with the ligand conformation. This flexibility is enabled by the ability of the underlying Rosetta protein force field 19,23 to model the energetics of protein conformational changes, and Rosetta’s tools for side-chain conformational sampling and energy minimization 33. We tested cross-docking performance on the Astex non-native set 36, a standard benchmark set consisting of 1,112 protein-ligand complexes. On this set, RosettaGenFF incorporated into GALigandDock achieved a median RMSD of 0.86 Å with success rates of 52/74% (using the criteria of ligand RMSD within 1/2 Å, respectively). This is an over 10% improvement in success rate over any previously reported study reported to date on the set 30,36,51–54 (Fig 3a).
Fig3. Incorporating receptor flexibility improves cross-docking.

a) Success rates in the cross-docking benchmark for various methods brought from literature 30,36,51–54 tested on Astex non-native set 36. Blue and red bars represent results from docking runs with and without receptor flexibility, respectively; solid and patterned bars show results by two criteria, ligand RMSD < 2 Å and < 1 Å, respectively. Sub-Angstrom success rates are not achieved with other methods. b) Per-protein cross-docking performance comparison between docking with (Y-axis) and without receptor flexibility (X-axis). The size of points represents the number of alternative protein conformations from largest (>50) to smallest (<10); colors represent a fraction of conformations with pocket RMSD improved or unchanged by flexibility, from 0.0 (black) to 0.8 (yellow). c-e) Examples in which flexible docking improves prediction. Top and bottoms panels are predictions without and with receptor flexibility, respectively. Crystal poses shown in gold, predicted ligand poses starting from multiple receptor conformations in blue (top panels) or white (bottom panels). c) 1hww, clash with arginine is relieved, increasing fraction of predictions within sub-Angstrom accuracy from 18% to 95%. d) 1g9v, rotameric search on lysine helps increase sub-Angstrom accuracy from 22% to 60%. e) 11pz, starting conformation from PDB ID 1f0s, backbone flexibility allows to correct the orientation of tyrosine leading to ligand RMSD 0.9 Å (10.4 Å without receptor flexibility).
Comparing these results to those without receptor flexibility showed that improvements in ligand pose accuracy primarily came from complexes in which pocket side-chain accuracy also increased (Fig 3b); relieving small clashes (Fig 3c), correcting wrong sidechain rotameric states (Fig 3d), and modeling small backbone conformational changes (Fig 3e); note that all of these were achieved by fully automated flexibility annotations. Of 276 complexes with initial models having relatively accurate backbones (i.e. RMSD < 1 Å at the backbone atoms annotated flexible by the automated logic) but for which rigid-receptor docking failed, about half (139) were successfully docked (ligand RMSD < 2 Å) following incorporation of receptor flexibility. The remaining 137 complexes for which both strategies failed showed on average similar docking results (ligand RMSD of 6.9 Å and 6.8 Å with and without receptor flexibility, respectively); for these cases no clear correlation was found between ligand RMSD and the structural error on the binding pocket. When receptor flexibility was employed for the self-docking problems, the success rate decreased to 80/88% with receptor flexibility (from 86/94% without flexibility), likely due to the increased search space and additional noise in energy values introduced from side-chains. Overall, the balance between protein and non-protein energetics is clearly important for flexible backbone docking 43,55.
DISCUSSION
The small molecule docking results described in this paper demonstrate the power of using prediction of small molecule crystal lattices, a new source of data, to drive energy model parameterization for accurate molecular docking studies. RosettaGenFF outperforms previously reported approaches in pose prediction for structure-based native- and non-native cross-docking. In the context of the functional forms used, the current energy model may be quite close to optimal for protein/ligand docking: when any of energy components or flexibility was varied from current implementation, around 10% worsening was observed in cross-docking (Table S2). Avenues for future improvement include improving the underlying physical model, for example: a) introducing an efficient polarizable and/or multipole electrostatic model 56,57 and b) additional bonded terms for ring systems with a ring sampling operator 58 as the current implementation has a weakness in non-aromatic ring conformation search. A large amount of small molecule crystal data that was not used in this study could be utilized for this further development, which could improve coverage in chemical diversity. Incorporation of quantum chemistry data during training could further improve the model, particularly for binding free energy calculations.
Application of the combination of RosettaGenFF and GALigandDock to high-precision virtual screening will be enhanced by increasing computational efficiency to allow higher throughput docking calculations. Improving computational efficiency while minimizing the loss in accuracy is a key direction of future studies. GPU-accelerated calculation and algorithmic improvements, such as a “competition-style” model where ligand identity can change along with ligand conformation in the genetic algorithm, should improve run-time, allowing for screening against very large ligand libraries.
Supplementary Material
Acknowledgements
We thank Dr. Timothy Craven, Dr. Gaurav Bhardwaj, Dr. Patrick Salveson, Jacob O’connor at University of Washington, Dr. Douglas Renfrew at Flatiron Institute, Dr. Rocco Moretti at Vanderbilt University, and Dr. Jason Labonte at Gettysburg College for their help on designing the project and helpful discussions. We also thank Dr. Chaok Seok at Seoul National University, Dr. Philip Bradley at Fred Hutchinson Cancer Research Center, and Dr. Ryan Pavlovicz at Cyrus biotech for their advice on the manuscript. Computing resources for this work are from Hyak supercomputer system at the University of Washington.
Footnotes
Supporting Information
List of atom types in RosettaGenFF; performance of docking benchmarks introducing variations in energy or protocol; per-group decomposition of Fig 1c; examples of small molecule crystal structure prediction highlighting difference between energy models; examples of docking failures starting from SMILES string; benchmark result on extended self-docking set; progress in self-docking scoring and sampling performance through the parameter optimization; prediction of thermodynamic properties with RosettaGenFF; reproducibility of GAFF by in-house version; list of PDB IDs in the extended docking set; example for generating a Rosetta parameter file containing generic atom types; example input for small molecule crystal structure prediction; and example input for running GALigandDock (pdf)
A zipped file containing the receptor pdbs and ligand mol2 files of the extended docking test set (zip)
The Supporting Information is available free of charge via the Internet at http://pubs.acs.org.
Code availability
- Energy function: https://www.rosettacommons.org/docs/latest/Updates-beta-genpot
References
- 1.Wang J; Wolf RM; Caldwell JW; Kollman PA; Case DA Development and Testing of a General Amber Force Field. J. Comput. Chem 2004, 25 (9), 1157–1174. [DOI] [PubMed] [Google Scholar]
- 2.Vanommeslaeghe K; Raman EP; MacKerell AD Jr. Automation of the CHARMM General Force Field (CGenFF) II: Assignment of Bonded Parameters and Partial Atomic Charges. J. Chem. Inf. Model 2012, 52 (12), 3155–3168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Roos K; Wu C; Damm W; Reboul M; Stevenson JM; Lu C; Dahlgren MK; Mondal S; Chen W; Wang L; Abel R; Friesner RA; Harder ED OPLS3e: Extending Force Field Coverage for Drug-Like Small Molecules. J. Chem. Theory Comput 2019, 15 (3), 1863–1874. [DOI] [PubMed] [Google Scholar]
- 4.Halgren TA Merck Molecular Force Field. I. Basis, Form, Scope, Parameterization, and Performance of MMFF94. J. Comput. Chem 1996, 17 (5), 490–519. [Google Scholar]
- 5.Reilly AM; Cooper RI; Adjiman CS; Bhattacharya S; Boese AD; Brandenburg JG; Bygrave PJ; Bylsma R; Campbell JE; Car R; Case DH; Chadha R; Cole JC; Cosburn K; Cuppen HM; Curtis F; Day GM; DiStasio RA Jr; Dzyabchenko A; van Eijck BP; Elking DM; van den Ende JA; Facelli JC; Ferraro MB; Fusti-Molnar L; Gatsiou CA; Gee TS; de Gelder R; Ghiringhelli LM; Goto H; Grimme S; Guo R; Hofmann DWM; Hoja J; Hylton RK; luzzolino L; Jankiewicz W; de Jong DT; Kendrick J; de Klerk NJJ; Ko HY; Kuleshova LN; Li X; Lohani S; Leusen FJJ; Lund AM; Lv J; Ma Y; Marom N; Masunov AE; McCabe P; McMahon DP; Meekes H; Metz MP; Misquitta AJ; Mohamed S; Monserrat B; Needs RJ; Neumann MA; Nyman J; Obata S; Oberhofer H; Oganov AR; Orendt AM; Pagola GI; Pantelides CC; Pickard CJ; Podeszwa R; Price LS; Price SL; Pulido A; Read MG; Reuter K; Schneider E; Schober C; Shields GP; Singh P; Sugden IJ; Szalewicz K; Taylor CR; Tkatchenko A; Tuckerman ME; Vacarro F; Vasileiadis M; Vazquez-Mayagoitia A; Vogt L; Wang Y; Watson RE; de Wijs GA; Yang J; Zhu Q; Groom CR Report on the Sixth Blind Test of Organic Crystal Structure Prediction Methods. Acta Crystallogr B Struct Sci Cryst Eng Mater. 2016, 72 (4), 439–459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Yin J; Henriksen NM; Slochower DR; Shirts MR; Chiu MW; Mobley DL; Gilson MK Overview of the SAMPL5 Host-Guest Challenge: Are We Doing Better? J. Comput. Aided Mol. Des 2017, 31 (1), 1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Boulanger E; Huang L; Rupakheti C; MacKerell AD; Roux B Optimized Lennard-Jones Parameters for Druglike Small Molecules. J. Chem. Theory Comput 2018, 14 (6), 3121–3131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Mobley D; Bannan CC; Rizzi A; Bayly CI; Chodera JD; Lim VT; Lim NM; Beauchamp KA; Shirts MR; Gilson MK; Eastman PK Open Force Field Consortium: Escaping Atom Types Using Direct Chemical Perception with SMIRNOFF v0.1. bioRxiv. 10.1101/286542. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Qiu Y; Smith D; Boothroyd S; Jang H; Wagner J; Bannan CC; Gokey T; Lim VT; Stern C; Rizzi A; Lucas X; Tjanaka B; Shirts MR; Gilson M; Chodera J; Bayly CI; Mobley D; Wang L-P Development and Benchmarking of Open Force Field v1.0.0, the Parsley Small Molecule Force Field. Chem Rxiv. 10.26434/chemrxiv.13082561.v1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Groom CR; Allen FH The Cambridge Structural Database in Retrospect and Prospect. Angew. Chem. Int. Ed Engl 2014, 53 (3), 662–671. [DOI] [PubMed] [Google Scholar]
- 11.Brameld KA; Kuhn B; Reuter DC; Stahl M Small Molecule Conformational Preferences Derived from Crystal Structure Data. A Medicinal Chemistry Focused Analysis. J. Chem. Inf. Model 2008, 48 (1), 1–24. [DOI] [PubMed] [Google Scholar]
- 12.Pillardy J; Arnautova YA; Czaplewski C; Gibson KD; Scheraga HA Conformation-Family Monte Carlo: A New Method for Crystal Structure Prediction. Proc. Natl. Acad. Sci. U. S. A 2001, 98 (22), 12351–12356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Price SL; Braun DE; Reutzel-Edens SM Can Computed Crystal Energy Landscapes Help Understand Pharmaceutical Solids? Chem. Commun 2016, 52 (44), 7065–7077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Cruz-Cabeza AJ; Reutzel-Edens SM; Bernstein J Facts and Fictions about Polymorphism. Chem. Soc. Rev 2015, 44 (23), 8619–8635. [DOI] [PubMed] [Google Scholar]
- 15.Pillardy J; Wawak RJ; Arnautova YA; Czaplewski C; Scheraga HA Crystal Structure Prediction by Global Optimization as a Tool for Evaluating Potentials: Role of the Dipole Moment Correction Term in Successful Predictions. J Am. Chem. Soc 2000, 122 (5), 907–921. [Google Scholar]
- 16.Velec HFG; Gohlke H; Klebe G DrugScore(CSD)-Knowledge-Based Scoring Function Derived from Small Molecule Crystal Data with Superior Recognition Rate of near-Native Ligand Poses and Better Affinity Prediction. J. Med. Chem 2005, 48 (20), 6296–6303. [DOI] [PubMed] [Google Scholar]
- 17.Groom CR; Bruno IJ; Lightfoot MP; Ward SC The Cambridge Structural Database. Acta Crystallogr B Struct Sci Cryst Eng Mater 2016, 72 (2), 171–179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.André I; Bradley P; Wang C; Baker D Prediction of the Structure of Symmetrical Protein Assemblies. Proc. Natl. Acad. Sci. U. S. A 2007, 104 (45), 17656–17661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Park H; Bradley P; Greisen P Jr; Liu Y; Mulligan VK; Kim DE; Baker D; DiMaio F Simultaneous Optimization of Biomolecular Energy Functions on Features from Small Molecules and Macromolecules. J. Chem. Theory Comput 2016, 12 (12), 6201–6212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.DiMaio F; Leaver-Fay A; Bradley P; Baker D; André I Modeling Symmetric Macromolecular Structures in Rosetta3. PLoS One 2011, 6 (6), e20450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.O’Boyle NM; Vandermeersch T; Flynn CJ; Maguire AR; Hutchison GR Confab - Systematic Generation of Diverse Low-Energy Conformers. J. Cheminform 2011, 3, 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Pavlovicz RE; Park H; DiMaio F Efficient Consideration of Coordinated Water Molecules Improves Computational Protein-Protein and Protein-Ligand Docking Discrimination. PLoS Comput. Biol 2020, 16 (9), e1008103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Alford RF; Leaver-Fay A; Jeliazkov JR; O’Meara MJ; DiMaio FP; Park H; Shapovalov MV; Renfrew PD; Mulligan VK; Kappel K; Labonte JW; Pacella MS; Bonneau R; Bradley P; Dunbrack RL Jr; Das R; Baker D; Kuhlman B; Kortemme T; Gray JJ The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J. Chem. Theory Comput 2017, 13 (6), 3031–3048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Jakalian A; Jack DB; Bayly CI Fast, Efficient Generation of High-Quality Atomic Charges. AM1-BCC Model: II. Parameterization and Validation. J. Comput. Chem 2002, 23 (16), 1623–1641. [DOI] [PubMed] [Google Scholar]
- 25.Jorgensen William L., *.; Maxwell David S.; Tirado-Rives J Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids. J. Am. Chem. Soc 1996, 118 (45), 11225–11236. [Google Scholar]
- 26.Clark M; Cramer RD; Van Opdenbosch N Validation of the General Purpose Tripos 5.2 Force Field. J. Comput. Chem 1989, 10 (8), 982–1012. [Google Scholar]
- 27.Wang J; Tingjun H Application of Molecular Dynamics Simulations in Molecular Property Prediction I: Density and Heat of Vaporization. J. Chem. Theory Comput 2011, 7 (7), 2151–2165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Dahlgren MK; Schyman P; Tirado-Rives J; Jorgensen WL Characterization of Biaryl Torsional Energetics and Its Treatment in OPLS AII-Atom Force Fields. J. Chem. Info. Mod 2013, 53 (5), 1191–1199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Nelder JA; Mead R A Simplex Method for Function Minimization. Comput. J 1965, 7 (4), 308–313. [Google Scholar]
- 30.Baek M; Shin W-H; Chung HW; Seok C GalaxyDock BP2 Score: A Hybrid Scoring Function for Accurate Protein–ligand Docking. J. Comput. Aided Mol. Des 2017, 31 (7), 653–666. [DOI] [PubMed] [Google Scholar]
- 31.Mobley DL; Guthrie JP FreeSolv: A Database of Experimental and Calculated Hydration Free Energies, with Input Files. J. Comput. Aided Mol. Des 2014, 28 (7), 711–720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Radzicka A; Wolfenden R Comparing the Polarities of the Amino Acids: Side-Chain Distribution Coefficients between the Vapor Phase, Cyclohexane, 1-Octanol, and Neutral Aqueous Solution. Biochemistry 1988, 27 (5), 1664–1670. [Google Scholar]
- 33.Leaver-Fay A; Tyka M; Lewis SM; Lange OF; Thompson J; Jacak R; Kaufman K; Renfrew PD; Smith CA; Sheffler W; Davis IW; Cooper S; Treuille A; Mandell DJ; Richter F; Ban Y-EA; Fleishman SJ; Corn JE; Kim DE; Lyskov S; Berrondo M; Mentzer S; Popović Z; Havranek JJ; Karanicolas J; Das R; Meiler J; Kortemme T; Gray JJ; Kuhlman B; Baker D; Bradley P ROSETTA3: An Object-Oriented Software Suite for the Simulation and Design of Macromolecules. Methods Enzymol. 2011, 487, 545–574. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Meng EC; Shoichet BK; Kuntz ID Automated Docking with Grid-Based Energy Evaluation. J. Comput. Chem 1992, 13 (4), 505–524. [Google Scholar]
- 35.Hartshorn MJ; Verdonk ML; Chessari G; Brewerton SC; Mooij WTM; Mortenson PN; Murray CW Diverse, High-Quality Test Set for the Validation of Protein–Ligand Docking Performance. J. Med. Chem 2007, 50 (4), 726–741. [DOI] [PubMed] [Google Scholar]
- 36.Verdonk ML; Mortenson PN; Hall RJ; Hartshorn MJ; Murray CW Protein–Ligand Docking against Non-Native Protein Conformers. J. Chem. Info. Mod 2008, 48 (11), 2214–2225. [DOI] [PubMed] [Google Scholar]
- 37.Sadowski J; Gasteiger J; Klebe G Comparison of Automatic Three-Dimensional Model Builders Using 639 X-Ray Structures. J. Chem. Info. Mod 1994, 34 (4), 1000–1008. [Google Scholar]
- 38.Dewar MJS; Zoebisch EG; Healy EF; Stewart JJP Development and Use of Quantum Mechanical Molecular Models. 76. AM1: A New General Purpose Quantum Mechanical Molecular Model. J. Am. Chem. Soc 1985, 107 (13), 3902–3909. [Google Scholar]
- 39.Okimoto N; Futatsugi N; Fuji H; Suenaga A; Morimoto G; Yanai R; Ohno Y; Narumi T; Taiji M High-Performance Drug Discovery: Computational Screening by Combining Docking and Molecular Dynamics Simulations. PLoS Comput. Biol 2009, 5 (10), e1000528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Suenaga A; Okimoto N; Hirano Y; Fukui K An Efficient Computational Method for Calculating Ligand Binding Affinities. PLoS One 2012, 7 (8), e42846. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Broo A; Nilsson Lill SO Transferable Force Field for Crystal Structure Predictions, Investigation of Performance and Exploration of Different Rescoring Strategies Using DFT-D Methods. Acta Crystallogr B Struct Sci Cryst Eng Mater. 2016, 72 (4), 460–476. [DOI] [PubMed] [Google Scholar]
- 42.Jorgensen WL; Chandrasekhar J; Madura JD; Impey RW; Klein ML Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys 1983, 79 (2), 926–935. [Google Scholar]
- 43.Harder E; Damm W; Maple J; Wu C; Reboul M; Xiang JY; Wang L; Lupyan D; Dahlgren MK; Knight JL; Kaus JW; Cerutti DS; Krilov G; Jorgensen WL; Abel R; Friesner RA OPLS3: A Force Field Providing Broad Coverage of Drug-like Small Molecules and Proteins. J. Chem. Theory Comput 2016, 12 (1), 281–296. [DOI] [PubMed] [Google Scholar]
- 44.Wang L; Berne BJ; Friesner RA On Achieving High Accuracy and Reliability in the Calculation of Relative Protein-Ligand Binding Affinities. Proc. Natl. Acad. Sci. U. S. A 2012, 109 (6), 1937–1942. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Wang L; Wu Y; Deng Y; Kim B; Pierce L; Krilov G; Lupyan D; Robinson S; Dahlgren MK; Greenwood J; Romero DL; Masse C; Knight JL; Steinbrecher T; Beuming T; Damm W; Harder E; Sherman W; Brewer M; Wester R; Murcko M; Frye L; Farid R; Lin T; Mobley DL; Jorgensen WL; Berne BJ; Friesner RA; Abel R Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field. J. Am. Chem. Soc 2015, 137 (7), 2695–2703. [DOI] [PubMed] [Google Scholar]
- 46.Weininger D SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Info. Mod 1988, 28 (1), 31–36. [Google Scholar]
- 47.Repasky MP; Murphy RB; Banks JL; Greenwood JR; Tubert-Brohman I; Bhat S; Friesner RA Docking Performance of the Glide Program as Evaluated on the Astex and DUD Datasets: A Complete Set of Glide SP Results and Selected Results for a New Scoring Function Integrating WaterMap and Glide. J. Comput. Aided Mol. Des 2012, 26 (6), 787–799. [DOI] [PubMed] [Google Scholar]
- 48.Liebeschuetz JW; Cole JC; Korb O Pose Prediction and Virtual Screening Performance of GOLD Scoring Functions in a Standardized Test. J. Comput. Aided Mol. Des 2012, 26 (6), 737–748. [DOI] [PubMed] [Google Scholar]
- 49.Spitzer R; Jain AN Surflex-Dock: Docking Benchmarks and Real-World Application. J. Comput. Aided Mol. Des 2012, 26 (6), 687–699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Korb O; Stützle T; Exner TE Empirical Scoring Functions for Advanced Protein-Ligand Docking with PLANTS. J. Chem. Inf. Model 2009, 49 (1), 84–96. [DOI] [PubMed] [Google Scholar]
- 51.Ruiz-Carmona S; Alvarez-Garcia D; Foloppe N; Garmendia-Doval AB; Juhos S; Schmidtke P; Barril X; Hubbard RE; Morley SD rDock: A Fast, Versatile and Open Source Program for Docking Ligands to Proteins and Nucleic Acids. PLoS Comput. Biol 2014, 10 (4), e1003571. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Rarey M; Kramer B; Lengauer T; Klebe G A Fast Flexible Docking Method Using an Incremental Construction Algorithm. J. Mol. Biol 1996, 261 (3), 470–489. [DOI] [PubMed] [Google Scholar]
- 53.Gaudreault F; Najmanovich RJ FlexAID: Revisiting Docking on Non-Native-Complex Structures. J. Chem. Inf. Model 2015, 55 (7), 1323–1336. [DOI] [PubMed] [Google Scholar]
- 54.Tanchuk VY; Tanin VO; Vovk AI; Poda G A New, Improved Hybrid Scoring Function for Molecular Docking and Scoring Based on AutoDock and AutoDock Vina. Chem Biol Drug Des. 2016, 87 (4), 618–625. [DOI] [PubMed] [Google Scholar]
- 55.Sherman W; Day T; Jacobson MP; Friesner RA; Farid R Novel Procedure for Modeling Ligand/receptor Induced Fit Effects. J. Med. Chem 2006, 49 (2), 534–553. [DOI] [PubMed] [Google Scholar]
- 56.Lemkul JA; Huang J; Roux B; MacKerell AD Jr. An Empirical Polarizable Force Field Based on the Classical Drude Oscillator Model: Development History and Recent Applications. Chem. Rev 2016, 116 (9), 4983–5013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Jakobsen S; Jensen F Systematic Improvement of Potential-Derived Atomic Multipoles and Redundancy of the Electrostatic Parameter Space. J. Chem. Theory Comput 2014, 10 (12), 5493–5504. [DOI] [PubMed] [Google Scholar]
- 58.Jain AN Surflex-Dock 2.1: robust performance from ligand energetic modeling, ring flexibility, and knowledge-based search. J. Comput. Aided Mol. Des 2007, 21 (5), 281–306. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
