Abstract
Accurate and rapid estimation of relative binding affinities of ligand-protein complexes is a requirement of computational methods for their effective use in rational ligand design. Of the approaches commonly used, free energy perturbation (FEP) methods are considered one of the most accurate, though they require significant computational resources. Accordingly, it is desirable to have alternative methods of similar accuracy but greater computational efficiency to facilitate ligand design. In the present study relative free energies of binding are estimated for one or two non-hydrogen atom changes in compounds targeting the proteins ACK1 and p38 MAP kinase using three methods. The methods include standard FEP, single-step FEP (SSFEP) and the site-identification by ligand competitive saturation (SILCS) ligand grid free energy (LGFE) approach. Results show the SSFEP and SILCS LGFE methods to be competitive with or better than the FEP results for the studied systems, with SILCS LGFE giving the best agreement with experimental results. This is supported by additional comparisons with published FEP data on p38 MAP kinase inhibitors. While both the SSFEP and SILCS LGFE approaches require a significant upfront computational investment, they offer a 1000-fold computational savings over FEP for calculating the relative affinities of ligand modifications once those pre-computations are complete. An illustrative example of the potential application of these methods in the context of screening large numbers of transformations is presented. Thus, the SSFEP and SILCS LGFE approaches represent viable alternatives for actively driving ligand design during drug discovery and development.
Graphical abstract
Accurate and rapid prediction of ligand potencies by the pre-computed ensembles based methods, Single Step Free Energy Perturbation (SSFEP) and Site Identification by Ligand Competitive Saturation (SILCS), is evaluated for Ack1 and p38 Map Kinase targets. Both SILCS and SSFEP are competitive with or better than the FEP results for the studied systems while being 1000+ fold times faster. Potential application of these methods in the context of screening large numbers of transformations is also illustrated.
Introduction
The discovery and design of ligands based on ligand-protein 3D structural information represents a primary means for the development of new drugs and other agents such as herbicides and pesticides. Significant progress has been made in the development of methods for the identification of novel ligands targeting a protein, an area of computer-aided drug design (CADD)1,2 including database screening3,4 and fragment-based design.5,6 Moreover, CADD approaches for ligand optimization have shown significant progress. Advances in screening methods include novel pharmacophore approaches7–10 and more accurately accounting for water, while ligand optimization approaches have seen methodological improvements related to sampling and estimation of free energy differences11 such as the Bennett Acceptance Ratio method12,13 in the context of free energy perturbation (FEP) calculations.14,15 While these advances have yielded improvements in accuracy, there is still significant room for improvement and computational costs remain a consideration, especially in the context of the large scale of chemical space available to drug-like molecules.
The use of FEP methods for ligand design has shown increased utility associated with increases in computational power, improvements in force fields and methodological developments.11,16 Improvements in computational power, which are intimately linked to improved software, have occurred due to increased CPU/core availability and, more notably, the ability to effectively use GPUs for molecular simulations. Improvements in force fields include both accuracy and coverage as well as the ability to readily generate the topologies and parameters required for the wide range of chemical space.17,18 Important for improved convergence in FEP calculations is the use of enhanced sampling methods, including combined FEP-Hamiltonian replica exchange molecular dynamics (HREMD)19,20 and orthogonal sampling methods21,22 and advantages in the Bennett Acceptance Ratio (BAR) method12 for calculating free energy differences. Towards more rapidly accessing larger numbers of modifications via FEP are the one-step perturbation (OSP)23–25 and single-step FEP (SSFEP)26 methods. These approaches allow for free energy differences for large numbers of modifications to be calculated based on a single ensemble of conformations. However, the two methods differ significantly in the approach used to generate the pre-computed ensemble of conformations, with the OSP approach using a fictitious reference compound in combination with soft-core potentials while the SSFEP approach simply uses MD simulations involving the parent ligand in complex with the protein and in solution. Such approaches allow for large numbers of ligand modifications to be evaluated rapidly, though they are limited in the magnitude of the chemical change that can be made in the perturbation. For example, the SSFEP approach to date was only applied to single, non-hydrogen atom modifications, such as converting an aromatic hydrogen into a methyl.
An alternative pre-computed ensemble approach is the site-identification by ligand competitive saturation (SILCS) method.27–29 In SILCS the target protein is subjected to simulations in an aqueous solution that contains multiple organic solutes that represent different types of functional groups. These can be MD simulations or a combination of oscillating chemical potential (μex) Grand Canonical Monte Carlo/MD simulations that allow for application of the SILCS approach to proteins with deep or occluded pockets.30,31 From these simulations, which include protein flexibility, 3D probability distributions of the different types of functional groups are obtained, normalized with respect to the functional group probabilities in aqueous solution in the absence of protein and subsequently converted to free energies based on a Boltzmann transformation. The resulting grid free energy (GFE) fragment maps (FragMaps) may then be used to quantitatively estimate ligand binding affinities based on an additive atom contribution yielding ligand GFE (LGFE) scores.28 As these are based on pre-computed GFE FragMaps they are rapidly computed, such that they may be used as the energy function in a SILCS-based Monte Carlo conformational sampling approach (MC-SILCS).28
In the present study we compare the ability of three methods, standard FEP, SSFEP and LGFE to reproduce the experimental relative binding affinities of known ligands for two proteins, ACK1 and p38 MAP kinase. The choice of these two proteins was motivated by that fact that high quality experimental data for closely related analog compounds spanning a wide range of relative affinities was available for these proteins. Both targets include compound sets which contained analogs differing by a single heavy atom, making it particularly relevant for this study. Such small ligand modifications are well known to have huge activity variations.32 The bound orientation of the ligand series is significantly different for the two proteins. These proteins are also of therapeutic interest, such that the presented results give a real world comparison of three methods that may be evaluated in terms of accuracy as well as computational efficiency.
Methods
Proteins
The initial protein coordinates for all the FEP and SSFEP simulations were obtained from PDB33 crystal structures 3EQR34 and 4EHW35 for ACK1 (see specific ligands in the following paragraph), with 3EQR used to initiate the SILCS simulations. With p38 MAP kinase 3FLY and 3D7Z36 were used for the FEP and SSFEP simulations with 3FLY used for the SILCS simulations. Water molecules 6Å away from the binding pocket were removed. Missing loops and residues were modeled using Prime (Schrödinger, LLC)37. The hydrogens to all heavy atoms were added with the protein preparation wizard in Maestro (Schrödinger, LLC).38 Following this, the pdb2gmx program in the GROMACS39 suite was used to prepare the topology and structure of the protein.
Ligands
The initial orientations of the ligands were based on the crystallographic structures. The respective modifications were then added directly to those structures with the positions of the remainder of the ligands initially maintained. The modified ligands were then built using Maestro (Schrödinger, LLC).38 For both the parent and the modified ligands, the appropriate protonation check was performed using Epik package40 in the ligand preparation wizard at a physiologically relevant pH of 7.4. This is followed by restrained minimization of the proteins and ligands. With ACK1 the crystal structure 3EQR34 was used to model L1 and L2 and 4EWH35 was used to model L3, L4 and L5 (Fig. 1A). With p38 MAP kinase the SSFEP and FEP calculations were initiated from the crystal structure 3FLY for L9, L10, L11, L13, L14 and L15 while 3DS641 was used to model L6, 2RG642 was used to model L7 and L8 and 3D7Z36 was used to model L12. The SILCS simulations were based on crystal structure 3FLY with the ligands placed following RMSD alignment of the protein. For the SILCS LGFE simulations performed on the Goldstein et al. ligands,43 the modified ligands were built based on the crystallographic orientation of the ligand in structure 3FLY.
Figure 1.
Structures of studied compounds, including the applied chemical transformations for A) ACK1 and B) p38 MAP kinase.
MD simulation setup
MD simulations were performed using the GROMACS simulation package.39 The CHARMM36 force-field44 was used to model the proteins and the CHARMM General Force-Field (CGenFF45) for the ligands. The automated CGenFF program46,47 (also available via the ParamChem interface at cgenff.paramchem.org) was used to generate the ligand topology and parameters by analogy to existing small molecules in CGenFF without performing any additional optimization. An integration time step of 2 fs was used and all bonds involving hydrogen atoms were constrained using the LINCS algorithm48. The standard FEP calculations used a time step of 1 fs for higher accuracy and stability of the simulations. A long-range dispersion correction was applied to the energy and pressure. The MD simulation setup used for the SILCS and SSFEP methods was identical. Van der Waals (vdW) interactions were smoothly switched off (force-switch) between 5–8 Å. Long-range electrostatics were treated using the PME method49 with a real space cutoff of 10 Å. All systems were minimized using the steepest descent algorithm, followed by equilibration using velocity rescaling. Equilibration was performed in the NVT ensemble and production in NPT where the Parinello-Rahman50 barostat was used to maintain pressure at 1bar. During production, the Nose-Hoover thermostat51,52 was used to maintain the temperature at 298K. As described previously28–30, the SILCS simulations were done with weak harmonic restraints on Cα atoms (0.12 kcal/mol/Å2) to prevent the movement of the protein in the simulation box and also to prevent denaturation. The SSFEP MD simulations were performed with 2.5 kcal/mol/Å2 restraints on all non-hydrogen atoms. In the FEP calculations, vdW interactions were smoothly switched off (force-switch) between 10–12 Å. Long-range electrostatics were treated using the PME method with a real space cutoff of 12 Å. The stochastic dynamics “sd” integrator of the GROMACS package was used for both equilibration and production simulations, which also maintained the temperature at 298K.
SILCS simulations
The SILCS method28,29,53 involves simulations of a protein immersed in an aqueous solution of small solute molecules representative of functional groups found in drug-like molecules. During the simulations, the solutes reversibly interact with the protein allowing the mapping of the affinity of the protein for different chemical functionality. The data generated from the SILCS simulations is analyzed by calculating 3D histograms of the occupancies of the atoms of the probe fragment molecules by discretizing their positions on a 3D grid spanning the volume, which are composed of cubic voxel elements of side 1Å. The histograms, called fragment maps or FragMaps, are subsequently normalized based on the bulk concentration of the fragments and converted to discretized free energy values, termed Grid Free Energies (GFE). In this study, we included benzene, propane, methanol, formamide, imidazole, acetaldehyde, methyl ammonium and acetate as the diverse solute set at a concentration of 0.25 M each within an aqueous environment (~55 M water) surrounding the protein.
A previously published extension of the SILCS approach that intersperses MD simulations with oscillating μex GCMC sampling to enhance fragment sampling was employed for the simulations.30,31 The GCMC-MD SILCS protocol allows for exchange of the solute molecules more efficiently than MD simulations alone. For each protein system, ten GCMC/MD simulations were run. Each of these 10 simulations constituted 20 cycles of GCMC-only simulations, followed by 100 cycles of GCMC and MD, with each cycle involving 100,000 steps of GCMC and 0.5 ns of MD, yielding a cumulative 100 million steps of GCMC and 500 ns of MD for both protein systems. Random seed numbers used in the GCMC simulations are different across the 10 runs.
LGFE scoring using MC-SILCS sampling
To score the ligands as a predictor of relative binding affinity, the ligands were conformationally sampled in the presence of the GFE FragMaps using the MC-SILCS approach described previously.28,30 The approach involves Monte Carlo (MC) simulations of the ligand in which the Hamiltonian includes intra-ligand energy terms and the Ligand Grid Free Energy metric. The protein is absent from the simulations but is implicitly present in the form of the FragMaps as well as Exclusion Maps. In essence, this amounts to MC sampling performed in the field of the GFE FragMaps and the Exclusion maps, where the latter are based on the region of the proteins, mapped on a 1 Å grid and assigned an energy of 1000 kcal/mol, in which no solute or water non-hydrogen atom sampled during the GCMC/MD simulations. In the present study generic GFE FragMaps were used for the MC SILCS and LGFE scoring. These are based on combining the probability distribution for classes of functional groups as follows: hydrophobic (benzene and propane), hydrogen bond acceptors (methanol O, formamide O, imidazole N and acetaldehyde O), and hydrogen bond donors (methanol O, imidazole N(H) and formamide (N)). In addition GFE FragMaps for negative acceptors (acetate) and positive donors (methylammonium) were used in all calculations. The intra-molecular energy terms included the Lennard-Jones, electrostatic and dihedral energy terms as described by the CGenFF force field, with the intramolecular electrostatics screened using a distance-dependent dielectric with epsilon = 4R, where R is the distance. MC sampling involved random translations up to 2 Å and dihedral rotations of up to 180° for 10,000 steps at 298 K. Dihedral rotations were applied only to acyclic, single bonds and conformations were saved every 1000 steps with the lowest energy conformation subjected to a MC simulated annealing for 40,000 steps with translations and dihedral rotations of up to 0.2 Å and 9°, respectively. This MC sampling was repeated a minimum of 50 times for each docking run. The sampling procedure exited if the three lowest energy conformations were within 0.5 kcal/mol (“3 criteria”). If the 3 criteria was not met an additional 50 MC samplings were undertaken until the criteria was met with a maximum of 250 MC sampling procedures attempted from which the lowest energy, based on the LGFE score, was taken. The runs were repeated 5 times for each ligand with the average and minimum LGFE scores used for analysis. To correlate the LGFE results with experimental ΔΔG values, a difference was taken between the LGFE values of the ligands involved in the associated pair to obtain ΔΔGLGFE scores. The initial ligand conformations were the same as the ones used in the SSFEP and FEP calculations.
Single Step Free Energy Perturbation (SSFEP) method
The SSFEP method26 involves post-processing of MD simulation data of a ligand in a given environment in the canonical ensemble to estimate the alchemical free energy change of chemically modifying the ligand. As discussed previously26, Zwanzig’s free energy perturbation formula54 (Eqn. 1) is used to estimate the free energy change.
(1) |
where, is the Boltzmann constant and T the absolute temperature. The angular brackets indicate an average of the exponential factor over the MD trajectory of ligand L1 in the given environment , which can either be the solvated protein environment or water. ΔE is the energy difference between the two systems involving L1 and L2, which in practice is computed as the difference in the interaction energies of the two ligands with the environment
(2) |
The environment in each system is defined as all atoms with the exception of ligand atoms. Since the environment is constant between the two ligands, the internal environmental energy cancels exactly during the computation of ΔE. Secondly, since the differences between L1 and L2 involve a very small number of heavy atom modifications, we expect any differential intra-ligand energy terms to also cancel exactly between the solution and protein environments. Accordingly, Eqn. 2 only contains the interaction energies between the ligand and environment. Following Eqn 1, once and are computed, the relative binding free energy is given by
(3) |
Unlike the One Step Perturbation (OSP) method,23–25 SSFEP does not use soft-core interactions at chosen sites of modification in order to make a single simulation data applicable to tens of potential modification sites. However, this also limits the applicability of the method to small modifications. Accordingly, the method is applied to ligand modifications involving a few non-hydrogen atoms.
The protein-ligand complexes were solvated in rectangular boxes of water, and 5 independent MD simulation runs were performed, each 10ns long. The MD simulation setup was identical to the SILCS-MD simulation setup described above. The ligands were simulated in cubic boxes of water of side 30 Å. The post-processing of the simulation data involved vdW interactions cut off at 14 Å and a short-range electrostatics cutoff of 14 Å. These longer cutoffs were employed during the post-processing during the computation of ΔE because of the lack of PME. Note that the same cutoffs were used in the analysis of both the protein-ligand simulation and the free-ligand aqueous simulation. The in-house molecular modeling and analysis suite “MolCal” was used to post-process the simulations and calculate ligand-environment interaction energies (Eqn. 2) and use Eqn. 1 to calculate the free energies in the protein and solvent environment.
In order to compute the ΔE value for each snapshot in the complex and the free-ligand simulations, the coordinates of L2 need to be generated, given the coordinates of L1. A utility program in MolCal was developed which recorded the internal coordinates of the additional atoms in L2 prior to initiating the SSFEP post-processing calculations. A set of rules was developed which let the user specify the desired chemical modifications that could be applied to the parent ligand. Prior to recording the internal coordinates, the L2 compound was energy minimized to facilitate the assignment of an optimal geometry for the internal coordinates of the added L2 atoms. The MD simulation trajectories were read and L2 coordinates were constructed using the L1 coordinates and recorded internal coordinates as input without any additional structure optimization. ΔΔG values were computed as arithmetic averages over the 5 independent MD simulations, and the reported errors are the standard deviations computed for the 5 values.
Free Energy Perturbation (FEP) method
The FEP calculations were performed using the GROMACS package.39 FEP calculations were setup in solution and in the protein environment to calculate the ΔΔG values corresponding to 8 transformations in the ACK1 and 15 transformations in p38 MAP kinase. All FEP calculations used harmonic restraints with a force constant of 0.12 kcal/mol/Å2 on the Cα atoms as performed for the SILCS calculations. A python library was developed to automatically generate topology and initial coordinates for the ligand transformations involved in the FEP calculations. A user-friendly set of rules was developed that allows the user to specify the chemical modification in the python script. The script then generates the FEP topology implementing the transformation using the single topology approach. Following this step, the ligand was solvated in a water box, and combined with the protein and solvated, to obtain the two systems for FEP calculations. Including the end-states, the transformations involved 11 λ-steps for the modification of vdW and bonded terms simultaneously, and 5 λ-steps for the partial atomic charges. The λ-states were run sequentially, such that the coordinates of a λ-state were taken from the end of the previous λ-state production. For simulations involving the chemical modification of a small group into a bigger one, the vdW and bonded terms preceded the charge perturbation, whereas the opposite was the case for modifications involving a big group into a smaller one. Soft-core potential were employed to avoid singularities in the end-states.55 Each simulation involved 500 steps of steepest descent minimization, equilibration for 20ps and 0.5ns of production. The Bennett Acceptance Ratio (BAR) method12 was used to estimate the free energies from the simulation data. The FEP calculations were run in triplicate in order to obtain error estimates. All values reported in this study were computed as arithmetic averages over the three runs, with the reported errors equal to the standard deviations computed for the 3 values.
Data analysis
Comparisons of the experimental and calculated changes in free energies of binding was performed in several ways. A simple analysis is the number of correct over the total number of perturbations with respect to sign of the free energy difference only. Such a qualitative metric is useful to the medicinal chemist as it may be used to indicate if a substituent will likely improve affinity without quantitative considerations, information that is often the basis of a go, no go decision. Several types of quantitative analyses were performed. Pearson correlation analysis was performed from which the correlation coefficient, R2, was extracted and the predictive index, PI, was calculated. The latter is a metric of the ability of the method to correctly rank order every compound with respect to the other compounds.56 In addition, root-mean-square differences (RMSD) and average absolute difference (AAD) over the ligands for each protein were computed and presented.
Results
Presented is a comparison of calculated and experimental relative free energies of binding for inhibitors of the proteins ACK1 and p38 MAP kinase. The compounds studied were obtained from previous drug development studies.34,35,41,42,56–59 The specific compounds selected for the present study, shown in Figure 1, vary by one to two non-hydrogen atoms, consistent with that applied in the original SSFEP methodology. The three methods studied, standard FEP, SSFEP and SILCS LGFE were chosen to contrast the mature, but computationally demanding FEP approach with the recently developed SSFEP and SILCS LGFE approaches that use pre-computed ensembles thereby being highly computationally efficient once that ensemble is available.
Experimental data set
Presented in Table 1 are experimental results for both the ACK1 and p38 MAP kinase compounds shown in Figure 1. The studied modifications here were limited to single non-hydrogen atom alterations. The experimental data set contains a range of such transformations including aromatic hydrogen to methyl, fluoro or chloro moieties, methyl to ethyl, ethyl to isopropyl, chirality switches involving methyl groups and substitutions of linkers including -CH2- to -O-, -O- to -NH- and O- to -S-. In addition, a di-substitution with ACK1 was included involving two hydrogens on an aromatic ring to a dichloro analog. The affinities of the compounds range from micromolar (i.e. >2500 nM) to low nanomolar, with experimental free energy differences ranging from −3.8 to 4.0 kcal/mol. Thus, while the number of compounds is not large, the range of transformations is diverse, effectively challenging the studied free energy difference calculation approaches.
Table 1.
Experimental data for the studied inhibitors of both ACK1 and p38 MAP kinase.42,56,58,59,34,35,41,57 Included are the Ki values for the parent and modified compounds and the corresponding free energy differences, ΔΔG, in kcal/mol.
Compound/Transformation | Ki (nM) | ΔΔG exp | |
---|---|---|---|
ACK1 Inhibitors | L1, 1➔2 (H to F) | 3 → 26 | 1.28 |
L1, 2➔3 (F to Methyl) | 26 → 2500 | 2.71 | |
L1, 1➔3 (H to Methyl) | 3 → 2500 | 3.98 | |
L2, 4➔5 (2H to 2Cl) | 4000 → 7 | −3.76 | |
L3,6➔7 (O to NH) | 10 → 2 | −0.95 | |
L4, 8➔9 (O to NH) | 13 → 3 | −0.87 | |
L5, 10➔11 (CH2 to O) | 5 → 13 | 0.57 | |
L5, 10➔12 (O to NH) | 5 → 6 | 0.11 | |
p38 MAP kinase Inhibitors | L6, 14➔15 (O to S) | 0.3→2 | 1.12 |
L6, 14➔16 (O to NH) | 0.3→16 | 2.36 | |
L7, 17➔18 (Ethyl to Isopropyl) | 5.5→680 | 2.85 | |
L8, 19➔20 (H to Methyl) | 42→1.1 | −2.16 | |
L9, 21➔22 (O to CH2) | 10→40 | 0.82 | |
L10, 23➔24 (O to CH2) | 15→42 | 0.61 | |
L10, 23➔25 (O to S) | 15→111 | 1.19 | |
L11, 26➔27 (H to F) | 106→14 | −1.20 | |
L12, 28➔29 (H to Methyl) | >2500→12 | −3.16 | |
L12, 28➔30 (H to F) | >2500→460 | −1.00 | |
L12, 28➔31 (H to Cl) | >2500→25 | −2.77 | |
L12, 30➔31 (F to Cl) | 25→460 | −1.72 | |
L13, 32➔33 (Methyl to Ethyl) | 260→4.25 | −2.44 | |
L14, (Methyl R➔S) | 5→50 | 1.36 | |
L15, (Methyl R➔S) | 11→2 | −1.01 |
Standard Free Energy Perturbation
Given the extensive use of FEP methods in the computational chemistry community along with its utility as a tool for drug design and development, this method was initially applied to the transformations listed in Table 1. In addition, as the FEP calculations represent the most formally rigorous estimate of the free energy differences the ΔΔG values are indicative of the underlying influence of the force field on estimated changes in binding affinities although limitations due to conformational sampling/convergence will contribute to the difference with experiment. Results shown in Table 2 indicate relatively poor agreement with experiment with ACK1, with significantly better agreement with the p38 MAP kinase data set as shown in Table 3. With the former the R2 value is 0.11 with a negative PI while in the latter case an R2 of 0.34 and a good PI of 0.63 was obtained. Consistent with the improved correlation smaller RMSD and AAD values were obtained with the p38 MAP kinase compounds, with those values being 1.95 and 1.69, respectively.
Table 2.
Comparison of experimental and computed free energy difference for the ACK1 inhibitors.
Parent cmpd/Transformation | ΔΔGexp | ΔΔGFEPa | ΔΔGSSFEPb | ΔΔGLGFEc | ΔΔGLGFEd |
---|---|---|---|---|---|
L1/1➔2 | 1.28 | 1.97±0.40 | 3.00±0.22 | 0.57±0.32 | 1.09 |
L1/2➔3 | 2.71 | −1.02±0.18 | 4.32±1.20 | 2.17±0.32 | 2.25 |
L1/1➔3 | 3.98 | 0.96±0.51 | 7.32±1.18 | 2.74±0.53 | 3.33 |
L2/4➔5e | −3.76 | 1.29±1.58 | 4.53±3.21 | 1.25±1.66 | 0.74 |
L3/6➔7 | −0.95 | 0.93±0.19 | 0.37±0.54 | −0.91±0.11 | −0.75 |
L4/8➔9 | −0.87 | 0.77±0.29 | 2.00±0.99 | −0.56±0.42 | −0.74 |
L5/10➔11 | 0.57 | 0.45±0.34 | 0.17±1.47 | 0.55±0.31 | 0.50 |
L5/10➔12 | 0.11 | 0.60±0.22 | 2.22±0.84 | −0.11±0.26 | −0.03 |
| |||||
R2 | 0.11 | 0.18 | 0.33 | 0.52 | |
PI | −0.25 | 0.27 | 0.52 | 0.60 | |
RMSD | 2.63 | 3.54 | 1.86 | 1.62 | |
AAD | 2.08 | 2.71 | 1.01 | 0.79 | |
| |||||
Omit Outliere | R2 | 0.05 | 0.78 | 0.98 | 1.00 |
PI | 0.06 | 0.85 | 1.00 | 0.99 | |
RMSD | 2.07 | 2.11 | 0.59 | 0.33 | |
AAD | 1.65 | 1.90 | 0.44 | 0.26 |
Average and standard deviations (n=3) from the standard FEP calculations.
Average and standard deviations (n=5) from the single-step FEP calculations.
Average and standard deviations (n=5) from the MC SILCS/LGFE calculations.
Minimum energy conformation from the MC SILCS/LGFE calculations.
L2/4➔5 (2H to 2Cl)outlier omitted from the second statistical analysis.
Table 3.
Comparison of experimental and computed free energy difference for the p38 MAP kinase inhibitors.
Parent cmpd/Transformation | ΔΔGexp | ΔΔGFEPa | ΔΔGSSFEPb | ΔΔGLGFEc | ΔΔGLGFEd |
---|---|---|---|---|---|
L6/14➔15 | 1.12 | 0.12±0.42 | −0.32±0.07 | 1.46±6.89 | 0.32 |
L6/14➔16 | 2.36 | 3.33±2.69 | 2.92±0.79 | 4.76±4.19 | 2.67 |
L7/17➔18 | 2.85 | 1.86±0.36 | 2.90±0.98 | −0.49±0.45 | 0.51 |
L8/19➔20e | −2.16 | −3.17±0.70 | 38.93±11.68 | 0.00±2.29 | −0.16 |
L9/21➔22 | 0.82 | −1.09±0.20 | 2.38±0.96 | −0.43±0.54 | −1.31 |
L10/23➔24 | 0.61 | −0.59±0.40 | 2.55±0.67 | −1.15±0.12 | −0.94 |
L10/23➔25 | 1.19 | 5.54±0.64 | −0.78±0.08 | 2.42±0.15 | 2.47 |
L11/26➔27 | −1.20 | 0.04±0.23 | 2.34±0.63 | −0.55±0.19 | −0.57 |
L12/28➔29 | −3.16 | −0.94±0.97 | −1.08±0.33 | −4.13±1.20 | −3.99 |
L12/28➔30 | −1.00 | 0.72±0.67 | 0.75±0.40 | −1.79±0.76 | −1.77 |
L12/28➔31 | −2.77 | −0.68±1.11 | −0.52±0.57 | −1.46±1.03 | −1.98 |
L12/30➔31 | −1.72 | −1.41±1.77 | −1.27±0.70 | 0.34±1.42 | −0.22 |
L13/32➔33 | −2.44 | −1.30±0.33 | −1.20±0.09 | −1.45±0.27 | −1.39 |
L14/rMe to sMe | 1.36 | −0.76±2.80 | 0.14±0.15 | −0.71±0.86 | −1.29 |
L15/rMe to sMe | −1.01 | −4.04±3.37 | 1.41±0.21 | −1.26±0.36 | −1.35 |
| |||||
R2 | 0.34 | 0.02 | 0.39 | 0.45 | |
PI | 0.63 | 0.48 | 0.61 | 0.71 | |
RMSD | 1.95 | 10.75 | 1.66 | 1.45 | |
AAD | 1.69 | 4.23 | 1.44 | 1.27 | |
| |||||
Omit outliere | R2 | 0.30 | 0.36 | 0.44 | 0.52 |
PI | 0.60 | 0.68 | 0.72 | 0.79 | |
RMSD | 2.00 | 1.82 | 1.61 | 1.41 | |
AAD | 1.73 | 1.61 | 1.39 | 1.21 |
Average and standard deviations (n=3) from the standard FEP calculations.
Average and standard deviations (n=5) from SSFEP calculations.
Average and standard deviations (n=5) from the MC SILCS/LGFE calculations.
Minimum energy conformation from the MC SILCS/LGFE calculations.
L8/19➔20 (H to Methyl) outlier omitted from the second statistical analysis.
With ACK1 the L2/4→5 (2H to 2Cl) transformation had a standard deviation significantly larger than the other transformations. Omission of this outlier, which was also problematic with the SSFEP and SILCS LGFE calculations (see below), did not significantly improve the correlation with the experimental data though improvements in the RMSD and AAD values were obtained. With the p38 MAP kinase data set, there were no evident outliers based on the FEP results, though the SSFEP calculations indicated problems with the L8/19→20 (H to Methyl) transformation (see below). Omission of this compound lead to a small degradation in the overall agreement of the FEP results with experiment for the p38 MAP kinase data set.
Single-Step Free Energy Perturbation
SSFEP calculations involved five 10 ns simulations of each of the parent compound-protein complexes and of the parent compounds in aqueous solution. The listed transformations were then applied to the parent compounds with only the local chemical changes in the compounds applied to each snapshot of the pre-computed MD trajectories with the remainder of the coordinates in the compound as well as the protein and solvent obtained from the trajectories, allowing for rapid estimate of the ΔΔG values, as discussed below. The transformations were performed by adding atoms based on internal coordinates, with the geometries based on a single energy minimized conformation, as described in the Methods. For example, in the case of an aromatic H to a halogen, the hydrogen is deleted and replaced by the halogen with the appropriate bond length. No changes in the geometry of the aromatic ring are performed, though the partial atomic charges in the ring will be altered as assigned by the CGenFF program. In the case of substitutions of, for example, an -O- linking two aromatic rings to an -S- (Figure 1b, Compound L6), the atom type is changed as are the charges of the adjacent rings per the CGenFF program. However, the local internal coordinates of the covalent bond from the perturbed atom to the surrounding rings along with the associated angles remains unperturbed. This represents an approximation in the SSFEP method that is required to allow for the remainder of the conformation of the parent compound to remain unchanged.
The SSFEP ΔΔG values were calculated as simple averages over the 5 trajectories from each set of simulations. In addition, results are presented in which the outliers were omitted from both the ACK1 and p38 MAP kinase analyses, as discussed above. The average SSFEP free energy differences for the ACK1 and p38 data sets are included in Tables 2 and 3, respectively. Figure 2 presents correlation plots for the average results from the two datasets. With ACK1, SSFEP yielded a level of correlation similar to that of the FEP method with a R2 value of 0.18, but a better PI of 0.27. The RMSD and AAD values were somewhat larger than the FEP results, due to large difference with respect to experiment with selected data points. This is evident in the correlation plot in Figure 2A, where a single data point significantly differs from the generally good agreement between experimental and calculated values. The compound involving a di-chloro substitution emerged as an outlier. The value associated with this compound was computed using two series of simulations, with two parent compounds with an unsubstituted and mono-substituted phenyl ring, respectively. The final value was obtained as a sum of the free energies from these two calculations. Since this compound showed problems with all three methods, the data were reanalyzed omitting the dichloro substitution. This yielded a significant improvement in both R2 to 0.78 and PI to 0.85, which is also evident in the correlation plot in Figure 2c, with concomitant decreases in the RMSD and AAD values. This represents a significant improvement over the FEP results, with the RMSD and AAD values marginally larger than the FEP results with the L2 dichloro outlier omitted.
Figure 2.
Experimental versus SSFEP ΔΔG correlation plots for all the (A and C) ACK1 and (B and D) p38 MAP kinase (p38MK) systems for all inhibitors (A and B) and with the outliers omit (C and D, see text)(solidcircles). Regression lines (solid lines) are shown for both data sets where A) Y= 0.42*X+2.83 (R2=0.18), B) Y=−1.23*X+3.25, (R2=0.05) C) Y=−1.2*X+1.61(R2=0.78), and D) Y=0.31*X+0.68(R2=0.14).
SSFEP average results for p38 MAP kinase are included in Table 3 and Figure 2b. The correlation R2 was 0.05, although the PI of 0.48 indicates some predictability of the model. Analysis of the data, as shown in Figure 2b, shows the presence of one outlier where the SSFEP ΔΔG value was 38.93±11.68 kcal/mol. This transformation involves a hydrogen to methyl change and the unrealistically large free energy change indicates a limitation in the SSFEP method. As shown in Figure 3, the ligand fits snugly in the binding pocket and has little conformational freedom. In the SSFEP calculation this leads to a lack of sampling consistent with the addition of a methyl group. As this SSFEP result may be considered an obvious outlier, the statistical analysis was redone with that data point omitted. This lead to significant improvement in R2 and PI to values of 0.14 and 0.68, respectively, indicating satisfactory predictability of the model (Figure 2d). These values are similar to those obtained in the FEP calculations, with the SSFEP RMSD and AAD values also being better than those from the standard FEP calculations.
Figure 3.
A) Representative ligand orientations from the p38 MAP kinase L8/19→20 (H to Methyl) transformations from the FEP (CPK, atom colored), SSFEP (Licorice, atom colored) and SILCS LGFE (CPK, blue) calculations with the crystallographic 3FLY protein conformation shown in New Cartoon (cyan) representation. The methylated L8 analog is shown for the FEP and MCSILCS calculations while the parent L8 structure is shown for the SSFEP calculation. B) The methylated L8 colored orange has a steric clash with the pocket backbone constituting residues T106, H107 and L108.
To understand the cause of the highly unfavorable ΔΔG for the L8/19→20 (H to Methyl) transformations, ligand conformations from the FEP and SSFEP calculations were analyzed. Shown in Figure 3A are representative conformations of the L8 ligand from the FEP, SSFEP and SILCS LGFE calculations. With all three ligands the amide moieties overlap while changes in the locations of the phenyl rings and, to a greater extent with the bicyclic ring system and the subsequent ester moiety are evident. As the FEP, SSFEP and SILCS calculations were initiated from the same ligand-protein conformation, it is evident that the addition of the methyl leads to a significant change in the conformation and orientation of the ligand in the binding pocket, to avoid steric clash with the backbone constituting of residues T106, H107 and L108. The conformational changes occur to the largest extent with the FEP calculation followed by the SILCS method, with the restrained nature of the protein in the SSFEP leading to minimal conformational changes during the ligand-protein complex simulations (Figure 3B). As the SSFEP method is based on directly using the ensemble of conformations of the parent ligand-protein complexes from the MD simulations for the perturbation, the change in orientation seen in the FEP and SILCS calculations cannot occur leading to the large, unfavorable free energy change.
SILCS Ligand Grid Free Energies
SILCS LGFE represents the second pre-computed ensemble approach. In SILCS LGFE scoring GFE distributions for functional group types (FragMaps) are calculated from SILCS simulations that contain a mixture of 8 solutes at ~0.25 M along with water and the respective protein. Conformational sampling of the ligands in the field of the GFE FragMaps may then be performed using MC SILCS sampling from which low free energy conformations based on the LGFE scores are calculated for each ligand. Thus, in contrast to the free energy differences from the standard FEP and SSFEP results presented in the preceding sections the LGFE free energy differences represent an “end point” calculation where the differences are based on the absolute LGFE scores for the parent compound and the individual transformed species. LGFE values were obtained as the average over 5 sets of MC SILCS simulated annealing runs and as the minimum LGFE from all the 5 MC SILCS runs.
Results from the SILCS LGFE calculations for ACK1 and p38 MAP kinase are included in Tables 2 and 3, respectively. For both proteins, the LGFE results show an improvement over both the standard FEP and the SSFEP results. This is evidenced by the larger R2 and PI values as well as the smaller RMSD and AAD values. This includes improvements when the outliers for both ACK1 and p38 MAP kinase are removed from the analysis. In addition, the R2 and PI values based on the minimum LGFE values are systematically better than those based on the LGFE average values. SILCS LGFE still shows problems with the ACK1 L2/4→5 (2H to 2Cl) transformation, where a large standard deviation is obtained. However, with the p38 MAP kinase L8/19→20 (H to Methyl) transformation the standard deviation is large, but less than that obtained for some other transformation. This appears to be due to the conformational sampling of the ligand in the field of the FragMaps, allowing low energy conformations to be sampled that are not being sampled in the SSFEP method. Accordingly, omission of the outlier leads to only a modest improvement in the quantitative agreement with experiment.
Qualitative predictability
In the context of a drug design project, it is often desirable to simply have information on whether a specific transformation leads to improved binding from which a go, no-go decision with respect to synthesis and testing can be made. This is supported by the magnitude of the RMSD and AAD values being in the vicinity of 2 kcal/mol (Tables 2 and 3) for the FEP and SSFEP calculations, indicating the difficulty of obtaining quantitative estimates of ΔΔG values. Similarly, FEP studies combined with enhanced sampling based on Hamiltonian replica exchange (solute tempering) reported RMSD and AAD values of approximately 1 kcal/mol, suggesting the lower limit on the accuracy of the quantitative estimates16. Accordingly, qualitative analysis was performed based on the number of calculated ΔΔG values with the correct sign, with the results included in Table 4. With ACK1, only 4 out of 8 ΔΔG values were of the correct sign with standard FEP, with SSFEP yielding 5 correct while LGFE scoring yielded 6 correct. With p38 MAP kinase standard FEP yielded 10 correct, and SSFEP yielded 9. With LGFE scoring, the average values yields 9 correct with the use of the minimum LGFE energies yielding 12 out of 15 correct. This corresponded with an improvement in the RMSD and AAD values. Thus, for the studied data sets the SILCS LGFE yields the highest level of qualitative predictability of the 3 tested methods.
Table 4.
Number of ΔΔG values with the correct sign for the different methods of estimating the free energy difference. The values after the slash are the total number of perturbations.
Protein | ΔΔGFEPa | ΔΔGSSFEPb | ΔΔGLGFEc | ΔΔGLGFEd |
---|---|---|---|---|
ACK1 | ||||
#Correct/8 | 4 | 5 | 6 | 6 |
| ||||
p38 MAP kinase | ||||
Correct/15 | 10 | 9 | 9 | 12 |
Average and standard deviations (n=3) from the standard FEP calculations.
Average and standard deviations (n=5) from the single-step FEP calculations.
Average and standard deviations (n=5) from the MC SILCS/LGFE calculations.
Minimum energy conformation from the MC SILCS/LGFE calculations.
p38 MAP kinase GCMC/MD SILCS LGFE analysis
Given the success of the MC-SILCS LGFE method in the tests shown above we applied the approach to a set of p38 MAP kinase ligands presented by Goldstein et al.43 and recently subjected to FEP calculations using an Hamiltonian Replica Exchange Solute Tempering method to improve sampling (FEP-REST).16 Since these ligands represent larger chemical modifications, multiple FEP pathways had to be run and the errors reduced via a cycle closure analysis.16 The size of the transformations considered in that study thereby offer a rigorous challenge to the SILCS methodology.
Analysis was performed on the 34 kinase inhibitors obtained from Tables 2, 3 and 4 of Goldstein et al. The list represents the same ligands that were also studied by Wang et al.16 The only ligand of the Goldstein set that was not included was 2cc in Table 4 as it was omitted in the Wang et al study. Relative energies were calculated with respect to compound 2a and the data for analysis of the Wang et al results was taken from the supporting information supplied with that publication. Results from the analyses are present in Table 5. Overall the results are similar for R2 and the PI values. However, the SILCS LGFE predictions yield a greater number of correct results with respect to the sign of the ΔΔG value over the Wang et al., FEP-REST results, with 25 correct for both of the LGFE analyses while only 20 are correct based on the FEP results. In contrast to the above results, the RMSD and AAD values are improved with the FEP-REST results as compared to those from LGFE scoring, potentially due to the enhanced sampling in the FEP-REST calculation as well as the larger nature of the chemical transformations in which approximations associated with the LGFE method have a larger impact (i.e. cancellation of energies associated with connectivity of the ligand, configurational entropy, and other enthalpic contributions that are present in a rigorous statistical thermodynamic treatment of binding affinity).
Table 5.
Statistical analyses of prediction from Wang et al. and from the SILCS LGFE scoring for the p38 MAP kinase inhibitors reported by Goldstein et al.
Wang et al. | SILCS LGFEave | SILCS LGFEmin | |
---|---|---|---|
R2 | 0.43 | 0.39 | 0.40 |
PI | 0.66 | 0.67 | 0.68 |
RMSD | 1.42 | 2.19 | 2.31 |
AAD | 1.09 | 1.85 | 1.74 |
#Correct/33 | 20 | 25 | 25 |
Note: 34 compounds included in data set, with relative energies offset to compound 2a such that 33 compounds included in the analysis of # correct.
Computational requirements
An important consideration in an active drug design project is throughput. Accordingly, analysis of the required computational resources and real time turnover time for the applied methods was undertaken. Table 6 includes the resource information along with the hardware used for the different methods. We note that the timings are specific for the studied systems and will vary according to system size, need for additional convergence and other variables. With the SSFEP and SILCS LGFE methods a pre-computation is required prior to the free energy estimates. With SSFEP, this requires 5 individual simulations of 10 ns for both the protein and aqueous systems, with the latter typically being less computationally demanding. On a compute cluster, this will typically require on the order of 24 hours with the individual simulations run simultaneously on 8 to 16 cores each. With SILCS LGFE 10 GCMC/MD simulations are required for 50 ns each, such that approximately 7 days are required for the typical pre-computation on a multicore cluster. However, with SSFEP a set of pre-computations is required for each parent compound while only a single set of pre-computations is required for the SILCS methods.
Table 6.
Resource requirements of the applied computational free energy methodologies for the fully solvated p38 MAP kinase system.
Method | FEP (Gromacs) | SSFEP | SILCS LGFE |
---|---|---|---|
Hardware | 2.3/3.0 GHz AMD Opteron |
2.3/3.0 GHz AMD Opteron |
2.3/3.0 GHz AMD Opteron |
| |||
CPU requirements (Pre-computations) | |||
NA | 10 × 16 cores × 10 nsa 0.7 CPU daysb |
10 × 16 cores × 50 nsa ~6.5 CPU days |
|
| |||
CPU requirements (FE calculations) | |||
16 cores × 24 hrs 384 CPU hrs |
1 core × 0.5 hrs 0.5 CPU hrs |
1 core × 0.5 hrs 0.5 CPU hrs |
|
| |||
Real time requirements per modification | |||
2.5 day | 0.02 day | 0.04 day | |
| |||
CPU requirements for 1000 ligands post pre-computation | |||
960,000 CPU hrs | 500 CPU hrsc | 1000 CPU hrs |
Simulations performed using Gromacs.
Assumes a 35,000 atom system.
The pre-computation needs to be preformed for each different parent ligand in SSFEP, but not with SILCS LGFE.
Significant differences occur in the computational requirements for the free energy estimates. With the standard FEP and the SSFEP methods the CPU requirements include both the protein and solution perturbations, while the SILCS LGFE timings account for 5 × 50 MC-SILCS sampling runs. As expected, the computational demands of the standard FEP methods are significantly higher than the pre-computed SSFEP and SILCS approaches. For each perturbation 60 hrs on 16 cores is required versus 0.5 hrs on a single core with the pre-computed methods. Thus, while there are upfront costs with the pre-computation in the SSFEP and SILCS methods, once finished, the methods represent an approximately 1000-fold improvement in computational resources over standard FEP methods. While such an estimate is an approximation that will be impacted by technological developments, such as GPUs, the significant speed-up in the pre-computed ensemble methods combined with their level of predictability indicates their potential utility in facilitating ligand design and development.
Using SSFEP and SILCS LGFE for exploratory screening of ligand chemical transformations
The efficiency of the SSFEP and SILCS LGFE approaches allows for their use in compound chemical space exploration. To illustrate this potential we applied the approaches to the simulations data for the p38 MAP kinase ligand L11 (Fig. 1B). A total of 15 sites were identified for transformations (Fig. 4A), where hydrogen atoms were substituted with 147 different functional groups (Supporting Information Table S1) with the ΔΔG values calculated using the respective protocols. We note the utility of the CGenFF program in combination with the automatic ligand transformation capabilities in MolCal that allows for the rapid generation of the necessary topology and parameters along with ligand energy minimizations to be performed on the 2205 chemical transformations. The resulting data are presented in Figure 4B and C for all the modifications. Many of these functional groups, which are not limited to single heavy atom substitutions typically used in the SSFEP approach, result in large unfavorable predictions. Large ΔΔG are also obtained with the SILCS LGFE method, though the number is significantly smaller, consistent with that methods ability to accommodate a range of chemical modifications of varying size and polarity. Making an assumption that reliable predictions are likely to be in the range −5 to +5 kcal/mol, we show in Figure 4B and C the ΔΔG values obtained using the SSFEP and SILCS LGFE approaches, respectively.
Figure 4.
SSFEP and SILCS for exploratory screening: A) At each of the shown 15 sites (R1 to R15) of the p38 MAP kinase ligand L11, 145 functional group substitutions were performed and their effect on the change in the free energy of binding (ΔΔG) was evaluated using B) SSFEP and C) SILCS. Functional groups (1–147) are listed in Table S1 in the Supplementary Info.
Of the total of 2205 possible modifications a total of 326 failed in the energy minimization of the ligands prior to the MC SILCS docking. These failures were typically associated with larger modifications in which severe steric overlap with the remainder of the ligand occurs. Of the remaining 1879 modified molecules 105 fell in the ±5 kcal/mol window with SSFEP while 1005 were in that range with SILCS LGFE. Clearly, the limitation on the size of the substitution that can be made in the SSFEP approach leads a relatively smaller number of acceptable ΔΔG values as expected. Alternatively, the inclusion of protein flexibility combined with relaxation of the ligands in the field of the FragMaps and Exclusion Map using the MC SILCS approaches allows for identification of putative orientations associated with favorable interactions with the protein. With respect to the individual sites on the ligand (Figure 4A), with the SSFEP calculations sites 9, 11, 12, 13 and 14 yield all unfavorable ΔΔG values, while SILCS LGFE for these sites only had 8, 3, 20, 37 and 31 favorable ΔΔG values, respectively. Such results indicate that specific sites, notably 9 and 11, are likely unsuitable for modification. Alternatively, at sites where a significant number of modifications are predicted to yield favorable ΔΔG changes, such as site 1, more detailed analysis of the SSFEP or SILCS LGFE results combined with ADMET or other considerations could be used to prioritize compounds for synthesis and testing. Again, we note that both the SSFEP and SILCS LGFE analyses were completed in a matter of hours, yielding results on a time scale that can lead the drug design process.
Discussion
Computational evaluation of the free energy differences associated with two sets of experimentally studied inhibitors of the protein ACK1 and p38 MAP kinase has been undertaken using multiple methods. Accordingly, the present results are based on similar binding sites and may not necessarily be extrapolated to other types of binding sites. In addition, the inhibitor transformations being tested were chemically small, representing a small number of non-hydrogen atoms. A standard FEP approach, as implemented in the GROMACS program, was used along with two approaches based on the use of pre-computed conformational ensembles. The SSFEP approach requires five 10 ns MD simulations of the parent compound-protein complex and of the parent compound in solution, with the free energy changes based on a “single-step” perturbation where the Boltzmann-weighted energy difference over the ensemble from the MD simulations are used to estimate the free energy differences. Alternatively, the SILCS approach requires ten 50 ns simulations of the protein alone in an aqueous solution containing multiple organic solutes from which normalized 3D probability distributions are obtained and Boltzmann transformed to yield GFE FragMaps. LGFE scores for ligands are then obtained based on the overlap of different classes of atoms in the ligands with the respective GFE FragMaps, with the resulting atom GFE scores summed to yield the LGFE. Monte Carlo simulated annealing, referred to as MC SILCS, may be used to identify low energy conformations of the ligand in field of the GFE FragMaps. Thus, the LGFE scores, and the resulting difference LGFE scores, represent an approximation in the free energy of binding difference versus the formally correct FEP methods.
Quantitative analysis of the three methods shows some interesting trends. With the ACK1 dataset, which only contains 8 compounds, both the standard FEP and SSFEP approaches give relatively poor correlations based on both the Pearson correlation and PI values. Omission of a single outlier, L2/4→5 (2H to 2Cl), based on the standard deviations and the large difference with the SSFEP method, leads to improved agreement with experiment for RMSD and AAD, though the correlation with SSFEP improves significantly. In addition, the SILCS LGFE result, which shows improved correlations and RMSD and AAD values over the two FEP methods, also shows a large standard deviation suggesting that the nature of the transformation, where the two chloro atoms are both ortho on the phenyl ring are problematic for the methodologies, though issues with the experimental data cannot be excluded.
With the larger p38 MAP kinase dataset, both FEP and the LGFE methods give reasonable correlation with the experimental data while the SSFEP data gives poor correlation although the PI values indicates some level of predictive ability. However, omission of an outlier from the SSFEP set, L8/19→20 (H to Methyl), leads to substantially improved correlations, which is also observed with the FEP and LGFE rankings. Analysis of the structures for this modification from the three applied methods indicates that the ligand is structurally constrained during the SSFEP pre-compute simulation due to certain functional groups being deeply buried in the binding pocket, such that the parent ligand-protein ensemble is inconsistent with that of the methyl substituted compound. Combined, these results emphasize the inherent limitation of the SSFEP approach being based on the pre-computed ensemble of conformations from the parent compound-protein complex in which the protein conformation is significantly restrained, a limitation that leads to the approach primarily being of utility for small, typically single non-hydrogen atom substitutions. While the predictability of the method is reasonable, the inherent limitation yields the potential for significant failures. However, in the present case the magnitude of the difference with the problematic transformations makes them readily identifiable and thus omitted from consideration when interpreting the results.
For the studied small transformations, the SILCS LGFE method overall performs quite well. This includes similar or improved correlations over the FEP data, with respect to Pearson correlation and the PI values. In addition, the RMSD and AAD values show improvement over the FEP results for the ACK1 and p38 MAP kinase small modifications (Table 1). These results indicate that the additive atom-based GFE assumption associated with the SILCS LGFE estimates is acceptable. Contributing to this is that the studied transformations are largely limited to a single non-hydrogen atom such that energetic contributions associated with connectivity of ligands, interaction with the protein and configurational entropy changes, cancel to a large extent, leading to the quality of the prediction for the present data sets. However, when the SILCS LGFE approach is applied to a more varied set of p38 MAP kinase inhibitors, the results are competitive with published FEP values (Table 5), indicating the potential utility of the method with larger chemical transformations, as have been reported for Factor Xa,28 thrombin,29 Mcl-1/Bcl-xl,60 ERK2,61 nuclear receptors and the β2 adrenergic and mGluR5 GPCRs.30,62,63 More generally, the results indicate the range of conformational changes in the protein and the sampling of the solutes in and around the protein obtained from the SILCS simulations are representative of the structural perturbations and functional group distributions in full ligands.
Qualitatively, the SILCS LGFE scores perform quite well with the best agreement with experiment occurring when the lowest (most favorable) LGFE score is used for ranking the ligands. This outcome is suggested to be associated with the underlying GFE FragMaps including energetic contributions for functional group-protein interactions, desolvation contribution for both the protein binding site and the functional groups as well as protein conformational flexibility. In addition, the improvement with the minimum versus average LGFE scores indicates that identifying the lowest energy conformation in the field of the free energy associated with the GFE FragMaps yields a representative energy that corresponds to the highest populated conformation versus the need to sample an ensemble of conformations in energy space as required for FEP methods.
An additional advantage of the SILCS LGFE scoring is a potentially decreased sensitivity of the scores to the quality of the ligand parameters. Clearly, the quality of the force field parameters are important for the accuracy of the pre-computed GFE FragMaps and the highly optimized additive CHARMM36 force field, that includes the solutes used in the SILCS simulations, represents the state-of-the-art for this type of calculation. However, when calculating the LGFE scores, including the MC SILCS simulated annealing, the force field is only used to calculate the intramolecular contribution of the energy used in the Metropolis acceptance criteria, which also includes the LGFE term, with the LGFE energy based only on the assignment of the atom types in the ligands to the corresponding GFE FragMaps. Thus, details of the partial atomic charges and other force field terms only partially impact the conformational sampling, with the LGFE score solely based on the atomic overlap with the FragMaps. Given the challenges of extending a highly optimized force field to the chemical space associated with drug-like molecules, this represents an important advantage of the SILCS approach.
An important aspect of a computational method is the ability to produce estimates of relative binding affinity in a timely fashion as required to drive the ligand design process. The time requirements shown in Table 6 may be put into context of the synthetic capabilities of a focused, industrial medicinal chemistry program that generates on the order of 100 compounds per week. Thus, in practical terms the computational method should be able to rank a minimum of 10 times more, or 1000 compounds/week, to effectively direct the synthetic chemistry efforts. Based on these estimates the advantage of the pre-computed methods to effectively lead a medicinal chemistry project is obvious. With access to 100 cores, the pre-compute methods would generate the data in a matter of hours, while 960,000 hours, or 400 days would be required with the standard FEP method used in the present study. These latter estimates are clearly beyond that required to lead a medicinal chemistry project, even with the resources increased by a factor of 10. These results indicate that the pre-computed methods offer an effective methodology that, as shown above, is of adequate accuracy to effectively drive a medicinal chemistry project. Finally, we note that the number of modifications of a lead compound by the SSFEP approach is limited due to the small, typically single non-hydrogen atom perturbations, while such limitations are not present in the SILCS LGFE approach. In addition, the LGFE approach is ligand neutral, such that additional pre-computations are not required for new parent compounds, unless significant changes in protein conformation (e.g. in vs. out state of kinases) occur. However, additional studies are required on larger data sets targeting additional proteins and larger chemical transformations to more rigorously evaluate the SILCS LGFE approach.
Supplementary Material
Acknowledgments
We thank all members of the MacKerell group for helpful discussions. This work was supported by NIH SBIR R43GM109635 and Pfizer. The authors acknowledge computer time and resources from the Computer Aided Drug Design (CADD) Center at the University of Maryland, Baltimore. RAD thanks Dr. Ray Unwalla for his assistance in assembling the ACK1 and p38 MAP kinase datasets.
Footnotes
Conflict of Interest ADM is co-founder and Chief Scientific Officer of SilcsBio LLC and RAD is an employee and stock holder of Pfizer Inc.
Supporting Information Available
Supporting information contains a table of the functional groups added in Figure 4 of the main text.
References
- 1.Song CM, Lim SJ, Tong JC. Brief Bioinform. 2009;10(5):579–591. doi: 10.1093/bib/bbp023. [DOI] [PubMed] [Google Scholar]
- 2.Cavasotto CN, editor. In Silico Drug Discovery and Design: Theory, Methods, Challenges, and Applications. CRC Press; Boca Raton, Florida: 2016. [Google Scholar]
- 3.Jorgensen WL. Science. 2004;303(5665):1813–1818. doi: 10.1126/science.1096361. [DOI] [PubMed] [Google Scholar]
- 4.Shoichet BK, Kobilka BK. Trends Pharmacol Sci. 2012;33(5):268–272. doi: 10.1016/j.tips.2012.03.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Erlanson DA, McDowell RS, O’Brien T. J Med Chem. 2004;47(14):3463–3482. doi: 10.1021/jm040031v. [DOI] [PubMed] [Google Scholar]
- 6.Sheng C, Zhang W. Med Res Rev. 2013;33(3):554–598. doi: 10.1002/med.21255. [DOI] [PubMed] [Google Scholar]
- 7.Hu B, Lill MA. J Cheminform. 2014;6:14. doi: 10.1186/1758-2946-6-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Yu W, Lakkaraju SK, Raman EP, MacKerell AD., Jr J Comput Aided Mol Des. 2014;28(5):491–507. doi: 10.1007/s10822-014-9728-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Yu W, Lakkaraju SK, Raman EP, Fang L, MacKerell AD., Jr J Chem Inf Model. 2015;55(2):407–420. doi: 10.1021/ci500691p. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Hu B, Lill MA. J Chem Info Model. 2012;52(4):1046–1060. doi: 10.1021/ci200620h. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Jorgensen WL. Acc Chem Res. 2009;42(6):724–733. doi: 10.1021/ar800236t. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bennett CH. J Comput Phys. 1976;22:245–268. [Google Scholar]
- 13.de Ruiter A, Boresch S, Oostenbrink C. J Comp Chem. 2013;34(12):1024–1034. doi: 10.1002/jcc.23229. [DOI] [PubMed] [Google Scholar]
- 14.Shirts MR, Pande VS. J Chem Phys. 2005;122(14):144107. doi: 10.1063/1.1873592. [DOI] [PubMed] [Google Scholar]
- 15.Aleksandrov A, Thompson D, Simonson T. J Mol Recognit. 2010;23(2):117–127. doi: 10.1002/jmr.980. [DOI] [PubMed] [Google Scholar]
- 16.Wang L, Wu Y, Deng Y, Kim B, Pierce L, Krilov G, Lupyan D, Robinson S, Dahlgren MK, Greenwood J, Romero DL, Masse C, Knight JL, Steinbrecher T, Beuming T, Damm W, Harder E, Sherman W, Brewer M, Wester R, Murcko M, Frye L, Farid R, Lin T, Mobley DL, Jorgensen WL, Berne BJ, Friesner RA, Abel R. J Am Chem Soc. 2015;137(7):2695–2703. doi: 10.1021/ja512751q. [DOI] [PubMed] [Google Scholar]
- 17.Vanommeslaeghe K, MacKerell AD., Jr Biochim Biophys Acta. 2015;1850(5):861–871. doi: 10.1016/j.bbagen.2014.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Pissurlenkar RR, Shaikh MS, Iyer RP, Coutinho EC. Anti Infect Agents Med Chem. 2009;8(2):128–150. [Google Scholar]
- 19.Jiang W, Roux B. J Chem Theory Comput. 2010;6(9):2559–2565. doi: 10.1021/ct1001768. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Wang L, Berne B, Friesner RA. Proc Natl Acad Sci USA. 2012;109(6):1937–1942. doi: 10.1073/pnas.1114017109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Zheng L, Chen M, Yang W. Proc Natl Acad Sci U S A. 2008;105(51):20227–20232. doi: 10.1073/pnas.0810631106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Zheng L, Chen M, Yang W. J Chem Phys. 2009;130(23):234105. doi: 10.1063/1.3153841. [DOI] [PubMed] [Google Scholar]
- 23.Oostenbrink C, van Gunsteren WF. Proc Natl Acad Sci U S A. 2005;102(19):6750–6754. doi: 10.1073/pnas.0407404102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Oostenbrink C, van Gunsteren WF. Proteins. 2004;54(2):237–246. doi: 10.1002/prot.10558. [DOI] [PubMed] [Google Scholar]
- 25.Liu H, Mark AE, Gunsteren WFv. J Phys Chem. 1996;100:9485–9494. [Google Scholar]
- 26.Raman EP, Vanommeslaeghe K, Mackerell AD., Jr J Chem Theory Comput. 2012;8(10):3513–3525. doi: 10.1021/ct300088r. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Guvench O, MacKerell AD., Jr PLoS Comput Biol. 2009;5(7):e1000435. doi: 10.1371/journal.pcbi.1000435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Raman EP, Yu W, Lakkaraju SK, MacKerell AD., Jr J Chem Inf Model. 2013;53(12):3384–3398. doi: 10.1021/ci4005628. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Raman EP, Yu W, Guvench O, MacKerell AD., Jr J Chem Inf Model. 2011;51(4):877–896. doi: 10.1021/ci100462t. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Lakkaraju SK, Yu W, Raman EP, Hershfeld AV, Fang L, Deshpande DA, MacKerell AD., Jr J Chem Inf Model. 2015;55(3):700–708. doi: 10.1021/ci500729k. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Lakkaraju SK, Raman EP, Yu W, MacKerell AD., Jr J Chem Theory Comput. 2014;10(6):2281–2290. doi: 10.1021/ct500201y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Stumpfe D, Bajorath Jr. J Med Chem. 2012;55(7):2932–2942. doi: 10.1021/jm201706b. [DOI] [PubMed] [Google Scholar]
- 33.Astolfi A, Iraci N, Manfroni G, Barreca ML, Cecchetti V. Chem Med Chem. 2015;10(6):957–969. doi: 10.1002/cmdc.201500030. [DOI] [PubMed] [Google Scholar]
- 34.Jiao X, Kopecky DJ, Liu J, Liu J, Jaen JC, Cardozo MG, Sharma R, Walker N, Wesche H, Li S. Bioorg Med Chem Lett. 2012;22(19):6212–6217. doi: 10.1016/j.bmcl.2012.08.020. [DOI] [PubMed] [Google Scholar]
- 35.Kopecky DJ, Hao X, Chen Y, Fu J, Jiao X, Jaen JC, Cardozo MG, Liu J, Wang Z, Walker NP. Bioorg Med Chem Lett. 2008;18(24):6352–6356. doi: 10.1016/j.bmcl.2008.10.092. [DOI] [PubMed] [Google Scholar]
- 36.Angell R, Aston NM, Bamborough P, Buckton JB, Cockerill S, deBoeck SJ, Edwards CD, Holmes DS, Jones KL, Laine DI, Patel S, Smee PA, Smith KJ, Somers DO, Walker AL. Bioorg Med Chem Lett. 2008;18(15):4428–4432. doi: 10.1016/j.bmcl.2008.06.048. [DOI] [PubMed] [Google Scholar]
- 37.Schrödinger, L. New York, NY 2009.
- 38.Release, S. Schrödinger, LLC, New York, NY 2013.
- 39.Pronk S, Pall S, Schulz R, Larsson P, Bjelkmar P, Apostolov R, Shirts MR, Smith JC, Kasson PM, van der Spoel D, Hess B, Lindahl E. Bioinformatics. 2013;29(7):845–854. doi: 10.1093/bioinformatics/btt055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Shelley JC, Cholleti A, Frye LL, Greenwood JR, Timlin MR, Uchimaya M. J Comput Aided Mol Des. 2007;21(12):681–691. doi: 10.1007/s10822-007-9133-z. [DOI] [PubMed] [Google Scholar]
- 41.Herberich B, Cao G-Q, Chakrabarti PP, Falsey JR, Pettus L, Rzasa RM, Reed AB, Reichelt A, Sham K, Thaman M. J Med Chem. 2008;51(20):6271–6279. doi: 10.1021/jm8005417. [DOI] [PubMed] [Google Scholar]
- 42.Hynes J, Jr, Dyckman AJ, Lin S, Wrobleski ST, Wu H, Gillooly KM, Kanner SB, Lonial H, Loo D, McIntyre KW. J Med Chem. 2007;51(1):4–16. doi: 10.1021/jm7009414. [DOI] [PubMed] [Google Scholar]
- 43.Goldstein DM, Soth M, Gabriel T, Dewdney N, Kuglstatter A, Arzeno H, Chen J, Bingenheimer W, Dalrymple SA, Dunn J, Farrell R, Frauchiger S, La Fargue J, Ghate M, Graves B, Hill RJ, Li F, Litman R, Loe B, McIntosh J, McWeeney D, Papp E, Park J, Reese HF, Roberts RT, Rotstein D, San Pablo B, Sarma K, Stahl M, Sung ML, Suttman RT, Sjogren EB, Tan Y, Trejo A, Welch M, Weller P, Wong BR, Zecic H. J Med Chem. 2011;54(7):2255–2265. doi: 10.1021/jm101423y. [DOI] [PubMed] [Google Scholar]
- 44.Best RB, Zhu X, Shim J, Lopes PE, Mittal J, Feig M, Mackerell AD., Jr J Chem Theory Comput. 2012;8(9):3257–3273. doi: 10.1021/ct300400x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Vanommeslaeghe K, Hatcher E, Acharya C, Kundu S, Zhong S, Shim J, Darian E, Guvench O, Lopes P, Vorobyov I. J Comput Chem. 2010;31(4):671–690. doi: 10.1002/jcc.21367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Vanommeslaeghe K, MacKerell AD., Jr J Chem Inf Model. 2012;52(12):3144–3154. doi: 10.1021/ci300363c. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Vanommeslaeghe K, Raman EP, MacKerell AD., Jr J Chem Inf Model. 2012;52(12):3155–3168. doi: 10.1021/ci3003649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Hess B, Bekker H, Berendsen HJC, Fraaije JGEM. J Comput Chem. 1997;18(12):1463–1472. [Google Scholar]
- 49.Darden T, York D, Pedersen L. J Chem Phys. 1993;98(12):10089–10092. [Google Scholar]
- 50.Parrinello M, Rahman A. J Appl Phys. 1981;52(12):7182–7190. [Google Scholar]
- 51.Nose S. J Chem Phys. 1984;81(1):511–519. [Google Scholar]
- 52.Hoover WG. Phys Rev A. 1985;31(3):1695–1697. doi: 10.1103/physreva.31.1695. [DOI] [PubMed] [Google Scholar]
- 53.Shi L, Quick M, Zhao Y, Weinstein H, Javitch JA. Mol Cell. 2008;30(6):667–677. doi: 10.1016/j.molcel.2008.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Zwanzig RW. J Chem Phys. 1954;22:1420–1426. [Google Scholar]
- 55.Gapsys V, Seeliger D, de Groot BL. J Chem Theory Comput. 2012;8(7):2373–2382. doi: 10.1021/ct300220p. [DOI] [PubMed] [Google Scholar]
- 56.Pearlman DA, Charifson PS. J Med Chem. 2001;44(21):3417–3423. doi: 10.1021/jm0100279. [DOI] [PubMed] [Google Scholar]
- 57.Leung CS, Leung SS, Tirado-Rives J, Jorgensen WL. J Med Chem. 2012;55(9):4489–4500. doi: 10.1021/jm3003697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Pettus LH, Xu S, Cao G-Q, Chakrabarti PP, Rzasa RM, Sham K, Wurz RP, Zhang D, Middleton S, Henkle B. J Med Chem. 2008;51(20):6280–6292. doi: 10.1021/jm8005405. [DOI] [PubMed] [Google Scholar]
- 59.Wurz RP, Pettus LH, Xu S, Henkle B, Sherman L, Plant M, Miner K, McBride H, Wong LM, Saris CJ. Bioorg Med Chem Lett. 2009;19(16):4724–4728. doi: 10.1016/j.bmcl.2009.06.058. [DOI] [PubMed] [Google Scholar]
- 60.Cao X, Yap JL, Newell-Rogers MK, Peddaboina C, Jiang W, Papaconstantinou HT, Jupitor D, Rai A, Jung KY, Tubin RP, Yu W, Vanommeslaeghe K, Wilder PT, MacKerell AD, Jr, Fletcher S, Smythe RW. Mol Cancer. 2013;12(1):42. doi: 10.1186/1476-4598-12-42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Samadani R, Zhang J, Brophy A, Oashi T, Priyakumar UD, Raman EP, St John FJ, Jung KY, Fletcher S, Pozharski E, MacKerell AD, Shapiro PS. Biochem J. 2015 doi: 10.1042/BJ20131571. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Lakkaraju SK, Mbatia H, Hanscom M, Zhao Z, Wu J, Stoica B, MacKerell AD, Jr, Faden AI, Xue F. Bioorg Med Chem. 2015 doi: 10.1016/j.bmcl.2015.04.042. In Press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.He X, Lakkaraju SK, Hanscom M, Zhao Z, Wu J, Stoica B, MacKerell AD, Jr, Faden AI, Xue F. Bioorg Med Chem. 2015;23(9):2211–2220. doi: 10.1016/j.bmc.2015.02.054. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.