Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2023 Jul 18;63(15):4772–4779. doi: 10.1021/acs.jcim.3c00416

CHARMM-GUI-Based Induced Fit Docking Workflow to Generate Reliable Protein–Ligand Binding Modes

Hugo Guterres 1, Wonpil Im 1,*
PMCID: PMC10428204  PMID: 37462607

Abstract

graphic file with name ci3c00416_0005.jpg

Molecular docking is a preferred method to predict ligand binding modes and their binding energy to target protein receptors, which is critical in early phase structure-based drug discovery. However, there is a persistent challenge in docking that can be attributed to the induced fit effect, as receptor binding sites undergo induced fit conformational changes upon ligand binding to achieve better binding modes. In this work, based on CHARMM-GUI LBS Finder& Refiner and High-Throughput Simulator, we present a straightforward CHARMM-GUI induced fit docking (CGUI-IFD) workflow to generate reliable protein–ligand binding modes. The CGUI-IFD workflow generates an ensemble of receptor binding site conformations through ligand-binding site (LBS) refinement, runs rigid receptor docking, and performs high-throughput molecular dynamics (MD) simulations of protein–ligand complex structures in explicit solvents. The results are evaluated based on the ligand root-mean-square deviation (RMSD)-based binding stability and the molecular mechanics generalized Born surface area binding energy. For a benchmark test, we used 258 cross-docking protein–ligand pairs across 41 target proteins from the Schrodinger IFD-MD data set. The application of CGUI-IFD on this data set shows 80% success rate (within 2.5 Å RMSD from the experimental structures). We expect that the CGUI-IFD workflow can be useful to generate reliable ligand binding modes for cross-docking cases.

1. Introduction

An accurate protein–ligand complex structure serves as a critical piece of the puzzle in structure-based drug design, providing insight into the specific physicochemical interactions between the ligand and the binding-site residues, the ligand binding mode, and the binding pocket arrangement that fits the ligand. Experimental approaches such as X-ray crystallography and cryo-electron microscopy (cryo-EM) are the gold standard methods used to obtain protein–ligand complex structures.1 However, these methods can be prohibitively expensive and time-consuming to determine the structures of many new ligands for target protein. Therefore, a more cost-effective and faster computational approach to generate reliable protein–ligand binding modes can be a useful tool for structure-based drug design.2

Over a century ago, Fischer introduced the lock-and-key model to describe a protein–ligand binding interaction, where a rigid receptor binding pocket serves as a lock and a specific ligand conformation is the complementary key.3 This simple model has been used by many computational molecular docking methods treating a receptor as a rigid entity while calculating various small molecule ligand conformations that fit complementarily to the receptor binding pocket. While fast and effective in predicting protein–ligand complex structures, this approach tends to fail when a receptor undergoes induced fit conformational changes to bind specific ligands.4 The induced fit phenomenon has been widely observed since it was first introduced in 1958 by Koshland.5 Therefore, protein receptor binding pockets are now largely viewed as dynamic structures that may adopt different conformations depending on the ligand structure. This situation further complicates computational approaches to predict the correct protein–ligand binding mode.

A few computational approaches have been carefully devised to tackle this induced fit docking (IFD) problem, including Schrodinger IFD-MD, CIFDock, Fleksy, and tinyIFD.69 Recently, the Schrodinger group introduced their IFD-MD (molecular dynamics) method that successfully predicted protein–ligand binding modes in 85% of 258 pairs of cross-docking cases.6 IFD-MD elaborately combines trained ligand-based pharmacophore docking, rigid receptor docking, energy-guided protein structure modeling, and MD simulations. Briefly, IFD-MD starts by generating an ensemble of ligands based on the template ligand to obtain information on clashes between ligands and binding site residues. Subsequently, residues that clash with ligands are refined to produce multiple alternative conformations of the receptor. Then, rigid receptor docking is used to dock the ligand into an ensemble of receptor structures, followed by residue side chain minimization. The next step uses MD simulation and water equilibration, followed by metadynamics simulations to assess the local stability of the ligand binding mode. Finally, a composite scoring function is used to rank the candidate poses. CIFDock is a CHARMM-based IFD workflow that employs self-guided Langevin dynamics simulations to sample relevant ligand conformations, side chain orientations, and water movements. Steps in their procedure include fixing individual components, i.e., the ligand, while sampling active site residues movements to account for induced fit conformational changes.7 Fleksy used a simpler approach of the IFD by sampling accessible side chain orientations for the selected flexible active site residues using a backbone-dependent rotamer library. The ligands are then docked into the generated ensemble receptor structures and best poses are minimized using the Yasara2 force field simulation.8 TinyIFD is a high-throughput MD-based workflow for docking pose refinement using truncated protein structures by choosing only the active site residues.9 Despite the availability of these IFD methods, the more commonly used method to account for the induced fit phenomenon is the ensemble MD-docking approach.1012 Using MD trajectory and cluster analysis, various representative structures of the protein receptor binding pocket are obtained. Rigid dockings are then performed on the ensemble receptor structures to account for the induced fit effects. In 2019, Falcon and colleagues have shown that G-protein coupled receptor ADORA2A needed 102 receptor conformations to identify all of its known active ligands in the DUD-E (data set of useful decoys-enhanced).12 However, they noted that clustering methods were not effective in identifying the correct receptor structures for ligand binding.

Although robust and effective, Schrodinger IFD-MD relies on many of their proprietary technologies that are not widely available to researchers in academia. At the same time, ensemble MD-docking methods have been shown to be ineffective in identifying the true receptor conformations from clustering. In this work, we propose a straightforward CHARMM-GUI-based IFD workflow (CGUI-IFD in short) to generate reliable protein–ligand binding modes. CHARMM-GUI is a widely used web-based cyber infrastructure for preparation of complex molecular simulation systems.1315 Our stepwise IFD workflow uses two CHARMM-GUI modules, LBS (ligand-binding-site) Finder& Refiner (LBS-FR) and High-Throughput Simulator (HTS).16,17 Briefly, LBS-FR is a template-based binding site refinement method to generate an ensemble of different binding pocket conformations, into which rigid receptor docking of each ligand can be performed. Subsequently, CHARMM-GUI HTS is used to evaluate the binding stability and interaction energies of the resulting protein–ligand docking models. The best ligand-binding mode is selected based on the binding stability (using ligand root-mean-square deviation, RMSD) and the best interaction energy using MMGBSA (molecular mechanics with the generalized Born surface area continuum solvation). For 258 cross-docking protein–ligand PDB structure pairs from the Miller data set (used in Schrodinger IFD-MD benchmark test), our workflow successfully predicts ligand-binding modes in 80% (i.e., within 2.5 Å ligand RMSD to experimental ligand structures), indicating that the CGUI-IFD workflow is a robust and easy-to-use tool to generate reliable protein–ligand binding modes.

2. Results and Discussion

2.1. CGUI-IFD Is Straightforward

The overall CGUI-IFD workflow is listed in Figure 1. To begin, a target receptor structure is uploaded onto LBS-FR that is a functional module using G-LoSA (Graph-based Local Structure Alignment) for local LBS structure alignment based on geometry and physicochemical features in a sequence-independent manner and ranks them using the GA score (G-LoSA alignment score).16,18,19 The GA score is a normalized similarity scoring function that ranges from 0 to 1, with 1 being a perfect alignment. A G-LoSA run involves extensive alignment of all nonredundant LBS template protein receptor structures in our PDB library (45,940 templates, as of February 2023). The library consists of experimentally solved PDB protein–ligand holo-structures with a resolution less than 3.5 Å. Here, an LBS is defined as protein residues that are within 4.5 Å from the bound ligand.

Figure 1.

Figure 1

CGUI-IFD workflow. Starting with a receptor structure, LBS Finder & Refiner generates multiple binding pocket conformations using different templates. In this example, we consider three template-based refined structures (i.e., s2, s3, and s4) and the original unrefined crystal structure (s1). Then, rigid receptor docking of each ligand is performed on various receptor conformations (s1, s2, s3, and s4) to generate the top 10 binding poses (i.e., p1-p10). All protein–ligand complex structures (40 for each cross-docking case in this example) are submitted simultaneously to High-Throughput Simulator (HTS) to generate MD systems and inputs. Finally, after 50 ns MD simulation of each protein–ligand pose, binding stabilities are evaluated with ligand RMSD and MMGBSA calculations for ranking.

Top ranked LBS templates based on their GA scores are selected for MD-based LBS structure refinement. First, equivalent aligned residues between template and receptor structures are identified using the shortest augmenting path algorithm to solve the linear sum assignment problem.20 Top 3 aligned templates on the receptor binding site are used for LBS structure refinement in this study. Note that these numbers could be changed, which is also elaborated below. For each template, the distance restraint potentials for both Cα atoms and side chain center-of-mass (SC–COM) are derived using eq 1 with one distance matrix (M) between Cα atoms of the template and their equivalent atoms on the query protein and another distance matrix (M) between SC–COM points.

2.1. 1

where k is the force constant, r0,ij represents the distance between ith and jth atoms in the template, and rij is the distance between the equivalent atoms on the receptor protein. The force constant value is set to 1.5 kcal/(mol·Å2) based on our previous optimization test.21,22 We hypothesize that by refining the receptor LBS structure using templates of holo-structures in the PDB library, we can generate an ensemble of different receptor LBS structures that are conducive to cross-docking for the following two reasons. First, all templates in our library are holo-structures that have been shown through many benchmark tests to be superior for docking compared to apo-structures.2226 Second, our nonredundant library allows for the selection of templates with different bound ligands and from nonhomologous proteins.22 In comparison, conventional ensemble MD-docking is stochastic and generates different receptor structures based on conformational variations of an empty binding pocket. Furthermore, Falcon et al. showed that clustering methods were ineffective in identifying relevant binding pocket conformations for ligand binding. Therefore, we believe that our approach is more focused and based on biologically relevant binding pocket conformations that have been observed experimentally.

Following LBS structure refinement, our workflow gathers an ensemble of 4 binding site structures in this study: 1 from the original receptor structure (s1 in Figure 1) and 3 from the template-based refined structures (s2, s3, s4). We include the original receptor structure to account for binding site conformations that are more rigid and do not change upon different ligand bindings.27 Rigid receptor docking is then used to dock a given ligand into 4 binding site structures, and the top 10 ligand-binding poses (p1-p10 in Figure 1) are collected for each receptor structure. For each protein–ligand cross-docking case, we obtain 40 protein–ligand binding modes that are then submitted to CHARMM-GUI HTS. HTS is an efficient functional module that accepts and processes multiple protein–ligand complex structures at the same time.17HTS automatically runs ligand parametrization and prepares MD simulation system and input files for all complex structures simultaneously. We have shown previously that HTS was effective in identifying correct ligand binding modes among predicted top 10 docking structures. In addition, many studies have indicated that the correct ligand-binding modes are present in the docking results, but they are often not the top ranked structure.2830

Binding stabilities from 50 ns MD results of 40 protein–ligand structures for each protein–ligand cross-docking case are quickly evaluated using a ligand RMSD metric by aligning only the protein to the initial structure and calculate the ligand RMSD (relative to the ligand in the initial structure) to capture ligand’s movements during the simulation. Stable binding poses with an average ligand RMSD less than 3 Å are selected for the subsequent MMGBSA binding interaction energy calculations. The final structures are ranked based on their MMGBSA binding energy values. All computational details are given in Methods.

2.2. CGUI-IFD Successfully Generates Correct Protein–Ligand Binding Modes in 80% of the Test Cases

For a benchmark validation test, we collected a publicly available data set of 258 cross-docking protein–ligand pairs from the Schrodinger IFD-MD paper (Table S1).6 The structures are diverse and span across 41 different protein targets and 199 ligands. The cross-docking ligand pairs have been evaluated using Tanimoto similarity coefficients (Tc), showing a median Tc value of 0.04, which demonstrates that the ligand pairs are noncongeneric.

The current general consensus of a successful protein–ligand binding mode prediction is to obtain a ligand heavy-atom RMSD within 2.5 Å relative to the native crystal structure after protein structures are aligned. CGUI-IFD shows 80% success rate for the 258 cross-docking cases (Figure 2). Rigid receptor docking was used as the control group, showing a 32% success rate. Schrodinger IFD-MD has shown the highest success rate to date with 85% in their first ranked results.6 Note that there are more sophisticated 7 steps in their IFD-MD, including pharmacophore analysis, WScore docking, Glide, and Prime that are proprietary technologies and thus are not freely available for most academic researchers. One apparent advantage of CGUI-IFD over Schrodinger IFD-MD is that our starting structure does not have to be a holo-protein. Schrodinger IFD-MD needs to start with a holo-protein in order to initialize the pharmacophore docking. Our approach does not use a pharmacophore analysis but instead uses template-based binding pocket refinement using holo-structures from the PDB library. Hence, we do not require a holo-protein to start CGUI-IFD; i.e., apo and homology model structures are all acceptable.

Figure 2.

Figure 2

258 cross-docking cases using rigid docking AutoDock-Vina, CGUI-IFD, and Schrodinger IFD-MD. Successful cases are the ones with the ligand heavy-atom RMSD within 2.5 Å relative to the native crystal structure after protein structures are aligned. Successful cases are obtained from the top ranking of each docking case.

In order to confirm the importance of using LBS-FR to generate additional receptor structures (i.e., s2, s3, and s4 in Figure 1), we calculated the number of successful cases only using the original crystal structure s1. In such cases, the success rate drops from 80% to 38% (97/258) (Table S1), indicating that generating multiple binding pocket conformations using LBS-FR is important to account for the induced fit effects in the majority of these cross-docking test cases. The recently published tinyIFD method showed that Vina was successful in producing the correct cross-docking poses within the top-5 ranked guesses in 39% of 369 cross-docking cases.9 In their test cases, only the original crystal structure (s1) was used. Their overall success rate of 39% is similar to our 38% success rate (within top-10 of s1 binding poses).

A representative cross-docking case of mitogen-activated protein kinase-2 (MK-2) is shown in Figure 3. It is a cross-docking between a receptor structure (PDB entry 2P3G) and ligand B97 from PDB entry 3FYJ, where a conventional rigid receptor docking produces an incorrect ligand-binding mode with a ligand RMSD of 8.9 Å to the native crystal structure (Figure 3A). It is clear that residue D142 clashes with the native ligand-binding mode, where the distance between the D142 side chain to the methylpyridine on the ligand is 1.4 Å. In its holo crystal structure (PDB entry 2P3G), the receptor is bound to a different ligand, F10, with a fluorophenyl group next to D142 at a distance of 4.2 Å (Figure 3A). The side chain conformation of D142 was suitable to interact with ligand F10, but not with ligand B97 in this cross-docking, resulting in the wrong ligand-binding mode predicted from rigid docking. Through binding-site structure refinement, a better receptor conformation is obtained using the template PDB 2R0U (sequence identity of 25% to the receptor protein), and a correct ligand-binding mode is produced with a ligand RMSD of 1.3 Å to the crystal structure (Figure 3B). In this refined structure, the D142 side chain is positioned at 4.4 Å away from the ligand, allowing for the correct ligand-binding mode to be obtained. As mentioned previously, the CGUI-IFD workflow generates 40 protein–ligand structures from 4 different receptor conformations. These structures were first filtered using their binding stability metric, i.e., their average MD ligand RMSD. In this case, 13 out of 40 had their average MD ligand RMSD less than 3 Å. These 13 structures underwent MMGBSA protein–ligand binding affinity calculations, and the top scoring is shown in Figure 3B with a binding affinity of −25.1 kcal/mol.

Figure 3.

Figure 3

Case study of mitogen-activated protein kinase-2 (MK-2) receptor PDB entry 2P3G (orange) and ligand B97 PDB entry 3FYJ (green). (A) Without refinement of PDB 2P3G, rigid docking (orange) using AutoDock Vina produces a ligand RMSD of 8.9 Å relative to the native crystal structure. The main culprit is residue D142 that clashes with the methylpyridine group of the native ligand. The native ligand F10 in PDB 2P3G is shown in gray. (B) Top ranked refined structure of PDB 2P3G (cyan) yields a ligand RMSD of 1.3 Å relative to the native crystal structure. Residue D142 is refined and away from the native ligand.

2.3. Many Failed Cases Could Be Rescued by Increasing the Number of Templates to Generate More Variations in Receptor Conformations

CGUI-IFD is developed to generate reliable protein–ligand binding modes even for the induced fit cases. In our initial test, we showed that using 3 templates was satisfactory (80% success rate in Figure 2) to produce an ensemble of different receptor conformations. Can we do better for 20% of failed cases if we increase the number of templates with receptor structure variations up to 10? To test this, we selected 10 cases from different protein target classes (Table 1). Using the same workflow except having now 10 template protein conformations instead of 3 (i.e., N = 10 and M = 10 in Figure 1), 8 out of 10 cases could be rescued and produce the correct ligand-binding mode.

Table 1. Summary of 10 Representative Cases: No Success with 3 Templates but Success in 8 out of 10 with 10 Templates.

        Best ligand RMSD relative to crystal structure (Å)
No Class Receptor Ligand S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
1 afab 4nns 3fr5 4.44 4.86 3.26 5.39 5.58 5.54 3.09 2.19 4.37 4.93
2 ask1 4bhn 4bid 6.16 7.06 6.38 3.78 9.16 6.39 7.55 1.39 7.12 5.24
3 dhodh 3kvk 3kvl 5.75 8.21 5.61 5.91 9.12 2.21 7.27 9.05 2.57 7.79
4 dpp4 1n1m 2aj8 6.47 5.69 7.35 5.39 6.33 5.19 6.46 6.8 5.47 2.35
5 fviia 2a2q 1wss 5.65 10.60 11.9 3.55 12.1 8.38 8.57 8.77 7.89 4.24
6 fxa 1xka 1mq5 9.06 8.55 5.44 7.55 10.6 2.18 7.38 8.25 7.14 9.71
7 hivrt 2b5j 1s1t 3.66 6.55 2.91 5.35 2.45 6.37 6.29 3.66 11.4 3.66
8 hsp70 3fzh 3m3z 8.66 5.22 7.55 9.54 9.2 8.04 7.47 8.81 8.51 8.74
9 jnk 2o2u 1pmv 4.71 4.79 2.95 4.79 2.52 5.07 10.3 1.65 3.66 4.08
10 lfa1 2ica 3bqn 5.54 6.01 4.55 9.02 6.24 6.95 6.06 6.69 2.44 6.59

Although most cases were rescued, 2 cross-docking cases stubbornly failed to reproduce the native crystal structure conformation. The first failed case is a protease, coagulation factor VIIa (fviia), where ligand 3CB from PDB entry 1WSS is docked into receptor structure 2A2Q. The persistent problem with this cross-docking case arises from the large number of rotatable bonds (18) of ligand 3CB. In many cross-docking attempts with relatively stable binding stability, the solvent-exposed carboxybenzyl and the isoleucyl groups were flipped, resulting in an incorrect ligand-binding mode compared to the crystal structure (Figure 4A). The second failed case is cross-docking of receptor PDB 3FZH (heat shock cognate 71 kDa protein (hsp70)) and ligand 3F5 from PDB 3M3Z. The main problem with this docking is the intricate hydrogen bond network at the active site formed by residues R272-E268 and R342-D366, which together generate a narrow binding pocket in which it is difficult to fit the adenosine group of the ligand 3F5 (Figure 4B). In the top structure from CGUI-IFD, residue R272 clashes with the adenosine group of the ligand 3F5, resulting in an incorrect binding mode with the adenosine facing away from the binding pocket.

Figure 4.

Figure 4

Unsuccessful cross-docking cases. (A) Coagulation factor VIIa (fviia) with ligand 3CB from PDB entry 1WSS (green) docked into receptor 2A2Q (cyan). The solvent-exposed carboxybenzyl and isoleucyl groups are flipped, resulting in an incorrect binding mode predicted by CGUI-IFD. (B) Heat shock protein (hsp70), receptor 3FZH (cyan), and ligand 3M3Z (green). Top CGUI-IFD structure shows residue R272 clashing with the adenosine ring of the ligand 3F5, resulting in an incorrect structure.

3. Conclusions

Cross-docking has proven to be challenging in reproducing the correct ligand-binding mode due to the induced fit effects. To overcome this challenge, we have developed a straightforward CGUI-IFD workflow that combines two CHARMM-GUI modules (LBS Finder& Refiner and High-Throughput Simulator) to generate various receptor conformations through MD simulations with restraints from templates, perform rigid docking, and perform high-throughput explicit solvent MD simulations to identify the correct candidate pose using ligand RMSD and MMGBSA binding energy. It has been validated using a publicly available large data set of 258 cross-docking cases with 80% success rate.

To date, there are very few induced fit docking approaches that have been systematically tested, including Schrodinger IFD-MD, tinyIFD, CIFDock, and Fleksy.69 Here, CGUI-IFD offers a new induced fit docking workflow that is successfully tested with the same data set from Schrodinger to offer an alternative method for academic researchers. Note that, although we have used holoprotein structures for our benchmark test, unlike Schrodinger IFD-MD,6 CGUI-IFD is not limited to only holoprotein receptors as inputs for docking. CGUI-IFD also works with apo- and homology model receptor structures. We have shown previously that LBS Finder & Refiner can generate reliable holo receptor structure from apo and homology model structures.21,22 CGUI-IFD is also different from conventional MD-ensemble docking in that it is a template-based method using an available library of holoprotein structures to generate an ensemble of receptor structures. MD-ensemble docking is stochastic and has been shown to be challenging to correctly identify the relevant receptor structures for docking.12 Another advantage of CGUI-IFD is that researchers are not limited to using a specific MD program for their simulations. CGUI-IFD supports various MD simulation programs, including NAMD, AMBER, GROMACS, OpenMM, and GENESIS.3135 In addition, CGUI-IFD provides 3 ligand force fields, including CGenFF, GAFF2, and OpenFF for successful ligand parametrization for high-throughput simulations.3638

CGUI-IFD is a robust and reliable method to solve the induced fit docking problem. An obvious advantage of CGUI-IFD is that it circumvents the expensive process of experimentally solving multiple protein–ligand complex structures for a given promiscuous receptor that can bind to many different small molecules. We expect that CGUI-IFD will have a significant impact on improving structure-based drug discovery by providing reliable protein–ligand structures for a large number of compounds.

4. Methods

4.1. Computational Details

Validation tests were conducted on 258 publicly available protein–ligand pairs for cross-docking from the Schrodinger data set (Table S1).6 The control group result was obtained using rigid receptor docking by AutoDock Vina and selecting only the top scored docking output for each protein–ligand pair. Vina is a freely available and widely used academic program for molecular docking that has been evaluated and shown to be the best and/or better than many commercial programs including LigandFit, Glide, MOE Dock, and Surflex-Dock.39,40 Therefore, we chose to incorporate Vina into the CGUI-IFD workflow. Each receptor structure contains a ligand that is used to determine the search space for ligand docking. A cubic box search space with 22.5 Å edges was used for each docking, based on the optimized search space recommended by Vina.

OpenMM MD simulations of proteins in solutions were conducted using 1 GTX 1080TI GPU (graphics processing unit) and 1 CPU (central processing unit) or 1 RTX 2080TI GPU and 1 CPU.34 The simulation times for the smallest and largest systems are listed in Table 2. In CGUI-IFD, each cross-docking case starts with 3 MD simulation runs with restraints for binding-site structure refinement (i.e., s2, s3, and s4 in Figure 1). These can be run in parallel, where each simulation is set to 50 ns and takes 6.05 h (using a GTX 1080TI) or 2.90 h (using an RTX 2080TI) for the smallest systems (Table 2). Following this, 40 MD simulations of 50 ns are conducted in parallel for the top 10 binding poses from each receptor structure s1, s2, s3, and s4, and each one takes 5.35 h (using a GTX 1080TI) or 2.15 h (using an RTX 2080TI) for the smallest systems (Table 2). The CGUI-IFD workflow started with the generation of an ensemble of receptor structures to account for the induced fit effect. For each target receptor structure, we performed all-atom MD simulations in explicit solvents using the distance restraint potentials from each of 3 LBS templates obtained using LBS-FR.16 The templates were collected from nonhomologous proteins with a sequence identity less than 30% to the receptor proteins. Templates used for each receptor are listed in Table S1. Each receptor was solvated in a cubic box with TIP3P water and neutralized with 0.15 M KCl. The particle-mesh Ewald summation was used to calculate the long-range electrostatic interactions.41 The force-based switching method was used to truncate the van der Waals interactions between 10 and 12 Å.42 Hydrogen atoms were constrained using the SHAKE algorithm.43 The CHARMM36m force field was used for all simulations.44 Each system was minimized using the steepest descent method for 5000 steps followed by a 125 ps equilibration run. The production run was performed for 50 ns, and the final frame was used as the refined receptor structure (i.e., s2, s3, and s4 in Figure 1). The structure s1 is the original unrefined crystal structure to account for receptor structures that do not undergo significant conformational changes during cross-docking.

Table 2. OpenMM MD Simulation Speed for the Smallest and Largest Systems in This Study.

          Speed (hours/ns)a
No. Target PDB Number of residues Atom numbers HPC system GPU LBS-FR HTS
1 3jzk 96 26,751 GTX 1080TI 0.121 0.107
2 3w69 96 26,715 RTX 2080TI 0.058 0.043
3 1n1m 728 105,264 GTX 1080TI 0.415 0.405
4 2iiv 728 105,240 RTX 2080TI 0.203 0.197
a

Simulation speed in real time, hours per nanoseconds. Note that the speed difference between LBS-FR and HTS partly arises from additional distance restraints used in LBS-FR.

Rigid receptor docking was then performed for each receptor structure: s1, s2, s3, and s4. Using 22.5 Å cubic boxes at the binding sites, the top 10 ligand binding poses were obtained for each receptor structure (i.e., p1-p10 in Figure 1). For each protein–ligand cross-docking case, a total of 40 protein–ligand structures were submitted to HTS to obtain 40 systems and input files for all-atom MD simulations in explicit solvents. 50 ns was performed for each system using the NPT (constant particle number, pressure, and temperature) at 303.15 K and 1 bar. CHARMM36m and CGenFF were used for the proteins and ligands.36,44 All simulations were conducted using the OpenMM package.34 Additionally, the production runs of all simulations used a time step of 4 fs using hydrogen mass repartitioning (HMR).45,46

4.2. RMSD/MMGBSA Calculation Details

Ranking of the 40 protein–ligand structures for each cross-docking case was first determined by calculating the ligand RMSD with respect to the initial ligand structure after all protein heavy atoms were superimposed on the initial protein structure throughout the MD trajectory. This calculation effectively captures the ligand RMSD including its translation and rotation with respect to the experimental ligand structure during the MD simulation. The average ligand RMSD value for 50 ns MD simulation was obtained for each candidate pose. Only good binding stability candidate poses with the average ligand RMSD less than 3 Å were selected for the subsequent MMGBSA ligand binding energy (ΔGbind) calculations using AMBER20.

4.2. 2

where ΔGbind is decomposed into several energy terms:

4.2. 3
4.2. 4
4.2. 5

ΔEMM, ΔGsol, and −TΔS are changes in the gas-phase molecular mechanics (MM) energy, solvation free energy, and conformational entropy upon binding, respectively. The solvation free energy, ΔGsol, is the sum of the nonpolar energy (ΔGSurf) and polar energy (ΔGGB) terms. The nonpolar energy was estimated using the solvent-accessible surface area (SASA), and the polar contribution was calculated using the GB model. The default solvent dielectric constant of 78.5 and a protein interior dielectric constant of 1 were applied. The OBC GB model with igb = 5 was selected.47,48 The conformational entropy change (−TΔS) was neglected in this calculation. ΔGbind, therefore, was the sum of the electrostatic (ΔEelec), van der Waals (ΔEvdw), nonpolar solvation, and GB polar solvation terms. A total of 100 snapshots extracted from 50 ns simulations were used to estimate ΔGbind. The remaining candidate poses were ranked using their ΔGbind values. MMGBSA analyses were performed using trajectories converted from OpenMM to the AMBER file format using CHARMM-GUI Force Field Converter.49

Acknowledgments

This work has been supported by NIH GM138472.

Data Availability Statement

All the input files (receptors and ligands) and their topology and parameter files are available on GitHub: https://github.com/hsg218/CGUI-IFD. All the top ranked protein–ligand complex structures from CGUI-IFD are also available on the same GitHub repository. All software used for this study are available through the following links, LBS Finder& Refiner (https://www.charmm-gui.org/?doc=input/lbsfinder), AutoDock Vina (https://Vina.scripps.edu/downloads/), and HTS (https://www.charmm-gui.org/?doc=input/hts).

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.3c00416.

  • Table S1, PDB IDs of the benchmark set (PDF)

Author Contributions

Hugo Guterres: Data curation; formal analysis; investigation; validation; writing original draft. Wonpil Im: Conceptualization; funding acquisition; investigation; formal analysis; supervision; writing-review and editing.

The authors declare the following competing financial interest(s): W.I. is the co-founder and CEO of MolCube INC.

Supplementary Material

ci3c00416_si_001.pdf (138.1KB, pdf)

References

  1. Aplin C.; Milano S. K.; Zielinski K. A.; Pollack L.; Cerione R. A. Evolving Experimental Techniques for Structure-Based Drug Design. J. Phys. Chem. B 2022, 126, 6599–6607. 10.1021/acs.jpcb.2c04344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Willems H.; De Cesco S.; Svensson F. Computational Chemistry on a Budget: Supporting Drug Discovery with Limited Resources. J. Med. Chem. 2020, 63, 10158–10169. 10.1021/acs.jmedchem.9b02126. [DOI] [PubMed] [Google Scholar]
  3. Lauria A.; Tutone M.; Almerico A. M. Virtual lock-and-key approach: the in silico revival of Fischer model by means of molecular descriptors. Eur. J. Med. Chem. 2011, 46, 4274–4280. 10.1016/j.ejmech.2011.06.033. [DOI] [PubMed] [Google Scholar]
  4. Antunes D. A.; Devaurs D.; Kavraki L. E. Understanding the challenges of protein flexibility in drug design. Expert Opin Drug Discov 2015, 10, 1301–1313. 10.1517/17460441.2015.1094458. [DOI] [PubMed] [Google Scholar]
  5. Koshland D. E. Application of a Theory of Enzyme Specificity to Protein Synthesis. Proc. Natl. Acad. Sci. U. S. A. 1958, 44, 98–104. 10.1073/pnas.44.2.98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Miller E. B.; Murphy R. B.; Sindhikara D.; Borrelli K. W.; Grisewood M. J.; Ranalli F.; Dixon S. L.; Jerome S.; Boyles N. A.; Day T.; Ghanakota P.; Mondal S.; Rafi S. B.; Troast D. M.; Abel R.; Friesner R. A. Reliable and Accurate Solution to the Induced Fit Docking Problem for Protein-Ligand Binding. J. Chem. Theory Comput 2021, 17, 2630–2639. 10.1021/acs.jctc.1c00136. [DOI] [PubMed] [Google Scholar]
  7. Vankayala S. L.; Warrensford L. C.; Pittman A. R.; Pollard B. C.; Kearns F. L.; Larkin J. D.; Woodcock H. L. CIFDock: A novel CHARMM-based flexible receptor-flexible ligand docking protocol. J. Comput. Chem. 2022, 43, 84–95. 10.1002/jcc.26759. [DOI] [PubMed] [Google Scholar]
  8. Wagener M.; Vlieg J.; Nabuurs S. B. Flexible Protein-Ligand Docking using the Fleksy Protocol. J. Comput. Chem. 2012, 33, 1215–1217. 10.1002/jcc.22948. [DOI] [PubMed] [Google Scholar]
  9. Hsu D. J.; Davidson R. B.; Sedova A.; Glaser J. tinyIFD: A High-Throughput Binding Pose Refinement Workflow Through Induced-Fit Ligand Docking. J. Chem. Inf. Model. 2023, 63, 3438. 10.1021/acs.jcim.2c01530. [DOI] [PubMed] [Google Scholar]
  10. Lin J. H.; Perryman A. L.; Schames J. R.; McCammon J. A. Computational drug design accommodating receptor flexibility: the relaxed complex scheme. J. Am. Chem. Soc. 2002, 124, 5632–5633. 10.1021/ja0260162. [DOI] [PubMed] [Google Scholar]
  11. De Vivo M.; Masetti M.; Bottegoni G.; Cavalli A. Role of Molecular Dynamics and Related Methods in Drug Discovery. J. Med. Chem. 2016, 59, 4035–4061. 10.1021/acs.jmedchem.5b01684. [DOI] [PubMed] [Google Scholar]
  12. Evangelista Falcon W.; Ellingson S. R.; Smith J. C.; Baudry J. Ensemble Docking in Drug Discovery: How Many Protein Configurations from Molecular Dynamics Simulations are Needed To Reproduce Known Ligand Binding?. J. Phys. Chem. B 2019, 123, 5189–5195. 10.1021/acs.jpcb.8b11491. [DOI] [PubMed] [Google Scholar]
  13. Jo S.; Kim T.; Iyer V. G.; Im W. CHARMM-GUI: a web-based graphical user interface for CHARMM. J. Comput. Chem. 2008, 29, 1859–1865. 10.1002/jcc.20945. [DOI] [PubMed] [Google Scholar]
  14. Lee J.; Cheng X.; Swails J. M.; Yeom M. S.; Eastman P. K.; Lemkul J. A.; Wei S.; Buckner J.; Jeong J. C.; Qi Y.; Jo S.; Pande V. S.; Case D. A.; Brooks C. L. 3rd; MacKerell A. D. Jr.; Klauda J. B.; Im W. CHARMM-GUI Input Generator for NAMD, GROMACS, AMBER, OpenMM, and CHARMM/OpenMM Simulations Using the CHARMM36 Additive Force Field. J. Chem. Theory Comput 2016, 12, 405–413. 10.1021/acs.jctc.5b00935. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Jo S.; Cheng X.; Lee J.; Kim S.; Park S. J.; Patel D. S.; Beaven A. H.; Lee K. I.; Rui H.; Park S.; Lee H. S.; Roux B.; MacKerell A. D. Jr.; Klauda J. B.; Qi Y.; Im W. CHARMM-GUI 10 years for biomolecular modeling and simulation. J. Comput. Chem. 2017, 38, 1114–1124. 10.1002/jcc.24660. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Guterres H.; Park S. J.; Zhang H.; Im W. CHARMM-GUI LBS Finder & Refiner for Ligand Binding Site Prediction and Refinement. J. Chem. Inf Model 2021, 61, 3744–3751. 10.1021/acs.jcim.1c00561. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Guterres H.; Park S.-J.; Zhang H.; Perone T.; Kim J.; Im W. CHARMM-GUI high-throughput simulator for efficient evaluation of protein-ligand interactions with different force fields. Protein Sci. 2022, 31, e4413. 10.1002/pro.4413. [DOI] [Google Scholar]
  18. Lee H. S.; Im W. G-LoSA: An efficient computational tool for local structure-centric biological studies and drug design. Protein Sci. 2016, 25, 865–876. 10.1002/pro.2890. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Lee H. S.; Im W. G-LoSA for Prediction of Protein-Ligand Binding Sites and Structures. Methods Mol. Biol. 2017, 1611, 97–108. 10.1007/978-1-4939-7015-5_8. [DOI] [PubMed] [Google Scholar]
  20. Derigs U. The shortest augmenting path method for solving assignment problems - motivation and computational experience. Annals of Operations Research 1985, 4, 57–102. 10.1007/BF02022037. [DOI] [Google Scholar]
  21. Guterres H.; Lee H. S.; Im W. Ligand-Binding-Site Structure Refinement Using Molecular Dynamics with Restraints Derived from Predicted Binding Site Templates. J. Chem. Theory Comput 2019, 15, 6524–6535. 10.1021/acs.jctc.9b00751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Guterres H.; Park S. J.; Jiang W.; Im W. Ligand-Binding-Site Refinement to Generate Reliable Holo Protein Structure Conformations from Apo Structures. J. Chem. Inf Model 2021, 61, 535–546. 10.1021/acs.jcim.0c01354. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Zhang J.; Li H.; Zhao X.; Wu Q.; Huang S. Y. Holo Protein Conformation Generation from Apo Structures by Ligand Binding Site Refinement. J. Chem. Inf Model 2022, 62, 5806–5820. 10.1021/acs.jcim.2c00895. [DOI] [PubMed] [Google Scholar]
  24. Lee H. S.; Lee C. S.; Kim J. S.; Kim D. H.; Choe H. Improving virtual screening performance against conformational variations of receptors by shape matching with ligand binding pocket. J. Chem. Inf Model 2009, 49, 2419–2428. 10.1021/ci9002365. [DOI] [PubMed] [Google Scholar]
  25. Erickson J. A.; Jalaie M.; Robertson D. H.; Lewis R. A.; Vieth M. Lessons in molecular recognition: The effects of ligand and protein flexibility on molecular docking accuracy. J. Med. Chem. 2004, 47, 45–55. 10.1021/jm030209y. [DOI] [PubMed] [Google Scholar]
  26. McGovern S. L.; Shoichet B. K. Information decay in molecular docking screens against holo, apo, and modeled conformations of enzymes. J. Med. Chem. 2003, 46, 2895–2907. 10.1021/jm0300330. [DOI] [PubMed] [Google Scholar]
  27. Clark J. J.; Benson M. L.; Smith R. D.; Carlson H. A. Inherent versus induced protein flexibility: Comparisons within and between apo and holo structures. PLoS Comput. Biol. 2019, 15, e1006705. 10.1371/journal.pcbi.1006705. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Rastelli G.; Degliesposti G.; Del Rio A.; Sgobba M. Binding estimation after refinement, a new automated procedure for the refinement and rescoring of docked ligands in virtual screening. Chem. Biol. Drug Des 2009, 73, 283–286. 10.1111/j.1747-0285.2009.00780.x. [DOI] [PubMed] [Google Scholar]
  29. Ramirez D.; Caballero J. Is It Reliable to Take the Molecular Docking Top Scoring Position as the Best Solution without Considering Available Structural Data?. Molecules 2018, 23, 1038. 10.3390/molecules23051038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Wagner J. R.; Churas C. P.; Liu S.; Swift R. V.; Chiu M.; Shao C.; Feher V. A.; Burley S. K.; Gilson M. K.; Amaro R. E. Continuous Evaluation of Ligand Protein Predictions: A Weekly Community Challenge for Drug Docking. Structure 2019, 27, 1326–1335. 10.1016/j.str.2019.05.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Phillips J. C.; Hardy D. J.; Maia J. D. C.; Stone J. E.; Ribeiro J. V.; Bernardi R. C.; Buch R.; Fiorin G.; Henin J.; Jiang W.; McGreevy R.; Melo M. C. R.; Radak B. K.; Skeel R. D.; Singharoy A.; Wang Y.; Roux B.; Aksimentiev A.; Luthey-Schulten Z.; Kale L. V.; Schulten K.; Chipot C.; Tajkhorshid E. Scalable molecular dynamics on CPU and GPU architectures with NAMD. J. Chem. Phys. 2020, 153, 044130. 10.1063/5.0014475. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Case D. A.; Cheatham T. E. 3rd; Darden T.; Gohlke H.; Luo R.; Merz K. M. Jr.; Onufriev A.; Simmerling C.; Wang B.; Woods R. J. The Amber biomolecular simulation programs. J. Comput. Chem. 2005, 26, 1668–1688. 10.1002/jcc.20290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Hess B.; Kutzner C.; van der Spoel D.; Lindahl E. GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J. Chem. Theory Comput 2008, 4, 435–447. 10.1021/ct700301q. [DOI] [PubMed] [Google Scholar]
  34. Eastman P.; Swails J.; Chodera J. D.; McGibbon R. T.; Zhao Y.; Beauchamp K. A.; Wang L. P.; Simmonett A. C.; Harrigan M. P.; Stern C. D.; Wiewiora R. P.; Brooks B. R.; Pande V. S. OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLoS Comput. Biol. 2017, 13, e1005659. 10.1371/journal.pcbi.1005659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Jung J.; Mori T.; Kobayashi C.; Matsunaga Y.; Yoda T.; Feig M.; Sugita Y. GENESIS: a hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations. Wiley Interdiscip Rev. Comput. Mol. Sci. 2015, 5, 310–323. 10.1002/wcms.1220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Vanommeslaeghe K.; Hatcher E.; Acharya C.; Kundu S.; Zhong S.; Shim J.; Darian E.; Guvench O.; Lopes P.; Vorobyov I.; Mackerell A. D. Jr. CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields. J. Comput. Chem. 2010, 31, 671–690. 10.1002/jcc.21367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Wang J.; Wolf R. M.; Caldwell J. W.; Kollman P. A.; Case D. A. Development and testing of a general amber force field. J. Comput. Chem. 2004, 25, 1157–1174. 10.1002/jcc.20035. [DOI] [PubMed] [Google Scholar]
  38. Qiu Y.; Smith D. G. A.; Boothroyd S.; Jang H.; Hahn D. F.; Wagner J.; Bannan C. C.; Gokey T.; Lim V. T.; Stern C. D.; Rizzi A.; Tjanaka B.; Tresadern G.; Lucas X.; Shirts M. R.; Gilson M. K.; Chodera J. D.; Bayly C. I.; Mobley D. L.; Wang L. P. Development and Benchmarking of Open Force Field v1.0.0-the Parsley Small-Molecule Force Field. J. Chem. Theory Comput 2021, 17, 6262–6280. 10.1021/acs.jctc.1c00571. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Trott O.; Olson A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010, 31, 455–461. 10.1002/jcc.21334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Wang Z.; Sun H.; Yao X.; Li D.; Xu L.; Li Y.; Tian S.; Hou T. Comprehensive evaluation of ten docking programs on a diverse set of protein-ligand complexes: the prediction accuracy of sampling power and scoring power. Phys. Chem. Chem. Phys. 2016, 18, 12964–12975. 10.1039/C6CP01555G. [DOI] [PubMed] [Google Scholar]
  41. Essmann U.; Perera L.; Berkowitz M. L.; Darden T.; Lee H.; Pedersen L. G. A Smooth Particle Mesh Ewald Method. J. Chem. Phys. 1995, 103, 8577–8593. 10.1063/1.470117. [DOI] [Google Scholar]
  42. Steinbach P. J.; Brooks B. R. New Spherical-Cutoff Methods for Long-Range Forces in Macromolecular Simulation. J. Comput. Chem. 1994, 15, 667–683. 10.1002/jcc.540150702. [DOI] [Google Scholar]
  43. Barth E.; Kuczera K.; Leimkuhler B.; Skeel R. D. Algorithms for Constrained Molecular-Dynamics. J. Comput. Chem. 1995, 16, 1192–1209. 10.1002/jcc.540161003. [DOI] [Google Scholar]
  44. Huang J.; Rauscher S.; Nawrocki G.; Ran T.; Feig M.; de Groot B. L.; Grubmuller H.; MacKerell A. D. Jr. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat. Methods 2017, 14, 71–73. 10.1038/nmeth.4067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Hopkins C. W.; Le Grand S.; Walker R. C.; Roitberg A. E. Long-Time-Step Molecular Dynamics through Hydrogen Mass Repartitioning. J. Chem. Theory Comput 2015, 11, 1864–1874. 10.1021/ct5010406. [DOI] [PubMed] [Google Scholar]
  46. Gao Y.; Lee J.; Smith I. P. S.; Lee H.; Kim S.; Qi Y.; Klauda J. B.; Widmalm G.; Khalid S.; Im W. CHARMM-GUI Supports Hydrogen Mass Repartitioning and Different Protonation States of Phosphates in Lipopolysaccharides. J. Chem. Inf Model 2021, 61, 831–839. 10.1021/acs.jcim.0c01360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Su P. C.; Tsai C. C.; Mehboob S.; Hevener K. E.; Johnson M. E. Comparison of radii sets, entropy, QM methods, and sampling on MM-PBSA, MM-GBSA, and QM/MM-GBSA ligand binding energies of F. tularensis enoyl-ACP reductase (FabI). J. Comput. Chem. 2015, 36, 1859–1873. 10.1002/jcc.24011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Wang K. W.; Lee J.; Zhang H.; Suh D.; Im W. CHARMM-GUI Implicit Solvent Modeler for Various Generalized Born Models in Different Simulation Programs. J. Phys. Chem. B 2022, 126, 7354–7364. 10.1021/acs.jpcb.2c05294. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Lee J.; Hitzenberger M.; Rieger M.; Kern N. R.; Zacharias M.; Im W. CHARMM-GUI supports the Amber force fields. J. Chem. Phys. 2020, 153, 035103. 10.1063/5.0012280. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ci3c00416_si_001.pdf (138.1KB, pdf)

Data Availability Statement

All the input files (receptors and ligands) and their topology and parameter files are available on GitHub: https://github.com/hsg218/CGUI-IFD. All the top ranked protein–ligand complex structures from CGUI-IFD are also available on the same GitHub repository. All software used for this study are available through the following links, LBS Finder& Refiner (https://www.charmm-gui.org/?doc=input/lbsfinder), AutoDock Vina (https://Vina.scripps.edu/downloads/), and HTS (https://www.charmm-gui.org/?doc=input/hts).


Articles from Journal of Chemical Information and Modeling are provided here courtesy of American Chemical Society

RESOURCES