An NMR-based scoring function improves the accuracy of binding pose predictions by docking by two orders of magnitude

Julien Orts; Stefan Bartoschek; Christian Griesinger; Peter Monecke; Teresa Carlomagno

doi:10.1007/s10858-011-9590-5

. 2011 Dec 14;52(1):23–30. doi: 10.1007/s10858-011-9590-5

An NMR-based scoring function improves the accuracy of binding pose predictions by docking by two orders of magnitude

Julien Orts ¹, Stefan Bartoschek ², Christian Griesinger ³, Peter Monecke ⁴, Teresa Carlomagno ^1,^✉

PMCID: PMC3266494 PMID: 22167466

Abstract

Low-affinity ligands can be efficiently optimized into high-affinity drug leads by structure based drug design when atomic-resolution structural information on the protein/ligand complexes is available. In this work we show that the use of a few, easily obtainable, experimental restraints improves the accuracy of the docking experiments by two orders of magnitude. The experimental data are measured in nuclear magnetic resonance spectra and consist of protein-mediated NOEs between two competitively binding ligands. The methodology can be widely applied as the data are readily obtained for low-affinity ligands in the presence of non-labelled receptor at low concentration. The experimental inter-ligand NOEs are efficiently used to filter and rank complex model structures that have been pre-selected by docking protocols. This approach dramatically reduces the degeneracy and inaccuracy of the chosen model in docking experiments, is robust with respect to inaccuracy of the structural model used to represent the free receptor and is suitable for high-throughput docking campaigns.

Electronic supplementary material

The online version of this article (doi:10.1007/s10858-011-9590-5) contains supplementary material, which is available to authorized users.

Keywords: NMR, INPHARMA, NOE, Docking, Drug design

Introduction

Structure based drug design (SBDD) has evolved within the last decades to a powerful tool for the optimization of many low molecular weight lead compounds to highly potent drugs (Rees et al. 2004). The principle of SBDD lies in the combination of different chemical moieties with the aim of obtaining a molecule that, while possessing the pharmacological properties necessary for a drug, is complementary in shape to the receptor-binding pocket. This process requires knowledge of the exact structure of the receptor/ligand complex, which is usually obtained by X-ray crystallography.

In the absence of structural information for the complex, SBDD relies on the generation of plausible docking models. However, docking protocols suffer from inaccuracies in the description of the interaction energies between the ligand and the target molecule and often fail in the prediction of the correct interaction mode. This is particularly true when the docking experiments use low-definition or inaccurate target structures. Such limitation of the docking approach is serious when considering the increasing gap between the newly identified protein sequences and the availability of structural information (The UniProt Consortium 2008; Berman et al. 2009). While for proteins sharing more than 30% sequence identity to their homologous templates, computational methods provide models that are typically comparable to low-resolution experimental structures, when the sequence identity drops below 30%, the model accuracy decreases due to alignment errors (Kortagere and Ekins 2010; Katritch et al. 2010; Rai et al. 2010). These problems call for the need of experimental data that could improve the performance of the docking scoring functions, while not requiring the difficult step of obtaining high-resolution structural information for the target.

In recent years, nuclear magnetic resonance (NMR) spectroscopy has taken an important role in the detection and structural characterization of low-affinity (micromolar to millimolar range) ligands that can be developed into high-affinity leads by SBDD (Pochapsky and Pochapsky 2001; Wyss et al. 2002; Van Dongen et al. 2002; Pellecchia et al. 2002). Transferred-NOEs (Ni and Scheraga 1994) and transferred-cross correlated relaxation rates (Carlomagno et al. 1999, 2003) provide the bioactive conformation of the ligand, while saturation transfer difference (STD) experiments (Mayer and Meyer 2001) reveal the ligand epitope. These approaches have the advantage of observing only the resonances of the ligands and of being applicable to protein targets of any size. Recently, we have developed an NMR-based methodology, INPHARMA (Interligand NOEs for PHARmacophore MApping) (Sanchez-Pedregal et al. 2005; Reese et al. 2007; Orts et al. 2008), which is able to reveal the relative, and in favourable cases even the absolute, binding mode of competitively binding, low-affinity ligands, with the sole requirement of a structural model of the apo-receptor. The relative binding mode of two ligands interacting competitively with a common receptor allows pharmacophore or ligand superimposition. This is an essential step in SBDD, guiding the synthetic combination of smaller ligands into a larger, higher affinity compound. The absolute binding mode of ligands to the receptor represents a higher level of knowledge that allows optimization of receptor/ligand interactions at an atomic level.

To demonstrate the efficacy of INPHARMA we had validated the methodology for a system consisting of the protein kinase A (PKA) with two inhibitors of catalysis (Fig. S1), for which the binding modes determined by INPHARMA could be compared to existing crystal structures (Orts et al. 2008). This test established the value of INPHARMA, and confirmed that the combination of tr-NOEs and INPHARMA NOEs, in the presence of a structural model for the apo-potein, allows discriminating between a few, very diverse docking modes (Orts et al. 2008).

Here, we demonstrate the use of INPHARMA data as a high-throughput scoring function for binding modes predicted by molecular docking. We show that INPHARMA allows a two-order of magnitude increase in accuracy with respect to state-of-the-art docking scoring functions and provides ligand binding modes at high resolution (up to less than 1 Å). In addition, we show that INPHARMA is applicable also to receptors whose apo-form is not a good representative of the holo-form or to receptors without an accurate structural representation. The easy availability of experimental INPHARMA data for target proteins of any size and nature make INPHARMA a tool of choice to increase the reliability of docking models and to substantially speed up the process of structure-based drug design.

Experimental procedures

INPHARMA data

The INPHARMA data were measured on a mixture of the Chinese hamster Ca catalytic subunit of cyclic adenosine monophosphate (cAMP) dependentent protein kinase A (PKA) (25–30 uM), L₁ (150 uM) and L₂ (450 uM), as described in (Orts et al. 2008). The values of the measured INPHARMA NOEs used in this work are in Table S1. NOESY spectra were recorded at two mixing times (τ_m = 300 and 600 ms) on an 800 MHz spectrometer (Bruker, Karlsruhe). One NOESY spectrum was recorded at a mixing time τ_m = 600 ms at a 900 MHz spectrometer (Bruker, Karlsruhe).

Molecular dynamics simulations

Molecular dynamics (MD) simulation were performed for the free PKA starting from the crystal structure of 3DNE.pdb after removal of L₁ with the software NAMD (Phillips et al. 2005) and the CHARMM force field (MacKerell et al. 1988). A 5 Å layer of water molecules hydrated the protein PKA. The water sphere was maintained with a spherical harmonic potential. Langevin dynamics was performed with a 2 fs time step using the SHAKE algorithm, without coupling the hydrogens to the thermal bath and with a damping coefficient g of 5 per picosecond. First, 30.000 steps of energy minimization were performed at 0 K using a conjugate gradient and the line-search algorithm as described in the NAMD manual. In order to achieve a larger sampling of the conformational space of the protein, we increased the temperature from an initial value of 0 K to a final value of 1,200 K. Every, 30.000 steps the temperature increased by 50 K. Each final structure was minimized at a temperature of 0 K in 30.000 steps.

Docking

PLANTS docking (Korb et al. 2009) was performed using the ChemPLP scoring function and default parameters. The crystal structures of PKA/L₁ and PKA/L₂ were aligned on the protein. A spherical definition of the binding pocket centered at the center of mass of L₁ and a radius of 11.2 Å was used to restrict the sampling space. After pose clustering using a threshold of 0.5 Å, the best-ranked 200 poses were used for further evaluations.

SURFLEX utilizes an idealized biding-site ligand as a target to generate conjectural poses of molecules. The idealized binding-site ligand is calculated for each of the 700 protein structures generated by MD simulation and minimized. Ligands are docked into the protein to optimize the value of the Hammerhead scoring function. For the analysis, we select the 10 best scoring poses for each protein structure. A similarity filter of 0.5 Å is applied for poses docked into the same protein structure, resulting in a final set of 4,636 and 4,758 unique complex structures for PKA/L₁ and PKA/L₂ respectively.

GLIDE requires preparing each proteins target with the “preparation wizard” option. For each prepared protein, we generated a grid around the binding site that was used as the docking target. A simple precision docking run was performed for each of the 700 protein structures generated by MD simulation, producing 5 poses per protein structure per ligand (PKA/L₁ and PKA/L₂). As for the SURFLEX docking, a similarity filter of 0.5 Å (integrated in GLIDE) deleted redundant ligand poses within the same protein structure. This procedure resulted in a final set of 2,697 and 3,069 unique complex structures for PKA/L₁ and PKA/L₂, respectively.

Calculation of INPHARMA

The theoretical INPHARMA NOEs for the complex pairs generated by docking were calculated with a program written in-house following the theory developed in (Reese et al. 2007; Orts et al. 2009). Protons within 8 Å from any ligand proton were included in the full relaxation matrix calculation. For the docking of Fig. 1, 40,000 complex pairs were calculated; for the docking with SURFLEX in Fig. 2 the dataset comprised 1,095,085, 2,694,315, 1,159,884, 70,448, 234,060 complex pairs for proteins with binding pocket RMSD in the range 0–1, 1–2, 2–3, 3–4, 4–5 Å from the crystal structure, respectively, of which 8,331, 16,770, 3,296, 181 and 412 correspond to the correct relative orientation of the ligands in each protein RMSD range.

Fig. 1 — a Initial pool of docked structures for the complexes PKA/L₁ and PKA/L₂. The receptor model used in the docking is the PKA structure of 3DNE.pdb. The complex pairs in b pass the selection through the INPHARMA data (Pearson correlation coefficient R² between the experimental and the theoretical INPHARMA NOEs > 0.89). All these complex pairs show a very low ligand RMSD from the correct binding mode. c Overlap of L₁ and L₂ in the complex pairs of b, after superimposition of the protein structures. The INPHARMA data define the orientation of L₁ and L₂ correctly to 0.5 and 1 Å resolution, respectively

Fig. 2 — Accuracy of the INPHARMA predictions as a function of the quality of the receptor structure. The x-axis represents the (protein only) binding pocket RMSD of the receptor models used in the docking from the crystallographic structure of PKA in the complex PKA/L₁ (3DNE.pdb). The accuracy on the y axis is defined as the number of complex pairs reproducing the correct ligands superposition (relative binding mode of L₁ and L₂) divided by the total number of pairs selected by INPHARMA. The numbers over each bar in red represent the accuracy before applying the INPHARMA score. In this case the accuracy is the number of the complex pairs showing the correct ligands superposition divided by the total number of complex pairs selected by the energy function of the docking program. The docking for this dataset was performed with SURFLEX

The ranking of the complex pairs was based on the centered Pearson correlation coefficient between the measured and the predicted INPHARMA NOEs. Structures were accepted when the Pearson correlation coefficient R² was higher than 0.89 for the data of Fig. 1 and 0.72 for the data of Fig. 2 and Fig. S4. An additional filter was applied based on the qualitative agreement of very weak INPHARMA NOEs, which were visible only at high-fields due to the better sensitivity of the instrumentation. For the docking of Fig. 2, 107, 208, 189, 26, 23 complex pairs with proteins in an RMSD range of 0–1, 1–2, 2–3, 3–4, 4–5 Å from the crystal structure, respectively, passed the selection, of which 98, 63, 60, 5 and 7 correspond to the correct relative orientation of the ligands in each protein RMSD range.

For the docking with GLIDE in Fig. S4, the dataset comprised 452,760, 908,974, 575,320, 36,864, 73,060 complex pairs for proteins with binding pocket RMSD in the range 0–1, 1–2, 2–3, 3–4, 4–5 Å from the crystal structure, respectively, of which 13,560, 8,694, 4,184, 160 and 166 correspond to the correct relative orientation of the ligands in each protein RMSD range. Moreover, 2, 485, 585, 35, 39 complex pairs with proteins in an RMSD range of 0–1, 1–2, 2–3, 3–4, 4–5 Å from the crystal structure, respectively, passed the selection, of which 2, 114, 103, 19 and 9 correspond to the correct relative orientation of the ligands in each protein RMSD range.

Results and discussion

The INPHARMA method is based on the observation of interligand, spin diffusion mediated, transferred-NOE data, between two ligands L₁ and L₂, binding competitively and weakly to a receptor T (Fig. S2). As the ligands are competitive binders, such NOEs do not originate from a direct transfer of magnetization between the two ligands, but rather from a spin-diffusion process mediated by the protons of the receptor binding pocket and are, therefore, dependent on the specific interactions of each of the two ligands with the protein (Sanchez-Pedregal et al. 2005). In line with common SBDD worflows, the INPHARMA NOEs are used to select among possible complex structures suggested by molecular docking. The bound ligand structures, which can be determined by tr-NOEs, are docked to a structural model of the apo-receptor. A library consisting of pairs of complex structures (receptor/L₁ and receptor/L₂) is generated by combining all docking modes of L₁ to the receptor with all docking modes of L₂ to the receptor. The resulting docking models pairs are ranked on the basis of the agreement between the predicted and the experimental INPHARMA NOEs (Reese et al. 2007).

Previously, we demonstrated that INPHARMA is able to determine the binding mode of the two ligands L₁ and L₂ to the catalytic subunit of PKA (Orts et al. 2008). In this work we aim at establishing INPHARMA as an effective scoring function for binding modes in high-throughput docking campaigns. First, we evaluate the ability of INPHARMA to provide high-resolution binding modes when ligands are docked to a correctly folded binding pocket; second, we evaluate the efficacy of the methodology in dependence of the accuracy of the protein structure used in the docking experiments. We prove that the use of experimental INPHARMA data to score binding modes generated in silico provides a considerable improvement in the accuracy of the selection of the correct binding pose, even when using a poor representation of the protein binding pocket. As a test system, we use the two ligands L₁ and L₂ bound to the catalytic subunit of the protein PKA, for which experimental data have been measured in the laboratory as described in the Experimental Section. L₁ and L₂ bind PKA with K_Ds of 6 and 16 uM, respectively and are therefore suitable to measure both transferred-NOEs and INPHARMA NOEs. The crystal structures of the complexes PKA/L₁ and PKA/L₂ (3DNE.pdb and 3DND.pdb, respectively) serve as benchmark to evaluate the performance of INPHARMA.

INPHARMA allows the definition of binding modes to 1 Å resolution

The bound structures of L₁ and L₂, which can be determined by transferred-NOEs, are docked into the structure of the catalytic subunit of PKA from 3DNE.pdb after removal of the ligand. The PKA structure of 3DND.pdb could have been used instead, as the protein heavy atom RMSD (root mean square deviation) in the two complexes is only 0.28 Å. 200 docking modes are generated per ligand with the program PLANTS (Korb et al. 2009) and combined pair-wise to give 40,000 pairs of complex structures of PKA/L₁ and PK/L₂. Each pair of this library is represented in Fig. 1 in terms of the RMSD of each ligand from the true binding mode, as observed in the crystal structures of the PKA/L₁ and PKA/L₂ complexes (3DNE.pdb and 3DND.pdb). The initial library of docking modes contains complex structures pairs where both ligands are in the correct orientation (lower left corner), both ligands are in the wrong orientation (higher right corner) or only one ligand is in the correct orientation (lower right and higher left corners). Next we ranked the 40,000 structure pairs with respect to the agreement between the theoretical, predicted INPHARMA NOEs for each particular structures pair and the experimentally measured INPHARMA NOEs of Table S1. The purpose of this analysis is to verify whether INPHARMA data can be used to select the correct binding modes of L₁ and L₂ and to determine the maximum achievable resolution of the resulting complex structures. We use the linear correlation coefficient R² to describe the agreement between experimental and theoretical INPHARMA data; pairs of complex structures with R² > 0.89 are accepted. Indeed, the structures selected by INPHARMA (Fig. 1b) are those of the lower left corner of the graph of Fig. 1a, namely close to the correct binding poses for both ligands. A closer analysis of the INPHARMA-selected structures reveals that they correspond to only one orientation per ligand, with L₁ and L₂ being defined to a precision higher than 0.5 and 1 Å, respectively (Fig. 1c). The maximum distance between two INPHARMA selected structures is between the orange and the yellow binding mode of L₂ (Fig. 1c) and corresponds to a rotation of 21° around the axis perpendicular to the figure plane. This result highlights an impressive performance of INPHARMA, which distinguishes even between closely related binding modes at a high level of resolution (~1 Å). The receptor model used in the docking can be derived either from the structure of the apo-receptor or from the structure of the receptor in complex with a reference ligand L_x. In the absence of conformational rearrangements between the apo- and the holo-receptor, or between the receptor/L_x and the receptor/L₁ (receptor/L₂) complexes, the absolute binding mode of any ligand (L₁….L_n) can be derived at a high confidence level from INPHARMA data measured for pair-wise combinations of ligands (e.g. L₁ and L₂).

INPHARMA alleviates the need of crystallizing the receptor in complex with all chemical lead series of interests, overcoming an important limiting factor in the daily work of pharmaceutical industry. The binding modes of all chemical series of interest are within reach through the measurement of a few INPHARMA NOESY spectra and the employment of the INPHARMA NOEs as a reliable selection criterion for docking modes. The NMR time necessary to acquire data for a ligands pair amounts to only 2 days, while the calculation time is less than 1 day for 40,000 pairs of docking models.

INPHARMA allows a 100-fold improvement with respect to docking scoring functions

Despite the enormous potential of INPHARMA demonstrated in the previous section, the pharmaceutical research often faces more challenging cases, where either the structure of the receptor is not known at a high level of accuracy or the receptor undergoes substantial conformational changes between the apo- and holo-forms (Bartoschek et al. 2010). In this section the performance of INPHARMA as energy function to rank docking modes generated from an ill-defined protein structure is systematically tested and compared with the performance of state-of-the-art docking scoring functions.

As a test system we use the protein PKA in complex with L₁ and L₂ (Fig. S1). Structures of PKA that differ from the ligand-bound structure were generated by a high-temperature molecular dynamic simulation run starting from the crystal structure of 3DNE.pdb after removal of L₁. 700 frames were sampled during the simulation, resulting in structures that display 0.5–6 Å heavy atom RMSD in the binding pocket from the ligand-bound structure (Fig. S3). In our definition the binding pocket comprises all atoms with distance <8 Å from any ligand atom in the crystal structure. All frames were subject to energy minimization in explicit water. This initial library of PKA models contains structures in a wide range of distances from the correct one and is therefore optimally suited to evaluate the performance of INPHARMA in dependence of the accuracy of the protein structural model.

Next we docked the protein-bound conformation of L₁ and L₂ to each of the 700 structural models of the protein PKA. We used the rigid docking module of the commercially available software SURFLEX (Jain 2003) and retained the 10 best energy solutions for each docking run. A filter based on similarity was applied to exclude redundant binding modes for the same protein model. Complex structures with the same protein model and with ligand all-atom RMSD < 0.5 Å were represented by one member of the family. Note that similar ligand binding poses in two different protein models are considered non-redundant and are retained. The final set of complexes consists of 4,636 and 4,758 poses for PKA/L₁ and PKA/L₂, respectively.

The 4,636 and 4,758 structures for the PKA/L₁ and PKA/L₂ complexes, respectively, have been selected by the docking scoring function as the lowest energy ones and represent the docking solution to the problem. At this point it is interesting to evaluate which percentage of the docking models predicts the correct relative orientation of the two ligands, in dependence of the accuracy of the receptor model used in the docking. This is highly relevant to SBDD as a correct ligands superposition is the first necessary step to the structure-guided, synthetic combination of lead compounds.

To this purpose, ligand binding modes of L₁ and L₂ were combined pair-wise for all protein models inside a certain range of RMSD (0–1; 1–2; 2–3; 3–4; 4–5 Å) from the correct protein structure (3DNE.pdb). This resulted in 1,095,085, 2,694,315, 1,159,884, 70,448, 234,060 complex pairs in each of the five RMSD ranges, respectively. To evaluate the similarity of the relative orientation of the ligands in each models pair to the correct relative orientation, as observed in the crystal structures of PKA/L₁ and PKA/L₂, we used quaternions, which describe objects rotations with respect to a reference frame. The exact procedure is explained in the Supporting Information. The relative binding mode of L₁ and L₂ to PKA was considered correct when the quaternions defining the rotations of L₁ and L₂ in the docking models with respect to the crystallographic structures of PKA/L₁ and PKA/L₂ satisfied the conditions of Eq. S1 and S2, namely the two quaternions were sufficiently similar. The accuracy was defined as the ratio between the number of correct pairs of complex structures (in terms of relative ligand orientation) over the total number of structural pairs.

The red numbers in Fig. 2 summarizes the results. The accuracy of predicting the correct superimposition of the ligands by the docking scoring function is rather poor and reaches at best 1% for ligands docked to the correct protein structure (receptor model RMSD < 1 Å). This low number is not surprising, as in general an average accuracy of only 5% is assumed for docking calculations with one ligand (the correct binding pose is found in the 20 lowest energy structures) (Davis and Baker 2009; Davis et al. 2009). To verify that this result is not only dependent on the docking program used, we repeated the docking exercise with GLIDE (Schroedinger 2003) and obtained very similar results (Fig. S4). The poor performance of the docking reflects the difficulty of both SURFLEX and GLIDE to find the correct binding pose for L₁ (Fig. S5). In general, buried binding pockets, like the ATP binding pocket in PKA, are unfavorable cases for in silico methods (Davis et al. 2009). The limited success of two of the most popular docking programs in this case calls for the need of an additional, experiment-based scoring function.

Following this reasoning, we evaluated the performance of the experimental INPHARMA data for ranking and filtering the pairs of docking modes. Complex structure pairs for which the correlation coefficient R² between the predicted and the experimental INPHARMA NOEs was lower than 0.72 are rejected. The threshold for R² is set lower than described in the previous section to account for the fact that the use of an ill-defined protein model to generate complex structures affects the quality of the fit of the theoretical to the experimental INPHARMA NOEs even for correct binding modes. The accuracy of the ligand superimposition after filtering the complex pairs with respect to the INPHARMA score is given in Fig. 2 and Fig. S4 (for docking performed with SURFLEX and GLIDE, respectively). In contrast to the docking scoring function, the additional filtering through the INPHARMA score achieves an accuracy >90% for reasonably well-defined protein models (PKA RMSD ≤ 1 Å). For less well-defined protein models (PKA RMSD > 1 Å) the accuracy drops to 30% but remains constant for the complete protein RMSD range (up to 5 Å) (Fig. 2). The weak dependence of the accuracy from the protein structure quality in a wide range of RMSD (1–5 Å) indicates that INPHARMA is a solid scoring function for docking modes calculated using a poor representation of the protein target structure, such as a homology model. The gain in accuracy provided by INPHARMA with respect to the energy function of the in silico docking reaches two orders of magnitude through-out the whole range of protein structures, underlining the enormous advantage of using these few, easily accessible experimental data for docking scoring and validation.

The criterion based on quaternions (Eq. S1 and S2) to evaluate the correctness of the relative binding mode of L₁ and L₂ is far more restrictive than the common requirements of SBDD workflows. In SBDD, a ligand is considered to be in the correct orientation when its RMSD from the true binding mode is less than 2 Å. A similar criterion can be applied to evaluate the correctness of the ligands superposition in the INPHARMA-selected complex pairs. If the RMSDs between both L₁ and L₂ in the INPHARMA-selected pairs and both L₁ and L₂ in the crystal structures of PKA/L₁ and PKA/L₂, after superimposition of the ligands, is less than 2 Å, the ligands superposition is considered to be correct. With this measure, the accuracy of the INPHARMA selection reaches 75% for all protein structures (Fig. 3).

Fig. 3 — Representation of the complex pairs PKA/L₁ and PKA/L₂ selected by INPHARMA in dependence of the RMSD of PKA from the coordinates of the crystal structure 3DNE.pdb used as initial model for the docking (a RMSD < 1 Å; b RMSD < 2 Å; c RMSD < 3 Å; d RMSD < 5 Å). The numbers and the color code of each circle slice represent the ligands RMSD (L₁, L₂) from the coordinates of the ligands in the crystal structures of PKA/L₁ and PKA/L₂, after superimposition of the ligands. When the initial protein model is well-defined (PKA RMSD < 1 Å), INPHARMA selects complex pairs that allow ligands superposition at a resolution better than 1.5 Å. Usually, in structural based drug design, a resolution of 2 Å in the ligand coordinates is considered to be acceptable. In this limit the accuracy of the ligands superposition in the INPHARMA selected binding modes is always higher than 75%, even for ill-defined protein models (panel c and d)

Evaluation of the accuracy of INPHARMA in determining ligands superposition when the binding mode of one of the ligands is known

Frequently in SBDD campaigns, the receptor protein can be crystallized with some but not all lead series of interest. In these cases, the process of finding the correct ligands superposition can be aided by the availability of the absolute binding mode of a reference ligand. To test the performance of INPHARMA in this scenario we repeated the analysis of Fig. 2 on complex pairs for which either the PKA/L₁ (Fig. 4a) or the PKA/L₂ (Fig. 4b) structure was fixed to the correct one, as seen in 3DNE.pdb or 3DND.pdb. The correctness of the orientation of the non-fixed ligand was evaluated by applying the criteria of Eq. S2 on the quaternion of this ligand. In this scenario the performance of INPHARMA is excellent: when L₁ is fixed and the orientation of the L₂ is unknown, selection of docking modes by INPHARMA reaches an accuracy of 100% in all cases. It is worth noticing that for ill-defined protein models, no complex structure passes the INPHARMA selection, indicating that in this case the INPHARMA data provide information also on the protein structure. When L₂ is fixed and the orientation of L₁ is searched, the performance of INPHARMA is slightly worse; nevertheless an improvement in accuracy of more than two orders of magnitudes is achieved with respect to the energy function of the in silico docking, which, as discussed above, performs particularly bad in predicting the binding mode of L₁. Also in this case INPHARMA finds no solution for ill-defined protein models (RMSD > 3 Å), thereby restricting also the protein conformation.

Fig. 4 — Accuracy of the INPHARMA predictions as a function of the quality of the receptor structure when the binding mode of one of the two ligands is known and used as reference. The x-axis represents the (protein only) binding pocket RMSD of the receptor models used in the docking from the crystallographic structure of PKA in the complex PKA/L₁ (3DNE.pdb). The accuracy on the y axis is defined as the number of complex pairs reproducing the correct ligands superposition (absolute binding mode of L₂, using the PKA/L₁ complex as reference in (a); absolute binding mode of L₁, using the PKA/L₂ complex as reference in (b) divided by the total number of pairs selected by INPHARMA. The numbers over each bar in red represent the accuracy before applying the INPHARMA score. In this case the accuracy is the number of the complex pairs showing the correct ligands superposition divided by the total number of complex pairs selected by the energy function of the docking program. The docking for this dataset was performed with SURFLEX

Conclusions

The INPHARMA method allows closing a gap in structure based drug discovery by providing information at atomic resolution on the receptor/ligand interactions for complexes that cannot be crystallized. When an accurate representation of the bound structure of the receptor is used for docking, INPHARMA experimental data allow selection of the correct ligand binding pose to a resolution better than 1 Å. The success rate of INPHARMA decreases for docking models obtained with an inaccurate structure of the receptor; however, independently of the quality of the receptor structure, the performance of the INPHARMA-based ranking exceeds by 100-fold that of the scoring function of state-of-the-art docking programs. INPHARMA data are easy to measure and require no isotope labeling scheme either for the receptor or for the ligands. All these factors encourage the implementation of INPHARMA experimental data as a routine scoring function to select complex models for weakly binding ligands.

Electronic supplementary material

Below is the link to the electronic supplementary material.

10858_2011_9590_MOESM1_ESM.pdf^{(2.2MB, pdf)}

Supporting Text containing a detailed description of the quaternion-based criteria; Fig. S1 showing the crystal structures of the PKA/L₁ and PKA/L₂ complexes; Fig. S2 describing schematically the INPHARMA methodology; Fig. S3 representing the PKA structures along the MD trajectory; Fig. S4, containing the same information as Fig. 2 but for docking with the program GLIDE; Fig. S5 showing the accuracy of the docking results; Table S1 containing the experimental INPHARMA data. (PDF 2204 kb)

Acknowledgments

This work was supported by the EMBL and by grant I 83-545 of the Volkswagen Stiftung. J. O. thanks Benjamin Stauch and Frank Thommen for support in software installation.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

References

Bartoschek S, Klabunde T, Defossa E, Dietrich V, Stengelin S, Griesinger C, Carlomagno T, Focken I, Wendt KU. Drug design for G-protein-coupled receptors by a ligand-based NMR method. Angew Chem Int Ed. 2010;49(8):1426–1429. doi: 10.1002/anie.200905102. [DOI] [PubMed] [Google Scholar]
Berman HM, Westbrook JD, Gabanyi MJ, Tao W, Shah R, Kouranov A, Schwede T, Arnold K, Kiefer F, Bordoli L, Kopp J, Podvinec M, Adams PD, Carter LG, Minor W, Nair R, La Baer J. The protein structure initiative structural genomics knowledgebase. Nucleic Acids Res. 2009;37:D365–D368. doi: 10.1093/nar/gkn790. [DOI] [PMC free article] [PubMed] [Google Scholar]
Carlomagno T, Felli I, Czech M, Fischer R, Sprinzl M, Griesinger C. Transferred cross-correlated relaxation: application to the determination of sugar pucker in an aminoacylated tRNA-mimetic weakly bound to EF-Tu. J Am Chem Soc. 1999;121:1945–1948. doi: 10.1021/ja9835887. [DOI] [Google Scholar]
Carlomagno T, Sanchez V, Blommers M, Griesinger C. Derivation of dihedral angles from CH–CH dipolar–dipolar cross-correlated relaxation rates: a C–C torsion involving a quaternary carbon atom in epothilone A bound to tubulin. Angew Chem Int Ed Engl. 2003;42:2515–2517. doi: 10.1002/anie.200350950. [DOI] [PubMed] [Google Scholar]
Davis IW, Baker D. ROSETTALIGAND docking with full ligand and receptor flexibility. J Mol Biol. 2009;385(2):381–392. doi: 10.1016/j.jmb.2008.11.010. [DOI] [PubMed] [Google Scholar]
Davis IW, Raha K, Head MS, Baker D. Blind docking of pharmaceutically relevant compounds using RosettaLigand. Protein Sci. 2009;18:1999–2002. doi: 10.1002/pro.192. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jain AN. Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. J Med Chem. 2003;46:499–511. doi: 10.1021/jm020406h. [DOI] [PubMed] [Google Scholar]
Katritch V, Rueda M, Lam PC, Yeager M, Abagyan R. GPCR 3D homology models for ligand screening: lessons learned from blind predictions of adenosine A2a receptor complex. Proteins. 2010;78:197–211. doi: 10.1002/prot.22507. [DOI] [PMC free article] [PubMed] [Google Scholar]
Korb O, Stützle T, Exner TE. Empirical scoring functions for advanced protein-ligand docking with PLANTS. J Chem Inf Model. 2009;49:84–96. doi: 10.1021/ci800298z. [DOI] [PubMed] [Google Scholar]
Kortagere S, Ekins S. Troubleshooting computational methods in drug discovery. J Pharmacol Toxicol Methods. 2010;61:67–75. doi: 10.1016/j.vascn.2010.02.005. [DOI] [PubMed] [Google Scholar]
MacKerell AD, Jr, Bashford D, Bellott M, Dunbrack RL, Jr, Evanseck J, Field MJ, Fischer S, Gao J, Guo H, Ha S, Joseph D, Kuchnir L, Kuczera K, Lau FTK, Mattos C, Michnick S, Ngo T, Nguyen DT, Prodhom B, Reiher IWE, Roux B, Schlenkrich M, Smith J, Stote R, Straub J, Watanabe M, Wiorkiewicz–Kuczera J, Yin D, Karplus M. All-atom empirical potential for molecular modeling and dynamics studies of proteins. J Phys Chem B. 1988;102:3586–3616. doi: 10.1021/jp973084f. [DOI] [PubMed] [Google Scholar]
Mayer M, Meyer B. Group epitope mapping by saturation transfer difference NMR to identify segments of a ligand in direct contact with a protein receptor. J Am Chem Soc. 2001;123:6108–6117. doi: 10.1021/ja0100120. [DOI] [PubMed] [Google Scholar]
Ni F, Scheraga HA. Use of the transferred nuclear overhauser effect to determine the conformations of ligands bound to proteins. Acc Chem Res. 1994;27(9):257–264. doi: 10.1021/ar00045a001. [DOI] [Google Scholar]
Orts J, Tuma J, Reese M, Grimm SK, Monecke P, Bartoschek S, Schiffer A, Wendt KU, Griesinger C, Carlomagno T. Crystallography-independent determination of ligand binding modes. Angew Chem Int Ed Engl. 2008;47:7736–7740. doi: 10.1002/anie.200801792. [DOI] [PubMed] [Google Scholar]
Orts J, Griesinger C, Carlomagno T. The INPHARMA technique for pharmacophore mapping: a theoretical guide to the method. J Magn Res. 2009;200(1):64–73. doi: 10.1016/j.jmr.2009.06.006. [DOI] [PubMed] [Google Scholar]
Pellecchia M, Sam D, W¨uthrich K. Nmr in drug discovery. Nat Rev Drug Discov. 2002;1:211–219. doi: 10.1038/nrd748. [DOI] [PubMed] [Google Scholar]
Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, Chipot C, Skeel RD, Kalé L, Schulten K. Scalable molecular dynamics with NAMD. J Comput Chem. 2005;26:1781–1802. doi: 10.1002/jcc.20289. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pochapsky S, Pochapsky T. Nuclear magnetic resonance as a tool in drug discovery, metabolism and disposition. Curr Top Med Chem. 2001;1:427–441. doi: 10.2174/1568026013394967. [DOI] [PubMed] [Google Scholar]
Rai B, Tawa G, Katz A, Humblet C. Modeling G protein-coupled receptors for structure-based drug discovery using low-frequency normal modes for refinement of homology models: application to H3 antagonists. Proteins. 2010;78:457–473. doi: 10.1002/prot.22571. [DOI] [PubMed] [Google Scholar]
Rees DC, Congreve M, Murray CW, Carr R. Fragment-based lead discovery. Nat Rev Drug Discov. 2004;3(8):660–672. doi: 10.1038/nrd1467. [DOI] [PubMed] [Google Scholar]
Reese M, Sanchez-Pedregal VM, Kubicek K, Meiler J, Blommers MJJ, Griesinger C, Carlomagno T. Structural basis of the activity of the microtubule-stabilizing agent epothilone A studied by NMR spectroscopy in solution. Angew Chem Int Ed Engl. 2007;46(11):1864–1868. doi: 10.1002/anie.200604505. [DOI] [PubMed] [Google Scholar]
Sanchez-Pedregal VM, Reese M, Meiler J, Blommers MJ, Griesinger C, Carlomagno T. The INPHARMA method: protein-mediated interligand NOEs for pharmacophore mapping. Angew Chem Int Ed Engl. 2005;44(27):4172–4175. doi: 10.1002/anie.200500503. [DOI] [PubMed] [Google Scholar]
Schroedinger LLC (2003) The Glide 2.5 calculations used FirstDiscovery, version 2.5021. New York
The UniProt Consortium (2008) The universal protein resource (UniProt). Nucleic Acids Res 36:D190–D195 [DOI] [PMC free article] [PubMed]
Van Dongen M, Weigelt J, Uppenberg J, Schultz J, Wikstr¨om M. Structurebased screening and design in drug discovery. Drug Discov Today. 2002;7:471–478. doi: 10.1016/S1359-6446(02)02233-X. [DOI] [PubMed] [Google Scholar]
Wyss D, McCoy M, Senior M. NMR-based approaches for lead discovery. Curr Opin Drug Discov Dev. 2002;5:630–647. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

10858_2011_9590_MOESM1_ESM.pdf^{(2.2MB, pdf)}

[CR1] Bartoschek S, Klabunde T, Defossa E, Dietrich V, Stengelin S, Griesinger C, Carlomagno T, Focken I, Wendt KU. Drug design for G-protein-coupled receptors by a ligand-based NMR method. Angew Chem Int Ed. 2010;49(8):1426–1429. doi: 10.1002/anie.200905102. [DOI] [PubMed] [Google Scholar]

[CR2] Berman HM, Westbrook JD, Gabanyi MJ, Tao W, Shah R, Kouranov A, Schwede T, Arnold K, Kiefer F, Bordoli L, Kopp J, Podvinec M, Adams PD, Carter LG, Minor W, Nair R, La Baer J. The protein structure initiative structural genomics knowledgebase. Nucleic Acids Res. 2009;37:D365–D368. doi: 10.1093/nar/gkn790. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] Carlomagno T, Felli I, Czech M, Fischer R, Sprinzl M, Griesinger C. Transferred cross-correlated relaxation: application to the determination of sugar pucker in an aminoacylated tRNA-mimetic weakly bound to EF-Tu. J Am Chem Soc. 1999;121:1945–1948. doi: 10.1021/ja9835887. [DOI] [Google Scholar]

[CR4] Carlomagno T, Sanchez V, Blommers M, Griesinger C. Derivation of dihedral angles from CH–CH dipolar–dipolar cross-correlated relaxation rates: a C–C torsion involving a quaternary carbon atom in epothilone A bound to tubulin. Angew Chem Int Ed Engl. 2003;42:2515–2517. doi: 10.1002/anie.200350950. [DOI] [PubMed] [Google Scholar]

[CR6] Davis IW, Baker D. ROSETTALIGAND docking with full ligand and receptor flexibility. J Mol Biol. 2009;385(2):381–392. doi: 10.1016/j.jmb.2008.11.010. [DOI] [PubMed] [Google Scholar]

[CR7] Davis IW, Raha K, Head MS, Baker D. Blind docking of pharmaceutically relevant compounds using RosettaLigand. Protein Sci. 2009;18:1999–2002. doi: 10.1002/pro.192. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] Jain AN. Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. J Med Chem. 2003;46:499–511. doi: 10.1021/jm020406h. [DOI] [PubMed] [Google Scholar]

[CR9] Katritch V, Rueda M, Lam PC, Yeager M, Abagyan R. GPCR 3D homology models for ligand screening: lessons learned from blind predictions of adenosine A2a receptor complex. Proteins. 2010;78:197–211. doi: 10.1002/prot.22507. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] Korb O, Stützle T, Exner TE. Empirical scoring functions for advanced protein-ligand docking with PLANTS. J Chem Inf Model. 2009;49:84–96. doi: 10.1021/ci800298z. [DOI] [PubMed] [Google Scholar]

[CR11] Kortagere S, Ekins S. Troubleshooting computational methods in drug discovery. J Pharmacol Toxicol Methods. 2010;61:67–75. doi: 10.1016/j.vascn.2010.02.005. [DOI] [PubMed] [Google Scholar]

[CR12] MacKerell AD, Jr, Bashford D, Bellott M, Dunbrack RL, Jr, Evanseck J, Field MJ, Fischer S, Gao J, Guo H, Ha S, Joseph D, Kuchnir L, Kuczera K, Lau FTK, Mattos C, Michnick S, Ngo T, Nguyen DT, Prodhom B, Reiher IWE, Roux B, Schlenkrich M, Smith J, Stote R, Straub J, Watanabe M, Wiorkiewicz–Kuczera J, Yin D, Karplus M. All-atom empirical potential for molecular modeling and dynamics studies of proteins. J Phys Chem B. 1988;102:3586–3616. doi: 10.1021/jp973084f. [DOI] [PubMed] [Google Scholar]

[CR13] Mayer M, Meyer B. Group epitope mapping by saturation transfer difference NMR to identify segments of a ligand in direct contact with a protein receptor. J Am Chem Soc. 2001;123:6108–6117. doi: 10.1021/ja0100120. [DOI] [PubMed] [Google Scholar]

[CR14] Ni F, Scheraga HA. Use of the transferred nuclear overhauser effect to determine the conformations of ligands bound to proteins. Acc Chem Res. 1994;27(9):257–264. doi: 10.1021/ar00045a001. [DOI] [Google Scholar]

[CR15] Orts J, Tuma J, Reese M, Grimm SK, Monecke P, Bartoschek S, Schiffer A, Wendt KU, Griesinger C, Carlomagno T. Crystallography-independent determination of ligand binding modes. Angew Chem Int Ed Engl. 2008;47:7736–7740. doi: 10.1002/anie.200801792. [DOI] [PubMed] [Google Scholar]

[CR16] Orts J, Griesinger C, Carlomagno T. The INPHARMA technique for pharmacophore mapping: a theoretical guide to the method. J Magn Res. 2009;200(1):64–73. doi: 10.1016/j.jmr.2009.06.006. [DOI] [PubMed] [Google Scholar]

[CR17] Pellecchia M, Sam D, W¨uthrich K. Nmr in drug discovery. Nat Rev Drug Discov. 2002;1:211–219. doi: 10.1038/nrd748. [DOI] [PubMed] [Google Scholar]

[CR18] Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, Chipot C, Skeel RD, Kalé L, Schulten K. Scalable molecular dynamics with NAMD. J Comput Chem. 2005;26:1781–1802. doi: 10.1002/jcc.20289. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] Pochapsky S, Pochapsky T. Nuclear magnetic resonance as a tool in drug discovery, metabolism and disposition. Curr Top Med Chem. 2001;1:427–441. doi: 10.2174/1568026013394967. [DOI] [PubMed] [Google Scholar]

[CR20] Rai B, Tawa G, Katz A, Humblet C. Modeling G protein-coupled receptors for structure-based drug discovery using low-frequency normal modes for refinement of homology models: application to H3 antagonists. Proteins. 2010;78:457–473. doi: 10.1002/prot.22571. [DOI] [PubMed] [Google Scholar]

[CR21] Rees DC, Congreve M, Murray CW, Carr R. Fragment-based lead discovery. Nat Rev Drug Discov. 2004;3(8):660–672. doi: 10.1038/nrd1467. [DOI] [PubMed] [Google Scholar]

[CR22] Reese M, Sanchez-Pedregal VM, Kubicek K, Meiler J, Blommers MJJ, Griesinger C, Carlomagno T. Structural basis of the activity of the microtubule-stabilizing agent epothilone A studied by NMR spectroscopy in solution. Angew Chem Int Ed Engl. 2007;46(11):1864–1868. doi: 10.1002/anie.200604505. [DOI] [PubMed] [Google Scholar]

[CR23] Sanchez-Pedregal VM, Reese M, Meiler J, Blommers MJ, Griesinger C, Carlomagno T. The INPHARMA method: protein-mediated interligand NOEs for pharmacophore mapping. Angew Chem Int Ed Engl. 2005;44(27):4172–4175. doi: 10.1002/anie.200500503. [DOI] [PubMed] [Google Scholar]

[CR24] Schroedinger LLC (2003) The Glide 2.5 calculations used FirstDiscovery, version 2.5021. New York

[CR5] The UniProt Consortium (2008) The universal protein resource (UniProt). Nucleic Acids Res 36:D190–D195 [DOI] [PMC free article] [PubMed]

[CR25] Van Dongen M, Weigelt J, Uppenberg J, Schultz J, Wikstr¨om M. Structurebased screening and design in drug discovery. Drug Discov Today. 2002;7:471–478. doi: 10.1016/S1359-6446(02)02233-X. [DOI] [PubMed] [Google Scholar]

[CR26] Wyss D, McCoy M, Senior M. NMR-based approaches for lead discovery. Curr Opin Drug Discov Dev. 2002;5:630–647. [PubMed] [Google Scholar]

PERMALINK

An NMR-based scoring function improves the accuracy of binding pose predictions by docking by two orders of magnitude

Julien Orts

Stefan Bartoschek

Christian Griesinger

Peter Monecke

Teresa Carlomagno