Rapid Identification of Inhibitors and Prediction of Ligand Selectivity for Multiple Proteins: Application to Protein Kinases

Zhiwei Ma; Sheng-You Huang; Fei Cheng; Xiaoqin Zou

doi:10.1021/acs.jpcb.1c00016

. Author manuscript; available in PMC: 2022 Apr 8.

Published in final edited form as: J Phys Chem B. 2021 Mar 2;125(9):2288–2298. doi: 10.1021/acs.jpcb.1c00016

Rapid Identification of Inhibitors and Prediction of Ligand Selectivity for Multiple Proteins: Application to Protein Kinases

Zhiwei Ma ^1,^*, Sheng-You Huang ^1,^*,^†, Fei Cheng ², Xiaoqin Zou ^1,^‡

PMCID: PMC8991440 NIHMSID: NIHMS1792902 PMID: 33651624

Abstract

Rapid identification of inhibitors for a family of proteins and prediction of ligand specificity are highly desirable for structure-based drug design. However, sequentially docking ligands into each protein target with conventional single-target docking methods is too computationally expensive to achieve these two goals, especially when the number of the targets is large. In this work, we use an efficient ensemble docking algorithm for simultaneous docking of ligands against multiple protein targets. We use protein kinases, a family of proteins that are highly important for many cellular processes and for rational drug design, as an example to demonstrate the feasibility of investigating ligand selectivity with this algorithm. Specifically, 14 human protein kinases were selected. First, native docking calculations were performed to test the ability of our energy scoring function to reproduce the experimentally determined structures of the ligand-protein kinase complexes. Next, cross-docking calculations were conducted using our ensemble docking algorithm to study ligand selectivity, based on the assumption that the native target of an inhibitor should have a more negative (i.e., favorable) energy score than the non-native targets. Staurosporine and Gleevec were studied as examples of nonselective and selective binding, respectively. Virtual ligand screening was also performed against five protein kinases that have at least seven known inhibitors. Our quantitative analysis of the results showed that the ensemble algorithm can be effective on screening for inhibitors and investigating their selectivities for multiple target proteins.

Graphical Abstract

graphic file with name nihms-1792902-f0001.jpg

1. Introduction

Molecular docking addresses the likely binding modes of a ligand interacting with a protein of known 3D structure (see refs [1–6] for review). It has been extensively applied to modern drug discovery [7, 8]. However, drug promiscuity is a challenging issue due to similarities of cognate protein structures and flexibilities in the binding pockets, resulting in drug side effects. Prominent examples include protein kinases.

Being one of the largest enzyme gene families, protein kinases are crucial in signal transduction cascades, mediating a wide range of catalytic regulation by modifying the function of a protein during phosphorylation [9]. Mutations in kinases are directly involved in many health complications such as cancer [10–12]. Consequently, tremendous efforts have been made to develop kinase inhibitors. In the past decade alone, tens of drugs specifically targeting protein kinases were approved for clinical use [13, 14]. Despite numerous successes, the broad promiscuity causing undesirable off-target actions and side effects remains a major problem for protein kinase inhibitors. Highly conserved residues in the active sites among the protein kinase family make it difficult for specific target recognition. Therefore, rapid identification of ligand selectivity is important for guiding the early stages in drug development.

Molecular docking predicts ligand selectivity by docking a single ligand onto multiple homologous proteins and by ranking targets and non-targets with their energy scores. Selectivity is also important in virtual screening, which retrieves potential inhibitors from a large database of compounds for a given protein structure by ranking their calculated energy scores [15, 16]. Slynko and colleagues performed virtual screening against protein kinase C-related kinase 1 (PRK1) by docking an in-house library of compounds containing structurally diverse PRK1 inhibitors into an ensemble of PRK1 homology models [17]. The use of an ensemble of homology models improved pose prediction and scoring performance. Nevertheless, their ensemble docking to different protein conformations was accomplished separately.

Most existing docking programs are designed for a single protein target, meaning that only one receptor can be considered at a time in a docking process. This single-docking strategy is commonly used, when one is interested in a specific protein target. However, investigating the selectivity of an inhibitor with respect to multiple protein targets prefers a more advanced method [18–21]. For multiple protein targets, docking programs usually sequentially dock ligands into individual proteins, referred to as sequential docking. The results from each single-target docking are then merged and re-ranked. Despite its feasibility, sequential docking is computationally demanding, with its computational time proportional to the number of the target proteins. Sequential docking also results in additional inconvenience for docking setup and for result analysis. Therefore, a docking strategy that can simultaneously address multiple targets will be highly desirable.

This work presents an efficient ensemble docking algorithm that can simultaneously dock ligands into multiple target proteins without a significant increase in the computational time. This ensemble docking algorithm regards the protein structure as an additional parameter in the energy optimization process and performs docking against multiple targets like docking against a single target. After docking, each ligand receives a series of energy scores corresponding to different target proteins so that one can investigate the selectivity of the ligand against different targets. The binding scores also yield multiple enrichment curves corresponding to different targets in one rather than multiple virtual screening calculations. We use protein kinases as a validation because they share similar binding pockets that provide an excellent test system, in addition to their therapeutic significance. Our ensemble docking algorithm performed well on ligand selectivity prediction and virtual screening with these protein kinase tests. We also applied our docking algorithm to Gleevec and identified its specificity that was reported in literature.

2. Materials and Methods

2.1. Ensemble docking algorithm to simultaneously address multiple targets

The key of our ensemble algorithm is to introduce an additional parameter in docking calculations to describe different protein structures. In a traditional single-rigid-receptor docking software such as DOCK [22–24], the receptor is fixed and the ligand is allowed to orient differently in the binding site. The binding energy score between the ligand and the receptor can be expressed as

E^{S} = E (x, y, z, θ, ϕ, ψ)

(1)

where x, y, and z represent the coordinates of the center of mass in the ligand, and θ, ϕ, and ψ denote the three Euler angles, respectively. These six parameters define the orientation of the ligand. The binding score is only determined by the orientation of the ligand. The optimization process of docking involves continuous adjustment of (x, y, z, θ, ϕ, ψ) until the calculated score converges to a minimum, which corresponds to the predicted binding mode. Therefore, if a ligand is docked into an ensemble of M target proteins by using a standard docking approach, the docking procedure will have to be performed M times. Therefore, the computational time will be M times of that for docking against a single target.

In order to improve the efficiency of docking against multiple targets, our ensemble docking algorithm adds the number of protein targets as the seventh dimension for scoring. Specifically, the ensemble algorithm reconsiders the scoring procedure as

E^{M} = E (x, y, z, θ, ψ, m)

(2)

where m stands for the m-th protein of the target ensemble, an integer parameter ranging from 1 to M. Namely, in our ensemble docking algorithm, the binding score depends not only on the ligand orientation (determined by x, y, z, θ, ϕ, ψ) but also the specific target protein (denoted by m). Thus, by adjusting the parameter set (x, y, z, θ, ϕ, ψ, m), this algorithm can simultaneously dock a ligand into a series of target proteins.

If there is only one target protein (M = 1), ensemble docking which uses eq (2) reverts to standard single-target docking which uses eq (1). We previously applied similar ensemble algorithm to account for protein structural variations in ligand binding [25, 26], and demonstrated that the computational time of our ensemble docking algorithm is comparable to that of single-target docking.

2.2. Scoring function

The scoring function being used in this study is ITScore, an iterative knowledge-based scoring function that we previously developed and systematically tested [27, 28]. The energy score between the protein and the ligand E_P−L is obtained by summing all protein (P) - ligand (L) atom pair interactions as

E_{P - L} = \sum_{P - L atom pair} u_{i j} (r)

(3)

where u_ij(r) is the pair potential of ITScore between the protein atom of type i and the ligand atom of type j at the atom pair distance r. Only intermolecular interactions are considered in ITScore. In this work, the scoring function was derived via an efficient iterative method using 2897 protein-ligand complexes [29] extracted from the Protein Data Bank (PDB) [31]. Its efficiency and robustness were demonstrated through binding mode identification, binding affinity prediction, and virtual database screening on the diverse test sets of up to 200 protein-ligand complexes. Details of ITScore and related test studies were described in our previous studies [27, 28].

In this work, proteins were treated as rigid bodies. Ligand flexibility was represented by multiple ligand conformers, which were generated using the OMEGA software Version 2.4.6 (OpenEye Scientific Software, Santa Fe, NM. http://www.eyesopen.com) [32, 33]. It is assumed that there is no penalty for ligand strain and no free energy difference between the proteins.

2.3. Construction of the reference protein

Our docking program uses the same matching algorithm as DOCK 4.0 [22, 24]. Therefore, similar to DOCK 4.0, ensemble docking also requires a protein structure to generate initial ligand orientations in the binding site. However, ensemble docking is usually applied to cases of multiple proteins, in which it is neither possible nor reasonable to determine which target should be used as a representative of the ensemble. Therefore, a reference protein will be automatically constructed by our docking program for generation of initial ligand orientations.

The principle for constructing the reference protein is identical to that for our previous work [25]. Namely, the binding pocket should be as large as possible to provide sufficient sampling space for the ligand with the least change in the overall shape of the binding site. The details are described as follows.

First, by collecting and clustering the sphere points of the ensemble, we generated a set of reference sphere points [25]. We calculated the distances between the reference sphere points and each residue of every target protein, with the distance defined as the minimum of the distances between every atom in the residue and any of the reference sphere points. Then, we performed multiple sequence alignment for multiple target proteins by using the structure-based sequence alignment software T-Coffee [34], and randomly selected one of the multiple sequences to be the reference sequence of the reference protein. Then, the conformation of a residue was chosen as a representation of the corresponding residue of the reference sequence by satisfying the following two conditions: the residue of the selected conformation was aligned with the residue of the reference sequence, and the conformation is at least 3 Å far from the reference sphere points. Finally, all the selected residue conformations were combined into the reference protein. The constructed reference protein is artificial, which is fine because the reference protein is only used as a guide to the generation of initial ligand orientations. The real target proteins, rather than the reference protein, will be used for scoring calculations and orientation optimization.

It should be noted that the random selection for the reference sequence has little effect on the ensemble docking results because of two reasons. First, the kinase proteins share similar sequences and therefore similar structures. Second, the constructed reference protein does not affect the final binding scores because the reference protein is only for guiding the generation of initial ligand binding modes. It is the true protein in the ensemble but not the reference protein that is used to evaluate the ligand binding energy score in ensemble docking. Our previous studies also showed that such reference protein is robust in guiding the generation of near-native ligand binding modes within 2.0 Å demonstrating the suitability of the constructed reference protein structure [25].

2.4. Docking algorithm

We implemented our ensemble docking algorithm [25, 26] by using the ligand matching method of DOCK 4.0 [24] and the ITScore scoring function [27, 28]. The FORTRAN language was used for the compatibility with the DOCK program. We retrained the scoring function using 2897 protein-ligand complexes from the Protein Data Bank [29].

The resulting docking program is called MDock (© University of Missouri-Columbia, 2007) [35, 36]. The source code of MDock can be downloaded from (http://zoulab.dalton.missouri.edu/mdock.htm). Ligand flexibility was considered by docking multiple conformers of a ligand generated by the OMEGA software under default parameters. The maximum number of output conformers per ligand, MAXCONFS, was set to the recommended value (200).

The initial ligand orientations were generated using the exhaustive matching algorithm by Ewing et al. [37], which was guided by a reference protein of multiple targets. About 50 sphere points represented the binding site. The tolerance distance between ligand atoms and sphere points was set to 0.5 Å during matching. The ligand orientation that matched the most sphere points underwent further scoring evaluation, with the maximum number of evaluated orientations as 1000, and the maximum minimization step as 200. Ligand energy scores were calculated using Equation (2) for our ensemble docking algorithm. DOCK’s grid-based energy calculation algorithm [38] was employed to improve the computational speed. The grid spacing was set to 0.3 Å, and trilinear interpolation was used to calculate energy scores. A SIMPLEX algorithm [39] was used to optimize the binding energy of each ligand orientation. Because SIMPLEX is essentially a local optimization method, the top 20 orientations for each ligand were re-optimized for each target protein to minimize the influence of local minima. The ligand binding scores of each target protein were saved for sequent analysis. In short, we had M sets of energy scores for each ligand corresponding to M target proteins in the ensemble.

2.5. Test data sets

The protein kinases obtained from the PDBbind (2015 version) database were selected as a test set because the experimental binding affinity data of the complexes were available in this database [40, 41]. We extracted 200 kinase complexes from a collection of 2401 hits in the PDBbind database by applying the following guidelines: (1) only human gene protein kinases; (2) pH ranging from 6.8 to 7.5; (3) not binding with ATP, ADP, ANP or peptide; and (4) X-ray structures rather than NMR models. These 200 protein kinase complexes belong to 14 different protein kinases (see Table 1) and were used to evaluate our docking algorithm and scoring function. The distributions of the 14 kinase families used in this study are highlighted in red dots in Figure 1, showing a wide diversity of the chosen kinases. The tight selection was for the specificity consideration that involves P38, ERK2, and JNK3. The 200 ligands served as the active compounds in the virtual screening test.

Table 1:

The 14 protein kinases for testing the ensemble docking algorithm. The PDB codes of the corresponding ligand-bound complexes are listed in the right column with the ligand IDs in parentheses. The protein structure with bold font represents its corresponding protein kinase for ensemble docking calculations.

Kinases	Protein-ligand Complexes
P38	1A9U(SB2), 1BL6(SB6), 1BL7(SB4), 1BMK(SB5), 1DI9(MSQ), 1KV1(BMU), 1KV2(B96), 1M7Q(DQO), 1OUK(084), 1OUY(094), 1OVE(358), 1W7H(3IP), 1WBN(L09), 1WBO(2CH), 1WBS(LI2), 1WBT(WBT), 1WBV(LI3), 1WBW(LI4), 3FC1(52P), 3GC7(B45), 3HVC(GG5), 3ITZ(P66), 3NNU(EDB), 3NNV(437), 3NNW(EDD)
CDK2	1H08(BWP), 1H01(FAL), 1H07(MFP), 1H1P(CMG), 1H1Q(2A6), 1H1R(6CP), 1H1S(4SP), 1OGU(ST8), 1OI9(N20), 1OIQ(HDU), 1OIR(HDY), 1OIT(HDT), 1OIU(N76), 1OIY(N41), 1P5E(TBS), 1URW(I1P), 1V1K(3FP), 1VYW(292), 1Y91(CT9), 2BKZ(SBC), 2BPM(529), 2C4G(514), 2C68(CT6), 2C69(CT8), 2C6I(DT1), 2C6K(DT2), 2C6L(DT4), 2C6M(DT5), 2C6O(4SP), 2CLX(F18), 2G9X(NU5), 2IW6(QQ2), 2IW8(4SP), 2IW9(4SP), 2R3F(SC8), 2R3G(SC9), 2R3H(SCE), 2R3I(SCF), 2R3J(SCJ), 2R3K(SCQ), 2R3L(SCW), 2R3M(SCX), 2R3N(SCZ), 2R3O(2SC), 2R3P(3SC), 2UZB(C75), 2UZD(C85), 2UZE(C95), 2UZL(C94), 2UZN(C96), 2UZO(C62), 2VV9(IM9), 2W05(FRT), 2W06(FRV), 2W17(I19), 2XMY(CDK), 2XNB(Y8L), 3DDP(RC8), 3DDQ(RRC), 3DOG(NNN), 3LFN(A27), 3LFS(A07), 3MY5(RFZ), 3PXQ(2AN), 3PXY(JWS), 3PXZ(JWS), 3PY0(SU9), 3PY1(SU9), 3S2P(PMU), 4BCK(T3E), 4BCM(T7Z), 4BCN(T9N), 4BCO(T6Q), 4BCP(T3C), 4RJ3(3QS)
AURKA	3P9J(P9J), 4DEA(NHI), 4DEB(NHJ), 4DED(NHU), 4UZD(QMN)
CHK1	2HOG(710), 2HXL(422), 2HXQ(373), 2HY0(306), 2QHM(7CS), 2QHN(582), 2R0U(M54), 2X8D(X8D), 3TKH(07s), 3TKI(S25)
CK2	3MB6(01I), 3MB7(14I)
JNK3	1PMN(984), 2B1P(AIZ), 2EXC(JNK), 3G90(J72), 3G9L(J67), 3G9N(j88), 3OY1(589), 3V6R(CQQ), 3V6S(0F0), 4U79(3EL), 4WHZ(3NL)
JAK2	3E62(5B1), 3E63(5B2), 3E64(5B3), 3JY9(JZH), 3KCK(3KC), 3KRR(DQX), 3Q32(J2I), 3RVG(17P), 3ZMM(F9J), 4BBE(3O4), 4BBF(O19), 4C61(LMM), 4D0W(VVQ), 4D0X(953)
HGF	1R0P(KSA), 3CE3(1FN), 3CTH(319), 3CTJ(320), 3F82(353), 3Q6W(Q6W), 3QTI(3QT), 3R7O(M61), 4EEV(L1X), 4GG5(0J3), 4GG7(0J8), 4IWD(1JC)
ERK2	1PME(577), 2OJG(19A), 2OJI(33A), 2OJJ(82A), 3I5Z(Z48), 3I60(E86), 4O6E(2SH), 4QTA(38Z)
GSK3β	1Q3D(STU), 1Q3W(ATU), 1Q41(679), 3GB2(G3B), 3I4B(Z48), 3L1S(Z92), 4IQ6(IQ6)
ITK	3T9T(IAQ), 4HCT(18R), 4HCU(13L), 4HCV(13J), 4M0Y(M0Y), 4M0Z(M0Z), 4M14(QWS)
PAK1	4EQC(XR1), 4O0R(X4Z), 4O0T(2OL)
PDK1	1OKY(STU), 1OKZ(UCN), 1UU3(LY4), 1UU7(BI2), 1UU8(BI1), 1UVR(BI8), 1Z5M(LI8), 3QC4(MP7), 3RCJ(3RC), 3RWP(ABQ), 3RWQ(3RW), 3SC1(3S1)
TGFR	1PY5(PY1), 1RW8(580), 2X7O(ZOP), 3FAA(55F), 3KCF(JZO), 4L3P(1UH), 4L52(1UL), 4L53(1UO), 4O91(NG2)

Open in a new tab

Figure 1: — The kinase tree diagram was built from http://www.kinhub.org/kinmap/. The kinase families used in this study are highlighted in red dots.

Each of the 14 protein kinases has a certain number of complexes. Each protein kinase was represented by one protein structure with high resolution and no missing residue in its binding site. Thus, we obtained a total of 14 target proteins representing 14 protein kinases (see Table 1), which were then used to evaluate our ensemble docking algorithm. The Protein Data Bank [31] provides the coordinates for all the complexes. The multiProt software [42] was used to superimpose the 14 target proteins representing 14 protein kinases according to their binding sites. Water molecules and metal ions were removed from the protein structures. Hydrogen atoms were not considered explicitly because of the feature of the ITScore scoring function [27, 28]. The T-Coffee software [34] was used to align the sequences of 14 target proteins for constructing the reference protein. The SYBYL software (Tripos, Inc.) was employed to assign atom types to the protein atoms and ligand atoms.

2.6. Database preparation

For a small-scale virtual screening test, 1000 commercially available compounds were randomly selected from the Available Chemical Directory (ACD, distributed by Molecular Design Ltd., San Leandro, CA) with the aid of the FILTER program (OpenEye Scientific Software Inc., Santa Fe, NM, https://www.eyesopen.com/filter) using default parameters. The test set of 200 known kinase ligands are significantly different from the selected ACD compounds with a range of molecular similarities from 0.204 to 0.525 (Figure 2a), where the molecular similarity for each kinase ligand is represented by its highest Tanimoto coefficient against the 1000 ACD compounds [43]. In addition, the molecular weights of 200 kinase ligands are also well covered by those of 1000 ACD compounds (Figure 2b), although the ACD compounds have a higher average molecular weight than the kinase ligands. These 1000 molecules served as a set of inactive compounds for virtual database screening. The CONCORD program was used to generate coordinates for the ligand molecules [44].

Figure 2: — Comparison between the test set of 200 kinase ligands and the database of 1000 ACD compounds. (a) The distribution of the molecular similarities for the 200 kinase ligands against the database of the 1000 ACD compounds in terms of the Tanimoto coefficients of the fingerprints. (b) The distributions of the molecular weights for the kinase ligands and the ACD compounds, respectively.

3. Results

One limitation of traditional single-target docking against multiple targets is its lack of efficiency for virtual screening, particularly for a large number of targets. We therefore performed virtual screening against multiple target proteins to evaluate our ensemble docking algorithm. The test was performed on an ensemble of 14 protein kinases (see Materials and Method). The results are compared with those of traditional single-target docking as follows.

3.1. Evaluation of the dock algorithm and scoring function

Before docking against the ensemble of multiple target proteins, we first re-docked the 200 ligands onto their bound protein structures (see Table 1). This so-called native-docking test served as a validation of our docking algorithm and scoring function.

In the present work, the docking accuracy was evaluated by calculating the root-mean-square deviation (rmsd) between the docked ligand and the ligand in the co-bound crystal structure. A ligand binding mode was considered to be successfully predicted if the rmsd < 2.0 Å for the top orientation [45], a default definition for this study unless otherwise specified. We chose the top pose rather than the best pose because the latter requires beforehand knowledge of the system. The success rates would be even higher if the best poses were used. The success rates would also be higher if top 3 or top 5 poses were included. Figure 3 shows the success rates of our docking algorithm for 200 ligands with different success criteria, ranging from rmsd < 1.0 Å to rmsd < 3.0 Å. From this figure, we can see that our docking algorithm yielded high success rates at different rmsd criteria. Under the strictest criterion (i.e., rmsd < 1.0 Å), the lowest success rate achieved 74%. Under looser restrictions, the success rate increased. The highest success rate (83%) was achieved for rmsd < 3.0 Å. The default criterion of rmsd < 2.0 Å yielded a success rate of 79%. The high success rates validated our docking algorithm and scoring function.

Figure 3: — The success rates of binding mode prediction for 200 complexes at different criteria of rmsd < 1.0, 1.5, 2.0, 2.5 and 3.0 Å when the top ligand conformation was considered.

3.2. The ligand selectivity

An important task of the ensemble docking algorithm is to investigate ligand selectivity against different protein targets. Ideally, a docking algorithm would give comparable energy scores as the experimental binding energies for those targets of a specified inhibitor and give worse scores (i.e., less negative scores) to those nontargets. However, our scoring function was not scaled to reproduce experimental binding affinities, even though it obtained a reasonable correlation between the calculated energy scores and experimental binding affinities [27, 28]. Therefore, we used an alternative ranking method to investigate the selectivity of the ligands against different protein kinases, which examined whether the true targets for an inhibitor received more favorable rankings than those non-targets in terms of calculated binding scores.

For each inhibitor, the proteins were divided into two simple classes, targets and non-targets, according to the experimental data. Table 2 shows the ranked list of different protein kinases according to their binding scores for the seven typical inhibitors in Figure 4 and Figure 5A. From the table we can see that the true targets of the inhibitors indeed obtained better rankings than those of the non-targets. It is also notable that our ensemble docking algorithm identified the targets of some inhibitors such as I17, MNY, and 460, despite the absence of the bound protein structures of these inhibitors in our kinase ensemble.

Table 2:

A ranked list of 14 protein kinases according to the computed binding energies by ensemble docking for seven typical inhibitor. The true targets of each inhibitor are colored in gray.

I17		MNY		KSA		TBS		STU		084		460
Kinase	Score	Kinase	Score	Kinase	Score	Kinase	Score	Kinase	Score	Kinase	Score	Kinase	Score
CDK2	−51.6	CK2	−45.3	HGF	−66.8	CK2	−49.2	PDK1	−52.7	P38	−51.4	TGFR	−47.1
HGF	−49.9	JAK2	−43.5	PDK1	−51.5	CDK2	−49.0	CDK2	−48.5	JNK3	−47.3	JNK3	−46.0
CHK1	−47.2	CDK2	−41.9	CK2	−51.3	ERK2	−47.4	CHK1	−45.4	ERK2	−43.2	HGF	−43.7
JNK3	−45.6	HGF	−41.4	CHK1	−46.8	HGF	−45.6	GSK3β	−45.0	AURKA	−43.2	CK2	−41.7
ERK2	−45.2	CHK1	−41.2	TGFR	−44.6	JNK3	−44.9	JNK3	−41.5	TGFR	−41.9	ERK2	−41.5
CK2	−43.9	JNK3	−40.3	JNK3	−44.5	JAK2	−44.2	HGF	40.8	CK2	−40.4	PDK1	−38.4
AURKA	−40.6	AURKA	−38.8	AURKA	−42.4	PDK1	−43.4	ERK2	−40.7	PAK1	−40.1	AURKA	−38.2
ITK	−39.6	PDK1	−37.7	GSK3β	−41.5	AURKA	−42.7	CK2	−40.1	GSK3β	−40.1	ITK	−38.2
JAK2	−39.1	TGFR	−37.7	ERK2	−40.6	CHK1	−42.2	AURKA	−38.8	PDK1	−40.0	CHK1	−37.0
PDK1	−39.1	ERK2	−36.3	ITK	−38.3	ITK	−41.9	P38	−34.4	CDK2	−40.0	GSK3β	−34.3
P38	−39.1	ITK	−35.8	JAK2	−35.4	P38	−41.6	ITK	−30.7	CHK1	−40.0	P38	−33.4
GSK3β	−39.0	P38	−35.2	PAK1	−35.3	PAK1	−41.6	PAK1	−30.3	HGF	−38.9	CDK2	−33.2
TGFR	−38.1	PAK1	−35.2	P38	−35.2	TGFR	−41.5	TGFR	−29.5	ITK	−38.1	PAK1	−32.5
PAK1	−35.9	GSK3β	−35.2	CDK2	−35.1	GSK3β	−41.1	JAK2	−28.9	JAK2	−34.2	JAK2	−32.5

Open in a new tab

Figure 4: — The 2D structures of selected protein kinase inhibitors for ligand selectivity analysis. The ligand ID and its corresponding PDB code are shown below each structure. The structures were drawn with the Marvinsketch software from ChemAxon Ltd. (http://www.chemaxon.com).

To further discuss our ensemble algorithm for ligand selectivity prediction, which is applicable for both non-specific and specific targets, we used staurosporine (STU) and Gleevec (STI) as two examples. As a non-specific inhibitor, staurosporine, shown in Figure 5A, represents a well-studied class of universal kinase inhibitors [46, 47]. Having only a few rotatable bonds, staurosporine is quite rigid. We docked staurosporine onto the protein kinases, and displayed the results with Chimera [48] (see Figure 5C). Two important motifs (the GXGXXG motif and the DFG motif) for staurosporine binding are also shown in this figure. The similar conformations of the protein kinases and the very rigid structure of staurosporine result in the similar binding modes. STU is reported to inhibit at least 4 sub-families of protein kinases (see Table 2), and we obtained better rankings for these 4 targets from the 14 kinases in our ensemble docking calculations.

Comparing with promiscuous staurosporine, a potent and effective drug called Gleevec (also known as Imatinib or STI-571, shown in Figure 5B) has been utilized as a primary clinical drug for treating special cancers such as chronic myelogenous leukemia (CML) and gastrointestinal stromal tumors (GISTs) [49, 50]. Gleevec exhibits an effective auto-inhibitory function specifically against c-Kit and Abl protein tyrosine kinase. In tumor cells, tyrosine kinases are constitutively active, and Gleevec blocks tyrosine kinase in an inactive conformation to prevent the phosphorylation that would benefit cancerous cells. However, the reason why Gleevec is a specific and potent inhibitor has been controversial [51–53]. Recently, Agafonov et al.[54] identified using NMR and fluorescence studies that Gleevec specifically inhibits Abl rather than Src because of the induced-fit conformational changes. In Figure 5D, we aligned the 4 tyrosine kinase complexes (with the PDB IDs shown in Table 3) and used different colors to show the conformational changes after Gleevec binding to different kinases. Specifically, Gleevec binds between the αC-helix and the Asp-Phe-Gly (DFG) motif of the activation loop (A-loop). The phosphate-binding loop (P-loop) stabilizes Gleevec binding[52]. Although these 4 tyrosine kinases are homologous, they exhibit distinct conformations in the binding site, resulting in the binding specificity of Gleevec.

Table 3:

The binding energy scores of Gleevec with 4 homologous tyrosine kinases and the 14 protein kinases in our test set. The true Gleevec targets are colored in gray.

Kinase	Score	Kinase	Score
c-Kit(1T46)	−76.8	CHK1	−50.3
Abl(1IEP)	−72.8	AURKA	−50.2
Lck(2PL0)	−65.7	GSK3β	−49.0
c-Src(2OIQ)	−64.9	TGFR	−48.9
JNK3	−53.9	ERK2	−47.7
CDK2	−52.0	JAK2	−47.6
ITK	−51.8	P38	−47.3
HGF	−51.6	PDK1	−47.3
CK2	−51.2	PAK1	−46.4

Open in a new tab

The binding scores of Gleevec from ensemble docking were −76.8 and −72.8 for c-Kit and Abl, respectively. In contrast, the energy scores of Gleevec were only −65.7 and −64.9 for Lck and c-Src, respectively (see Table 3). In summary, the binding energy scores of Gleevec exhibited sensitivity to Abl and c-Kit kinases, despite Lck and c-Src being their homologous tyrosine kinases, suggesting that our ensemble docking algorithm can predict ligand sensitivity.

Finally, to quantitatively evaluate the efficiency of our ensemble docking algorithm in discriminating true targets from non-targets, we defined the following efficiency factor (EF):

E F = 100 % \times \frac{〈 Rank (nontarget) 〉 - 〈 Rank (target) 〉 + N_{proteins} / 2}{N_{proteins}}

(4)

Here, N_proteins is the number of the proteins in the ensemble including both non-targets and targets, and in this study, N_proteins = 14. 〈Rank(nontarget)〉 and 〈Rank(target)〉 denote the average rankings of the non-target and target proteins, respectively. The efficiency factor EF ranges from 0 to 100%. If all the targets rank better than the non-targets, EF is 100%, indicating that the prediction completely agrees with the experimental findings. In contrast, if all the non-targets rank better than the targets, EF is 0, representing a completely failed prediction. A random selection yields an equal distribution of the targets among all the proteins, resulting in an EF of 50%. The greater the EF value, the higher the efficiency in identifying known targets. Figure 6 gives the efficiency factors in identifying the true protein target for all the 200 ligands by the ensemble docking algorithm. It can be seen from the figure that ensemble docking yielded an efficiency factor of > 50% for most of the ligands. The average EF was 74%, indicating favorable ranking of the true targets over the non-targets. The results suggest the effectiveness of our ensemble algorithm in investigating the selectivity of inhibitors against multiple target proteins.

Figure 6: — The efficient factors of the ensemble docking algorithm in discriminating the true targets from the non-targets for the 200 ligands listed in Table 1. The horizontal red dashed line stands for random selection.

3.3. Virtual Screening

The effectiveness of a docking algorithm on virtual screening is important from the perspective of structure-based drug design. A widely-used index for the assessment of virtual screening effectiveness is enrichment factor, which is defined as the accumulated rate of known inhibitors identified in a top percentage of the ranked database. As described in the Materials and Methods section, our ensemble docking algorithm calculates the binding energy score of a ligand for each target. In other words, each target protein receives a set of binding energy scores for all the docked ligands in only one run of ensemble docking, and multiple enrichment curves corresponding to multiple target proteins can then be plotted. To assess the accuracy and efficiency of ensemble docking on virtual screening for multiple targets, we compared its enrichment data with the enrichment data obtained from the traditional sequential docking method. Figure 7 shows such enrichment comparisons on five typical target proteins with at least seven known ligands in Table 1. For each of these five kinase targets, the corresponding known ligands and the 1000 inactive molecules were used for virtual screening, respectively. With the introduction of the conformational parameter, the ensemble docking method is about M times faster than traditional sequential docking methods, where M is the number of protein conformations in the ensemble. The computational time of sequential docking is proportional to M. The Area Under the Curve (AUC) was not improved in Figure 7 because of the use of the same pairwise potentials in the scoring functions. It can be seen from the figure that ensemble docking performed comparably well to sequential docking for all the target proteins but with much less computational time, indicating that our ensemble docking algorithm can efficiently identify known ligands.

Figure 7: — Comparisons of the enrichments between ensemble docking and standard single-target docking on five typical target proteins. The legend applies to all the panels.

3.4. Computational efficiency

In addition to docking accuracy, our ensemble docking algorithm has an advantage in computational efficiency. It has been shown that the run time of ensemble docking is comparable to that of single-structure docking despite the fact that ensemble docking addresses multiple protein structures [25, 26]. In this study, on an Intel(R) Core(TM) i7-4770 PC with 3.4 GHz Pentium IV CPU and 8.0 GB RAM, the average run time was 2.8 seconds for ensemble docking algorithm to place and optimize a ligand conformer in the binding site, which was comparable to that for standard single-target docking (2.4 s). In contrast, the total run time for sequential docking is proportional to the number of protein structures in an ensemble. In the example of P38, it has a total of 25 kinase structures (see Table 1). The ensemble docking took about 2.8 s, which was only an increase of 17% in computational time compared with single docking. In contrast, sequential docking took about 2.4 × 35 s, which was an increase of 35 folds.

4. Discussions and Conclusion

Simultaneously docking against multiple target proteins is valuable because compounds often inhibit multiple proteins of the same family and result in the challenge of ligand specificity [18–20, 55–59]. Our present ensemble docking algorithm addresses this issue by simultaneously docking against multiple targets using protein structures as an additional parameter for energy optimization. The virtual screening test on 14 protein kinases demonstrates that our ensemble docking algorithm for multiple target proteins achieved comparable performances to the standard single-target docking method without much increase in the computational time.

There is a similar “in situ cross-docking” approach that was proposed and tested on several ensembles of up to six targets by Sotriffer and Dramburg, in which the ligands can be simultaneously docked into one united grid consisting of all the binding sites of multiple targets [20]. Despite the similarities, there are several major differences between this approach and our ensemble docking approach. In Sotriffer and Dramburg’s approach, the binding sites of multiple target proteins are separate from each other in the united grid. Therefore, if a ligand has fairly favorable scores for more than one target, the convergence toward the native, global optimum will be slow. This may limit the applicability of their approach to a protein ensemble of similar targets or different conformations of the same protein [20]. Moreover, because the grids from different targets are joined one by one, the united grid will boast a much larger volume than the grid of single target. Therefore, the searching efficiency for a ligand binding mode will be considerably lower due to the larger space for docking.

Unlike the “in situ cross-docking” approach, our ensemble docking algorithm aligns the binding sites of multiple target proteins so that the united grid has almost the same volume as that for single-target docking. Accordingly, our ensemble algorithm is computationally efficient with a run time comparable to that of single-target docking. The ensemble algorithm can also be applied to different proteins for identifying potential targets of an inhibitor and to different conformations of the same protein for considering protein flexibility regardless of whether the protein structures are similar or not. In addition, ensemble docking provides ligand binding energy scores for each target, which can be used for studying ligand selectivity.

In principle, our ensemble docking algorithm can simultaneously address unlimited target proteins. However, the energy landscape of the protein ensemble would be more rugged as the number of targets increases, which in turn, would lower the searching efficiency of the energy optimization method. In this work, we used the local optimization method, SIMPLEX [39]. Thus, the ligand may sometimes be trapped in a local minimum. As a result, occasional wrong predictions often originate from the optimization method rather than from the ensemble algorithm itself. One solution is to use a global minimization method, which would not be affected by local minima. However, the global minimization method achieves its efficiency at the cost of computational time. Therefore, it is desirable to develop an appropriate optimization method with a good balance between speed and efficiency when the number of proteins is increased.

In summary, we have presented an efficient ensemble docking algorithm for simultaneous docking onto multiple target proteins. The algorithm was validated by virtual screening against 14 human protein kinases. The ensemble docking algorithm performed comparably to standard single-target docking without a significant increase in run time. We expect the algorithm to be useful for the investigation of ligand selectivity and for rapid identification of inhibitors for multiple targets in structure-based drug design.

Acknowledgments

We are thankful to Dr. Xianjin Xu for insightful discussions and helpful assistance. We are also grateful to the support to XZ from OpenEye Scientific Software Inc. (Santa Fe, NM), Tripos, Inc. (St. Louis, MO) and MDL Information Systems, Inc. (San Leandro, CA). XZ is supported by the NIH grants R01GM109980 and R35GM136409 (PI: XZ), R01HL126774 (PI: Jianmin Cui) and NIH R01HL142301 (PI: Jonathan R. Silva). The computations were performed on the high performance computing infrastructure supported by NSF CNS-1429294 (PI: Chi-Ren Shyu) and the HPC resources supported by the University of Missouri Bioinformatics Consortium(UMBC).

References

[1].Huang S-Y; Zou X Advances and Challenges in Protein-Ligand Docking. Int. J. Mol. Sci 2010, 11, 3016–3034. [DOI] [PMC free article] [PubMed] [Google Scholar]
[2].Huang S-Y; Grinter SZ; Zou X Scoring Functions and Their Evaluation Methods for Protein-Ligand Docking: Recent Advances and Future Directions. Phys. Chem. Chem. Phys 2010, 12, 12899–12908. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Grinter SZ; Zou X Challenges, Applications, and Recent A dvances of Protein-Ligand Docking in Structure-based Drug Design. Molecules 2014, 19, 10150–10176. [DOI] [PMC free article] [PubMed] [Google Scholar]
[4].Kitchen DB; Decornez H; Furr JR; Bajorath J Docking and Scoring in Virtual Screening for Drug Discovery: Methods and Applications. Nat. Rev. Drug Discov 2004, 3, 935–949. [DOI] [PubMed] [Google Scholar]
[5].Brooijmans N; Kuntz ID Molecular recognition and docking algorithms. Annu. Rev. Biophys. Biomol. Struct 2003, 32, 335–373. [DOI] [PubMed] [Google Scholar]
[6].Meng X-Y; Zhang H-X; Mezei M; Cui M Molecular Docking: A powerful approach for structure-based drug discovery. Curr. Comput.-Aided Drug Des 2011, 7, 146–157. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Santiago DN; Pevzner Y; Durand AA; Tran M; Scheerer RR; Daniel K; Sung S-S; Woodcock HL; Guida WC; Brooks WH Virtual target screening: validation using kinase inhibitors. J. Chem. Inf. Model 2012, 52, 2192–2203. [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Zahler S; Tietze S; Totzke F; Kubbutat M; Meijer L; Vollmar AM; Apostolakis J Inverse in silico screening for identification of kinase inhibitor targets. Chem. Biol 2007, 14, 1207–1214. [DOI] [PubMed] [Google Scholar]
[9].Bond JS Proteases: History, discovery, and roles in health and disease. J. Biol. Chem 2019, 294(5), 1643–1651. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Cohen P Protein kinases - the major drug targets of the twenty-first century?. Nat. Rev. Drug Discov 2002, 1, 309–315. [DOI] [PubMed] [Google Scholar]
[11].Cherry M; Williams DH Recent kinase and kinase inhibitor X-ray structures: Mechanisms of inhibition and selectivity insights. Curr. Med. Chem 2004, 11, 663–673. [DOI] [PubMed] [Google Scholar]
[12].Lahiry P; Torkamani A; Schork NJ; Hegele RA Kinase mutations in human disease: interpreting genotype-phenotype relationships. Nat. Rev. Genet 2010, 11, 60–74. [DOI] [PubMed] [Google Scholar]
[13].Cohen P; Alessi DR Kinase Drug Discovery Whats Next in the Field? ACS Chem. Biol 2013, 8, 96–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14].Bhullar KS; Lagarn NO; McGowan EM; Parmar I; Jha A; Hubbard BP Rupasinghe HPV Kinase-targeted cancer therapies: progress, challenges and future directions. Mol. Cancer 2018, 17, 48. [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].Macchiarulo A; Nobeli I; Thornton JM Ligand selectivity and competition between enzymes in silico. Nat. Biotechnol 2004, 22, 1039–1045. [DOI] [PubMed] [Google Scholar]
[16].Seifert MH; Kraus J; Kramer B Virtual high-throughput screening of molecular databases. Curr. Opin. Drug Discov. Devel 2007, 10, 298–307. [PubMed] [Google Scholar]
[17].Slynko I; Scharfe M; Rumpf T; Eib J; Metzger E; Schule R; Jung M; Sippl W Virtual screening of PRK1 inhibitors: ensemble docking, rescoring using binding free energy calculation and QSAR model development. J. Chem. Inf. Model 2014, 54, 138–50. [DOI] [PubMed] [Google Scholar]
[18].Lamb ML; Burdick KW; Toba S; Young MM; Skillman AG; Zou X; Arnold JR; Kuntz ID Design, docking, and evaluation of multiple libraries against multiple targets. Proteins 2001, 42, 296–318. [DOI] [PubMed] [Google Scholar]
[19].Rockey WM; Elcock AH Progress toward virtual screening for drug side effects. Proteins 2002, 48, 664–671. [DOI] [PubMed] [Google Scholar]
[20].Sotriffer CA; Dramburg I “In situ cross-docking” to simultaneously address multiple targets. J. Med. Chem 2005, 48, 3122–3125. [DOI] [PubMed] [Google Scholar]
[21].Hardman JG; Limbird LE; Gilman AG The Pharmacological Basis of Therapeutics 10th ed; McGraw-Hill, New York. 2001. [Google Scholar]
[22].Kuntz ID; Blaney JM; Oatley SJ; Langridge R; Ferrin TE A geometric approach to macromolecule-ligand interactions. J. Mol. Biol 1982, 161, 269–288. [DOI] [PubMed] [Google Scholar]
[23].Allen WJ; Balius TE; Mukherjee S; Brozell SR; Moustakas DT; Lang PT; Case DA; Kuntz ID; Rizzo RC J. Comput. Chem 2015, 36, 1132–1156. [DOI] [PMC free article] [PubMed] [Google Scholar]
[24].Ewing TJA; Makino S; Skillman AG; Kuntz ID DOCK 4.0: Search strategies for automated molecular docking of flexible molecule database. J. Comput. Aided Mol. Des 2001, 15, 411–428. [DOI] [PubMed] [Google Scholar]
[25].Huang S-Y; Zou X Ensemble docking of multiple protein structures: Considering protein structural variations in molecular docking. Proteins 2007, 66, 399–421. [DOI] [PubMed] [Google Scholar]
[26].Huang S-Y; Zou X Efficient molecular docking of NMR structures: Application to HIV-1 protease. Protein Sci. 2007, 16, 43–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
[27].Huang S-Y; Zou X. An iterative knowledge-based scoring function to predict protein-ligand interactions: I. Derivation of interaction potentials. J. Comput. Chem 2006, 27, 1865–1875. [DOI] [PubMed] [Google Scholar]
[28].Huang S-Y; Zou X An iterative knowledge-based scoring function to predict protein-ligand interactions: II. Validation of the scoring function. J. Comput. Chem 2006, 27, 1876–1882. [DOI] [PubMed] [Google Scholar]
[29].Yan C; Grinter SZ; Merideth BR; Ma Z; Zou X Iterative Knowledge-Based Scoring Functions Derived from Rigid and Flexible Decoy Structures: Evaluation with the 2013 and 2014 CSAR Benchmarks. J. Chem. Inf. Model 2015, 56, 1013–1021. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30].Eid S; Turk S; Volkamer A; Rippmann F; Fulle S KinMap: a web-based tool for interactive navigation through human kinome data. BMC Biol. 2017, 18, 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
[31].Berman HM; Westbrook J; Feng Z; Gilliland G; Bhat TN; Weissig H Shindyalov IN; Bourne PE The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
[32].Hawkins PCD; Skillman AG; Warren GL; Ellingson BA; Stahl MT Conformer Generation with OMEGA: Algorithm and Validation Using High Quality Structures from the Protein Databank and the Cambridge Structural Database J. Chem. Inf. Model 2010, 50, 572–584. [DOI] [PMC free article] [PubMed] [Google Scholar]
[33].Hawkins PCD; Nicholls A Conformer generation with OMEGA: learning from the data set and the analysis of failures. J. Chem. Inf. Model 2012, 52, 2919–2936. [DOI] [PubMed] [Google Scholar]
[34].Notredame C; Higgins DG; Heringa J T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol 2000, 302, 205–217. [DOI] [PubMed] [Google Scholar]
[35].Yan C; Zou X Computer-Aided Drug Discovery; Zhang W Eds.; Humana Press, New York, NY., 2015; pp153–166. [Google Scholar]
[36].Ma Z; Zou X Protein-Ligand Interactions and Drug Design; Ballante F Eds.; Humana Press, New York, NY., 2020; Vol. 2266. [Google Scholar]
[37].Ewing TJA; Kuntz ID Critical evaluation of search algorithms for automated molecular docking and database screening. J. Comput. Chem 1997, 18, 1175–1189. [Google Scholar]
[38].Meng EC; Shoichet BK; Kuntz ID Automated docking with grid-based energy approach to macromolecule-ligand interactions. J. Comput. Chem 1992, 13, 505–524. [Google Scholar]
[39].Nelder JA; Mead R A simplex method for function minimization. Computer J. 1965, 7, 308–313. [Google Scholar]
[40].Liu Z; Li Y; Han L; Li J; Liu J; Zhao Z; Nie W; Liu Y; Wang R PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 2015, 31, 405–412. [DOI] [PubMed] [Google Scholar]
[41].Wang R; Fang X; Lu Y; Wang S The PDBbind database: Collection of binding affinities for protein-ligand complexes with known three-dimensional structures. J. Med. Chem 2004, 47, 2977–2980. [DOI] [PubMed] [Google Scholar]
[42].Shatsky M; Nussinov R; Wolfson HJ A method for simultaneous alignment of multiple protein structures. Proteins 2004, 56, 143–56. [DOI] [PubMed] [Google Scholar]
[43].O’Boyle NM; Banck M; James CA; Morley C; Vandermeersch T; Hutchison GR Open Babel: An open chemical toolbox. J. Cheminform 2011, 56, 33. [DOI] [PMC free article] [PubMed] [Google Scholar]
[44].Rusinko A; Sheridan RP; Nilakantan R; Haraki KS; Bauman N; Venkataraghavan R Using CONCORD to construct a large database of three-dimensional coordinates from connection tables. J. Chem. Inf. Comput. Sci 1989, 29, 251–255. [Google Scholar]
[45].Cole JC; Murray CW; Nissink JWM; Taylor RD; Taylor R Comparing protein-ligand docking programs is difficult. Proteins 2005, 60, 325–332. [DOI] [PubMed] [Google Scholar]
[46].Haupt VJ; Daminelli S; Schroeder M Drug promiscuity in PDB: protein binding site similarity is key. PLoS One. 2013, 8, No. e65894. [DOI] [PMC free article] [PubMed] [Google Scholar]
[47].Tanramluk D; Schreyer A; Pitt WR; Blundell TL On the origins of enzyme inhibitor selectivity and promiscuity: a case study of protein kinase binding to staurosporine. Chem. Biol. Drug. Des. 2009, 74, 16–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
[48].Pettersen EF; Goddard TD; Huang CC; Couch GS; Greenblatt DM; Meng EC; Ferrin TE J. Comput. Chem 2004, 25, 1605–1612. [DOI] [PubMed] [Google Scholar]
[49].Kantarjian H; Sawyers C; Hochhaus A; Guilhot F; Schiffer C; Gambacorti-Passerini C; Niederwieser D; Resta D; Capdeville R; Zoellner U, et al. Hematologic and cytogenetic responses to imatinib mesylate in chronic myelogenous leukemia. N. Engl. J. Med 2002, 346, 645–652. [DOI] [PubMed] [Google Scholar]
[50].Demetri GD; von Mehren M; Blanke CD; Van den Abbeele AD; Eisenberg B; Roberts PJ; Heinrich MC; Tuveson DA; Singer S; Janicek M, et al. Efficacy and safety of imatinib mesylate in advanced gastrointestinal stromal tumors. N. Engl. J. Med 2002, 347, 472–480. [DOI] [PubMed] [Google Scholar]
[51].Lin Y-L; Meng Y; Jiang W; Roux B Explaining why Gleevec is a specific and potent inhibitor of Abl kinase. Proc. Natl. Acad. Sci. U. S. A 2013, 110,1664–1669. [DOI] [PMC free article] [PubMed] [Google Scholar]
[52].Lin Y-L; Roux B Computational analysis of the binding specificity of Gleevec to Abl, c-Kit, Lck, and c-Src tyrosine kinases. J. Am. Chem. Soc 2013, 135, 14741–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
[53].Lovera S; Sutto L; Boubeva R; Scapozza L; Dlker N; Gervasio FL The different flexibility of c-Src and c-Abl kinases regulates the accessibility of a druggable inactive conformation. J. Am. Chem. Soc 2012, 134, 2496–2499. [DOI] [PubMed] [Google Scholar]
[54].Agafonov RV; Wilson C; Otten R; Buosi V; Kern D Energetic dissection of Gleevec’s selectivity toward human tyrosine kinases. Nat. Struct. Biol 2014, 21, 848–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
[55].Bowman AL; Lerner MG; Carlson HA Protein flexibility and species specificity in structure-based drug discovery: Dihydrofolate reductase as a test system. J. Am. Chem. Soc 2007, 129, 3634–3640. [DOI] [PubMed] [Google Scholar]
[56].Chen YZ; Zhi DG Ligand-protein inverse docking and its potential use in the computer search of protein targets of a small molecule. Proteins 2001, 43, 217–226. [DOI] [PubMed] [Google Scholar]
[57].Paul N; Kellenberger E; Bret G; Muller P; Rognan D Recovering the true targets of specific ligands by virtual screening of the Protein Data Bank. Proteins 2004, 54, 671–680. [DOI] [PubMed] [Google Scholar]
[58].Gilson MK Sensitivity analysis and charge-optimization for flexible ligands: Applicability to lead optimization. J. Chem. Theory Comput 2006, 2, 259–270. [DOI] [PubMed] [Google Scholar]
[59].Rockey WM; Elcock AH Rapid computational identification of the targets of protein kinase inhibitors. J. Med. Chem 2005, 48, 4138–4152. [DOI] [PubMed] [Google Scholar]

[R1] [1].Huang S-Y; Zou X Advances and Challenges in Protein-Ligand Docking. Int. J. Mol. Sci 2010, 11, 3016–3034. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] [2].Huang S-Y; Grinter SZ; Zou X Scoring Functions and Their Evaluation Methods for Protein-Ligand Docking: Recent Advances and Future Directions. Phys. Chem. Chem. Phys 2010, 12, 12899–12908. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] [3].Grinter SZ; Zou X Challenges, Applications, and Recent A dvances of Protein-Ligand Docking in Structure-based Drug Design. Molecules 2014, 19, 10150–10176. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] [4].Kitchen DB; Decornez H; Furr JR; Bajorath J Docking and Scoring in Virtual Screening for Drug Discovery: Methods and Applications. Nat. Rev. Drug Discov 2004, 3, 935–949. [DOI] [PubMed] [Google Scholar]

[R5] [5].Brooijmans N; Kuntz ID Molecular recognition and docking algorithms. Annu. Rev. Biophys. Biomol. Struct 2003, 32, 335–373. [DOI] [PubMed] [Google Scholar]

[R6] [6].Meng X-Y; Zhang H-X; Mezei M; Cui M Molecular Docking: A powerful approach for structure-based drug discovery. Curr. Comput.-Aided Drug Des 2011, 7, 146–157. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] [7].Santiago DN; Pevzner Y; Durand AA; Tran M; Scheerer RR; Daniel K; Sung S-S; Woodcock HL; Guida WC; Brooks WH Virtual target screening: validation using kinase inhibitors. J. Chem. Inf. Model 2012, 52, 2192–2203. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] [8].Zahler S; Tietze S; Totzke F; Kubbutat M; Meijer L; Vollmar AM; Apostolakis J Inverse in silico screening for identification of kinase inhibitor targets. Chem. Biol 2007, 14, 1207–1214. [DOI] [PubMed] [Google Scholar]

[R9] [9].Bond JS Proteases: History, discovery, and roles in health and disease. J. Biol. Chem 2019, 294(5), 1643–1651. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Cohen P Protein kinases - the major drug targets of the twenty-first century?. Nat. Rev. Drug Discov 2002, 1, 309–315. [DOI] [PubMed] [Google Scholar]

[R11] [11].Cherry M; Williams DH Recent kinase and kinase inhibitor X-ray structures: Mechanisms of inhibition and selectivity insights. Curr. Med. Chem 2004, 11, 663–673. [DOI] [PubMed] [Google Scholar]

[R12] [12].Lahiry P; Torkamani A; Schork NJ; Hegele RA Kinase mutations in human disease: interpreting genotype-phenotype relationships. Nat. Rev. Genet 2010, 11, 60–74. [DOI] [PubMed] [Google Scholar]

[R13] [13].Cohen P; Alessi DR Kinase Drug Discovery Whats Next in the Field? ACS Chem. Biol 2013, 8, 96–104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] [14].Bhullar KS; Lagarn NO; McGowan EM; Parmar I; Jha A; Hubbard BP Rupasinghe HPV Kinase-targeted cancer therapies: progress, challenges and future directions. Mol. Cancer 2018, 17, 48. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] [15].Macchiarulo A; Nobeli I; Thornton JM Ligand selectivity and competition between enzymes in silico. Nat. Biotechnol 2004, 22, 1039–1045. [DOI] [PubMed] [Google Scholar]

[R16] [16].Seifert MH; Kraus J; Kramer B Virtual high-throughput screening of molecular databases. Curr. Opin. Drug Discov. Devel 2007, 10, 298–307. [PubMed] [Google Scholar]

[R17] [17].Slynko I; Scharfe M; Rumpf T; Eib J; Metzger E; Schule R; Jung M; Sippl W Virtual screening of PRK1 inhibitors: ensemble docking, rescoring using binding free energy calculation and QSAR model development. J. Chem. Inf. Model 2014, 54, 138–50. [DOI] [PubMed] [Google Scholar]

[R18] [18].Lamb ML; Burdick KW; Toba S; Young MM; Skillman AG; Zou X; Arnold JR; Kuntz ID Design, docking, and evaluation of multiple libraries against multiple targets. Proteins 2001, 42, 296–318. [DOI] [PubMed] [Google Scholar]

[R19] [19].Rockey WM; Elcock AH Progress toward virtual screening for drug side effects. Proteins 2002, 48, 664–671. [DOI] [PubMed] [Google Scholar]

[R20] [20].Sotriffer CA; Dramburg I “In situ cross-docking” to simultaneously address multiple targets. J. Med. Chem 2005, 48, 3122–3125. [DOI] [PubMed] [Google Scholar]

[R21] [21].Hardman JG; Limbird LE; Gilman AG The Pharmacological Basis of Therapeutics 10th ed; McGraw-Hill, New York. 2001. [Google Scholar]

[R22] [22].Kuntz ID; Blaney JM; Oatley SJ; Langridge R; Ferrin TE A geometric approach to macromolecule-ligand interactions. J. Mol. Biol 1982, 161, 269–288. [DOI] [PubMed] [Google Scholar]

[R23] [23].Allen WJ; Balius TE; Mukherjee S; Brozell SR; Moustakas DT; Lang PT; Case DA; Kuntz ID; Rizzo RC J. Comput. Chem 2015, 36, 1132–1156. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] [24].Ewing TJA; Makino S; Skillman AG; Kuntz ID DOCK 4.0: Search strategies for automated molecular docking of flexible molecule database. J. Comput. Aided Mol. Des 2001, 15, 411–428. [DOI] [PubMed] [Google Scholar]

[R25] [25].Huang S-Y; Zou X Ensemble docking of multiple protein structures: Considering protein structural variations in molecular docking. Proteins 2007, 66, 399–421. [DOI] [PubMed] [Google Scholar]

[R26] [26].Huang S-Y; Zou X Efficient molecular docking of NMR structures: Application to HIV-1 protease. Protein Sci. 2007, 16, 43–51. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] [27].Huang S-Y; Zou X. An iterative knowledge-based scoring function to predict protein-ligand interactions: I. Derivation of interaction potentials. J. Comput. Chem 2006, 27, 1865–1875. [DOI] [PubMed] [Google Scholar]

[R28] [28].Huang S-Y; Zou X An iterative knowledge-based scoring function to predict protein-ligand interactions: II. Validation of the scoring function. J. Comput. Chem 2006, 27, 1876–1882. [DOI] [PubMed] [Google Scholar]

[R29] [29].Yan C; Grinter SZ; Merideth BR; Ma Z; Zou X Iterative Knowledge-Based Scoring Functions Derived from Rigid and Flexible Decoy Structures: Evaluation with the 2013 and 2014 CSAR Benchmarks. J. Chem. Inf. Model 2015, 56, 1013–1021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] [30].Eid S; Turk S; Volkamer A; Rippmann F; Fulle S KinMap: a web-based tool for interactive navigation through human kinome data. BMC Biol. 2017, 18, 16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] [31].Berman HM; Westbrook J; Feng Z; Gilliland G; Bhat TN; Weissig H Shindyalov IN; Bourne PE The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] [32].Hawkins PCD; Skillman AG; Warren GL; Ellingson BA; Stahl MT Conformer Generation with OMEGA: Algorithm and Validation Using High Quality Structures from the Protein Databank and the Cambridge Structural Database J. Chem. Inf. Model 2010, 50, 572–584. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] [33].Hawkins PCD; Nicholls A Conformer generation with OMEGA: learning from the data set and the analysis of failures. J. Chem. Inf. Model 2012, 52, 2919–2936. [DOI] [PubMed] [Google Scholar]

[R34] [34].Notredame C; Higgins DG; Heringa J T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol 2000, 302, 205–217. [DOI] [PubMed] [Google Scholar]

[R35] [35].Yan C; Zou X Computer-Aided Drug Discovery; Zhang W Eds.; Humana Press, New York, NY., 2015; pp153–166. [Google Scholar]

[R36] [36].Ma Z; Zou X Protein-Ligand Interactions and Drug Design; Ballante F Eds.; Humana Press, New York, NY., 2020; Vol. 2266. [Google Scholar]

[R37] [37].Ewing TJA; Kuntz ID Critical evaluation of search algorithms for automated molecular docking and database screening. J. Comput. Chem 1997, 18, 1175–1189. [Google Scholar]

[R38] [38].Meng EC; Shoichet BK; Kuntz ID Automated docking with grid-based energy approach to macromolecule-ligand interactions. J. Comput. Chem 1992, 13, 505–524. [Google Scholar]

[R39] [39].Nelder JA; Mead R A simplex method for function minimization. Computer J. 1965, 7, 308–313. [Google Scholar]

[R40] [40].Liu Z; Li Y; Han L; Li J; Liu J; Zhao Z; Nie W; Liu Y; Wang R PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 2015, 31, 405–412. [DOI] [PubMed] [Google Scholar]

[R41] [41].Wang R; Fang X; Lu Y; Wang S The PDBbind database: Collection of binding affinities for protein-ligand complexes with known three-dimensional structures. J. Med. Chem 2004, 47, 2977–2980. [DOI] [PubMed] [Google Scholar]

[R42] [42].Shatsky M; Nussinov R; Wolfson HJ A method for simultaneous alignment of multiple protein structures. Proteins 2004, 56, 143–56. [DOI] [PubMed] [Google Scholar]

[R43] [43].O’Boyle NM; Banck M; James CA; Morley C; Vandermeersch T; Hutchison GR Open Babel: An open chemical toolbox. J. Cheminform 2011, 56, 33. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] [44].Rusinko A; Sheridan RP; Nilakantan R; Haraki KS; Bauman N; Venkataraghavan R Using CONCORD to construct a large database of three-dimensional coordinates from connection tables. J. Chem. Inf. Comput. Sci 1989, 29, 251–255. [Google Scholar]

[R45] [45].Cole JC; Murray CW; Nissink JWM; Taylor RD; Taylor R Comparing protein-ligand docking programs is difficult. Proteins 2005, 60, 325–332. [DOI] [PubMed] [Google Scholar]

[R46] [46].Haupt VJ; Daminelli S; Schroeder M Drug promiscuity in PDB: protein binding site similarity is key. PLoS One. 2013, 8, No. e65894. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] [47].Tanramluk D; Schreyer A; Pitt WR; Blundell TL On the origins of enzyme inhibitor selectivity and promiscuity: a case study of protein kinase binding to staurosporine. Chem. Biol. Drug. Des. 2009, 74, 16–24. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] [48].Pettersen EF; Goddard TD; Huang CC; Couch GS; Greenblatt DM; Meng EC; Ferrin TE J. Comput. Chem 2004, 25, 1605–1612. [DOI] [PubMed] [Google Scholar]

[R49] [49].Kantarjian H; Sawyers C; Hochhaus A; Guilhot F; Schiffer C; Gambacorti-Passerini C; Niederwieser D; Resta D; Capdeville R; Zoellner U, et al. Hematologic and cytogenetic responses to imatinib mesylate in chronic myelogenous leukemia. N. Engl. J. Med 2002, 346, 645–652. [DOI] [PubMed] [Google Scholar]

[R50] [50].Demetri GD; von Mehren M; Blanke CD; Van den Abbeele AD; Eisenberg B; Roberts PJ; Heinrich MC; Tuveson DA; Singer S; Janicek M, et al. Efficacy and safety of imatinib mesylate in advanced gastrointestinal stromal tumors. N. Engl. J. Med 2002, 347, 472–480. [DOI] [PubMed] [Google Scholar]

[R51] [51].Lin Y-L; Meng Y; Jiang W; Roux B Explaining why Gleevec is a specific and potent inhibitor of Abl kinase. Proc. Natl. Acad. Sci. U. S. A 2013, 110,1664–1669. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] [52].Lin Y-L; Roux B Computational analysis of the binding specificity of Gleevec to Abl, c-Kit, Lck, and c-Src tyrosine kinases. J. Am. Chem. Soc 2013, 135, 14741–53. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] [53].Lovera S; Sutto L; Boubeva R; Scapozza L; Dlker N; Gervasio FL The different flexibility of c-Src and c-Abl kinases regulates the accessibility of a druggable inactive conformation. J. Am. Chem. Soc 2012, 134, 2496–2499. [DOI] [PubMed] [Google Scholar]

[R54] [54].Agafonov RV; Wilson C; Otten R; Buosi V; Kern D Energetic dissection of Gleevec’s selectivity toward human tyrosine kinases. Nat. Struct. Biol 2014, 21, 848–53. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] [55].Bowman AL; Lerner MG; Carlson HA Protein flexibility and species specificity in structure-based drug discovery: Dihydrofolate reductase as a test system. J. Am. Chem. Soc 2007, 129, 3634–3640. [DOI] [PubMed] [Google Scholar]

[R56] [56].Chen YZ; Zhi DG Ligand-protein inverse docking and its potential use in the computer search of protein targets of a small molecule. Proteins 2001, 43, 217–226. [DOI] [PubMed] [Google Scholar]

[R57] [57].Paul N; Kellenberger E; Bret G; Muller P; Rognan D Recovering the true targets of specific ligands by virtual screening of the Protein Data Bank. Proteins 2004, 54, 671–680. [DOI] [PubMed] [Google Scholar]

[R58] [58].Gilson MK Sensitivity analysis and charge-optimization for flexible ligands: Applicability to lead optimization. J. Chem. Theory Comput 2006, 2, 259–270. [DOI] [PubMed] [Google Scholar]

[R59] [59].Rockey WM; Elcock AH Rapid computational identification of the targets of protein kinase inhibitors. J. Med. Chem 2005, 48, 4138–4152. [DOI] [PubMed] [Google Scholar]

PERMALINK

Rapid Identification of Inhibitors and Prediction of Ligand Selectivity for Multiple Proteins: Application to Protein Kinases

Zhiwei Ma

Sheng-You Huang

Fei Cheng

Xiaoqin Zou

Abstract

Graphical Abstract

1. Introduction

2. Materials and Methods