Abstract
Increased availability of bioinformatics resources is creating opportunities for the application of network pharmacology to predict drug effects and toxicity resulting from multi-target interactions. Here we present a high-precision computational prediction approach that combines two elaborately built machine learning systems and multiple molecular docking tools to assess binding potentials of a test compound against proteins involved in a complex molecular network. One of the two machine learning systems is a re-scoring function to evaluate binding modes generated by docking tools. The second is a binding mode selection function to identify the most predictive binding mode. Results from a series of benchmark validations and a case study show that this approach surpasses the prediction reliability of other techniques and that it also identifies either primary or off-targets of kinase inhibitors. Integrating this approach with molecular network maps makes it possible to address drug safety issues by comprehensively investigating network-dependent effects of a drug or drug candidate.
Introduction
Drugs may interact with numerous molecules in the human body. Approximately 35% of known drugs or drug leads present multi-target activity [1]. Even when a drug is claimed to have high selectivity, it probably binds to proteins that are not identified as targets. Such unexpected off-target interactions may result in adverse reactions, which increase therapeutic risks and negatively impact drug development. An example of this is the cardiotoxicity of the tyrosine kinase inhibitor Sunitinib [2]. Concerns surrounding the use of this anti-cancer drug have arisen due to its adverse side effects. Its unanticipated inhibition of members of the ribosomal S6 kinase (RSK) and AMP-activated protein kinase (AMPK) families are at least partly responsible for the drug's cardiotoxicity [3]. Since more than two hundred proteins associated with cardiovascular diseases have been identified [4], treatment with low-selectivity drugs can have many unexpected effects. In contrast, designing drugs with multi-target therapeutic application is of increasing interest to the drug discovery community. Compared with single-target agents, drugs that regulate multiple proteins have the potential to improve the balance of efficacy and safety [5], although minimizing their toxicity remains challenging. As an example, the treatment of neurodegenerative diseases has progressed a multi-target strategy [6]. Though some multi-target drugs prove beneficial, their discovery and the identification of other clinically relevant targets is often accidental, and their final application may differ radically from their original design. Sorafenib, for example, was initially developed as a RAF kinase inhibitor, but its therapeutic contribution in curing renal and hepatocellular cancers was later ascribed to its inhibition of VEGFR2 and PDGFR, and probably other targets as well [7].
To comprehensively assess pharmacological effects, systems pharmacology has been developed [8], [9], in which various bioinformatics resources assessing different structural levels, from molecules to systems are integrated. A well-curated, comprehensive molecular interaction network is the focal point of the systems pharmacological approach. Such a network can reveal causes and effects of protein interactions over signaling networks, metabolic networks, and other related pathways. With a deeply curated network map that describes signaling cascades and interactions among molecules, one can carry out network-based screening to systematically identify target proteins of a given drug candidate and to assess its impact. Thus, network-based screening appears promising for drug repurposing and safety prediction.
Various bioinformatics resources including biological databases, signaling network construction tools, and molecular modeling software have been developed, allowing a great opportunity to meet the demands of rapid systematic screening. Given the rich data and algorithmic resources availability on one side, and urgent needs to capture poly-pharmacological effects of drugs and candidates on the other side, one obvious challenge is to develop a computational method that can accurately predict a drug's effects across molecular networks. Doing this requires development of high-precision molecular docking simulation systems, and applying them over molecular networks to compute aggregated effects of drugs.
Issues in molecular docking simulation
Molecular virtual docking is an efficient computational method to rapidly calculate the binding potential of a small molecule, such as a drug or candidate, to a target protein. It is widely used in computer-aided drug discovery due to its speed and low cost [10]. This method is mainly used to dock a small molecule to a protein structure (i.e. pose generation) and to evaluate its potential complementarity with the defined binding site (scoring). Studies assessing the performance and accuracy of various commonly used molecular docking programs indicate that those packages are able to perform pose generation well, since most of the generated binding modes are conformationally similar to the corresponding co-crystallized ligands, but their scoring functions are still too inaccurate for a reliable prediction [11]–[13]. Plewczynski and colleagues evaluated seven popular docking programs, including Surflex [14], LigandFit [15], Glide [16], GOLD [17], FlexX [18], eHiTS [19] and AutoDock [20] on the PDBbind database [21]. The best Pearson correlations between predicted binding affinities (or scores) and experimental values were 0.38 or lower [13]. Thus, correctly predicting the binding affinity of a given protein-ligand complex continues to be one of the most challenging issues in docking simulation. Scoring algorithms such as X-Score [22] and RF-Score [23], have attempted to improve accuracy, and the best Pearson correlation value measured was 0.776 [23]. However, this correlation was obtained only in an ideal situation where binding interactions of co-crystallized complexes were directly evaluated without considering the influence of the pose prediction that is required to re-dock the native ligand to its target protein prior to scoring. When performing both pose generation and scoring function, the correlation might decrease.
In order to overcome these problems in docking simulation, multiple docking tools plus scoring functions can be applied to a given docking study to improve performance (Figure S1). This yields more than one score for each test and the best among them is identified by referring to the corresponding experimental binding affinity. Correlations can be improved from 0.61 to 0.84, depending on the tools used. However, it should be noted that the highly accurate correlation of 0.84 was achieved by manually selecting the best predictions from multiple simulators. For this approach to be practical, the best prediction from multiple simulators has to be selected automatically. To accomplish this, we developed a novel scoring approach employing two machine learning systems, which were embedded as a part of a pipeline implementing a network-based screening approach that integrates curated signaling networks, bioinformatics databases, and molecular docking simulation to comprehensively and rapidly evaluate potential binding affinities of given drugs against proteins involved in a signaling network.
Results
Machine learning systems for improving docking simulation
The first machine learning system we employed (A) was a re-scoring function developed to assess binding modes generated by docking tools and to rank them accordingly. Machine learning system B was a binding mode selection function designed to identify the most predictive binding mode from those originated in the previous step. A test case for these two systems is illustrated in Figure S2. Both systems were built and validated by using the PDBbind version 2007 refined set which contains 1300 protein-ligand complexes and is considered a high-quality standard dataset for theoretical studies on molecular recognition.
1. Machine learning system A: a re-scoring function
Reliability of docking simulations depends upon performance of the scoring function. A recent developed method known as RF-Score [23] allows better predictions compared with other scoring functions. RF-Score applies a non-parametric machine learning algorithm called Random Forest [24] to predict protein-ligand binding affinity by assessing the number of occurrences of various protein-ligand interatomic contacts (Supplementary Table S1) within a specified distance. We adopted and further revised RF-Score in this work, so that we not only considered intermolecular interactions, but also included the quantitative structure–activity relationship (QSAR) in the machine learning model in order to extend its modeling assumptions. Molecular physicochemical properties of test compounds were parameterized as predictors in the modeling exercise (Table S2).
2. Machine learning system B: a binding mode selection function
Following the re-scoring function, the second machine learning system developed in this work was a binding mode selection function, designed to assess binding modes and to identify the best predictor (Figure S3). This learning system used a multinomial logistic regression method [25]. It employed supervised learning algorithm capable of predicting probabilities of categorical placement among more than two discrete outcomes, based on a set of independent variables. Similar to binary logistic regression, multinomial logistic regression uses maximum likelihood estimation to calculate the probability of categorical outcomes and allows different types of independent variables in building a model. In the present study, independent variables were the same as the predictors used in machine learning system A. Predicted categorical outcomes include the three top-score binding modes generated by docking tools, including eHiTS, GOLD and AutoDock VINA [26]. Through the binding mode selection function, molecular interactions of the three binding modes together with molecular properties of the test compound were assessed, and finally one of them was selected which was predicted as most reliable for a particular docking study.
3. Validation using PDBbind benchmark
The performance of screening with these two machine learning systems was validated on the demanding PDBbind benchmark. For testing with less bias, the validation was conducted with a re-docking experiment (Materials and Methods). Of the 1,300 complexes, 195 structures (15%) were randomly selected as the test set and remaining 1,105 structures were employed as the training set (85%). The training process with the random test/training partition was iterated 25 times in order to comprehensively assess the effects of varying the dataset. Pearson correlations between the predicted scores and corresponding experimental binding affinities were measured.
Application of external re-scoring functions (i.e. X-Score and RF-Score) improved correlations compared with the employment of docking simulations alone using the default functions equipped in the docking tools (Figure 1). Application of machine learning systems A + B was the most effective (R = 0.82, average of 25 tests). When the benchmark data were replaced with the newer PDBbind version 2012, refined set (2,897 complexes) with the same validation procedure, machine learning systems A + B also showed a strong correlation (average R = 0.76). For more extensive testing, we applied machine learning systems built with a training set (1,105 complexes from PDBbind 2007) to predict a larger test set composed of 1,792 new complexes found in PDBbind 2012. Unlike the previous validations with a test/training partition of 15%, the ratio in this test increased to 162% (1,792 V.S. 1,105). Not surprisingly, the average correlation (R = 0.65) was not as strong as observed in the previous validation, but it still presents a competitive performance compared with other methods [13]. These results indicate that the application of multiple docking tools together with the predictive machine learning systems is capable of estimating the ligand binding strength better than the use of a single docking tool and other scoring functions. Some of the consensus methods [27]–[29] apply multiple docking tools and combine the reported scores through weighting or normalizing methods for the consensus procedure. Unlike those methods, our screening approach reports a score which is a negative logarithm of experimental dissociation/inhibition constant value (pKd/pKi) usually ranging from 0 to 10 (i.e. from weak to strong binding), allowing a straightforward indication of binding strength.
The re-scoring function (i.e. machine learning system A) is essentially a learning model built upon a training set from co-crystallized complexes in PDBbind with a range of experimentally determined binding affinities. The learning model was capable of predicting the strength of binding, but was not necessarily able to recognize a compound's activity (i.e. bound or not bound to a target protein), because information about unbound compounds (inactive) was absent from the training data. In order to introduce information about inactive substances into the learning model, we added a substantial number of dummy entries to the training set, which are the binding modes of a set of test compounds against various proteins (Tables S3 and Table S4) generated by docking tools. While the test compounds were experimentally confirmed as inactive against target proteins, their binding affinity values (pKd/pKi) were set to 1. Consequently, the machine learning model contained information on both groups of compounds (i.e. active and inactive).
A case study of target identification for kinase inhibitors
Karaman et al. proposed a quantitative analysis of kinase inhibitor selectivity against a substantial number of kinases using an in-vitro competition binding assay [30]. Because that study analyzed global observed interaction patterns of numerous compounds against diverse kinases, it was of great interest to assess the consistency between bioassay results and our predictive approach. Considering structure availability and binding site certainty, we selectively downloaded a set of co-crystallized kinase structures from the PDB database [31] (Materials and Methods). Finally, we selected 139 different kinases in 8 kinase groups for docking simulations (Table S5). Tested compounds include 33 kinase inhibitors interacting with various primary targets (Table S3). Karaman et al. proposed the calculation of a selectivity score (S) for each test compound, dividing the number of kinases interacting with a dissociation constant <3 µM by the number of kinases tested. A lower selectivity score indicates that a compound only interacts with a small number of target proteins, implying a lower potential for off-target effects. This constant (3 µM) is equal to a docking score 5.52 pKd using a negative logarithmic calculation, so we set 5.52 pKd as our cutoff value to “predicted selectivity score (S) = number of kinases docked with score >5.52/total number of kinases tested”. For example, if a test compound is docked to 100 different target proteins and 45 have a docking score >5.52, its selectivity score is 0.45.
The majority of the predicted selectivity scores were similar to experimentally measured values (Figure 2), though the screening approach tended to overestimate binding affinity in some cases. Some of the predicted selectivity scores are fairly high compared with the referred bioassay. The docking scores of each compound against the kinases are tabulated in Table S6. The screening approach succeeded in identifying a half of the primary target proteins (50% of 50), and when a tolerance is given (docking score >4.52 as cutoff value for calculating selectivity score) it identified 68% of them. Figure 3 shows the performance of the 15 high-selectivity inhibitors in identifying the off-targets. Off-target proteins are proteins other than primary targets that interacted with an inhibitor with a binding affinity <3 µM (Karaman et al. [30]). The screening approach was able to recognize one or more off-targets for the most of inhibitors, and was only clearly unsatisfactory in the cases of compounds SB_431542, PI_103, and CP_690550. For instance, compound GW-2580 was originally designed to interact with CSF1R kinase (Kd = 1.6 nM in bioassay). Through screening, a potential off-target protein known as TRKB was suggested (docking score = 6.03) and its bioassay also indicated a stronger binding affinity (Kd = 36 nM). Similarly, the docking simulation revealed that EGFR is also a likely off-target for CP_724714 (docking score = 8.11 and experimental Kd = 42 nM). Overall, the screening approach succeeded in finding more than 32% of off-targets (25 out of 78) of the 15 selective inhibitors, and this score was improved (∼41%) when a tolerance was given.
Integrating into a network-based screening system
The system described in this article is a plug-in to CellDesigner [32], enabling molecular docking simulations to be performed with specified molecules in a network loaded to CellDesigner (Figure 4). CellDesigner is one of the most widely used graphical editors for deep curation [8] and is capable of capturing a large-scale signaling network consisting of more than a thousand molecular species and reactions [33].
For the sake of simplicity, we used a simple EGFR signaling network edited by CellDesigner as an example, although a large-scale comprehensive network would be used for real cases. The EGFR network describes signaling cascades of 14 different proteins with 27 known reactions (Figure S4). Identities of the proteins (i.e. protein names) shown on the network map were first retrieved by the CellDesigner plugin API, to look up the referred protein structures in 3D through a protein identity-to-structure mapping system. Protein identity information stored in the mapping system was mainly obtained from EMBL-EBI [34], and 3D structures deposited were from the PDB database. Through the plug-in GUI interface, users can quickly select structures of proteins listed on the network for large-scale screening. By default, the protein binding site for subsequent docking simulation is automatically defined as that to which the biggest native ligand of the co-crystallized complex binds. If only an orphan protein structure is available or an alternative site is preferred, users can easily specify another site through the plug-in. Users can upload a reasonable number of test compound structures in 1D, 2D, or 3D, and the screening pipeline accepts most common formats, such as MDL SDF [35], Sybyl Mol2 [36] and Daylight SMILES [37]. Alternatively, users can compose a compound structure manually using a built-in chemical editor [38].
Following structure file preparation, docking simulation is then carried out using multiple docking tools, including eHiTS, GOLD, and AutoDock VINA. Machine learning systems A + B were then applied to evaluate the binding potential between the test compound and a protein. Screening was iterated to assess subsequent proteins until all of the pathway proteins have been tested. Subsequently, docking scores, that is, the predicted binding affinities of the test compound against all of the network proteins, are converted into a white-to-red color scale from 0 to 10 (pKd/pKi; see Figure 4). A docking score that exceeds the range is set as 10. Colored results are then projected to the network map to directly display predicted binding affinities. Proteins in gray indicate cases in which 3D structures were not available, or none of the binding modes is generated by the docking tools. Through the colored network map generated by the system, users can efficiently investigate potential bioactivity of a test compounds against numerous proteins and can also carry out a perspective inspection for the results throughout a complex signaling network. For example, a test compound had a strong interaction with MEK protein. As a consequence, not only MEK might be functionally impaired, but the phosphorylation of all downstream target molecules (e.g. ERK and RSK2) would be inhibited, negatively impacting regulation of the cell cycle. Blocking MEK-ERK-RSK2 signaling results in reduced activation of several transcriptional regulators of proliferative genes in tumor cells (e.g. melanomas and myeloma) and suppresses cell growth [39]. In addition, side effects, such as dermatologic toxicity caused by MEK inhibitors, have been reported because inhibition of MAPK signaling causes decreased cell migration, keratinocyte cell death, and inflammation [40]. Since Figure S4 presents a very simple pathway map, it does not show possible interactions with proteins in other pathways. However, the use of larger, more comprehensive maps for the EGFR pathways [41], Toll-Like Receptor pathways[42] and mTOR pathways [33], or maps developed for specific biological processes related to potential side-effects, would uncover such risks.
Discussion
Docking simulation, a screening method to rapidly assess a test compound's binding activity, is especially helpful in early stage pharmacology studies. We developed a docking method using machine learning approach, which integrates features of structure-based rational drug design and QSAR into the learning models to enhance performance in molecular recognition. Docking simulation conducted by machine learning systems A + B provides improved reliability in predicting binding potentials and the capability of identifying potential targets. To achieve more accurate prediction, further integration of other computer-aided technology is feasible, such as the application of molecular dynamics (MD) after docking. Together with a curated signaling map, the network-based screening approach is able to comprehensively characterize the underlying mechanism of a drug candidate's activity and also to interpret the cascade effects of modulated targets. Adverse side effects constitute an enormous cost in drug development. By applying network-based screening, drug developers can reduce the possibility of marketing a drug with unfavorable drug-target interactions. On the other hand, it also provides an opportunity to rationally optimize inhibitor polypharmacology for treating complex diseases, such as cancer, neurodegenerative disorders, cardiovascular disease, and metabolic syndromes.
Materials and Methods
Re-docking experiment for model validation
1. Molecular structure files: Protein-ligand complex files for re-docking experiments were obtained from the PDBbind database. To validate predictive models with less bias, native ligands of the co-crystallized complexes were first extracted and converted into 2D using Open Babel [43]. For the following docking simulation, 2D structures were then re-converted to 3D using a 3D structure generator called CORINA version 3.4 [44].
2. Molecular docking simulation packages: Native ligands were docked to their corresponding target proteins using eHiTS, GOLD, and AutoDock VINA (Table S7). These docking tools are used to generate numerous binding modes of the test compound in a defined binding site, and the number of binding modes generated varies with the docking tools. For a docking simulation, eHiTS was set to output 1000 conformations for each docking study. Considering the computing speed of GOLD, we set the maximum as 300. The maximum binding mode of AutoDock VINA varies with an energy range of 10 (kcal/mol).
3. Application of machine learning systems: Binding modes generated by the three docking tools were re-scored by machine learning system A, and only the three top-score candidates in each set were retained. Subsequently, machine learning system B assessed the three top-score candidates and identified the most predictive one. Modeling exercises of the machine learning systems A and B were conducted using the R statistical package. The Random Forest algorithm was applied to build machine learning system A, which was implemented in “randomForest” (Breiman and Cutler's random forests for classification and regression) module. For machine learning system B, the multinomial logistic regression of “nnet” (Feed-forward Neural Networks and Multinomial Log-Linear Models) and “MASS” (Modern Applied Statistics with S. Fourth Edition) modules was utilized.
4. Re-docking result: The Pearson correlation coefficient between the predicted docking scores and the experimental binding affinities was calculated using R to determine the predictiveness of the screening approach.
Case study of target identification for kinase inhibitors
1. Protein structure files: protein structures collected from the PDB database complied with the following criteria: 1) X-ray structures with resolution of 2.5Å or better, if available 2) if two or more structures were available, that with the best solution was selected 3) a structure with a ligand bound to its nucleotide binding site was selected 4) non-modified and non-phosphorylated residues found in the binding site were selected with priority 5) the organism was human.
2. Test compound files: test compound structure files in 2D format were downloaded from PubChem, and converted into 3D using CORINA version 3.4 for the docking simulation.
3. Molecular docking simulation: the use of the docking tools was the same as mentioned in the re-docking experiment.
Supporting Information
Acknowledgments
The authors would like to thank Yoshiyuki Asai, Takeshi Abe, Yukiko Matsuoka and Steve Aird for critical discussion, and Kazuhiro Fujita for composing the EGFR signaling map.
Funding Statement
This work was supported by the HD-Physiology Project of the Japan Society for the Promotion of Science (JSPS) to the Okinawa Institute of Science and Technology Graduate University (OIST). This work, in part, has been carried out as research collaboration between the United States Food and Drug Administration (FDA) and the Systems Biology Institute (SBI) under Memorandum of Understanding Number 225-12-8000. Funding agencies had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1. Xie L, Li J, Xie L, Bourne PE (2009) Drug discovery using chemical systems biology: identification of the protein-ligand binding network to explain the side effects of CETP inhibitors. PLoS Comput Biol 5: e1000387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Kerkela R, Woulfe KC, Durand JÄ, Vagnozzi R, Kramer D, et al. (2009) Sunitinib-Induced Cardiotoxicity Is Mediated by Off-Target Inhibition of AMP-Activated Protein Kinase. Clin Transl Sci 2: 15–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Force T, Krause DS, Van Etten RA (2007) Molecular mechanisms of cardiotoxicity of tyrosine kinase inhibition. Nat Rev Cancer 7: 332–344. [DOI] [PubMed] [Google Scholar]
- 4. Cases M, Mestres J (2009) A chemogenomic approach to drug discovery: focus on cardiovascular diseases. Drug Discov Today 14: 479–485. [DOI] [PubMed] [Google Scholar]
- 5. Csermely P, Agoston V, Pongor S (2005) The efficiency of multi-target drugs: the network approach might help drug design. Trends Pharmacol Sci 26: 178–182. [DOI] [PubMed] [Google Scholar]
- 6.Geldenhuys WJ, Van der Schyf CJ (2013) Designing drugs with multi-target activity: the next step in the treatment of neurodegenerative disorders. Expert opinion on drug discovery: 1–15. [DOI] [PubMed]
- 7. Wilhelm S, Carter C, Lynch M, Lowinger T, Dumas J, et al. (2006) Discovery and development of sorafenib: a multikinase inhibitor for treating cancer. Nat Rev Drug Discovery 5: 835–844. [DOI] [PubMed] [Google Scholar]
- 8. Ghosh S, Matsuoka Y, Asai Y, Hsin K-Y, Kitano H (2011) Software for systems biology: from tools to integrated platforms. Nat Rev Genet 12: 821–832. [DOI] [PubMed] [Google Scholar]
- 9. Bai JP, Abernethy DR (2013) Systems Pharmacology to Predict Drug Toxicity: Integration Across Levels of Biological Organization. Annu Rev Pharmacol Toxicol 53: 451–473. [DOI] [PubMed] [Google Scholar]
- 10. Chen Y, Shoichet BK (2009) Molecular docking and ligand specificity in fragment-based inhibitor discovery. Nat Chem Biol 5: 358–364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Cheng T, Li X, Li Y, Liu Z, Wang R (2009) Comparative assessment of scoring functions on a diverse test set. J Chem Inf Model 49: 1079–1093. [DOI] [PubMed] [Google Scholar]
- 12. Englebienne P, Moitessier N (2009) Docking ligands into flexible and solvated macromolecules. 4. Are popular scoring functions accurate for this class of proteins? J Chem Inf Model 49: 1568–1580. [DOI] [PubMed] [Google Scholar]
- 13. Plewczynski D, Łaźniewski M, Augustyniak R, Ginalski K (2011) Can we trust docking results? Evaluation of seven commonly used programs on PDBbind database. J Comput Chem 32: 742–755. [DOI] [PubMed] [Google Scholar]
- 14. Jain AN (2003) Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. J Med Chem 46: 499–511. [DOI] [PubMed] [Google Scholar]
- 15. Venkatachalam C, Jiang X, Oldfield T, Waldman M (2003) LigandFit: a novel method for the shape-directed rapid docking of ligands to protein active sites. J Mol Graph Model 21: 289–307. [DOI] [PubMed] [Google Scholar]
- 16. Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, et al. (2004) Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem 47: 1739–1749. [DOI] [PubMed] [Google Scholar]
- 17. Verdonk ML, Cole JC, Hartshorn MJ, Murray CW, Taylor RD (2003) Improved protein–ligand docking using GOLD. Proteins: Struct, Funct, Bioinf 52: 609–623. [DOI] [PubMed] [Google Scholar]
- 18. Rarey M, Kramer B, Lengauer T, Klebe G (1996) A fast flexible docking method using an incremental construction algorithm. J Mol Biol 261: 470–489. [DOI] [PubMed] [Google Scholar]
- 19. Zsoldos Z, Reid D, Simon A, Sadjad BS, Peter Johnson A (2006) eHiTS: an innovative approach to the docking and scoring function problems. Curr Protein Pept Sci 7: 421–435. [DOI] [PubMed] [Google Scholar]
- 20. Morris GM, Goodsell DS, Halliday RS, Huey R, Hart WE, et al. (1998) Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J Comput Chem 19: 1639–1662. [Google Scholar]
- 21. Wang R, Fang X, Lu Y, Wang S (2004) The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. J Med Chem 47: 2977–2980. [DOI] [PubMed] [Google Scholar]
- 22. Wang R, Lai L, Wang S (2002) Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J Comput-Aided Mol Des 16: 11–26. [DOI] [PubMed] [Google Scholar]
- 23. Ballester PJ, Mitchell JB (2010) A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking. Bioinformatics 26: 1169–1175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Breiman L (2001) Random forests. Machine learning 45: 5–32. [Google Scholar]
- 25. Böhning D (1992) Multinomial logistic regression algorithm. Annals of the Institute of Statistical Mathematics 44: 197–200. [Google Scholar]
- 26. Trott O, Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31: 455–461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Oda A, Tsuchida K, Takakura T, Yamaotsu N, Hirono S (2006) Comparison of consensus scoring strategies for evaluating computational models of protein-ligand complexes. J Chem Inf Model 46: 380–391. [DOI] [PubMed] [Google Scholar]
- 28. Teramoto R, Fukunishi H (2008) Consensus scoring with feature selection for structure-based virtual screening. J Chem Inf Model 48: 288–295. [DOI] [PubMed] [Google Scholar]
- 29. Plewczynski D, Łażniewski M, Grotthuss MV, Rychlewski L, Ginalski K (2011) VoteDock: consensus docking method for prediction of protein–ligand interactions. J Comput Chem 32: 568–581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Karaman MW, Herrgard S, Treiber DK, Gallant P, Atteridge CE, et al. (2008) A quantitative analysis of kinase inhibitor selectivity. Nat Biotechnol 26: 127–132. [DOI] [PubMed] [Google Scholar]
- 31. Rose PW, Bi C, Bluhm WF, Christie CH, Dimitropoulos D, et al. (2013) The RCSB Protein Data Bank: new resources for research and education. Nucleic Acids Res 41: D475–D482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Kitano H, Funahashi A, Matsuoka Y, Oda K (2005) Using process diagrams for the graphical representation of biological networks. Nat Biotechnol 23: 961–966. [DOI] [PubMed] [Google Scholar]
- 33. Caron E, Ghosh S, Matsuoka Y, Ashton-Beaucage D, Therrien M, et al. (2010) A comprehensive map of the mTOR signaling network. Mol Syst Biol 6: 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.EMBL-EBI (2013) Bioinformatics services - molecular databases http://www.ebi.ac.uk/services.Accessed 2013 Nov. 20.
- 35. Dalby A, Nourse JG, Hounshell WD, Gushurst AK, Grier DL, et al. (1992) Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited. J Chem Inf Comput Sci 32: 244–255. [Google Scholar]
- 36.Tripos (2009) Tripos Sybyl Mol2 File Format.
- 37. Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28: 31–36. [Google Scholar]
- 38. Trepalin SV, Yarkov AV, Pletnev IV, Gakh AA (2006) A Java chemical structure editor supporting the modular chemical descriptor language (MCDL). Molecules 11: 219–231. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Ferguson J, Arozarena I, Ehrhardt M, Wellbrock C (2012) Combination of MEK and SRC inhibition suppresses melanoma cell growth and invasion. Oncogene 32: 86–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Manousaridis I, Mavridou S, Goerdt S, Leverkus M, Utikal J (2013) Cutaneous side effects of inhibitors of the RAS/RAF/MEK/ERK signalling pathway and their management. J Eur Acad Dermatol Venereol 27: 11–18. [DOI] [PubMed] [Google Scholar]
- 41. Oda K, Matsuoka Y, Funahashi A, Kitano H (2005) A comprehensive pathway map of epidermal growth factor receptor signaling. Mol Syst Biol 1: 0010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Oda K, Kitano H (2006) A comprehensive map of the toll-like receptor signaling network. Mol Syst Biol 2: 0015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. O'Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, et al. (2011) Open Babel: An open chemical toolbox. Journal of cheminformatics 3: 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Sadowski J, Gasteiger J, Klebe G (1994) Comparison of automatic three-dimensional model builders using 639 X-ray structures. J Chem Inf Comput Sci 34: 1000–1008. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.