Abstract
Druggable Protein–protein Interaction Assessment System (Dr. PIAS) is a database of druggable protein–protein interactions (PPIs) predicted by our support vector machine (SVM)-based method. Since the first publication of this database, Dr. PIAS has been updated to version 2.0. PPI data have been increased considerably, from 71 500 to 83 324 entries. As the new positive instances in our method, 4 PPIs and 10 tertiary structures have been added. This addition increases the prediction accuracy of our SVM classifier in comparison with the previous classifier, despite the number of added PPIs and structures is small. We have introduced the novel concept of ‘similar positives’ of druggable PPIs, which will help researchers discover small compounds that can inhibit predicted druggable PPIs. Dr. PIAS will aid the effective search for druggable PPIs from a mine of interactome data being rapidly accumulated. Dr. PIAS 2.0 is available at http://www.drpias.net.
Database URL: http://www.drpias.net.
Introduction
Modulating protein–protein interactions (PPIs) using small compounds can greatly contribute to the therapeutic intervention of various human diseases, since most proteins in a cell function by interacting with other proteins (1–4). Many proteins such as membrane receptors and enzymes have been intensively studied as drug targets, and various biological information on the target proteins have been stored in some public databases, for example Therapeutic Target Database (5) and SuperTarget (6). In contrast, there are only a limited number of drug target PPIs deposited in databases, TIMBAL (7) and 2P2IDB (8), despite the biological importance of the PPIs.
To date, we have developed novel methodologies to assess the druggability [also called ‘ligandability’ (9)] of PPIs. Our approach is based on support vector machine (SVM) utilizing the physicochemical properties of the PPI-inhibitor-binding pockets (structural attributes), the number of drugs/chemicals that target interacting proteins (drug/chemical attributes) and available information on biological function such as diseases, pathways, gene ontologies and gene expression profiles (functional attributes) (10, 11). By applying our methodologies to human PPIs, we have predicted their druggabilities (11). We created a database of predicted druggable PPIs, named Dr. PIAS, aiming at helping researchers effectively explore druggable PPIs from interactome data (12). Here, we introduce Dr. PIAS version 2.0, which contains novel features not present in the first version of the database.
Novel Features of DR. PIAS 2.0
PPI data
We have retrieved PPI data from three resources, mainly focusing on human, mouse, rat and human immunodeficiency virus proteins. One of the resources is the Entrez Gene (13). It integrates PPI data from BIND (14), BioGRID (15) and HPRD (16) databases, and thus includes most human PPIs experimentally identified to date. Other two resources are the Genome Network Platform in Japan (http://genomenetwork.nig.ac.jp/index_e.html) and several publications (17–19), and many PPIs in these resources have not been registered in the Entrez Gene yet. In the current version, Dr. PIAS contains 83 324 PPIs, a considerable increase from the 71 500 PPIs in the previous version. PPIs between human proteins are most abundant (72 130), followed by mouse PPIs (4819).
New positive instances in our SVM-based method
Dr. PIAS assesses the druggability of PPIs based on one of the supervised machine learning method, SVM, using the computational program package Libsvm (http://www.csie.ntu.edu.tw/∼cjlin/libsvm/). In our previous studies, we used 30 well-studied drug target PPIs as the positive instances in our SVM-based method (11). Progress in the research area of drug target PPIs has allowed us to add 4 PPIs (CREBBP/TP53, MDM4/TP53, RAF1/YWHAZ and S100B/TP53) and 10 tertiary structures (PDB entries 2D82, 2OPY, 2W3L, 3HCM, 3INQ, 3JZK, 3LBJ, 3LBK, 3LBL and 3RDH) as the new positives. These structures are complexes of a protein from a PPI and a compound inhibiting the PPI. Although the number of added PPIs and structures is small, in the cross-validation tests, these additions increase the accuracy and specificity from 80.5% and 80.5% in the previous SVM classifier to 83.2% and 84.7% in the new classifier, respectively. Sensitivity remains constant at 81.6%. Therefore, the new classifier used in Dr. PIAS 2.0 has more discriminative power for druggable and non-druggable PPIs than the previous one.
Application of the new SVM classifier to the positives shows that the mean value of druggability scores (using all attributes) of the positives is 0.908 (Supplementary Table S1). Among 72 096 human PPIs (except for the positives) in Dr. PIAS, 41 have a score ≥0.908 and can thus be considered to be ‘highly druggable’ (Supplementary Table S2).
Similar positives of druggable PPIs
In the current version, we introduced a novel concept of ‘similar positives’ of druggable PPIs. When a PPI is judged to be highly druggable by our method, the information on which positive is most similar to the PPI will be valuable for further investigation of the PPI from a viewpoint of discovering small inhibitory compounds. As described in our previous papers, we calculated the ‘druggability score’ of a PPI as the number of times the PPI was judged to be positive in a 10 000 times training-prediction iteration using 10 000 random training data (11, 12). A high druggability score of a PPI indicates that the PPI has a feature vector highly similar to the positives. If a PPI (test instance) was judged as positive in a process, we calculated similarities between the feature vector of the test and that of each positive. Similarities were measured using a radial basis function kernel, k(xpositive, xtest) = exp (−γ||xpositive − xtest||2) (where γ > 0). When a positive gives the largest k(xpositive, xtest) among all measurements, it is identified as most closely located (most similar) to the test in a feature space of SVM.
Two examples are shown in Figure 1. Both IL1B/IL1R1 and XIAP/DIABLO are the positives and also are used as the tests in this case. They have high druggability scores of 0.8652 and 0.8305, respectively, when assessed using all attributes. It is reasonable that the feature vector of IL1B/IL1R1 is essentially identical to itself (Figure 1A). On the other hand, Figure 1B indicates that the feature vectors of XIAP/DIABLO have high frequencies to be located nearest to each other in the feature space of SVM. Similarity matrix of the positives based on all attributes (Supplementary Table S3) clearly shows that the identical PPIs (only the structural attributes are different) and the homologous PPIs form a cluster in the feature space. This trend is also observed in similarity matrices based on structural (Supplementary Table S4), drug/chemical (Supplementary Table S5) and functional attributes (Supplementary Table S6). [In these tables, similarity scores range from 0 (dissimilar) to 10 000 (highly similar or identical).] In these matrices, however, similarities not only within clusters but also between clusters are observed. For example, in Supplementary Table S4, all instances in the cluster of EGFR/GRB2 and GRB2/MET show slight similarities (scores of 258–825) to STAT3/STAT3 (see No. 43–60 in row and No. 92 in column). This implies that the physicochemical properties of the pockets at the interfaces of these PPIs are similar to each other. Indeed, SH2 is common to these PPIs as the target domain for the inhibitors. Any inhibitor of EGFR/GRB2 and GRB2/MET may thus provide a starting point for the discovery or development of a small compound that can interfere with STAT3/STAT3, and vice versa. The information on similar positives may also be useful for avoiding small compounds that have a potency to inhibit non-target PPIs as well as an intended target and cause side effects. In Supplementary Table S4, instances in some clusters, such as IL1B/IL1R1 (No. 62) (also shown in Figure 1A), RAC1/TIAM1 (No. 81–84) and RAC1/TRIO (No. 85), show similarities only to themselves or only within clusters. Small compound inhibitors of these PPIs may have low potencies to bind other PPI interfaces and lead to side effects.
Discussion
To date, researchers can access a huge amount of information on a few hundreds of drug target proteins, such as membrane receptors and enzymes. These data have been accumulated in public databases and literatures over a period of decades. In contrast, there is little information on drug target PPIs, despite the tens of thousands of experimentally identified human PPIs. Dr. PIAS will help researchers effectively explore potentially druggable PPIs by mining interactome data, thereby leading to the discovery of promising compounds that inhibit PPIs.
Some databases and webservers, for example, ANCHOR (20) and sc-PDB (21), focusing on tertiary structure of druggable ligand-binding sites and PPI interfaces will be helpful for researchers to develop or discover small compounds, by in silico methods, that inhibit PPIs predicted as druggable in Dr. PIAS. Although Dr. PIAS can assess whether a PPI is druggable or not, it does not provide users with tools for in silico drug design and is not suitable for more detailed dissection of tertiary structure of PPI interfaces. Users can search for druggable PPIs in Dr. PIAS as the first step and then can use other databases and webservers described above to design a drug targeting the PPIs as the second step. The cooperative use of Dr. PIAS and other resources will facilitate the discovery of drugs targeting PPIs.
Supplementary data
Supplementary data are available at Database Online.
Funding
Funding for open access charge: PharmaDesign, Inc.
References
- 1.Arkin MR, Wells JA. Small-molecule inhibitors of protein–protein interactions: progressing towards the dream. Nat. Rev. Drug Discov. 2004;3:301–317. doi: 10.1038/nrd1343. [DOI] [PubMed] [Google Scholar]
- 2.Pagliaro L, Felding J, Audouze K, et al. Emerging classes of protein–protein interaction inhibitors and new tools for their development. Curr. Opin. Chem. Biol. 2004;8:442–449. doi: 10.1016/j.cbpa.2004.06.006. [DOI] [PubMed] [Google Scholar]
- 3.Zhao L, Chmielewski J. Inhibiting protein–protein interactions using designed molecules. Curr. Opin. Struct. Biol. 2005;15:31–34. doi: 10.1016/j.sbi.2005.01.005. [DOI] [PubMed] [Google Scholar]
- 4.Wells JA, McClendon CL. Reaching for high-hanging fruit in drug discovery at protein–protein interfaces. Nature. 2007;450:1001–1009. doi: 10.1038/nature06526. [DOI] [PubMed] [Google Scholar]
- 5.Zhu F, Shi Z, Qin C, et al. Therapeutic target database update 2012: a resource for facilitating target-oriented drug discovery. Nucleic Acids Res. 2012;40:D1128–D1136. doi: 10.1093/nar/gkr797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Hecker N, Ahmed J, von Eichborn J, et al. SuperTarget goes quantitative: update on drug–target interactions. Nucleic Acids Res. 2012;40:D1113–D1117. doi: 10.1093/nar/gkr912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Higueruelo AP, Schreyer A, Bickerton GR, et al. Atomic interactions and profile of small molecules disrupting protein–protein interfaces: the TIMBAL database. Chem. Biol. Drug Des. 2009;74:457–467. doi: 10.1111/j.1747-0285.2009.00889.x. [DOI] [PubMed] [Google Scholar]
- 8.Bourgeas R, Basse MJ, Morelli X, et al. Atomic analysis of protein–protein interfaces with known inhibitors: the 2P2I database. PLoS One. 2010;5:e9598. doi: 10.1371/journal.pone.0009598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Edfeldt FN, Folmer RH, Breeze AL. Fragment screening to predict druggability (ligandability) and lead discovery success. Drug Discov. Today. 2011;16:284–287. doi: 10.1016/j.drudis.2011.02.002. [DOI] [PubMed] [Google Scholar]
- 10.Sugaya N, Ikeda K, Tashiro T, et al. An integrative in silico approach for discovering candidates for drug-targetable protein–protein interactions in interactome data. BMC Pharmacol. 2007;7:10. doi: 10.1186/1471-2210-7-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Sugaya N, Ikeda K. Assessing the druggability of protein–protein interactions by a supervised machine-learning method. BMC Bioinformatics. 2009;10:263. doi: 10.1186/1471-2105-10-263. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Sugaya N, Furuya T. Dr. PIAS: an integrative system for assessing the druggability of protein–protein interactions. BMC Bioinformatics. 2011;12:50. doi: 10.1186/1471-2105-12-50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Maglott D, Ostell J, Pruitt KD, et al. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 2011;39:D52–D57. doi: 10.1093/nar/gkq1237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Bader GD, Betel D, Hogue CW. BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. 2003;31:248–250. doi: 10.1093/nar/gkg056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Stark C, Breitkreutz BJ, Chatr-Aryamontri A, et al. The BioGRID Interaction Database: 2011 update. Nucleic Acids Res. 2011;39:D698–D704. doi: 10.1093/nar/gkq1116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Keshava Prasad TS, Goel R, Kandasamy K, et al. Human protein reference database—2009 update. Nucleic Acids Res. 2009;37:D767–D772. doi: 10.1093/nar/gkn892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Stelzl U, Worm U, Lalowski M, et al. A human protein–protein interaction network: a resource for annotating the proteome. Cell. 2005;122:957–968. doi: 10.1016/j.cell.2005.08.029. [DOI] [PubMed] [Google Scholar]
- 18.Lim J, Hao T, Shaw C, et al. A protein–protein interaction network for human inherited ataxias and disorders of Purkinje cell degeneration. Cell. 2006;125:801–814. doi: 10.1016/j.cell.2006.03.032. [DOI] [PubMed] [Google Scholar]
- 19.Ramani AK, Li Z, Hart GT, et al. A map of human protein interactions derived from co-expression of human mRNAs and their orthologs. Mol. Syst. Biol. 2008;4:180. doi: 10.1038/msb.2008.19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Meireles LM, Dömling AS, Camacho CJ. ANCHOR: a web server and database for analysis of protein–protein interaction binding pockets for drug discovery. Nucleic Acids Res. 2010;38:W407–W411. doi: 10.1093/nar/gkq502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Meslamani J, Rognan D, Kellenberger E. sc-PDB: a database for identifying variations and multiplicity of ‘druggable’ binding sites in proteins. Bioinformatics. 2011;27:1324–1326. doi: 10.1093/bioinformatics/btr120. [DOI] [PubMed] [Google Scholar]