Skip to main content
Cellular & Molecular Biology Letters logoLink to Cellular & Molecular Biology Letters
. 2011 Mar 20;16(2):264–278. doi: 10.2478/s11658-011-0008-x

PPI_SVM: Prediction of protein-protein interactions using machine learning, domain-domain affinities and frequency tables

Piyali Chatterjee 1, Subhadip Basu 2, Mahantapas Kundu 2, Mita Nasipuri 2, Dariusz Plewczynski 3,
PMCID: PMC6275787  PMID: 21442443

Abstract

Protein-protein interactions (PPI) control most of the biological processes in a living cell. In order to fully understand protein functions, a knowledge of protein-protein interactions is necessary. Prediction of PPI is challenging, especially when the three-dimensional structure of interacting partners is not known. Recently, a novel prediction method was proposed by exploiting physical interactions of constituent domains. We propose here a novel knowledge-based prediction method, namely PPI_SVM, which predicts interactions between two protein sequences by exploiting their domain information. We trained a two-class support vector machine on the benchmarking set of pairs of interacting proteins extracted from the Database of Interacting Proteins (DIP). The method considers all possible combinations of constituent domains between two protein sequences, unlike most of the existing approaches. Moreover, it deals with both single-domain proteins and multi domain proteins; therefore it can be applied to the whole proteome in high-throughput studies. Our machine learning classifier, following a brainstorming approach, achieves accuracy of 86%, with specificity of 95%, and sensitivity of 75%, which are better results than most previous methods that sacrifice recall values in order to boost the overall precision. Our method has on average better sensitivity combined with good selectivity on the benchmarking dataset. The PPI_SVM source code, train/test datasets and supplementary files are available freely in the public domain at: http://code.google.com/p/cmater-bioinfo/.

Electronic Supplementary Material

The online version of this article (doi: 10.2478/s11658-011-0008-x contains supplementary material, which is available to authorized users.

Key words: Protein-protein interaction, Domain-frequency values, Domaindomain interaction affinity value, Proteome, Interactome, Brainstorming, Machine learning, Consensus, DIP, Protein domains, Sequences, Structures, Protein-protein complexes

Full Text

The Full Text of this article is available as a PDF (754.5 KB).

Abbreviations used

AP

appearance probability

BiFC

biomolecular fluorescence complementation

BIND

Biomolecular Interaction Network Database

DIP

Database of Interacting Proteins

DPI

dual polarization interferometry

FN

false negatives

FP

false positives

FPR

false positive rate

FRET

fluorescence resonance energy transfer

HMMs

hidden Markov models

IgG

Immunoglobulin G

IntAct

open source molecular interaction database

MINT

Molecular Interactions Database

MIPS

Mammalian Protein-Protein Interaction Database

PID

interacting domain pairs

PPI

protein-protein interactions

RBF

radial basis function

ROC

receiver operator curve

SVM

support vector machine

TAP

tandem affinity purification

TN

true negatives

TP

true positives

TPR

true positive rate

References

  • 1.Ito T., Tashiro K., Muta S., Ozawa R., Chiba T., Nishizawa M., Yamamoto K., Kuhara S., Sakaki Y. Toward a protein-protein interaction map of the budding yeast: a comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc. Natl. Acad. Sci. USA. 2000;97:1143–1147. doi: 10.1073/pnas.97.3.1143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Plewczynski D., Basu S. AMS 3.0: prediction of post-translational modifications. BMC Bioinformatics. 2010;11:210. doi: 10.1186/1471-2105-11-210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Gharakhanian E., Takahashi J., Clever J., Kasamatsu H. In vitro assay for protein-protein interaction: carboxyl-terminal 40 residues of simian virus 40 structural protein VP3 contain a determinant for interaction with VP1. Proc. Natl. Acad. Sci. USA. 1998;85:6607–6611. doi: 10.1073/pnas.85.18.6607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Hu C.D., Chinenov Y., Kerppola T.K. Visualization of interactions among bZIP and Rel family proteins in living cells using bimolecular fluorescence complementation. Mol. Cell. 2002;9:789–798. doi: 10.1016/s1097-2765(02)00496-3. [DOI] [PubMed] [Google Scholar]
  • 5.Rigaut G., Shevchenko A., Rutz B., Wilm M., Mann M., Seraphin B. A generic protein purification method for protein complex characterization and proteome exploration. Nat. Biotechnol. 1999;17:1030–1032. doi: 10.1038/13732. [DOI] [PubMed] [Google Scholar]
  • 6.Klingström, T. and Plewczynski D. Protein-protein interaction and pathway databases, a graphical review. Brief. Bioinform. (2010) DOI: 10.1093/bib/bbq064. [DOI] [PubMed]
  • 7.Salwinski L., Miller C.S., Smith A.J., Pettit F.K., Bowie J.U., Eisenberg E. The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 2004;32:449–451. doi: 10.1093/nar/gkh086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Pagel P., Kovac S., Oesterheld M., Brauner B., Dunger-Kaltenbach I., Frishman G., Montrone C., Mark P., Stümpflen V., Mewes H.W., Ruepp A., Frishman D. The MIPS mammalian protein-protein interaction database. Bioinformatics. 2005;21:832–834. doi: 10.1093/bioinformatics/bti115. [DOI] [PubMed] [Google Scholar]
  • 9.Bader G.D., Betel D., Hogue C.W. BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. 2003;31:248–250. doi: 10.1093/nar/gkg056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Aranda B., Achuthan P., Alam-Faruque Y., Armean I., Bridge A., Derow C., Feuermann M., Ghanbarian A.T., Kerrien S., Khadake J., Kerssemakers J., Leroy C., Menden M., Michaut M., Montecchi-Palazzi L., Neuhauser L.N., Orchard S., Perreau V., Roechert B., van Eijk K., Hermjakob H. The IntAct molecular interaction database in 2010. Nucleic Acids Res. 2009;38:525–531. doi: 10.1093/nar/gkp878. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ceol A., Chatr, Aryamontri A., Licata L., Peluso D., Briganti L., Perfetto L., Castagnoli L., Cesareni G. MINT, the molecular interaction database: 2009 update. Nucleic Acids Res. 2010;38:532–539. doi: 10.1093/nar/gkp983. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Plewczynski D., Łaźniewski M., Augustyniak R., Ginalski K. Can we trust docking results? Evaluation of seven commonly used programs on PDBbind database. J. Comput. Chem. 2011;32:742–755. doi: 10.1002/jcc.21643. [DOI] [PubMed] [Google Scholar]
  • 13.Plewczynski D., Łaźniewski M., von Grotthuss M., Rychlewski L., Ginalski K. VoteDock: Consensus docking method for prediction of protein-ligand interactions. J. Comput. Chem. 2011;32:568–581. doi: 10.1002/jcc.21642. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bock J.R., Gough A.D. A. Predicting protein-protein interactions from primary structure. Bioinformatics. 2001;17:455–460. doi: 10.1093/bioinformatics/17.5.455. [DOI] [PubMed] [Google Scholar]
  • 15.Gomez S.M., Noble W.S., Rzhetsky A. Learning to predict protein-protein interactions from protein sequences. Bioinformatics. 2003;19:1875–1881. doi: 10.1093/bioinformatics/btg352. [DOI] [PubMed] [Google Scholar]
  • 16.Zaki N. Prediction of protein-protein interactions using pairwise alignment and inter-domain linker region. Engin. Letter. 2008;16:505–511. [Google Scholar]
  • 17.Wojcik J., Schachter V. Protein-protein interaction map inference using interacting domain profile pairs. Bioinformatics. 2001;17:296–305. doi: 10.1093/bioinformatics/17.suppl_1.s296. [DOI] [PubMed] [Google Scholar]
  • 18.Kim W.K., Park J., Suh J.K. Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair. Genome Inform. 2002;13:42–50. [PubMed] [Google Scholar]
  • 19.Alashwal H., Deris S., Othman R.M. One-class support vector machines for protein-protein interactions prediction. J. Biomed. Sci. 2006;1:120–127. [Google Scholar]
  • 20.Chen X.W., Liu M. Domain-based predictive models for proteinprotein interaction prediction. Eurasip Jasp. 2006;1:1–8. [Google Scholar]
  • 21.Han D.S., Kim H.S., Jang W.H., Lee S.D., Suh J.K. PreSPI: a domain combination based prediction system for protein-protein interaction. Nucleic Acids Res. 2004;132:6312–6320. doi: 10.1093/nar/gkh972. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Alashwal H., Deris S., Othman R.M. A Bayesian kernel for the Prediction of Protein-Protein Interactions. World Academy of Science, Engineering and Technology. 2009;51:928–933. [Google Scholar]
  • 23.Vapnik V. The nature of statistical learning theory. New York: Springer-Verlag; 1995. [Google Scholar]
  • 24.Xenarios I., Salwinski L., Duan X.J., Higney P., Kim S.M., Eisenberg D. DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 2002;30:303–305. doi: 10.1093/nar/30.1.303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Joachims T. Making Large-Scale SVM Learning Practical. In: Schölkopf B., Burges C., Smola A., editors. Advances in Kernel Methods — Support Vector Learning. Cambridge: MIT Press; 1999. pp. 169–284. [Google Scholar]
  • 26.Plewczynski D., Ginalski K. The interactome: Predicting the proteinprotein interactions in cells. Cell. Mol. Biol. Lett. 2009;14:1–22. doi: 10.2478/s11658-008-0024-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Plewczynski D. Brainstorming: weighted voting prediction of inhibitors for protein targets. J. Mol. Model. (2010) DOI 10.1007/s00894-010-0854-x. [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table 1(XLS 26.5 KB) (26.5KB, xls)
Table 2(XLS 28 KB) (28KB, xls)
Table 3(XLS 34.5 KB) (34.5KB, xls)

Articles from Cellular & Molecular Biology Letters are provided here courtesy of BMC

RESOURCES