Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2008 Nov 6;37(Database issue):D651–D656. doi: 10.1093/nar/gkn870

PIPs: human protein–protein interaction prediction database

Mark D McDowall 1, Michelle S Scott 1, Geoffrey J Barton 1,*
PMCID: PMC2686497  PMID: 18988626

Abstract

The PIPs database (http://www.compbio.dundee.ac.uk/www-pips) is a resource for studying protein–protein interactions in human. It contains predictions of >37 000 high probability interactions of which >34 000 are not reported in the interaction databases HPRD, BIND, DIP or OPHID. The interactions in PIPs were calculated by a Bayesian method that combines information from expression, orthology, domain co-occurrence, post-translational modifications and sub-cellular location. The predictions also take account of the topology of the predicted interaction network. The web interface to PIPs ranks predictions according to their likelihood of interaction broken down by the contribution from each information source and with easy access to the evidence that supports each prediction. Where data exists in OPHID, HPRD, DIP or BIND for a protein pair this is also reported in the output tables returned by a search. A network browser is included to allow convenient browsing of the interaction network for any protein in the database. The PIPs database provides a new resource on protein–protein interactions in human that is straightforward to browse, or can be exploited completely, for interaction network modelling.

INTRODUCTION

Protein–protein interactions (PPIs) regulate many fundamental cellular processes. As a consequence, a key step in understanding the function of a protein in its cellular context is to identify potential interacting partners. PPIs are typically identified on a small scale by pull-down experiments or similar techniques, but this approach is too slow and expensive to meet the goal of identifying all the PPIs necessary to provide a rich picture of the functional and dynamic properties of the cell (1). High-throughput methods, such as yeast two-hybrid seek to overcome the time constraints of traditional protein-by-protein methods and have been applied to the study of PPIs in many organisms, including Saccharomyces cerevisiae (2,3) Caenorhabditis elegans (4), Drosophila melanogaster (5,6), Escherichia coli (7) and more recently human (8,9). Although high-throughput methods provide data for large numbers of potential interacting pairs, they unfortunately often have much higher error rates than traditional approaches (10). Computational methods to predict PPIs complement experimental methods. They can efficiently integrate data from numerous sources in order to make predictions of the likelihood of interaction between two proteins (11).

There are several public repositories that store PPIs identified by experimental methods. Databases, such as the HPRD (12,13), DIP (14), IntAct (15), BioGRID (16) and MINT (17) all provide lists of experimentally determined interactions. Many of these resources contain only interactions that have been observed experimentally, but these data are not yet representative of a complete interactome.

It has been suggested that the human proteome includes around 300 000 PPIs (18) out of a potential >300 000 000. This estimate does not account for the numerous variations in interacting pairs due to post-translational modifications and alternative splicing. However, the number of human PPIs that have been experimentally determined is an order of magnitude less as shown in Table 1. The importance of prediction in filling this gap has been recognized by a number of groups and led to the development of databases, such as OPHID (19) and POINT (20) which predict PPIs as well as STRING, a database of predicted protein–protein associations (direct and indirect PPIs) (21). All three services computationally predict likely PPIs (whether direct or indirect) based on orthology, annotations and/or experimental information and have substantially increased the size of the human interactome. However, neither OPHID nor POINT ranks the predictions in order of likelihood. Furthermore, the breakdown of the evidence for interaction is limited to a summary of correlation scores or a binary indication of co-occurrence. STRING provides an aesthetically pleasing, informative and user-friendly method of accessing its predictions and the primary data, but does not distinguish between direct physical interactions and indirect relationships, which include transcriptional relationships as well as co-pathway membership (21).

Table 1.

Number of human PPIs that have been determined experimentally and the results made available via publically accessible databases

Database No. of interactions No. of proteins Website Reference
DIP 1923 1298 http://dip.doe-mbi.ucla.edu/ (14)
HPRD 38 167 25 661 http://www.hprd.org/ (12,13)
IntAct* 24 274 8766 http://www.ebi.ac.uk/intact/ (15)
MINT 20 832 6106 http://mint.bio.uniroma2.it/mint/Welcome.do (17)
MIPS 355 423 http://mips.gsf.de/cgi-bin/proj/ppi/prot2ppi.cgi (31)

All values were extracted from the respective databases statistics pages except where identified (*). Values obtained 8 August 2008.

*The number of unique proteins and interactions was calculated by searching for all human binary interactions within IntAct then analysing the downloaded PSI-MI data file.

In this article, a new database—PIPs—of predicted PPIs for human is described. The predictions stored in PIPs are derived by a Bayesian prediction method that combines information on the likelihood of interaction from a variety of sources (11). A novel feature of the method is to use a ‘Transitive’ module that gathers evidence for interaction from examination of predicted common interactors to a pair of proteins. The unique combination of features examined allowed the generation of a set of predictions that are mostly orthogonal to other PPI databases (11). The database and its interface allow the user to see the full evidence trail for each predicted interaction. In this way, PIPs is a resource not only for large-scale modelling of protein interaction networks, but also as an exploratory tool for the cell/molecular biologist who wishes to understand more about the predicted interaction network for the protein they are studying.

THE DATABASE

Overview

The PIPs database is a resource of PPIs in human predicted by a naïve Bayesian model as described in Scott and Barton (11). Briefly, the method (11) combines information from gene co-expression, orthology, co-occurrence of domains, post-translational modifications, co-localization of the proteins within the cell and analysis of the local topology of the predicted PPI network. The different evidence types are programmed as separate modules with each module giving a score of interaction. The individual module scores are combined to give a prediction for the overall likelihood of interaction given the available data.

The full database of predicted interactions includes details about 69 965 human proteins imported from the IPI (22) together with interaction scores for 17 643 506 protein pairs, of which 37 606 are predicted to interact. For each protein pair, the overall score is stored along with a breakdown of the scores provided by each of the modules. Further information is stored that details the evidence that was used in calculating the final score. The evidence includes 5872 S. cerevisiae, 23 195 C. elegans and 27629 D. melanogaster proteins that were analysed by InParanoid (23) to identify orthologous protein pairs, where each protein was known to be involved in an interaction. Details of the InterPro (24) motifs and domains, the sites of post-translational modifications, and each protein's sub-cellular localization are also stored, as well as the Pearson's correlation coefficients from analysis of expression data. In order to simplify exploration of the predicted interactions, links are stored to external data sources including, RefSeq (25), UniProt (26) and Entrez (27). Comparisons to other publicly available databases of interactions are simplified by the inclusion of links to HPRD (12,13), DIP (14), BIND (28) and OPHID (19) for protein pairs that are represented in those databases.

The PIPs database was constructed on a Linux server running the MySQL database software and Apache/Tomcat for the web server. The front-end utilizes Java Server Pages (JSP) to provide a dynamic and easy to navigate web interface.

The PIPs web interface

The front page of the PIPs interface allows for simple searches with the IPI, UniProt or RefSeq identifier for a protein, or a text search with keywords. The output may be restricted by adjusting the minimum score threshold. The Advanced Search allows the query protein sequence to be compared with the protein sequences stored in the PIPs database by MagicMatch (29) which returns exact matches to the query sequence. If no match is found, a BLAST (30) search may optionally be run to find sequences that are similar to the query. A batch mode is available to allow larger numbers of protein IPI identifiers to be run against the database as a single set.

Figure 1 illustrates the result of searching with IPI00016572 (SNRPG–small nuclear ribonucleoprotein G) via the quick search from the front page and selecting to view the scores from each module. The Interaction Summary Page for SNRPG shows interacting pairs of proteins ranked in descending order by the final interaction score. The output includes the name of the protein and scores obtained by each of the different modules. For example, the interaction between SNRPG and LSM8 seen in Figure 1 shows that a low contribution was made by the orthology and combined modules, but the expression and transitive modules provide the major contribution to the final score. In contrast, the interaction between SNRPG and SNRPD3, the modules expression, orthology, combined and transitive are all predictive of this interaction. The ‘Evidence’ column provides a link to view the evidence that was used by each of the modules in calculating the final interaction score, while the ‘Database’ column lets the user know if the pair of proteins has been reported as interacting in other databases [Currently—BIND (28), DIP (14), HPRD (12,13) and OPHID (19)].

Figure 1.

Figure 1.

Interaction Summary for the protein IPI00016572 (SNRPG): this page shows the predicted interactors, ordered by the score in descending order from the most probable interactor. The name of the predicted interactor and a breakdown by predictive feature is also shown with links to retrieve the evidence for the predicted interaction.

Figure 2a–c show the Evidence of Interaction page for the interaction predicted between SNRPG and SNRPD3 that was identified in Figure 1. The page is organized into six sections which provide a break-down of the information on expression, orthology, domains, post-translational modifications, localization and topology (transitive) score.

Figure 2.

Figure 2.

(a) Evidence of Interaction Summary page for the interaction between SNRPG and SNRPD3: Sections Gene Expression and Orthology provide details about the predictions based on expression and orthology for the interaction pair. (b) Sections Domains, Post-translational modification and Localization provide the information that was used by the combined module describing the co-occurrence of domains within the protein pair, post-translational modifications and localization of the proteins within the cell. (c) Section Transitive score provides a list of the common interactors with an integrated interaction score >0.025 for the expression, orthology and combined modules. These common interactors are considered by the Transitive module for calculating the likelihood of interaction between SNRPG and SNRPD3. In total, there are 236 predicted common interactors; the figure shows only the top six common interactors.

For each protein analysed in the prediction, a Protein Summary page is available as a link from the main prediction result page. For example, Figure 3 shows the Protein Summary page for the SNRPG protein. The summary shows the number of predicted interactions above a given threshold (57 predicted interactors with a Score ≥1.0 of which four have a Score ≥2500). The table also provides links to external protein databases including RefSeq (25), HPRD (12,13), UniProt (26) and Entrez (27).

Figure 3.

Figure 3.

Protein Summary for the protein SNRPG: information about the selected protein including a breakdown of the number of predicted interactions and the number of interactions within external databases. Links are also provided to obtain further details about the protein from the HPRD, RefSeq, Entrez and UniProt.

Figure 4 illustrates the display of interactions through a new Java applet that can be accessed from the Protein Summary page. Users are able to view the network of the predicted protein interactions out to a path length of two from the query protein. Within the applet the user is able to view the network with and without proteins that have only a single connection. The user can also grow the graph by selecting a protein and clicking on the ‘Grow Network …’ option. Once the network has been created it is possible to save the network as an image or save an adjacency list of the proteins so that they can be represented in an external application, such as Cytoscape (http://cytoscape.org/) or Graphviz (http://www.graphviz.org/).

Figure 4.

Figure 4.

Network view of the predicted interactors of SNRPG: Java application to view the local topology of the predicted PPI network. Left: network image of the predicted primary and secondary interactors of the protein SNRPG (blue). Right: network image of the predicted primary and secondary interactors of the protein SNRPG (blue), with all interactors that have only a single predicted interaction removed from the image.

SUMMARY

It has been estimated that only 10% of the human interactome has been identified (18). The PIPs database allows the user to browse and easily access many additional high probability predicted human interactions and to see the evidence that led to each prediction. It also provides a source of information to help improve the design of experiments to investigate further the function of proteins in the human proteome. All predictions are ranked allowing the most probable interactions to be investigated first rather than being given a flat list of predicted interactions.

The database is freely available to search/explore at http://www.compbio.dundee.ac.uk/www-pips.

FUNDING

UK Biotechnology and Biological Sciences Research Council (BBSRC to M.D.M.); Canadian Institutes of Health Research (fellowship to M.S.S.). Funding for open access charge: Canadian Institutes of Health Research.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

We would like to thank Dr Tom Walsh for assistance with computational issues and all members of the Barton Group for helpful discussions.

REFERENCES

  • 1.Stelzl U, Wanker EE. The value of high quality protein-protein interaction networks for systems biology. Curr. Opin. Chem. Biol. 2006;10:551–558. doi: 10.1016/j.cbpa.2006.10.005. [DOI] [PubMed] [Google Scholar]
  • 2.Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl Acad. Sci. USA. 2001;98:4569–4574. doi: 10.1073/pnas.061034498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623–627. doi: 10.1038/35001009. [DOI] [PubMed] [Google Scholar]
  • 4.Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain P-O, Han J-DJ, Chesneau A, Hao T, et al. A map of the interactome network of the metazoan C. elegans. Science. 2004;303:540–543. doi: 10.1126/science.1091403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Formstecher E, Aresta S, Collura V, Hamburger A, Meil A, Trehin A, Reverdy C, Betin V, Maire S, Brun C, et al. Protein interaction mapping: a Drosophila case study. Genome Res. 2005;15:376–384. doi: 10.1101/gr.2659105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E, et al. A protein interaction map of Drosophila melanogaster. Science. 2003;302:1727–1736. doi: 10.1126/science.1090289. [DOI] [PubMed] [Google Scholar]
  • 7.Arifuzzaman M, Maeda M, Itoh A, Nishikata K, Takita C, Saito R, Ara T, Nakahigashi K, Huang H-C, Hirai A, et al. Large-scale identification of protein-protein interaction of Escherichia coli K-12. Genome Res. 2006;16:686–691. doi: 10.1101/gr.4527806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Rual J-F, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature. 2005;437:1173–1178. doi: 10.1038/nature04209. [DOI] [PubMed] [Google Scholar]
  • 9.Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S, et al. A human protein-protein interaction network: a resource for annotating the proteome. Cell. 2005;122:957–968. doi: 10.1016/j.cell.2005.08.029. [DOI] [PubMed] [Google Scholar]
  • 10.Reguly T, Breitkreutz A, Boucher L, Breitkreutz B-J, Hon G, Myers C, Parsons A, Friesen H, Oughtred R, Tong A, et al. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J. Biol. 2006;5:11. doi: 10.1186/jbiol36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Scott MS, Barton GJ. Probabilistic prediction and ranking of human protein-protein interactions. BMC Bioinformatics. 2007;8:239. doi: 10.1186/1471-2105-8-239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Mishra GR, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P, Shivakumar K, Anuradha N, Reddy R, Raghavan TM, et al. Human protein reference database–2006 update. Nucleic Acids Res. 2006;34:D411–D414. doi: 10.1093/nar/gkj141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath V, Niranjan V, Muthusamy B, Gandhi TKB, Gronborg M, et al. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res. 2003;13:2363–2371. doi: 10.1101/gr.1680803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D. The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 2004;32:D449–D451. doi: 10.1093/nar/gkh086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley R, et al. IntAct–open source resource for molecular interaction data. Nucleic Acids Res. 2007;35:D561–D565. doi: 10.1093/nar/gkl958. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Breitkreutz BJ, Stark C, Reguly T, Boucher L, Breitkreutz A, Livstone M, Oughtred R, Lackner DH, Bahler J, Wood V, et al. The BioGRID interaction database: 2008 update. Nucleic Acids Res. 2008;36:D637–D640. doi: 10.1093/nar/gkm1001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Chatr-aryamontri A, Ceol A, Palazzi LM, Nardelli G, Schneider MV, Castagnoli L, Cesareni G. MINT: the Molecular INTeraction database. Nucleic Acids Res. 2007;35:D572–D574. doi: 10.1093/nar/gkl950. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Hart GT, Ramani AK, Marcotte EM. How complete are current yeast and human protein-interaction networks? Genome Biol. 2006;7:120. doi: 10.1186/gb-2006-7-11-120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Brown KR, Jurisica I. Online predicted human interaction database. Bioinformatics. 2005;21:2076–2082. doi: 10.1093/bioinformatics/bti273. [DOI] [PubMed] [Google Scholar]
  • 20.Huang T-W, Tien A-C, Huang W-S, Lee Y-CG, Peng C-L, Tseng H-H, Kao C-Y, Huang C-YF. POINT: a database for the prediction of protein-protein interactions based on the orthologous interactome. Bioinformatics. 2004;20:3273–3276. doi: 10.1093/bioinformatics/bth366. [DOI] [PubMed] [Google Scholar]
  • 21.von Mering C, Jensen LJ, Kuhn M, Chaffron S, Doerks T, Kruger B, Snel B, Bork P. STRING 7 - recent developments in the integration and prediction of protein interactions. Nucleic Acids Res. 2007;35:D358–D362. doi: 10.1093/nar/gkl825. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Kersey PJ, Duarte J, Williams A, Karavidopoulou Y, Birney E, Apweiler R. The International Protein Index: an integrated database for proteomics experiments. Proteomics. 2004;4:1985–1988. doi: 10.1002/pmic.200300721. [DOI] [PubMed] [Google Scholar]
  • 23.Berglund AC, Sjolund E, Ostlund G, Sonnhammer ELL. InParanoid 6: eukaryotic ortholog clusters with inparalogs. Nucleic Acids Res. 2008;36:D263–D266. doi: 10.1093/nar/gkm1020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Buillard V, Cerutti L, Copley R, et al. New developments in the InterPro database. Nucleic Acids Res. 2007;35:D224–D228. doi: 10.1093/nar/gkl841. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007;35:D61–D65. doi: 10.1093/nar/gkl842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Bairoch A, Bougueleret L, Altairac S, Amendolia V, Auchincloss A, Puy GA, Axelsen K, Baratin D, Blatter MC, Boeckmann B, et al. The Universal Protein Resource (UniProt) Nucleic Acids Res. 2008;36:D190–D195. doi: 10.1093/nar/gkm895. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2008;36:D13–D21. doi: 10.1093/nar/gkm1000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Alfarano C, Andrade CE, Anthony K, Bahroos N, Bajec M, Bantoft K, Betel D, Bobechko B, Boutilier K, Burgess E, et al. The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res. 2005;33:D418–D424. doi: 10.1093/nar/gki051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Smith M, Kunin V, Goldovsky L, Enright AJ, Ouzounis CA. MagicMatch–cross-referencing sequence identifiers across databases. Bioinformatics. 2005;21:3429–3430. doi: 10.1093/bioinformatics/bti548. [DOI] [PubMed] [Google Scholar]
  • 30.Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Mewes HW, Dietmann S, Frishman D, Gregory R, Mannhaupt G, Mayer KFX, Munsterkotter M, Ruepp A, Spannagl M, Stuempflen V, et al. MIPS: analysis and annotation of genome information in 2007. Nucleic Acids Res. 2008;36:D196–D201. doi: 10.1093/nar/gkm980. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES