HINT: a database of annotated protein-protein interactions and their homologs

Ashwini Patil; Haruki Nakamura

doi:10.2142/biophysics.1.21

. 2005 Feb 28;1:21–24. doi: 10.2142/biophysics.1.21

HINT: a database of annotated protein-protein interactions and their homologs

Ashwini Patil ¹, Haruki Nakamura ^1,^✉

PMCID: PMC5036632 PMID: 27857549

Abstract

Despite the abundance of protein-protein interaction databases currently available online, a source that identifies and lists similar interactions in different species is lacking. The Homologous Interactions (HINT) database is such a collection of protein-protein interactions and their homologs in one or more species. The interactions and their homologs are annotated with Eukaryotic Cluster of Orthologous Groups (KOG) IDs, InterPro domains, Gene Ontology (GO) terminology and Protein Data Bank (PDB) structures. HINT is available as an interactive Web server at http://helix.protein.osaka-u.ac.jp/hint/.

Keywords: homologous interactions, interaction families

Protein-protein interactions in various organisms are increasingly becoming the focus of study in the identification of cellular functions of proteins. For any given interaction, it is of significant interest to find similar interactions in different species. Such a comparative study helps in the transfer of annotations between interactions from better annotated species to poorly annotated ones. It also aids the identification of likely true interactions from error-prone high-throughput datasets since, intuitively, an interaction found in more than one species is likely to be universal. It has been recently estimated that the total number of interaction types is limited to about 10,0001. Grouping similar interactions on the basis of sequence homology would help in their classification in different distinct interaction types or families.

There are a number of protein-protein interaction databases available online that give information about experimentally determined interactions. Some of these are the Database of Interacting Proteins (DIP)2, IntAct3 and Biomolecular Interaction Network Database (BIND)4. Although these databases provide considerable information about the interaction of interest, they do not provide any information about interactions similar or homologous to it.

With these goals in mind, we present here HINT, a database of homologous interactions with various annotations for the interacting proteins. HINT is available online at http://helix.protein.osaka-u.ac.jp/hint/.

Methods

Two interactions are considered homologous if the interacting proteins of one interaction are homologous to the interacting proteins for the other interaction (Fig. 1). Homologous interactions include, but are not limited to, orthologous interactions (similar interactions found in different species) and paralogous interactions (similar interactions in the same species).

Homologous interactions — Proteins P1, P2, P3 are the sequence homologs of protein P. Similarly, proteins Q1, Q2 are the sequence homologs of protein Q. Interactions P1–Q2 and P2–Q1 are homologous to the interaction P–Q.

We use protein-protein interaction data for different model organisms from DIP (July 2004 version) and IntAct (September 2004 version). For each interaction, the sequence homologs of the interacting proteins are determined using PSIBLAST5 with 5 iterations and an E-value cutoff of 10⁻⁸. If an interaction is found that involves any of the homologs of the interacting proteins, then it is deemed homologous to the interaction under consideration. Figure 1 illustrates this concept. We thus generate groups of homologous interactions that have been determined by small-scale or high-throughput experiments. We determine if any two interactions are orthologous or paralogous by assigning the interacting proteins to clusters from the Eukaryotic Cluster of Orthologous Groups (KOG) database6. The interacting proteins for each interaction are also annotated with domain definitions from InterPro7, Gene Ontology (GO) terms8 and Protein Data Bank (PDB) structures, where available.

The interactions from DIP and IntAct were parsed from XML files in Proteomics Standards Initiative — Molecular Interaction (PSI-MI) XML format9. This allows for easy extension of the database by incorporating protein-protein interaction data from various other databases using this format. HINT is implemented as a relational database hosted on a PostgreSQL server and can be accessed over the Inter-net through an HTML web interface.

Results

11103 of the 45840 interactions (24%) have one or more homologs in HINT. Table 1 shows the distribution of the homologs across species. The web interface can be used to search interactions using various identifiers such as SwissProt Accession numbers, PIR IDs, GenBank Accession numbers, RefSeq IDs or descriptions of the interacting proteins. An interaction of interest can be selected from the results of the search to obtain detailed information about it. Figure 2 shows a snapshot of the Interactions web page. The homologs of the interaction selected are shown in graphical form as well as tabular form and sorted according to the score of the protein hits, with the best hits shown first. The graphical form helps the user to visualize the regions and domains that are common among the proteins of the selected interaction and those of the homologous interactions. The tabular form gives details about the E-values and the percent identity given by PSIBlast. Further details about the usage of the web interface are provided in the form of an online Help document.

Table 1.

Species distribution of homologous protein-protein interactions in HINT

Organism	Two letter code	Interactions	Homologous Interactions
D. melanogaster	Dm	20581	3521
S. cerevisiae	Sc	14178	3879
C. elegans	Ce	4553	1165
H. sapiens	Hs	3933	2126
H. pylori	Hp	1409	128
E. coli	Ec	554	150
M. musculus	Mm	483	292
A. thaliana	At	76	65
R. norvegicus	Rn	67	43
S. pombe	Sp	6	4

Total		45840	11103

Open in a new tab

Web Interface of the Database of Homologous Interactions with interaction details and GO, InterPro and KOG annotations of the interacting proteins. Also shown is a graph of the homologs with the query interaction in yellow, InterPro domains of the interacting proteins in blue and the homologs of the proteins colored differently with varying hit scores, similar to BLAST results10. The species in which the homolog is found is given by a 2 letter code as given in Table 1. If the interaction is an ortholog or a paralog of the query interaction, it is indicated by an ‘ O’ or a ‘ P’ along side each homolog. Clicking on the ‘ +’ gives further details of the homolog such as E-value, percent identity, hit score and KOG, where available. Tool tips are provided in various places.

Discussion

HINT is a database of homologous protein-protein interactions that can be used by researchers to determine the detailed information about similar interactions in different species. It provides a graphical view of the homologous interactions and the various annotations of the interacting proteins and can be accessed over the Internet at http://helix.protein.osaka-u.ac.jp/hint/. For a given interaction, HINT is able to provide a list of similar interactions found in the same or in different species. This is of considerable use in comparative genomic analyses. In future, we plan to use HINT in the identification of true positives in high-throughput interaction data sets and in the formation of interaction families.

Acknowledgments

This study has been supported by grant-in-aid from Institute for Bioinformatics Research and Development, Japan Science and Technology Agency and by grant-in-aid for Scientific Research on priority areas No. 12144206 from the Ministry of Education, Science, Sports and Culture of Japan.

References

1.Aloy P, Russell RB. Ten thousand interactions for the molecular biologist. Nat. Biotechnol. 2004;22:1317–1321. doi: 10.1038/nbt1018. [DOI] [PubMed] [Google Scholar]
2.Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D. The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 2004;32:D449–D451. doi: 10.1093/nar/gkh086. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, Margalit H, Armstrong J, Bairoch A, Cesareni G, Sherman D, Apweiler R. IntAct: an open source molecular interaction database. Nucleic Acids Res. 2004;32:D452–D455. doi: 10.1093/nar/gkh052. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Bader GD, Betel D, Hogue CWV. BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. 2003;31:248–250. doi: 10.1093/nar/gkg056. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman D. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003;4:41. doi: 10.1186/1471-2105-4-41. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley RR, Courcelle E, Das U, Durbin R, Falquet L, Fleischmann W, Griffiths-Jones S, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R, Letunic I, Lonsdale D, Silventoinen V, Orchard SE, Pagni M, Peyruc D, Ponting CP, Selengut JD, Servant F, Sigrist CJA, Vaughan R, Zdobnov EM. The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res. 2003;31:315–318. doi: 10.1093/nar/gkg046. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004;32:D258–D261. doi: 10.1093/nar/gkh036. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Hermjakob H, Montecchi-Palazzi L, Bader G, Wojcik J, Salwinski L, Ceol A, Moore S, Orchard S, Sarkans U, von Mering C, Roechert B, Poux S, Jung E, Mersch H, Kersey P, Lappe M, Li Y, Zeng R, Rana D, Nikolski M, Husi H, Brun C, Shanker K, Grant SGN, Sander C, Bork P, Zhu W, Pandey A, Brazma A, Jacq B, Vidal M, Sherman D, Legrain P, Cesareni G, Xenarios I, Eisenberg D, Steipe B, Hogue C, Apweiler R. The HUPO PSI’s Molecular Interaction format — a community standard for the representation of protein interaction data. Nat. Biotechnol. 2004;22:177–183. doi: 10.1038/nbt926. [DOI] [PubMed] [Google Scholar]
10. http://www.ncbi.nlm.nih.gov/BLAST/.

[b1-1_21] 1.Aloy P, Russell RB. Ten thousand interactions for the molecular biologist. Nat. Biotechnol. 2004;22:1317–1321. doi: 10.1038/nbt1018. [DOI] [PubMed] [Google Scholar]

[b2-1_21] 2.Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D. The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 2004;32:D449–D451. doi: 10.1093/nar/gkh086. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b3-1_21] 3.Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, Margalit H, Armstrong J, Bairoch A, Cesareni G, Sherman D, Apweiler R. IntAct: an open source molecular interaction database. Nucleic Acids Res. 2004;32:D452–D455. doi: 10.1093/nar/gkh052. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b4-1_21] 4.Bader GD, Betel D, Hogue CWV. BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. 2003;31:248–250. doi: 10.1093/nar/gkg056. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b5-1_21] 5.Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman D. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b6-1_21] 6.Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003;4:41. doi: 10.1186/1471-2105-4-41. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b7-1_21] 7.Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley RR, Courcelle E, Das U, Durbin R, Falquet L, Fleischmann W, Griffiths-Jones S, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R, Letunic I, Lonsdale D, Silventoinen V, Orchard SE, Pagni M, Peyruc D, Ponting CP, Selengut JD, Servant F, Sigrist CJA, Vaughan R, Zdobnov EM. The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res. 2003;31:315–318. doi: 10.1093/nar/gkg046. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b8-1_21] 8.Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004;32:D258–D261. doi: 10.1093/nar/gkh036. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b9-1_21] 9.Hermjakob H, Montecchi-Palazzi L, Bader G, Wojcik J, Salwinski L, Ceol A, Moore S, Orchard S, Sarkans U, von Mering C, Roechert B, Poux S, Jung E, Mersch H, Kersey P, Lappe M, Li Y, Zeng R, Rana D, Nikolski M, Husi H, Brun C, Shanker K, Grant SGN, Sander C, Bork P, Zhu W, Pandey A, Brazma A, Jacq B, Vidal M, Sherman D, Legrain P, Cesareni G, Xenarios I, Eisenberg D, Steipe B, Hogue C, Apweiler R. The HUPO PSI’s Molecular Interaction format — a community standard for the representation of protein interaction data. Nat. Biotechnol. 2004;22:177–183. doi: 10.1038/nbt926. [DOI] [PubMed] [Google Scholar]

[b10-1_21] 10. http://www.ncbi.nlm.nih.gov/BLAST/.

PERMALINK

HINT: a database of annotated protein-protein interactions and their homologs

Ashwini Patil

Haruki Nakamura

Abstract

Methods

Figure 1.

Results

Table 1.

Figure 2.

Discussion

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

HINT: a database of annotated protein-protein interactions and their homologs

Ashwini Patil

Haruki Nakamura

Abstract

Methods

Figure 1.

Results

Table 1.

Figure 2.

Discussion

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases