Abstract
Proteins are responsible for all biological activities in living organisms. Thanks to genome sequencing projects, large amounts of DNA and protein sequence data are now available, but the biological functions of many proteins are still not annotated in most cases. The unknown function of such non-annotated proteins may be inferred or deduced from their neighbors in a protein interaction network. In this paper, we propose two new methods to predict protein functions based on network neighborhood properties. FunPred 1.1 uses a combination of three simple-yet-effective scoring techniques: the neighborhood ratio, the protein path connectivity and the relative functional similarity. FunPred 1.2 applies a heuristic approach using the edge clustering coefficient to reduce the search space by identifying densely connected neighborhood regions. The overall accuracy achieved in FunPred 1.2 over 8 functional groups involving hetero-interactions in 650 yeast proteins is around 87%, which is higher than the accuracy with FunPred 1.1. It is also higher than the accuracy of many of the state-of-the-art protein function prediction methods described in the literature. The test datasets and the complete source code of the developed software are now freely available at http://code.google.com/p/cmaterbioinfo/.
Keywords: Protein interaction network, Protein function prediction, Functional groups, Neighborhood analysis, Relative functional similarity, Edge clustering coefficient
Full Text
The Full Text of this article is available as a PDF (2.0 MB).
Abbreviations used
- BIND
- bimolecular interaction network database 
- DIP
- Database of Interacting Proteins 
- ECC
- edge clustering coefficient 
- HCS
- highly connected subgraphs 
- LNPC
- Laplacian network partitioning correlations 
- MCODE
- molecular complex detection 
- MIPS
- Munich Information Center for Protein Sequences 
- NMF
- non-negative matrix factorization 
- PPI
- protein-protein interactions 
- RNCS
- restricted neighborhood search clustering algorithm 
- SVM
- support vector machine 
References
- 1.Schwikowski B, Uetz P, Fields S. A network of protein-protein interactions in yeast. Nat. Biotechnol. 2000;18:1257–1261. doi: 10.1038/82360. [DOI] [PubMed] [Google Scholar]
- 2.Hishigaki H, Nakai K, Ono T, Tanigami A, Takagi T. Assessment of prediction accuracy of protein function from protein-protein interaction data. Yeast (Chichester, England) 2001;18:523–531. doi: 10.1002/yea.706. [DOI] [PubMed] [Google Scholar]
- 3.Chen J, Hsu W, Lee ML, Ng SK. IEEE 23rd International Conference on Data Engineering. 2007. Labeling network motifs in protein interactomes for protein function prediction; pp. 546–555. [Google Scholar]
- 4.Vazquez A, Flammini A, Maritan A, Vespignani A. Global protein function prediction from protein-protein interaction networks. Nat. Biotechnol. 2003;21:697–700. doi: 10.1038/nbt825. [DOI] [PubMed] [Google Scholar]
- 5.Karaoz U, Murali TM, Letovsky S, Zheng Y, Ding C, Cantor CR, Kasif S. Whole-genome annotation by using evidence integration in functional-linkage networks. Proc. Natl. Acad. Sci. USA. 2004;101:2888–2893. doi: 10.1073/pnas.0307326101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Nabieva E, Jim K, Agarwal A, Chazelle B, Singh M. Wholeproteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics. 2005;21:i302–i310. doi: 10.1093/bioinformatics/bti1054. [DOI] [PubMed] [Google Scholar]
- 7.Deng M, Mehta S, Sun F, Chen T. Genome Res. 2002. Inferring domain-domain interactions from protein-protein interactions; pp. 1540–1548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Letovsky S, Kasif S. Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics. 2003;19:i197–i204. doi: 10.1093/bioinformatics/btg1026. [DOI] [PubMed] [Google Scholar]
- 9.Wu DD. Proc. IEEE Symp. Comput. Intel. Bioinforma. Comput. Biol. 2005. An efficient approach to detect a protein community from a seed; pp. 1–7. [Google Scholar]
- 10.Samanta MP, Liang S. Predicting protein functions from redundancies in large-scale protein interaction networks. Proc. Natl. Acad. Sci. USA. 2003;100:12579–12583. doi: 10.1073/pnas.2132527100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Arnau V, Mars S, Marín I. Iterative cluster analysis of protein interaction data. Bioinformatics. 2005;21:364–378. doi: 10.1093/bioinformatics/bti021. [DOI] [PubMed] [Google Scholar]
- 12.Bader GD, Hogue CWV. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics. 2003;27:1–27. doi: 10.1186/1471-2105-4-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Altaf-Ul-Amin, M., Shinbo, Y., Mihara, K., Kurokawa, K. and Kanaya, S. Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinformatics7 (2006) DOI: 10.1186/1471-2105-7-207. [DOI] [PMC free article] [PubMed]
- 14.Spirin V, Mirny LA. Protein complexes and functional modules in molecular networks. Proc. Natl. Acad. Sci. USA. 2003;100:12123–12128. doi: 10.1073/pnas.2032324100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.King AD, Przulj N, Jurisica I. Protein complex prediction via costbased clustering. Bioinformatics. 2004;20:3013–3020. doi: 10.1093/bioinformatics/bth351. [DOI] [PubMed] [Google Scholar]
- 16.Asthana S, King OD, Gibbons FD, Roth FP. Predicting protein complex membership using probabilistic network reliability. Genome Res. 2004;14:1170–1175. doi: 10.1101/gr.2203804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Krogan N J, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, Punna T, Peregrín-Alvarez J M, Shales M, Zhang X, Davey M, Robinson MD, Paccanaro A, Bray JE, Sheung A, Beattie B, Richards DP, Canadien V, Lalev A, Mena F, Wong P, Starostine A, Canete MM, Vlasblom J, Wu S, Orsi C, Collins SR, Chandran S, Haw R, Rilstone JJ, Gandi K, Thompson NJ, Musso G, St Onge P, Ghanny S, Lam MHY, Butland G, Altaf-Ul AM, Kanaya S, Shilatifard A, O’Shea E, Weissman JS, Ingles CJ, Hughes TR, Parkinson J, Gerstein M, Wodak SJ, Emili A, Greenblatt JF. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006;440:637–643. doi: 10.1038/nature04670. [DOI] [PubMed] [Google Scholar]
- 18.Wang H, Huang H, Ding C, Nie F. Predicting protein-protein interactions from multimodal biological data sources via nonnegative matrix tri-factorization. J. Comput. Biol. 2013;20:344–358. doi: 10.1089/cmb.2012.0273. [DOI] [PubMed] [Google Scholar]
- 19.Chatterjee P, Basu S, Kundu M, Nasipuri M, Plewczynski D. PPI_SVM: prediction of protein-protein interactions using machine learning, domain-domain affinities and frequency tables. Cell. Mol. Biol. Lett. 2011;16:264–278. doi: 10.2478/s11658-011-0008-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Wu X, Zhu L, Guo J, Zhang DY, Lin K. Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations. Nucleic Acids Res. 2006;34:2137–2150. doi: 10.1093/nar/gkl219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Moosavi, S., Rahgozar, M. and Rahimi, A. Protein function prediction using neighbor relativity in protein-protein interaction network. Comput. Biol. Chem.43 (2013) DOI: 10.1016/j.compbiolchem.2012.12.003. [DOI] [PubMed]
- 22.Peng, W., Wang, J., Wang, W., Liu, Q., Wu, F.X. and Pan, Y. Iteration method for predicting essential proteins based on orthology and proteinprotein interaction networks. BMC Syst. Biol.6 (2012) DOI: 10.1186/1752-0509-6-87. [DOI] [PMC free article] [PubMed]
- 23.Chua HN, Sung WK, Wong L. Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics. 2006;22:1623–1630. doi: 10.1093/bioinformatics/btl145. [DOI] [PubMed] [Google Scholar]
- 24.Chatterjee P, Basu S, Kundu M, Nasipuri M, Plewczynski D. PSP_MCSVM: brainstorming consensus prediction of protein secondary structures using two-stage multiclass support vector machines. J. Mol. Model. 2011;17:2191–2201. doi: 10.1007/s00894-011-1102-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
