Summary.
The avalanche of newly found protein sequences in the post-genomic era has motivated and challenged us to develop an automated method that can rapidly and accurately predict the localization of an uncharacterized protein in cells because the knowledge thus obtained can greatly speed up the process in finding its biological functions. However, it is very difficult to establish such a desired predictor by acquiring the key statistical information buried in a pile of extremely complicated and highly variable sequences. In this paper, based on the concept of the pseudo amino acid composition (Chou, K. C. PROTEINS: Structure, Function, and Genetics, 2001, 43: 246–255), the approach of cellular automata image is introduced to cope with this problem. Many important features, which are originally hidden in the long amino acid sequences, can be clearly displayed through their cellular automata images. One of the remarkable merits by doing so is that many image recognition tools can be straightforwardly applied to the target aimed here. High success rates were observed through the self-consistency, jackknife, and independent dataset tests, respectively.
Keywords: Keywords: Cellular automata images – Pseudo amino-acid composition – Protein subcellular location – Complexity – Covariant-discriminant algorithm
References
- Boland MV, Markey MK, Murphy RF. Automated recognition of patterns characteristic of subcellular structures in fluorescence microscopy images. Cytometry. 1998;33:366–375. doi: 10.1002/(SICI)1097-0320(19981101)33:3<366::AID-CYTO12>3.0.CO;2-R. [DOI] [PubMed] [Google Scholar]
- Cai YD. Is it a paradox or misinterpretation. PROTEINS: Structure, Function, and Genetics. 2001;43:336–338. doi: 10.1002/prot.1045. [DOI] [PubMed] [Google Scholar]
- Cai YD, Chou KC. Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. Biochem Biophys Res Comm. 2003;305:407–411. doi: 10.1016/s0006-291x(03)00775-7. [DOI] [PubMed] [Google Scholar]
- Cai YD, Chou KC. Predicting 22 protein localizations in budding yeast. Biochem Biophys Res Comm. 2004a;323:425–428. doi: 10.1016/j.bbrc.2004.08.113. [DOI] [PubMed] [Google Scholar]
- Cai YD, Chou KC. Predicting subcellular localization of proteins in a hybridization space. Bioinformatics. 2004b;20:1151–1156. doi: 10.1093/bioinformatics/bth054. [DOI] [PubMed] [Google Scholar]
- Cai YD, Liu XJ, Xu XB, Chou KC. Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect. J Cell Biochem. 2002a;84:343–348. doi: 10.1002/jcb.10030. [DOI] [PubMed] [Google Scholar]
- Cai YD, Liu XJ, Xu XB, Chou KC. SVM for predicting membrane protein types by incorporating quasi-sequence-order effect. Internet. Electronic Journal of Molecular Design. 2002b;1:219–226. [Google Scholar]
- Cai YD, Zhou GP, Chou KC. Support vector machines for predicting membrane protein types by using functional domain composition. Biophys J. 2003;84:3257–3263. doi: 10.1016/S0006-3495(03)70050-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cai YD, Zhou GP, Chou KC. Predicting enzyme family classes by hybridizing gene product composition and pseudo-amino acid composition. J Theor Biol. 2005;234:145–149. doi: 10.1016/j.jtbi.2004.11.017. [DOI] [PubMed] [Google Scholar]
- Cedano J, Aloy P, P’erez-Pons JA, Querol E. Relation between amino acid composition and cellular location of proteins. J Mol Biol. 1997;266:594–600. doi: 10.1006/jmbi.1996.0804. [DOI] [PubMed] [Google Scholar]
- Chou KC. A novel approach to predicting protein structural classes in a (20–1)-D amino acid composition space. Proteins: Structure, Function & Genetics. 1995;21:319–344. doi: 10.1002/prot.340210406. [DOI] [PubMed] [Google Scholar]
- Chou KC. Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem Biophys Res Commun. 2000a;278:477–483. doi: 10.1006/bbrc.2000.3815. [DOI] [PubMed] [Google Scholar]
- Chou KC. Review: Prediction of protein structural classes and subcellular locations. Curr Protein Pept Sci. 2000b;1:171–208. doi: 10.2174/1389203003381379. [DOI] [PubMed] [Google Scholar]
- Chou KC. Prediction of protein cellular attributes using pseudo-amino-acid-composition. PROTEINS: Structure, Function, and Genetics. 2001;43:246–255. doi: 10.1002/prot.1035. [DOI] [PubMed] [Google Scholar]
- Chou KC (2002) A new branch of proteomics: prediction of protein cellular attributes. In: Weinrer PW, Lu Q (eds) Gene cloning & expression technologies, Chapter 4. Eaton Publishing, Westborough, MA, pp 57–70
- Chou KC. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics. 2005;21:10–19. doi: 10.1093/bioinformatics/bth466. [DOI] [PubMed] [Google Scholar]
- Chou KC, Cai YD. Using functional domain composition and support vector machines for prediction of protein subcellular location. J Biol Chem. 2002;277:45765–45769. doi: 10.1074/jbc.M204161200. [DOI] [PubMed] [Google Scholar]
- Chou KC, Cai YD. Predicting protein quaternary structure by pseudo amino acid composition. PROTEINS: Structure, Function, and Genetics. 2003a;53:282–289. doi: 10.1002/prot.10500. [DOI] [PubMed] [Google Scholar]
- Chou KC, Cai YD. Prediction and classification of protein subcellular location: sequence-order effect and pseudo amino acid composition. J Cell Biochem. 2003b;90:1250–1260. doi: 10.1002/jcb.10719. [DOI] [PubMed] [Google Scholar]
- Chou KC, Cai YD. Predicting protein structural class by functional domain composition. Biochem Biophys Res Comm. 2004a;321:1007–1009. doi: 10.1016/j.bbrc.2004.07.059. [DOI] [PubMed] [Google Scholar]
- Chou KC, Cai YD. Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo-amino acid composition. J Cell Biochem. 2004b;91:1197–1203. doi: 10.1002/jcb.10790. [DOI] [PubMed] [Google Scholar]
- Chou KC, Cai YD. Prediction of protein subcellular locations by GO-FunD-PseAA predicor. Biochem Biophys Res Commun. 2004c;320:1236–1239. doi: 10.1016/j.bbrc.2004.06.073. [DOI] [PubMed] [Google Scholar]
- Chou KC, Cai YD. Predicting protein localization in budding yeast. Bioinformatics. 2005;21:944–950. doi: 10.1093/bioinformatics/bth466. [DOI] [PubMed] [Google Scholar]
- Chou KC, Elrod DW. Protein subcellular location prediction. Protein Engineering. 1999;12:107–118. doi: 10.1093/protein/12.2.107. [DOI] [PubMed] [Google Scholar]
- Chou JJ, Zhang CT. A joint prediction of the folding types of 1490 human proteins from their genetic codons. J Theor Biol. 1993;161:251–262. doi: 10.1006/jtbi.1993.1053. [DOI] [PubMed] [Google Scholar]
- Chou KC, Zhang CT. Predicting protein folding types by distance functions that make allowances for amino acid interactions. J Biol Chem. 1994;269:22014–22020. [PubMed] [Google Scholar]
- Chou KC, Zhang CT. Review: Prediction of protein structural classes. Crit Rev Biochem Mol Biol. 1995;30:275–349. doi: 10.3109/10409239509083488. [DOI] [PubMed] [Google Scholar]
- Chou KC, Liu W, Maggiora GM, Zhang CT. Prediction and classification of domain structural classes. PROTEINS: Structure, Function, and Genetics. 1998;31:97–103. doi: 10.1002/(SICI)1097-0134(19980401)31:1<97::AID-PROT8>3.0.CO;2-E. [DOI] [PubMed] [Google Scholar]
- Gao Y, Shao SH, Xiao X, Ding YS, Huang YS, Huang ZD, Chou KC. Using pseudo amino acid composition to predict protein subcellular location: approached with Lyapunov index, Bessel function, and Chebyshev filter. Amino Acids. 2005;28:373–376. doi: 10.1007/s00726-005-0206-9. [DOI] [PubMed] [Google Scholar]
- Gusev VD, Nemytikova LA, Chuzhanova NA. A rapid method for detecting interconnections between functionally and/or evolutionary close biological sequences. Mol Biol (Mosk) 2001;35:1015–1022. [PubMed] [Google Scholar]
- Haddadnia J, Faez K, Ahmadi M. A neural based human face recognition system using an efficient feature extraction method with pseudo zernike moment. J Circuits, Systems, and Computers. 2002;11:283–304. [Google Scholar]
- Murphy RF, Boland MV, Velliste M. Towards a systematics for protein subcellular location: quantitative description of protein localization patterns and automated analysis of fluorescence microscope images. Proc Int Conf Intell Syst Mol Biol. 2000;8:251–259. [PubMed] [Google Scholar]
- Nakai K. Protein sorting signals and prediction of subcellular localization. Adv Protein Chem. 2000;54:277–344. doi: 10.1016/S0065-3233(00)54009-1. [DOI] [PubMed] [Google Scholar]
- Pan YX, Zhang ZZ, Guo ZM, Feng GY, Huang ZD, He L. Application of pseudo amino acid composition for predicting protein subcellular location: stochastic signal processing approach. J Protein Chem. 2003;22:395–402. doi: 10.1023/A:1025350409648. [DOI] [PubMed] [Google Scholar]
- Park KJ, Kanehisa M. Prediction of protein subcellular locations by support vector machines using compositions of amino acid and amino acid pairs. Bioinformatics. 2003;19:1656–1663. doi: 10.1093/bioinformatics/btg222. [DOI] [PubMed] [Google Scholar]
- Portilla J, Simoncelli EP. A parametric texture model based on joint statistics of complex wavelet coefficients. Int J Comput Vision. 2000;40:49–71. doi: 10.1023/A:1026553619983. [DOI] [Google Scholar]
- Wang M, Yang J, Liu GP, Xu ZJ, Chou KC. Weighted-support vector machines for predicting membrane protein types based on pseudo amino acid composition. Protein Eng Des Sel. 2004a;17:509–516. doi: 10.1093/protein/gzh061. [DOI] [PubMed] [Google Scholar]
- Wang M, Yang J, Xu ZJ, Chou KC. SLLE for predicting membrane protein types. J Theor Biol. 2004b;232:7–15. doi: 10.1016/j.jtbi.2004.07.023. [DOI] [PubMed] [Google Scholar]
- Wang M, Yao JS, Huang ZD, Xu ZJ, Liu GP, Zhao HY, Wang XY, Yang J, Zhu YS, Chou KC. A new nucleotide-composition based fingerprint of SARS-CoV with visualization analysis. Med Chem. 2005;1:39–47. doi: 10.2174/1573406053402505. [DOI] [PubMed] [Google Scholar]
- Wolfram S (2002) A new kind of science. Wolfram Media Inc., Champaign, IL
- Xiao X, Shao S, Dingl Y, Huang Z, Huang Y, Chou KC. Using complexity measure factor to predict protein subcellular location. Amino Acids. 2005a;28:57–61. doi: 10.1007/s00726-004-0148-7. [DOI] [PubMed] [Google Scholar]
- Xiao X, Shao S, Ding Y, Huang Z, Chen X, Chou KC. An application of gene comparative image for predicting the effect on replication ratio by HBV virus gene missense mutation. J Theor Biol. 2005b;235:555–565. doi: 10.1016/j.jtbi.2005.02.008. [DOI] [PubMed] [Google Scholar]
- Xiao X, Shao S, Ding Y, Huang Z, Chen X, Chou KC. Using cellular automata to generate Image representation for biological sequences. Amino Acids. 2005c;28:29–35. doi: 10.1007/s00726-004-0154-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou GP. An intriguing controversy over protein structural class prediction. J Protein Chem. 1998;17:729–738. doi: 10.1023/A:1020713915365. [DOI] [PubMed] [Google Scholar]
- Zhou GP, Assa-Munt N. Some insights into protein structural class prediction. PROTEINS: Structure, Function, and Genetics. 2001;44:57–59. doi: 10.1002/prot.1071. [DOI] [PubMed] [Google Scholar]
- Zhou GP, Doctor K. Subcellular location prediction of apoptosis proteins. PROTEINS: Structure, Function, and Genetics. 2003;50:44–48. doi: 10.1002/prot.10251. [DOI] [PubMed] [Google Scholar]
- Zhu SC, Wu Y, Mumford D. Minimax entropy principle and its application to texture modeling. Neural comput. 1997;9:1627–1660. [Google Scholar]
- Ziv J, Lempel A. On the complexity of finite sequences. IEEE Trans Inf Theor. 1976;IT-22:75–81. [Google Scholar]