Skip to main content
Protein & Cell logoLink to Protein & Cell
. 2011 Dec 19;3(1):38–43. doi: 10.1007/s13238-011-1130-2

SySAP: a system-level predictor of deleterious single amino acid polymorphisms

Tao Huang 1,2, Chuan Wang 1, Guoqing Zhang 1, Lu Xie 2,, Yixue Li 1,2,
PMCID: PMC4875213  PMID: 22183811

Abstract

Single amino acid polymorphisms (SAPs), also known as non-synonymous single nucleotide polymorphisms (nsSNPs), are responsible for most of human genetic diseases. Discriminate the deleterious SAPs from neutral ones can help identify the disease genes and understand the mechanism of diseases. In this work, a method of deleterious SAP prediction at system level was established. Unlike most existing methods, our method not only considers the sequence and structure information, but also the network information. The integration of network information can improve the performance of deleterious SAP prediction. To make our method available to the public, we developed SySAP (a System-level predictor of deleterious Single Amino acid Polymorphisms), an easy-to-use and high accurate web server. SySAP is freely available at http://www.biosino.org/ SySAP/and http://lifecenter.sgst.cn/SySAP/.

Keywords: deleterious single amino acid polymorphisms, predictor, web server

Contributor Information

Lu Xie, Email: xielu@scbit.org.

Yixue Li, Email: yxli@sibs.ac.cn.

References

  1. Ahmad S., Sarai A. PSSM-based prediction of DNA binding sites in proteins. BMC Bioinformatics. 2005;6:33. doi: 10.1186/1471-2105-6-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Altschul S.F., Madden T.L., Schäffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J. Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Atchley W.R., Zhao J., Fernandes A.D., Drüke T. Solving the protein sequence metric problem. Proc Natl Acad Sci U S A. 2005;102:6395–6400. doi: 10.1073/pnas.0408677102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Baldi P., Brunak S., Chauvin Y., Andersen C.A., Nielsen H. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics. 2000;16:412–424. doi: 10.1093/bioinformatics/16.5.412. [DOI] [PubMed] [Google Scholar]
  5. Burke D.F., Worth C.L., Priego E.M., Cheng T., Smink L.J., Todd J. A., Blundell T.L. Genome bioinformatic analysis of nonsynonymous SNPs. BMC Bioinformatics. 2007;8:301. doi: 10.1186/1471-2105-8-301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cai, Y., Huang, T., Hu, L., Shi, X., Xie, L., and Li, Y. Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids. 2011 Jan 26. [Epub ahead of print]. [DOI] [PubMed]
  7. Cai Y.D., Huang T., Feng K.Y., Hu L., Xie L. A unified 35-gene signature for both subtype classification and survival prediction in diffuse large B-cell lymphomas. PLoS One. 2010;5:e12726. doi: 10.1371/journal.pone.0012726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Care M.A., Needham C.J., Bulpitt A.J., Westhead D.R. Deleterious SNP prediction: be mindful of your training data! Bioinformatics. 2007;23:664–672. doi: 10.1093/bioinformatics/btl649. [DOI] [PubMed] [Google Scholar]
  9. Chou K.C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins. 2001;43:246–255. doi: 10.1002/prot.1035. [DOI] [PubMed] [Google Scholar]
  10. Chou K.C. Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol. 2011;273:236–247. doi: 10.1016/j.jtbi.2010.12.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Chou K.C., Shen H.B. Recent progress in protein subcellular location prediction. Anal Biochem. 2007;370:1–16. doi: 10.1016/j.ab.2007.07.006. [DOI] [PubMed] [Google Scholar]
  12. Chou K.C., Shen H.B. Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms. Nat Protoc. 2008;3:153–162. doi: 10.1038/nprot.2007.494. [DOI] [PubMed] [Google Scholar]
  13. Chou K.C., Wu Z.C., Xiao X. iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins. PLoS One. 2011;6:e18258. doi: 10.1371/journal.pone.0018258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Chou K.C., Zhang C.T. Prediction of protein structural classes. Crit Rev Biochem Mol Biol. 1995;30:275–349. doi: 10.3109/10409239509083488. [DOI] [PubMed] [Google Scholar]
  15. Esmaeili M., Mohabatkar H., Mohsenzadeh S. Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses. J Theor Biol. 2010;263:203–209. doi: 10.1016/j.jtbi.2009.11.016. [DOI] [PubMed] [Google Scholar]
  16. Fan R.-E., Chang K.-W., Hsieh C.-J., Wang X.-R., Lin C.-J. LIBLINEAR: A library for large linear classification. J Mach Learn Res. 2008;9:1871–1874. [Google Scholar]
  17. Freeman L.C. Centrality in social networks: Conceptual clarification. Soc Networks. 1979;1:215–239. doi: 10.1016/0378-8733(78)90021-7. [DOI] [Google Scholar]
  18. Georgiou D.N., Karakasidis T.E., Nieto J.J., Torres A. Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou’s pseudo amino acid composition. J Theor Biol. 2009;257:17–26. doi: 10.1016/j.jtbi.2008.11.003. [DOI] [PubMed] [Google Scholar]
  19. Grantham R. Amino acid difference formula to help explain protein evolution. Science. 1974;185:862–864. doi: 10.1126/science.185.4154.862. [DOI] [PubMed] [Google Scholar]
  20. Hamosh A., Scott A.F., Amberger J.S., Bocchini C.A., McKusick V.A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33:D514–D517. doi: 10.1093/nar/gki033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Hsieh C.-J., Chang K.-W., Lin C.-J., Keerthi S.S., Sundararajan S. Proceedings of the 25th international conference on Machine learning. Helsinki, Finland: ACM; 2008. A dual coordinate descent method for large-scale linear SVM; pp. 408–415. [Google Scholar]
  22. Hu L., Huang T., Shi X., Lu W.C., Cai Y.D., Chou K.C. Predicting functions of proteins in mouse based on weighted protein-protein interaction network and protein hybrid properties. PLoS One. 2011;6:e14556. doi: 10.1371/journal.pone.0014556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Hu L.L., Huang T., Cai Y.D., Chou K.C. Prediction of body fluids where proteins are secreted into based on protein interaction network. PLoS One. 2011;6:e22989. doi: 10.1371/journal.pone.0022989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Huang T., Chen L., Cai Y.D., Chou K.C. Classification and analysis of regulatory pathways using graph property, biochemical and physicochemical property, and functional property. PLoS One. 2011;6:e25297. doi: 10.1371/journal.pone.0025297. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Huang T., Cui W., Hu L., Feng K., Li Y.X., Cai Y.D. Prediction of pharmacological and xenobiotic responses to drugs based on time course gene expression profiles. PLoS One. 2009;4:e8126. doi: 10.1371/journal.pone.0008126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Huang T., Niu S., Xu Z., Huang Y., Kong X., Cai Y.D., Chou K. C. Predicting transcriptional activity of multiple site p53 mutants based on hybrid properties. PLoS One. 2011;6:e22940. doi: 10.1371/journal.pone.0022940. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Huang T., Shi X.H., Wang P., He Z., Feng K.Y., Hu L., Kong X., Li Y.X., Cai Y.D., Chou K.C. Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks. PLoS One. 2010;5:e10972. doi: 10.1371/journal.pone.0010972. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Huang T., Tu K., Shyr Y., Wei C.C., Xie L., Li Y.X. The prediction of interferon treatment effects based on time series microarray gene expression profiles. J Transl Med. 2008;6:44. doi: 10.1186/1479-5876-6-44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Huang T., Wang P., Ye Z.Q., Xu H., He Z., Feng K.Y., Hu L., Cui W., Wang K., Dong X., et al. Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties. PLoS One. 2010;5:e11900. doi: 10.1371/journal.pone.0011900. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Jensen L.J., Kuhn M., Stark M., Chaffron S., Creevey C., Muller J., Doerks T., Julien P., Roth A., Simonovic M., et al. STRING 8—a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 2009;37:D412–D416. doi: 10.1093/nar/gkn760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Kawashima S., Ogata H., Kanehisa M. AAindex: amino acid index database. Nucleic Acids Res. 1999;27:368–369. doi: 10.1093/nar/27.1.368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Keerthi S.S., Sundararajan S., Chang K.-W., Hsieh C.-J., Lin C.-J. Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. Las Vegas, Nevada, USA: ACM; 2008. A sequential dual method for large scale multi-class linear svms; pp. 408–416. [Google Scholar]
  33. Li S., Xi L., Li J., Wang C., Lei B., Shen Y., Liu H., Yao X., Li B. In silico prediction of deleterious single amino acid polymorphisms from amino acid sequence. J Comput Chem. 2011;32:1211–1216. doi: 10.1002/jcc.21701. [DOI] [PubMed] [Google Scholar]
  34. Lin C.-J., Weng R.C., Keerthi S.S. Trust region newton method for logistic regression. J Mach Learn Res. 2008;9:627–650. [Google Scholar]
  35. Lin W.Z., Fang J.A., Xiao X., Chou K.C. iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One. 2011;6:e24756. doi: 10.1371/journal.pone.0024756. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Mohabatkar H. Prediction of cyclin proteins using Chou’s pseudo amino acid composition. Protein Pept Lett. 2010;17:1207–1214. doi: 10.2174/092986610792231564. [DOI] [PubMed] [Google Scholar]
  37. Ng P.C., Henikoff S. Accounting for human polymorphisms predicted to affect protein function. Genome Res. 2002;12:436–446. doi: 10.1101/gr.212802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Ng P.C., Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31:3812–3814. doi: 10.1093/nar/gkg509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Niu S., Huang T., Feng K., Cai Y., Li Y. Prediction of tyrosine sulfation with mRMR feature selection and analysis. J Proteome Res. 2010;9:6490–6497. doi: 10.1021/pr1007152. [DOI] [PubMed] [Google Scholar]
  40. Peng K., Radivojac P., Vucetic S., Dunker A.K., Obradovic Z. Length-dependent prediction of protein intrinsic disorder. BMC Bioinformatics. 2006;7:208. doi: 10.1186/1471-2105-7-208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Qiu J.D., Huang J.H., Shi S.P., Liang R.P. Using the concept of Chou’s pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform. Protein Pept Lett. 2010;17:715–722. doi: 10.2174/092986610791190372. [DOI] [PubMed] [Google Scholar]
  42. Ramensky V., Bork P., Sunyaev S. Human non-synonymous SNPs: server and survey. Nucleic Acids Res. 2002;30:3894–3900. doi: 10.1093/nar/gkf493. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Sharan R., Ulitsky I., Shamir R. Network-based prediction of protein function. Mol Syst Biol. 2007;3:88. doi: 10.1038/msb4100129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Sherry S.T., Ward M.H., Kholodov M., Baker J., Phan L., Smigielski E.M., Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:308–311. doi: 10.1093/nar/29.1.308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Stenson P.D., Ball E.V., Mort M., Phillips A.D., Shiel J.A., Thomas N.S., Abeysinghe S., Krawczak M., Cooper D.N. Human Gene Mutation Database (HGMD): 2003 update. Hum Mutat. 2003;21:577–581. doi: 10.1002/humu.10212. [DOI] [PubMed] [Google Scholar]
  46. Wang P., Xiao X., Chou K.C. NR-2L: a two-level predictor for identifying nuclear receptor subfamilies based on sequence-derived features. PLoS One. 2011;6:e23505. doi: 10.1371/journal.pone.0023505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Wu Z.C., Xiao X., Chou K.C. iLoc-Plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites. Mol Biosyst. 2011;7:3287–3297. doi: 10.1039/c1mb05232b. [DOI] [PubMed] [Google Scholar]
  48. Xiao X., Wu Z.C., Chou K.C. A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites. PLoS One. 2011;6:e20592. doi: 10.1371/journal.pone.0020592. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Ye Z.Q., Zhao S.Q., Gao G., Liu X.Q., Langlois R.E., Lu H., Wei L. Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP) Bioinformatics. 2007;23:1444–1450. doi: 10.1093/bioinformatics/btm119. [DOI] [PubMed] [Google Scholar]
  50. Zeng Y.H., Guo Y.Z., Xiao R.Q., Yang L., Yu L.Z., Li M.L. Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. J Theor Biol. 2009;259:366–372. doi: 10.1016/j.jtbi.2009.03.028. [DOI] [PubMed] [Google Scholar]

Articles from Protein & Cell are provided here courtesy of Oxford University Press

RESOURCES