Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2005 Jul 25;14(4):897–911. doi: 10.1016/S0888-7543(05)80111-9

A knowledge base for predicting protein localization sites in eukaryotic cells

Kenta Nakai 1,1, Minoru Kanehisa 1
PMCID: PMC7134799  PMID: 1478671

Abstract

To automate examination of massive amounts of sequence data for biological function, it is important to computerize interpretation based on empirical knowledge of sequence-function relationships. For this purpose, we have been constructing a knowledge base by organizing various experimental and computational observations as a collection of if—then rules. Here we report an expert system, which utilizes this knowledge base, for predicting localization sites of proteins only from the information on the amino acid sequence and the source origin. We collected data for 401 eukaryotic proteins with known localization sites (subcellular and extracellular) and divided them into training data and testing data. Fourteen localization sites were distinguished for animal cells and 17 for plant cells. When sorting signals were not well characterized experimentally, various sequence features were computationally derived from the training data. It was found that 66% of the training data and 59% of the testing data were correctly predicted by our expert system. This artificial intelligence approach is powerful and flexible enough to be used in genome analyses.

References

  1. Adams M.D., Dubnick M., Kerlavage A.R., Moreno R., Kelley J.M., Utterback T.R., Nagle J.W., Fields C., Venter J.C. Sequence identification of 2,375 human brain genes. Nature. 1992;355:632–634. doi: 10.1038/355632a0. [DOI] [PubMed] [Google Scholar]
  2. Baker K.P., Schatz G. Mitochondrial proteins essential for viability mediate protein import into yeast mitochondria. Nature. 1991;349:205–208. doi: 10.1038/349205a0. [DOI] [PubMed] [Google Scholar]
  3. Baranski T.J., Faust P.L., Kornfeld S. Generation of a lysosomal enzyme targeting signal in the secretory protein pepsinogen. Cell. 1990;63:281–291. doi: 10.1016/0092-8674(90)90161-7. [DOI] [PubMed] [Google Scholar]
  4. Barker W.C., George D.G., Hunt L.T. Protein sequence database. Methods Enzymol. 1990;183:31–49. doi: 10.1016/0076-6879(90)83005-t. [DOI] [PubMed] [Google Scholar]
  5. Bendiak B. A common peptide stretch among enzymes localized to the Golgi apparatus: Structural similarity of Golgi-associated glycosyltransferases. Biochem. Biophys. Res. Commun. 1990;170:879–882. doi: 10.1016/0006-291x(90)92173-w. [DOI] [PubMed] [Google Scholar]
  6. Borst P. How proteins get into microbodies (peroxisomes, glyoxysomes, glycosomes) Biochim. Biophys. Acta. 1986;866:179–203. doi: 10.1016/0167-4781(86)90044-8. [DOI] [PubMed] [Google Scholar]
  7. Chen W.-J., Goldstein J.L., Brown M.S. NPXY, a sequence often found in cytoplasmic tails, is required for coated pit-mediated internalization of the low density lipoprotein receptor. J. Biol. Chem. 1990;265:3116–3123. [PubMed] [Google Scholar]
  8. Chrispeels M.J., Raikhel N.V. Short peptide domains target proteins to plant vacuoles. Cell. 1992;68:613–616. doi: 10.1016/0092-8674(92)90134-x. [DOI] [PubMed] [Google Scholar]
  9. Collawn J.F., Stangel M., Kuhn L.A., Esekogwu V., Jing S., Trowbridge I.S., Tainer J.A. Transferrin receptor internalization sequence YXRF implicates a tight turn as the structural recognition motif for endocytosis. Cell. 1990;63:1061–1072. doi: 10.1016/0092-8674(90)90509-d. [DOI] [PubMed] [Google Scholar]
  10. Cross G.A.M. Glycolipid anchoring of plasma membrane proteins. Annu. Rev. Cell Biol. 1990;6:1–39. doi: 10.1146/annurev.cb.06.110190.000245. [DOI] [PubMed] [Google Scholar]
  11. Dahllöf B., Wallin M., Kvist S. The endoplasmic reticulum retention signal of the E3/19K protein of adenovirus-2 is microtubule binding. J. Biol. Chem. 1991;266:1804–1808. [PubMed] [Google Scholar]
  12. Eisenberg D. Three-dimensional structure of membrane and surface proteins. Annu. Rev. Biochem. 1984;53:595–623. doi: 10.1146/annurev.bi.53.070184.003115. [DOI] [PubMed] [Google Scholar]
  13. Ferguson M.A., Williams A.F. Cell-surface anchoring of proteins via glycosyl-phosphatidylinositol structures. Annu. Rev. Biochem. 1988;57:285–320. doi: 10.1146/annurev.bi.57.070188.001441. [DOI] [PubMed] [Google Scholar]
  14. Forgy C.L. Production Systems Technologies, Inc.; Cold Spring Harbor, NY: 1989. The OPS83 User's Manual System Version 3.0. [Google Scholar]
  15. Gavel Y., von Heijne G. A conserved cleavage-site motif in chloroplast transit peptides. FEBS Lett. 1990;261:455–458. doi: 10.1016/0014-5793(90)80614-o. [DOI] [PubMed] [Google Scholar]
  16. Gavel Y., von Heijne G. Cleavage-site motifs in mitochondrial targeting peptides. Protein Eng. 1990;4:33–37. doi: 10.1093/protein/4.1.33. [DOI] [PubMed] [Google Scholar]
  17. Gould S.J., Keller G.-A., Hosken N., Wilkinson J., Subramani S. A conserved tripeptide sorts proteins to peroxisomes. J. Cell Biol. 1989;108:1657–1664. doi: 10.1083/jcb.108.5.1657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Gould S.J., Keller G.-A., Schneider M., Howell S.H., Garrard L.J., Goodman J.M., Distel B., Tabak H., Subramani S. Peroxisomal protein import is conserved between yeast, plants, insects and mammals. EMBO J. 1990;9:85–90. doi: 10.1002/j.1460-2075.1990.tb08083.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Hamm J., Darzynkiewicz E., Tahara S., Mattaj I.W. The trimethylguanosine cap structure of U1 snRNA is a component of a bipartite nuclear targeting signal. Cell. 1990;62:569–577. doi: 10.1016/0092-8674(90)90021-6. [DOI] [PubMed] [Google Scholar]
  20. Hancock J.F., Paterson H., Marshall C.J. A polybasic domain or palmitoylation is required in addition to the CAAX motif to localize p21ras to the plasma membrane. Cell. 1990;63:133–139. doi: 10.1016/0092-8674(90)90294-o. [DOI] [PubMed] [Google Scholar]
  21. Hand J.M., Szabo L.J., Vasconcelos A.C., Cashmore A.R. The transit peptide of a chloroplast thylakoid membrane protein is functionally equivalent to a stromal-targeting sequence. EMBO J. 1989;8:3195–3206. doi: 10.1002/j.1460-2075.1989.tb08478.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Hartl F.-U., Neupert W. Protein sorting to mitochondria: Evolutionary conservations of folding and assembly. Science. 1990;247:930–938. doi: 10.1126/science.2406905. [DOI] [PubMed] [Google Scholar]
  23. Hartmann E., Rapoport T.A., Lodish H.F. Vol. 86. 1989. Predicting the orientation of eukaryotic membrane spanning proteins; pp. 5786–5790. (Proc. Natl. Acad. Sci. USA). [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Holloway P.W., Buchheit C. Topography of the membrane-binding domain of cytochrome b5 in lipids by fourier-transform infrared spectroscopy. Biochemistry. 1990;29:2623–2634. doi: 10.1021/bi00493a018. [DOI] [PubMed] [Google Scholar]
  25. Holtz D., Tanaka R.A., Hartwig J., McKeon F. The CaaX motif of lamin A functions in conjunction with the nuclear localization signal to target assembly to the nuclear envelope. Cell. 1989;59:969–977. doi: 10.1016/0092-8674(89)90753-8. [DOI] [PubMed] [Google Scholar]
  26. Howe C.J., Wallace T.P. Prediction of leader peptide cleavage sites for polypeptides of the thylakoid lumen. Nucleic Acids Res. 1990;18:3417. doi: 10.1093/nar/18.11.3417. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Hunt T. Cytoplasmic anchoring proteins and the control of nuclear localization. Cell. 1989;59:949–951. doi: 10.1016/0092-8674(89)90747-2. [DOI] [PubMed] [Google Scholar]
  28. Hurtley S.M. Golgi localization signals. Trends Biochem. Sci. 1992;17:2–3. doi: 10.1016/0968-0004(92)90414-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Jackson M.R., Nilsson T., Peterson P.A. Identification of a consensus motif for retention of transmembrane proteins in the endoplasmic reticulum. EMBO J. 1990;9:3153–3162. doi: 10.1002/j.1460-2075.1990.tb07513.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Keegstra K., Olsen L.J., Theg S.M. Chloroplastic precursors and their transport across the envelope membranes. Annu. Rev. Plant Physiol. Plant Mol. Biol. 1989;40:471–501. [Google Scholar]
  31. Klein P., Kanehisa M., DeLisi C. The detection and classification of membrane-spanning proteins. Biochim. Biophys. Acta. 1985;815:468–476. doi: 10.1016/0005-2736(85)90375-x. [DOI] [PubMed] [Google Scholar]
  32. Klionsky D., Emr S.D. A new class of lysosomal/vacuolar protein sorting signals. J. Biol. Chem. 1990;265:5349–5352. [PubMed] [Google Scholar]
  33. Kornfeld S., Mellman I. The biogenesis of lysosomes. Annu. Rev. Cell Biol. 1989;5:483–525. doi: 10.1146/annurev.cb.05.110189.002411. [DOI] [PubMed] [Google Scholar]
  34. Machamer C.E., Rose J.K. A specific transmembrane domain of a coronavirus E1 glycoprotein is required for its retention in the Golgi region. J. Cell Biol. 1987;105:1205–1214. doi: 10.1083/jcb.105.3.1205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Maltese W.A. Posttranslational modification of proteins by isoprenoids in mammalian cells. FASEB J. 1990;4:3319–3328. doi: 10.1096/fasebj.4.15.2123808. [DOI] [PubMed] [Google Scholar]
  36. McGeoch D.J. On the predictive recognition of signal peptide sequences. Virus Res. 1985;3:271–286. doi: 10.1016/0168-1702(85)90051-6. [DOI] [PubMed] [Google Scholar]
  37. McIlhinney R.A.J. The fats of life: The importance and function of protein acylation. Trends Biochem. Sci. 1990;15:387–391. doi: 10.1016/0968-0004(90)90237-6. [DOI] [PubMed] [Google Scholar]
  38. Nakai K., Kanehisa M. Prediction of in-vivo modification sites of proteins from their primary structures. J. Biochem. (Tokyo) 1988;104:693–699. doi: 10.1093/oxfordjournals.jbchem.a122535. [DOI] [PubMed] [Google Scholar]
  39. Nakai K., Kanehisa M. Expert system for predicting protein localization sites in Gram-negative bacteria. Proteins. 1991;11:95–110. doi: 10.1002/prot.340110203. [DOI] [PubMed] [Google Scholar]
  40. Osumi T., Fujiki Y. Topogenesis of peroxisomal proteins. BioEssays. 1990;12:217–222. doi: 10.1002/bies.950120505. [DOI] [PubMed] [Google Scholar]
  41. Parks G.D., Lamb R.A. Topology of Eukaryotic type II membrane proteins: Importance of N-terminal positively charged residues flanking the hydrophobic domain. Cell. 1991;64:777–787. doi: 10.1016/0092-8674(91)90507-u. [DOI] [PubMed] [Google Scholar]
  42. Pelham H.R.B. The retention signal for soluble proteins of the enndoplasmic reticulum. Trends Biochem. Sci. 1990;15:483–486. doi: 10.1016/0968-0004(90)90303-s. [DOI] [PubMed] [Google Scholar]
  43. Pfeffer S.R., Rothman J.E. Biosynthetic protein transport and sorting by the endoplasmic reticulum and Golgi. Annu. Rev. Biochem. 1987;56:829–852. doi: 10.1146/annurev.bi.56.070187.004145. [DOI] [PubMed] [Google Scholar]
  44. Query C., Bentley R.C., Keene J.D. A Common RNA Recognition motif identified within a defined U1 RNA binding domain of the 70K U1 snRNP protein. Cell. 1989;57:89–101. doi: 10.1016/0092-8674(89)90175-x. [DOI] [PubMed] [Google Scholar]
  45. Resh M.D. Membrane interactions of pp60v-src: A model for myristylated tyrosine protein kinases. Oncogene. 1990;5:1437–1444. [PubMed] [Google Scholar]
  46. Robbins J., Dilworth S.M., Laskey R.A., Dingwall C. Two interdependent basic domains in nucleoplasmin nuclear targeting sequence: Identification of a class of bipartite nuclear targeting sequence. Cell. 1991;64:615–623. doi: 10.1016/0092-8674(91)90245-t. [DOI] [PubMed] [Google Scholar]
  47. Rothman J.H., Yamashiro C.T., Kane P.M., Stevens T.H. Protein targeting to the yeast vacuole. Trends Biochem. Sci. 1989;14:347–350. doi: 10.1016/0968-0004(89)90170-9. [DOI] [PubMed] [Google Scholar]
  48. Rubartelli A., Cozzolino F., Talio M., Sitia R. A novel secretory pathway for interleukin-1β, a protein lacking a signal sequence. EMBO J. 1990;9:1503–1510. doi: 10.1002/j.1460-2075.1990.tb08268.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Schultz A.M., Henderson L.E., Oroszlan S. Fatty acylation of proteins. Annu. Rev. Cell. Biol. 1988;4:611–647. doi: 10.1146/annurev.cb.04.110188.003143. [DOI] [PubMed] [Google Scholar]
  50. Silver P.A. How proteins enter the nucleus. Cell. 1991;64:489–497. doi: 10.1016/0092-8674(91)90233-o. [DOI] [PubMed] [Google Scholar]
  51. Simons K., Wandinger-Ness A. Polarized sorting in epithelia. Cell. 1990;62:207–210. doi: 10.1016/0092-8674(90)90357-k. [DOI] [PubMed] [Google Scholar]
  52. Singer S.J. The structure and insertion of integral proteins in membranes. Annu. Rev. Cell Biol. 1990;6:247–296. doi: 10.1146/annurev.cb.06.110190.001335. [DOI] [PubMed] [Google Scholar]
  53. Tanaka H., Shimoi Y. Personal Media, Inc.; 1987. Expert System Kochiku-nohôhô. [in Japanese] [Google Scholar]
  54. Towler D.A., Gordon J.I., Adams S.P., Glaser L. The biology and enzymology of eukaryotic protein acylation. Annu. Rev. Biochem. 1988;57:69–99. doi: 10.1146/annurev.bi.57.070188.000441. [DOI] [PubMed] [Google Scholar]
  55. Underwood M.R., Fried H.M. Characterization of nuclear localizing sequences derived from yeast ribosomal protein L29. EMBO J. 1990;9:91–99. doi: 10.1002/j.1460-2075.1990.tb08084.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Varshavsky A. Naming a targeting signal. Cell. 1991;64:13–15. doi: 10.1016/0092-8674(91)90202-a. [DOI] [PubMed] [Google Scholar]
  57. Verner K., Schatz G. Protein translocation across membranes. Science. 1988;241:1307–1313. doi: 10.1126/science.2842866. [DOI] [PubMed] [Google Scholar]
  58. von Heijne G. A new method for predicting signal sequence cleavage sites. Nucleic Acids Res. 1986;14:4683–4690. doi: 10.1093/nar/14.11.4683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. von Heijne G., Gavel Y. Topogenic signals in integral membrane proteins. Eur. J. Biochem. 1988;174:671–678. doi: 10.1111/j.1432-1033.1988.tb14150.x. [DOI] [PubMed] [Google Scholar]
  60. von Heijne G., Steppuhn J., Herrmann R.G. Domain structure of mitochondrial and chloroplast targeting peptides. Eur. J. Biochem. 1989;180:535–545. doi: 10.1111/j.1432-1033.1989.tb14679.x. [DOI] [PubMed] [Google Scholar]
  61. Waterman D.A. Addison-Wesley; 1986. A Guide to Expert Systems. [Google Scholar]
  62. Williams M.A., Fukuda M. Accumulation of membrane glycoproteins in lysosomes requires a tyrosine residue at a particular position in the cytoplasmic tail. J. Cell Biol. 1990;111:955–966. doi: 10.1083/jcb.111.3.955. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Yoshihisa T., Anraku Y. A novel pathway of import of α-mannosidase, a marker enzyme of vacuolar membrane, in Saccharomyces cerevisiae. J. Biol. Chem. 1990;265:22418–22425. [PubMed] [Google Scholar]
  64. Zhao L.-J., Padmanabhan R. Nuclear transport of adenovirus DNA polymerase is facilitated by interaction with preterminal protein. Cell. 1988;64:13–15. doi: 10.1016/0092-8674(88)90245-0. [DOI] [PubMed] [Google Scholar]

Articles from Genomics are provided here courtesy of Elsevier

RESOURCES