Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 1995 Mar;4(3):506–520. doi: 10.1002/pro.5560040317

An automatic method involving cluster analysis of secondary structures for the identification of domains in proteins.

R Sowdhamini 1, T L Blundell 1
PMCID: PMC2143076  PMID: 7795532

Abstract

With a growing number of structures available in the Brookhaven Protein Data Bank, automatic methods for domain identification are required for the construction of databases. Domains are considered to be clusters of secondary structure elements. Thus, helices and strands are first clustered using intersecondary structural distances between C alpha positions, and dendrograms based on this distance measure are used to identify domains. Individual domains are recognized by a disjoint factor, which enables the automatic identification and classification into disjoint, interacting, and conjoint domains. Application to a database of 83 protein families and 18 unique structures shows that the approach provides an effective delineation of boundaries and identifies those proteins that can be considered as a single domain. A quantitative estimate of the interaction between domains has been proposed. The database of protein domains is a useful tool for understanding protein folding, for recognizing protein folds, and for understanding structure-activity relationships.

Full Text

The Full Text of this article is available as a PDF (8.7 MB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Argos P. An investigation of oligopeptides linking domains in protein tertiary structures and possible candidates for general gene fusion. J Mol Biol. 1990 Feb 20;211(4):943–958. doi: 10.1016/0022-2836(90)90085-Z. [DOI] [PubMed] [Google Scholar]
  2. Bernstein F. C., Koetzle T. F., Williams G. J., Meyer E. F., Jr, Brice M. D., Rodgers J. R., Kennard O., Shimanouchi T., Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol. 1977 May 25;112(3):535–542. doi: 10.1016/s0022-2836(77)80200-3. [DOI] [PubMed] [Google Scholar]
  3. Berry M. B., Meador B., Bilderback T., Liang P., Glaser M., Phillips G. N., Jr The closed conformation of a highly flexible protein: the structure of E. coli adenylate kinase with bound AMP and AMPPNP. Proteins. 1994 Jul;19(3):183–198. doi: 10.1002/prot.340190304. [DOI] [PubMed] [Google Scholar]
  4. Blundell T. L., Jenkins J. A., Sewell B. T., Pearl L. H., Cooper J. B., Tickle I. J., Veerapandian B., Wood S. P. X-ray analyses of aspartic proteinases. The three-dimensional structure at 2.1 A resolution of endothiapepsin. J Mol Biol. 1990 Feb 20;211(4):919–941. doi: 10.1016/0022-2836(90)90084-Y. [DOI] [PubMed] [Google Scholar]
  5. Crippen G. M. The tree structural organization of proteins. J Mol Biol. 1978 Dec 15;126(3):315–332. doi: 10.1016/0022-2836(78)90043-8. [DOI] [PubMed] [Google Scholar]
  6. Dixon M. M., Nicholson H., Shewchuk L., Baase W. A., Matthews B. W. Structure of a hinge-bending bacteriophage T4 lysozyme mutant, Ile3-->Pro. J Mol Biol. 1992 Oct 5;227(3):917–933. doi: 10.1016/0022-2836(92)90231-8. [DOI] [PubMed] [Google Scholar]
  7. Evans S. V. SETOR: hardware-lighted three-dimensional solid model representations of macromolecules. J Mol Graph. 1993 Jun;11(2):134-8, 127-8. doi: 10.1016/0263-7855(93)87009-t. [DOI] [PubMed] [Google Scholar]
  8. Fitch W. M., Margoliash E. Construction of phylogenetic trees. Science. 1967 Jan 20;155(3760):279–284. doi: 10.1126/science.155.3760.279. [DOI] [PubMed] [Google Scholar]
  9. Gerstein M., Lesk A. M., Chothia C. Structural mechanisms for domain movements in proteins. Biochemistry. 1994 Jun 7;33(22):6739–6749. doi: 10.1021/bi00188a001. [DOI] [PubMed] [Google Scholar]
  10. Go M. Correlation of DNA exonic regions with protein structural units in haemoglobin. Nature. 1981 May 7;291(5810):90–92. doi: 10.1038/291090a0. [DOI] [PubMed] [Google Scholar]
  11. Holm L., Sander C. Parser for protein folding units. Proteins. 1994 Jul;19(3):256–268. doi: 10.1002/prot.340190309. [DOI] [PubMed] [Google Scholar]
  12. Hurley J. H., Thorsness P. E., Ramalingam V., Helmers N. H., Koshland D. E., Jr, Stroud R. M. Structure of a bacterial enzyme regulated by phosphorylation, isocitrate dehydrogenase. Proc Natl Acad Sci U S A. 1989 Nov;86(22):8635–8639. doi: 10.1073/pnas.86.22.8635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Janin J., Chothia C. Domains in proteins: definitions, location, and structural principles. Methods Enzymol. 1985;115:420–430. doi: 10.1016/0076-6879(85)15030-5. [DOI] [PubMed] [Google Scholar]
  14. Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
  15. Kamphuis I. G., Kalk K. H., Swarte M. B., Drenth J. Structure of papain refined at 1.65 A resolution. J Mol Biol. 1984 Oct 25;179(2):233–256. doi: 10.1016/0022-2836(84)90467-4. [DOI] [PubMed] [Google Scholar]
  16. Karplus P. A., Schulz G. E. Refined structure of glutathione reductase at 1.54 A resolution. J Mol Biol. 1987 Jun 5;195(3):701–729. doi: 10.1016/0022-2836(87)90191-4. [DOI] [PubMed] [Google Scholar]
  17. Kikuchi T., Némethy G., Scheraga H. A. Prediction of the location of structural domains in globular proteins. J Protein Chem. 1988 Aug;7(4):427–471. doi: 10.1007/BF01024890. [DOI] [PubMed] [Google Scholar]
  18. Lesk A. M., Chothia C. Elbow motion in the immunoglobulins involves a molecular ball-and-socket joint. Nature. 1988 Sep 8;335(6186):188–190. doi: 10.1038/335188a0. [DOI] [PubMed] [Google Scholar]
  19. Levitt M., Chothia C. Structural patterns in globular proteins. Nature. 1976 Jun 17;261(5561):552–558. doi: 10.1038/261552a0. [DOI] [PubMed] [Google Scholar]
  20. Louie G. V., Brownlie P. D., Lambert R., Cooper J. B., Blundell T. L., Wood S. P., Warren M. J., Woodcock S. C., Jordan P. M. Structure of porphobilinogen deaminase reveals a flexible multidomain polymerase with a single catalytic site. Nature. 1992 Sep 3;359(6390):33–39. doi: 10.1038/359033a0. [DOI] [PubMed] [Google Scholar]
  21. Overington J. P., Zhu Z. Y., Sali A., Johnson M. S., Sowdhamini R., Louie G. V., Blundell T. L. Molecular recognition in protein families: a database of aligned three-dimensional structures of related proteins. Biochem Soc Trans. 1993 Aug;21(3):597–604. doi: 10.1042/bst0210597. [DOI] [PubMed] [Google Scholar]
  22. Phillips D. C. The three-dimensional structure of an enzyme molecule. Sci Am. 1966 Nov;215(5):78–90. doi: 10.1038/scientificamerican1166-78. [DOI] [PubMed] [Google Scholar]
  23. Rao S. T., Rossmann M. G. Comparison of super-secondary structures in proteins. J Mol Biol. 1973 May 15;76(2):241–256. doi: 10.1016/0022-2836(73)90388-4. [DOI] [PubMed] [Google Scholar]
  24. Rose G. D. Hierarchic organization of domains in globular proteins. J Mol Biol. 1979 Nov 5;134(3):447–470. doi: 10.1016/0022-2836(79)90363-2. [DOI] [PubMed] [Google Scholar]
  25. Rufino S. D., Blundell T. L. Structure-based identification and clustering of protein families and superfamilies. J Comput Aided Mol Des. 1994 Feb;8(1):5–27. doi: 10.1007/BF00124346. [DOI] [PubMed] [Google Scholar]
  26. Sali A., Veerapandian B., Cooper J. B., Moss D. S., Hofmann T., Blundell T. L. Domain flexibility in aspartic proteinases. Proteins. 1992 Feb;12(2):158–170. doi: 10.1002/prot.340120209. [DOI] [PubMed] [Google Scholar]
  27. Sternberg M. J., Thornton J. M. On the conformation of proteins: towards the prediction of strand arrangements in beta-pleated sheets. J Mol Biol. 1977 Jun 25;113(2):401–418. doi: 10.1016/0022-2836(77)90149-8. [DOI] [PubMed] [Google Scholar]
  28. Weiss M. S., Schulz G. E. Structure of porin refined at 1.8 A resolution. J Mol Biol. 1992 Sep 20;227(2):493–509. doi: 10.1016/0022-2836(92)90903-w. [DOI] [PubMed] [Google Scholar]
  29. Wetlaufer D. B. Nucleation, rapid folding, and globular intrachain regions in proteins. Proc Natl Acad Sci U S A. 1973 Mar;70(3):697–701. doi: 10.1073/pnas.70.3.697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Wodak S. J., Janin J. Location of structural domains in protein. Biochemistry. 1981 Nov 10;20(23):6544–6552. doi: 10.1021/bi00526a005. [DOI] [PubMed] [Google Scholar]
  31. Zehfus M. H. Binary discontinuous compact protein domains. Protein Eng. 1994 Mar;7(3):335–340. doi: 10.1093/protein/7.3.335. [DOI] [PubMed] [Google Scholar]
  32. Zehfus M. H., Rose G. D. Compact units in proteins. Biochemistry. 1986 Sep 23;25(19):5759–5765. doi: 10.1021/bi00367a062. [DOI] [PubMed] [Google Scholar]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES