Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 1995 May;4(5):872–884. doi: 10.1002/pro.5560040507

Continuous and discontinuous domains: an algorithm for the automatic generation of reliable protein domain definitions.

A S Siddiqui 1, G J Barton 1
PMCID: PMC2143117  PMID: 7663343

Abstract

An algorithm is presented for the fast and accurate definition of protein structural domains from coordinate data without prior knowledge of the number or type of domains. The algorithm explicitly locates domains that comprise one or two continuous segments of protein chain. Domains that include more than two segments are also located. The algorithm was applied to a nonredundant database of 230 protein structures and the results compared to domain definitions obtained from the literature, or by inspection of the coordinates on molecular graphics. For 70% of the proteins, the derived domains agree with the reference definitions, 18% show minor differences and only 12% (28 proteins) show very different definitions. Three screens were applied to identify the derived domains least likely to agree with the subjective definition set. These screens revealed a set of 173 proteins, 97% of which agree well with the subjective definitions. The algorithm represents a practical domain identification tool that can be run routinely on the entire structural database. Adjustment of parameters also allows smaller compact units to be identified in proteins.

Full Text

The Full Text of this article is available as a PDF (2.4 MB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Baron M., Norman D. G., Campbell I. D. Protein modules. Trends Biochem Sci. 1991 Jan;16(1):13–17. doi: 10.1016/0968-0004(91)90009-k. [DOI] [PubMed] [Google Scholar]
  2. Bryant S. H., Lawrence C. E. An empirical energy function for threading protein sequence through the folding motif. Proteins. 1993 May;16(1):92–112. doi: 10.1002/prot.340160110. [DOI] [PubMed] [Google Scholar]
  3. Campbell I. D., Baron M. The structure and function of protein modules. Philos Trans R Soc Lond B Biol Sci. 1991 May 29;332(1263):165–170. doi: 10.1098/rstb.1991.0045. [DOI] [PubMed] [Google Scholar]
  4. Go M. Modular structural units, exons, and function in chicken lysozyme. Proc Natl Acad Sci U S A. 1983 Apr;80(7):1964–1968. doi: 10.1073/pnas.80.7.1964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Hecht H. J., Kalisz H. M., Hendle J., Schmid R. D., Schomburg D. Crystal structure of glucose oxidase from Aspergillus niger refined at 2.3 A resolution. J Mol Biol. 1993 Jan 5;229(1):153–172. doi: 10.1006/jmbi.1993.1015. [DOI] [PubMed] [Google Scholar]
  6. Holm L., Sander C. Parser for protein folding units. Proteins. 1994 Jul;19(3):256–268. doi: 10.1002/prot.340190309. [DOI] [PubMed] [Google Scholar]
  7. Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
  8. Lin Z., Konno M., Abad-Zapatero C., Wierenga R., Murthy M. R., Ray W. J., Jr, Rossmann M. G. The structure of rabbit muscle phosphoglucomutase at intermediate resolution. J Biol Chem. 1986 Jan 5;261(1):264–274. [PubMed] [Google Scholar]
  9. Russell R. B., Barton G. J. Multiple protein sequence alignment from tertiary structure comparison: assignment of global and residue confidence levels. Proteins. 1992 Oct;14(2):309–323. doi: 10.1002/prot.340140216. [DOI] [PubMed] [Google Scholar]
  10. Russell R. B. Domain insertion. Protein Eng. 1994 Dec;7(12):1407–1410. doi: 10.1093/protein/7.12.1407. [DOI] [PubMed] [Google Scholar]
  11. Ryu S. E., Kwong P. D., Truneh A., Porter T. G., Arthos J., Rosenberg M., Dai X. P., Xuong N. H., Axel R., Sweet R. W. Crystal structure of an HIV-binding recombinant fragment of human CD4. Nature. 1990 Nov 29;348(6300):419–426. doi: 10.1038/348419a0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Wetlaufer D. B. Nucleation, rapid folding, and globular intrachain regions in proteins. Proc Natl Acad Sci U S A. 1973 Mar;70(3):697–701. doi: 10.1073/pnas.70.3.697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Wilson K. P., Shewchuk L. M., Brennan R. G., Otsuka A. J., Matthews B. W. Escherichia coli biotin holoenzyme synthetase/bio repressor crystal structure delineates the biotin- and DNA-binding domains. Proc Natl Acad Sci U S A. 1992 Oct 1;89(19):9257–9261. doi: 10.1073/pnas.89.19.9257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Wodak S. J., Janin J. Location of structural domains in protein. Biochemistry. 1981 Nov 10;20(23):6544–6552. doi: 10.1021/bi00526a005. [DOI] [PubMed] [Google Scholar]
  15. Zehfus M. H. Binary discontinuous compact protein domains. Protein Eng. 1994 Mar;7(3):335–340. doi: 10.1093/protein/7.3.335. [DOI] [PubMed] [Google Scholar]
  16. Zehfus M. H., Rose G. D. Compact units in proteins. Biochemistry. 1986 Sep 23;25(19):5759–5765. doi: 10.1021/bi00367a062. [DOI] [PubMed] [Google Scholar]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES