Abstract
The FSSP database and its new supplement, the Dali Domain Dictionary, present a continuously updated classification of all known 3D protein structures. The classification is derived using an automatic structure alignment program (Dali) for the all-against-all comparison of structures in the Protein Data Bank. From the resulting enumeration of structural neighbours (which form a surprisingly continuous distribution in fold space) we derive a discrete fold classification in three steps: (i) sequence-related families are covered by a representative set of protein chains; (ii) protein chains are decomposed into structural domains based on the recurrence of structural motifs; (iii) folds are defined as tight clusters of domains in fold space. The fold classification, domain definitions and test sets for sequence-structure alignment (threading) are accessible on the web at www.embl-ebi.ac.uk/dali . The web interface provides a rich network of links between neighbours in fold space, between domains and proteins, and between structures and sequences leading, for example, to a database of explicit multiple alignments of protein families in the twilight zone of sequence similarity. The Dali/FSSP organization of protein structures provides a map of the currently known regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination.
Full Text
The Full Text of this article is available as a PDF (206.2 KB).
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Bairoch A. PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res. 1992 May 11;20 (Suppl):2013–2018. doi: 10.1093/nar/20.suppl.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bernstein F. C., Koetzle T. F., Williams G. J., Meyer E. F., Jr, Brice M. D., Rodgers J. R., Kennard O., Shimanouchi T., Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol. 1977 May 25;112(3):535–542. doi: 10.1016/s0022-2836(77)80200-3. [DOI] [PubMed] [Google Scholar]
- Etzold T., Ulyanov A., Argos P. SRS: information retrieval system for molecular biology data banks. Methods Enzymol. 1996;266:114–128. doi: 10.1016/s0076-6879(96)66010-8. [DOI] [PubMed] [Google Scholar]
- Gibrat J. F., Madej T., Bryant S. H. Surprising similarities in structure comparison. Curr Opin Struct Biol. 1996 Jun;6(3):377–385. doi: 10.1016/s0959-440x(96)80058-3. [DOI] [PubMed] [Google Scholar]
- Holm L., Ouzounis C., Sander C., Tuparev G., Vriend G. A database of protein structure families with common folding motifs. Protein Sci. 1992 Dec;1(12):1691–1698. doi: 10.1002/pro.5560011217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holm L., Sander C. An evolutionary treasure: unification of a broad set of amidohydrolases related to urease. Proteins. 1997 May;28(1):72–82. [PubMed] [Google Scholar]
- Holm L., Sander C. Decision support system for the evolutionary classification of protein structures. Proc Int Conf Intell Syst Mol Biol. 1997;5:140–146. [PubMed] [Google Scholar]
- Holm L., Sander C. Enzyme HIT. Trends Biochem Sci. 1997 Apr;22(4):116–117. doi: 10.1016/s0968-0004(97)01021-9. [DOI] [PubMed] [Google Scholar]
- Holm L., Sander C. Mapping the protein universe. Science. 1996 Aug 2;273(5275):595–603. doi: 10.1126/science.273.5275.595. [DOI] [PubMed] [Google Scholar]
- Islam S. A., Luo J., Sternberg M. J. Identification and analysis of domains in proteins. Protein Eng. 1995 Jun;8(6):513–525. doi: 10.1093/protein/8.6.513. [DOI] [PubMed] [Google Scholar]
- Murzin A. G., Brenner S. E., Hubbard T., Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995 Apr 7;247(4):536–540. doi: 10.1006/jmbi.1995.0159. [DOI] [PubMed] [Google Scholar]
- Orengo C. A., Michie A. D., Jones S., Jones D. T., Swindells M. B., Thornton J. M. CATH--a hierarchic classification of protein domain structures. Structure. 1997 Aug 15;5(8):1093–1108. doi: 10.1016/s0969-2126(97)00260-8. [DOI] [PubMed] [Google Scholar]
- Sander C., Schneider R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins. 1991;9(1):56–68. doi: 10.1002/prot.340090107. [DOI] [PubMed] [Google Scholar]
- Schneider R., de Daruvar A., Sander C. The HSSP database of protein structure-sequence alignments. Nucleic Acids Res. 1997 Jan 1;25(1):226–230. doi: 10.1093/nar/25.1.226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Siddiqui A. S., Barton G. J. Continuous and discontinuous domains: an algorithm for the automatic generation of reliable protein domain definitions. Protein Sci. 1995 May;4(5):872–884. doi: 10.1002/pro.5560040507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sowdhamini R., Rufino S. D., Blundell T. L. A database of globular protein structural domains: clustering of representative family members into similar folds. Fold Des. 1996;1(3):209–220. doi: 10.1016/S1359-0278(96)00032-6. [DOI] [PubMed] [Google Scholar]