Abstract
Macromolecules carrying biological information often consist of independent modules containing recurring structural motifs. Detection of a specific structural motif within a protein (or DNA) aids in elucidating the role played by the protein (DNA element) and the mechanism of its operation. The number of crystallographically known structures at high resolution is increasing very rapidly. Yet, comparison of three-dimensional structures is a laborious time-consuming procedure that typically requires a manual phase. To date, there is no fast automated procedure for structural comparisons. We present an efficient O(n3) worst case time complexity algorithm for achieving such a goal (where n is the number of atoms in the examined structure). The method is truly three-dimensional, sequence-order-independent, and thus insensitive to gaps, insertions, or deletions. This algorithm is based on the geometric hashing paradigm, which was originally developed for object recognition problems in computer vision. It introduces an indexing approach based on transformation invariant representations and is especially geared toward efficient recognition of partial structures in rigid objects belonging to large data bases. This algorithm is suitable for quick scanning of structural data bases and will detect a recurring structural motif that is a priori unknown. The algorithm uses protein (or DNA) structures, atomic labels, and their three-dimensional coordinates. Additional information pertaining to the structure speeds the comparisons. The algorithm is straightforwardly parallelizable, and several versions of it for computer vision applications have been implemented on the massively parallel connection machine. A prototype version of the algorithm has been implemented and applied to the detection of substructures in proteins.
Full text
PDF![10495](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/050d/52955/4fe8454a1631/pnas01073-0130.png)
![10496](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/050d/52955/35e13195195a/pnas01073-0131.png)
![10497](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/050d/52955/a7cda3f7249c/pnas01073-0132.png)
![10498](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/050d/52955/e0b443534622/pnas01073-0133.png)
![10499](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/050d/52955/338c5d9b7c69/pnas01073-0134.png)
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Abagyan R. A., Maiorov V. N. A simple qualitative representation of polypeptide chain folds: comparison of protein tertiary structures. J Biomol Struct Dyn. 1988 Jun;5(6):1267–1279. doi: 10.1080/07391102.1988.10506469. [DOI] [PubMed] [Google Scholar]
- Abel T., Maniatis T. Gene regulation. Action of leucine zippers. Nature. 1989 Sep 7;341(6237):24–25. doi: 10.1038/341024a0. [DOI] [PubMed] [Google Scholar]
- Courey A. J., Tjian R. Analysis of Sp1 in vivo reveals multiple transcriptional domains, including a novel glutamine-rich activation motif. Cell. 1988 Dec 2;55(5):887–898. doi: 10.1016/0092-8674(88)90144-4. [DOI] [PubMed] [Google Scholar]
- Gehring W. J. Homeo boxes in the study of development. Science. 1987 Jun 5;236(4806):1245–1252. doi: 10.1126/science.2884726. [DOI] [PubMed] [Google Scholar]
- Landschulz W. H., Johnson P. F., McKnight S. L. The leucine zipper: a hypothetical structure common to a new class of DNA binding proteins. Science. 1988 Jun 24;240(4860):1759–1764. doi: 10.1126/science.3289117. [DOI] [PubMed] [Google Scholar]
- Mermod N., O'Neill E. A., Kelly T. J., Tjian R. The proline-rich transcriptional activator of CTF/NF-I is distinct from the replication and DNA binding domain. Cell. 1989 Aug 25;58(4):741–753. doi: 10.1016/0092-8674(89)90108-6. [DOI] [PubMed] [Google Scholar]
- Murre C., McCaw P. S., Baltimore D. A new DNA binding and dimerization motif in immunoglobulin enhancer binding, daughterless, MyoD, and myc proteins. Cell. 1989 Mar 10;56(5):777–783. doi: 10.1016/0092-8674(89)90682-x. [DOI] [PubMed] [Google Scholar]
- Nussinov R. Sequence signals in eukaryotic upstream regions. Crit Rev Biochem Mol Biol. 1990;25(3):185–224. doi: 10.3109/10409239009090609. [DOI] [PubMed] [Google Scholar]
- Pabo C. O., Sauer R. T. Protein-DNA recognition. Annu Rev Biochem. 1984;53:293–321. doi: 10.1146/annurev.bi.53.070184.001453. [DOI] [PubMed] [Google Scholar]
- Ploegman J. H., Drent G., Kalk K. H., Hol W. G. Structure of bovine liver rhodanese. I. Structure determination at 2.5 A resolution and a comparison of the conformation and sequence of its two domains. J Mol Biol. 1978 Aug 25;123(4):557–594. [PubMed] [Google Scholar]
- Rafferty J. B., Somers W. S., Saint-Girons I., Phillips S. E. Three-dimensional crystal structures of Escherichia coli met repressor with and without corepressor. Nature. 1989 Oct 26;341(6244):705–710. doi: 10.1038/341705a0. [DOI] [PubMed] [Google Scholar]
- Renetseder R., Brunie S., Dijkstra B. W., Drenth J., Sigler P. B. A comparison of the crystal structures of phospholipase A2 from bovine pancreas and Crotalus atrox venom. J Biol Chem. 1985 Sep 25;260(21):11627–11634. [PubMed] [Google Scholar]
- Richards F. M., Kundrot C. E. Identification of structural motifs from protein coordinate data: secondary structure and first-level supersecondary structure. Proteins. 1988;3(2):71–84. doi: 10.1002/prot.340030202. [DOI] [PubMed] [Google Scholar]
- Rossmann M. G., Argos P. Exploring structural homology of proteins. J Mol Biol. 1976 Jul 25;105(1):75–95. doi: 10.1016/0022-2836(76)90195-9. [DOI] [PubMed] [Google Scholar]
- Sarai A., Mazur J., Nussinov R., Jernigan R. L. Origin of DNA helical structure and its sequence dependence. Biochemistry. 1988 Nov 1;27(22):8498–8502. doi: 10.1021/bi00422a030. [DOI] [PubMed] [Google Scholar]
- Srinivasan A. R., Torres R., Clark W., Olson W. K. Base sequence effects in double helical DNA. I. Potential energy estimates of local base morphology. J Biomol Struct Dyn. 1987 Dec;5(3):459–496. doi: 10.1080/07391102.1987.10506409. [DOI] [PubMed] [Google Scholar]
- Sutcliffe M. J., Haneef I., Carney D., Blundell T. L. Knowledge based modelling of homologous proteins, Part I: Three-dimensional frameworks derived from the simultaneous superposition of multiple structures. Protein Eng. 1987 Oct-Nov;1(5):377–384. doi: 10.1093/protein/1.5.377. [DOI] [PubMed] [Google Scholar]
- Suzuki M. SPKK, a new nucleic acid-binding unit of protein found in histone. EMBO J. 1989 Mar;8(3):797–804. doi: 10.1002/j.1460-2075.1989.tb03440.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tanaka I., Appelt K., Dijk J., White S. W., Wilson K. S. 3-A resolution structure of a protein with histone-like properties in prokaryotes. Nature. 1984 Aug 2;310(5976):376–381. doi: 10.1038/310376a0. [DOI] [PubMed] [Google Scholar]
- Taylor W. R., Orengo C. A. Protein structure alignment. J Mol Biol. 1989 Jul 5;208(1):1–22. doi: 10.1016/0022-2836(89)90084-3. [DOI] [PubMed] [Google Scholar]
- Thornton J. M., Gardner S. P. Protein motifs and data-base searching. Trends Biochem Sci. 1989 Jul;14(7):300–304. doi: 10.1016/0968-0004(89)90069-8. [DOI] [PubMed] [Google Scholar]
- Tung C. S., Harvey S. C. Base sequence, local helix structure, and macroscopic curvature of A-DNA and B-DNA. J Biol Chem. 1986 Mar 15;261(8):3700–3709. [PubMed] [Google Scholar]
- Ulyanov N. B., Zhurkin V. B. Sequence-dependent anisotropic flexibility of B-DNA. A conformational study. J Biomol Struct Dyn. 1984 Oct;2(2):361–385. doi: 10.1080/07391102.1984.10507573. [DOI] [PubMed] [Google Scholar]
- Weaver L. H., Grütter M. G., Remington S. J., Gray T. M., Isaacs N. W., Matthews B. W. Comparison of goose-type, chicken-type, and phage-type lysozymes illustrates the changes that occur in both amino acid sequence and three-dimensional structure during evolution. J Mol Evol. 1984;21(2):97–111. doi: 10.1007/BF02100084. [DOI] [PubMed] [Google Scholar]