Abstract
We describe a database of protein structure alignments as well as methods and tools that use this database to improve comparative protein modeling. The current version of the database contains 105 alignments of similar proteins or protein segments. The database comprises 416 entries, 78,495 residues, 1,233 equivalent entry pairs, and 230,396 pairs of equivalent alignment positions. At present, the main application of the database is to improve comparative modeling by satisfaction of spatial restraints implemented in the program MODELLER (Sali A, Blundell TL, 1993, J Mol Biol 234:779-815). To illustrate the usefulness of the database, the restraints on the conformation of a disulfide bridge provided by an equivalent disulfide bridge in a related structure are derived from the alignments; the prediction success of the disulfide dihedral angle classes is increased to approximately 80%, compared to approximately 55% for modeling that relies on the stereochemistry of disulfide bridges alone. The second example of the use of the database is the derivation of the probability density function for comparative modeling of the cis/trans isomerism of the proline residues; the prediction success is increased from 0% to 82.9% for cis-proline and from 93.3% to 96.2% for trans-proline. The database is available via electronic mail.
Full Text
The Full Text of this article is available as a PDF (3.4 MB).
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- A protein sequence/structure database. Protein Engineering Club Database Group. Nature. 1988 Oct 20;335(6192):745–746. doi: 10.1038/335745a0. [DOI] [PubMed] [Google Scholar]
- Bernstein F. C., Koetzle T. F., Williams G. J., Meyer E. F., Jr, Brice M. D., Rodgers J. R., Kennard O., Shimanouchi T., Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol. 1977 May 25;112(3):535–542. doi: 10.1016/s0022-2836(77)80200-3. [DOI] [PubMed] [Google Scholar]
- Bilofsky H. S., Burks C. The GenBank genetic sequence data bank. Nucleic Acids Res. 1988 Mar 11;16(5):1861–1863. doi: 10.1093/nar/16.5.1861. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bowie J. U., Lüthy R., Eisenberg D. A method to identify protein sequences that fold into a known three-dimensional structure. Science. 1991 Jul 12;253(5016):164–170. doi: 10.1126/science.1853201. [DOI] [PubMed] [Google Scholar]
- Browne W. J., North A. C., Phillips D. C., Brew K., Vanaman T. C., Hill R. L. A possible three-dimensional structure of bovine alpha-lactalbumin based on that of hen's egg-white lysozyme. J Mol Biol. 1969 May 28;42(1):65–86. doi: 10.1016/0022-2836(69)90487-2. [DOI] [PubMed] [Google Scholar]
- Bryant S. H. PKB: a program system and data base for analysis of protein structure. Proteins. 1989;5(3):233–247. doi: 10.1002/prot.340050307. [DOI] [PubMed] [Google Scholar]
- Chothia C., Lesk A. M. The relation between the divergence of sequence and structure in proteins. EMBO J. 1986 Apr;5(4):823–826. doi: 10.1002/j.1460-2075.1986.tb04288.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chothia C. Proteins. One thousand families for the molecular biologist. Nature. 1992 Jun 18;357(6379):543–544. doi: 10.1038/357543a0. [DOI] [PubMed] [Google Scholar]
- Chou P. Y., Fasman G. D. Prediction of protein conformation. Biochemistry. 1974 Jan 15;13(2):222–245. doi: 10.1021/bi00699a002. [DOI] [PubMed] [Google Scholar]
- Dhanaraj V., Dealwis C. G., Frazao C., Badasso M., Sibanda B. L., Tickle I. J., Cooper J. B., Driessen H. P., Newman M., Aguilar C. X-ray analyses of peptide-inhibitor complexes define the structural basis of specificity for human and mouse renins. Nature. 1992 Jun 11;357(6378):466–472. doi: 10.1038/357466a0. [DOI] [PubMed] [Google Scholar]
- Dunbrack R. L., Jr, Karplus M. Backbone-dependent rotamer library for proteins. Application to side-chain prediction. J Mol Biol. 1993 Mar 20;230(2):543–574. doi: 10.1006/jmbi.1993.1170. [DOI] [PubMed] [Google Scholar]
- Finkelstein A. V., Reva B. A. A search for the most stable folds of protein chains. Nature. 1991 Jun 6;351(6326):497–499. doi: 10.1038/351497a0. [DOI] [PubMed] [Google Scholar]
- Flaherty K. M., DeLuca-Flaherty C., McKay D. B. Three-dimensional structure of the ATPase fragment of a 70K heat-shock cognate protein. Nature. 1990 Aug 16;346(6285):623–628. doi: 10.1038/346623a0. [DOI] [PubMed] [Google Scholar]
- George D. G., Barker W. C., Hunt L. T. The protein identification resource (PIR). Nucleic Acids Res. 1986 Jan 10;14(1):11–15. doi: 10.1093/nar/14.1.11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Godzik A., Kolinski A., Skolnick J. Topology fingerprint approach to the inverse protein folding problem. J Mol Biol. 1992 Sep 5;227(1):227–238. doi: 10.1016/0022-2836(92)90693-e. [DOI] [PubMed] [Google Scholar]
- Gribskov M., McLachlan A. D., Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. doi: 10.1073/pnas.84.13.4355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heinz D. W., Baase W. A., Dahlquist F. W., Matthews B. W. How amino-acid insertions are allowed in an alpha-helix of T4 lysozyme. Nature. 1993 Feb 11;361(6412):561–564. doi: 10.1038/361561a0. [DOI] [PubMed] [Google Scholar]
- Holm L., Ouzounis C., Sander C., Tuparev G., Vriend G. A database of protein structure families with common folding motifs. Protein Sci. 1992 Dec;1(12):1691–1698. doi: 10.1002/pro.5560011217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hubbard T. J., Blundell T. L. Comparison of solvent-inaccessible cores of homologous proteins: definitions useful for protein modelling. Protein Eng. 1987 Jun;1(3):159–171. doi: 10.1093/protein/1.3.159. [DOI] [PubMed] [Google Scholar]
- Huysmans M., Richelle J., Wodak S. J. SESAM: a relational database for structure and sequence of macromolecules. Proteins. 1991;11(1):59–76. doi: 10.1002/prot.340110108. [DOI] [PubMed] [Google Scholar]
- Islam S. A., Sternberg M. J. A relational database of protein structures designed for flexible enquiries about conformation. Protein Eng. 1989 Mar;2(6):431–442. doi: 10.1093/protein/2.6.431. [DOI] [PubMed] [Google Scholar]
- Janin J., Wodak S. Conformation of amino acid side-chains in proteins. J Mol Biol. 1978 Nov 5;125(3):357–386. doi: 10.1016/0022-2836(78)90408-4. [DOI] [PubMed] [Google Scholar]
- Johnson M. S., Overington J. P., Blundell T. L. Alignment and searching for common protein folds using a data bank of structural templates. J Mol Biol. 1993 Jun 5;231(3):735–752. doi: 10.1006/jmbi.1993.1323. [DOI] [PubMed] [Google Scholar]
- Jones D. T., Taylor W. R., Thornton J. M. A new approach to protein fold recognition. Nature. 1992 Jul 2;358(6381):86–89. doi: 10.1038/358086a0. [DOI] [PubMed] [Google Scholar]
- Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
- Lüthy R., Bowie J. U., Eisenberg D. Assessment of protein models with three-dimensional profiles. Nature. 1992 Mar 5;356(6364):83–85. doi: 10.1038/356083a0. [DOI] [PubMed] [Google Scholar]
- Lüthy R., McLachlan A. D., Eisenberg D. Secondary structure-based profiles: use of structure-conserving scoring tables in searching protein sequence databases for structural similarities. Proteins. 1991;10(3):229–239. doi: 10.1002/prot.340100307. [DOI] [PubMed] [Google Scholar]
- MacArthur M. W., Thornton J. M. Influence of proline residues on protein conformation. J Mol Biol. 1991 Mar 20;218(2):397–412. doi: 10.1016/0022-2836(91)90721-h. [DOI] [PubMed] [Google Scholar]
- Manavalan P., Ponnuswamy P. K. Hydrophobic character of amino acid residues in globular proteins. Nature. 1978 Oct 19;275(5681):673–674. doi: 10.1038/275673a0. [DOI] [PubMed] [Google Scholar]
- Mottonen J., Strand A., Symersky J., Sweet R. M., Danley D. E., Geoghegan K. F., Gerard R. D., Goldsmith E. J. Structural basis of latency in plasminogen activator inhibitor-1. Nature. 1992 Jan 16;355(6357):270–273. doi: 10.1038/355270a0. [DOI] [PubMed] [Google Scholar]
- Orengo C. A., Flores T. P., Taylor W. R., Thornton J. M. Identification and classification of protein fold families. Protein Eng. 1993 Jul;6(5):485–500. doi: 10.1093/protein/6.5.485. [DOI] [PubMed] [Google Scholar]
- Overington J. P., Zhu Z. Y., Sali A., Johnson M. S., Sowdhamini R., Louie G. V., Blundell T. L. Molecular recognition in protein families: a database of aligned three-dimensional structures of related proteins. Biochem Soc Trans. 1993 Aug;21(3):597–604. doi: 10.1042/bst0210597. [DOI] [PubMed] [Google Scholar]
- Overington J., Donnelly D., Johnson M. S., Sali A., Blundell T. L. Environment-specific amino acid substitution tables: tertiary templates and prediction of protein folds. Protein Sci. 1992 Feb;1(2):216–226. doi: 10.1002/pro.5560010203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pabo C. O., Suchanek E. G. Computer-aided model-building strategies for protein design. Biochemistry. 1986 Oct 7;25(20):5987–5991. doi: 10.1021/bi00368a023. [DOI] [PubMed] [Google Scholar]
- Pascarella S., Argos P. A data bank merging related protein structures and sequences. Protein Eng. 1992 Mar;5(2):121–137. doi: 10.1093/protein/5.2.121. [DOI] [PubMed] [Google Scholar]
- Ponder J. W., Richards F. M. Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. J Mol Biol. 1987 Feb 20;193(4):775–791. doi: 10.1016/0022-2836(87)90358-5. [DOI] [PubMed] [Google Scholar]
- Richardson J. S., Richardson D. C. Amino acid preferences for specific locations at the ends of alpha helices. Science. 1988 Jun 17;240(4859):1648–1652. doi: 10.1126/science.3381086. [DOI] [PubMed] [Google Scholar]
- Richardson J. S. The anatomy and taxonomy of protein structure. Adv Protein Chem. 1981;34:167–339. doi: 10.1016/s0065-3233(08)60520-3. [DOI] [PubMed] [Google Scholar]
- Richmond T. J., Richards F. M. Packing of alpha-helices: geometrical constraints and contact areas. J Mol Biol. 1978 Mar 15;119(4):537–555. doi: 10.1016/0022-2836(78)90201-2. [DOI] [PubMed] [Google Scholar]
- Sali A., Blundell T. L. Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol. 1993 Dec 5;234(3):779–815. doi: 10.1006/jmbi.1993.1626. [DOI] [PubMed] [Google Scholar]
- Sali A., Blundell T. L. Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. J Mol Biol. 1990 Mar 20;212(2):403–428. doi: 10.1016/0022-2836(90)90134-8. [DOI] [PubMed] [Google Scholar]
- Sali A., Overington J. P., Johnson M. S., Blundell T. L. From comparisons of protein sequences and structures to protein modelling and design. Trends Biochem Sci. 1990 Jun;15(6):235–240. doi: 10.1016/0968-0004(90)90036-b. [DOI] [PubMed] [Google Scholar]
- Schrauber H., Eisenhaber F., Argos P. Rotamers: to be or not to be? An analysis of amino acid side-chain conformations in globular proteins. J Mol Biol. 1993 Mar 20;230(2):592–612. doi: 10.1006/jmbi.1993.1172. [DOI] [PubMed] [Google Scholar]
- Sippl M. J. Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. J Mol Biol. 1990 Jun 20;213(4):859–883. doi: 10.1016/s0022-2836(05)80269-4. [DOI] [PubMed] [Google Scholar]
- Sowdhamini R., Srinivasan N., Shoichet B., Santi D. V., Ramakrishnan C., Balaram P. Stereochemical modeling of disulfide bridges. Criteria for introduction into proteins by site-directed mutagenesis. Protein Eng. 1989 Nov;3(2):95–103. doi: 10.1093/protein/3.2.95. [DOI] [PubMed] [Google Scholar]
- Stewart D. E., Sarkar A., Wampler J. E. Occurrence and role of cis peptide bonds in protein structures. J Mol Biol. 1990 Jul 5;214(1):253–260. doi: 10.1016/0022-2836(90)90159-J. [DOI] [PubMed] [Google Scholar]
- Sutcliffe M. J., Haneef I., Carney D., Blundell T. L. Knowledge based modelling of homologous proteins, Part I: Three-dimensional frameworks derived from the simultaneous superposition of multiple structures. Protein Eng. 1987 Oct-Nov;1(5):377–384. doi: 10.1093/protein/1.5.377. [DOI] [PubMed] [Google Scholar]
- Taylor W. R. Towards protein tertiary fold prediction using distance and motif constraints. Protein Eng. 1991 Dec;4(8):853–870. doi: 10.1093/protein/4.8.853. [DOI] [PubMed] [Google Scholar]
- Thornton J. M. Disulphide bridges in globular proteins. J Mol Biol. 1981 Sep 15;151(2):261–287. doi: 10.1016/0022-2836(81)90515-5. [DOI] [PubMed] [Google Scholar]
- Topham C. M., McLeod A., Eisenmenger F., Overington J. P., Johnson M. S., Blundell T. L. Fragment ranking in modelling of protein structure. Conformationally constrained environmental amino acid substitution tables. J Mol Biol. 1993 Jan 5;229(1):194–220. doi: 10.1006/jmbi.1993.1018. [DOI] [PubMed] [Google Scholar]
- Wilmot C. M., Thornton J. M. Beta-turns and their distortions: a proposed new nomenclature. Protein Eng. 1990 May;3(6):479–493. doi: 10.1093/protein/3.6.479. [DOI] [PubMed] [Google Scholar]
- Zhu Z. Y., Sali A., Blundell T. L. A variable gap penalty function and feature weights for protein 3-D structure comparisons. Protein Eng. 1992 Jan;5(1):43–51. doi: 10.1093/protein/5.1.43. [DOI] [PubMed] [Google Scholar]