Abstract
Dihedral angles of amino acids are of considerable importance in protein tertiary structure prediction as they define the backbone of a protein and hence almost define the protein's entire conformation. Most ab initio protein structure prediction methods predict the secondary structure of a protein before predicting the tertiary structure because three-dimensional fold consists of repeating units of secondary structures. Hence, both dihedral angles and secondary structures are important in tertiary structure prediction of proteins. Here we describe a database called DASSD (Dihedral Angle and Secondary Structure Database of Short Amino acid Fragments) that contains dihedral angle values and secondary structure details of short amino acid fragments of lengths 1, 3 and 5. Information stored in this database was extracted from a set of 5,227 non-redundant high resolution (less than 2-angstroms) protein structures. In total, DASSD stores details for about 733,000 fragments. This database finds application in the development of ab initio protein structure prediction methods using fragment libraries and fragment assembly techniques. It is also useful in protein secondary structure prediction.
Availability
DASSD can be accessed and downloaded from http://www.cs.rmit.edu.au/dassd/
Keywords: proteins, dihedral angles, secondary structure, tertiary structure, fragments
Background
Proteins are macromolecules that play a very important role in the functioning of living organisms. Proteins are made of 20 different types of amino acids formed in different combinations. Proteins are active only in their native or folded state and prediction of protein structure from sequence is challenging. [1] Each amino acid residue in a protein constitutes the phi and psi dihedral angles. Dihedral angles are of considerable importance in protein structure prediction as they define the backbone of a protein, which together with side chains define the entire protein conformation. [2] The three-dimensional tertiary structure of a protein consists of repeating units of secondary structures. The two major types of secondary structures are alpha helix and beta strands. Most ab initio protein structure prediction methods predict secondary structural elements as a starting point to predict the tertiary structure of proteins. 3 5] Hence, both dihedral angles and secondary structure details of proteins are important in ab initio protein tertiary structure prediction.
The calculation of dihedral angle values from each known structure downloaded from PDB (protein databank) is trivial. However, calculation of all occurring dihedral angle values of a short fragment of amino acids for all known structures available at the PDB is computationally intensive. A protein structure prediction program using fragment assembly techniques would initiate hundreds of queries to obtain dihedral angle values of different short fragments of amino acids. Thus, a database of dihedral angles of short fragments is critical.
Methodology of development
This article describes the dihedral angle and secondary structure database (DASSD) of short amino acid peptide fragments of lengths 1, 3 and 5 derived from PDB (protein databank). For each fragment, all occurring dihedral angles of the middle amino acid residue and its STRIDE secondary structure classification [6] were extracted from a set of 5,227 non-redundant high resolution (less than 2-angstroms) protein structures. The non-redundant dataset (cullpdb_pc90_res2.0_R0.25) was obtained from the PISCES protein sequence culling server for removing redundancy. [8] Even though there are many protein secondary structure assignment programs as described elsewhere [7], we chose the most widely used STRIDE for this analysis [ 7] and it is shown to perform better than DSSP. [ 6]
DASSD contains information for dihedral angles (phi and psi), phi distribution, psi distribution, STRIDE secondary structure classification and Ramachandran plot of the middle residues in about 733,000 different amino acid peptide fragments. Information stored in this database would assist in ab initio protein tertiary structure prediction methods using fragment libraries and fragment assembly techniques. It is also useful in protein secondary structure prediction. Information in DASSD is stored as flat files. PHP is used to extract information for each query from the database. The extracted results from DASSD are displayed using a Java Applet as shown in Figure 1.
Figure 1.
Screen shot of DASSD showing the dihedral angles distribution and the STRIDE secondary structure classifications for the amino acid fragment GLU-ALA-LEU
Utility to the biological community
One of the methods used in many ab initio structure prediction methods is to use secondary structure prediction as a start point to predict the tertiary structure and then use fragment assembly techniques where a library of fragments is generated from which the protein's tertiary structure is built. 3– 4–5–9–10] The proposed database of dihedral angles and secondary structures of short amino acid fragments would assist such ab initio methods as fragment libraries or in building new fragment libraries. The database can also be used to analyze the dihedral angles and secondary structure properties of an amino acid in relation to other amino acids. Such analysis would help in assigning structures for amino acids in tertiary structure prediction and also in protein loop structure prediction. The secondary structures of fragments in DASSD can also be used in secondary structure prediction.
Future development
The current database contains information for dihedral and secondary structure details of peptide fragment length <= 5 residues. We plan to update and improve the database using a more representative dataset of structures for peptide fragment length > 5 residues. Visual comparison of dihedral angles distribution and secondary structure of different fragments will be facilitated in future developments.
Acknowledgments
We thank Victorian Partnership for Advanced Computing (VPAC), Melbourne, Australia for providing access to their parallel computing facility. We also thank Nathan Hall, The Ludwig Institute for Cancer Research, Melbourne, Australia, for his valuable advice.
Footnotes
Citation:Dayalan et al., Bioinformation 1(3): 78-80, (2006)
References
- 1.Ginalski K, et al. Nucleic Acids Res. 2005;33:1874. doi: 10.1093/nar/gki327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Esposito L, et al. J Mol Biol. 2005;347:483. doi: 10.1016/j.jmb.2005.01.065. [DOI] [PubMed] [Google Scholar]
- 3.Rohl CA, et al. Methods Enzymol. 2004;383:66. doi: 10.1016/S0076-6879(04)83004-0. [DOI] [PubMed] [Google Scholar]
- 4.Lee J, et al. Biophys Chem. 2005;115:209. doi: 10.1016/j.bpc.2004.12.046. [DOI] [PubMed] [Google Scholar]
- 5.Lee J, et al. Proteins. 2004;56:704. doi: 10.1002/prot.20150. [DOI] [PubMed] [Google Scholar]
- 6.Frishman D, Argos P. Proteins. 1995;23:566. doi: 10.1002/prot.340230412. [DOI] [PubMed] [Google Scholar]
- 7.Martin J, et al. BMC Struct Biol. 2005;15:17. doi: 10.1186/1472-6807-5-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Wang G, Dunbrack RL. Bioinformatics. 2003;19:1589. doi: 10.1093/bioinformatics/btg224. [DOI] [PubMed] [Google Scholar]
- 9.Jones DT, et al. Proteins. 2005 Epub ahead of print. [Google Scholar]
- 10.Karplus K, et al. Proteins. 2005 Epub ahead of print. [Google Scholar]