Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2008 Sep 26.
Published in final edited form as: Bioinformatics. 2005 Mar 29;21(11):2787–2788. doi: 10.1093/bioinformatics/bti408

GOR V server for protein secondary structure prediction

Taner Z Sen 1,2, Robert L Jernigan 1,2, Jean Garnier 3, Andrzej Kloczkowski 1,
PMCID: PMC2553678  NIHMSID: NIHMS67487  PMID: 15797907

Summary

We have created the GOR V web server for protein secondary structure prediction. The GOR V algorithm combines information theory, Bayesian statistics and evolutionary information. In its fifth version, the GOR method reached (with the full jack-knife procedure) an accuracy of prediction Q3 of 73.5%. Although GOR V has been among the most successful methods, its online unavailability has been a deterrent to its popularity. Here, we remedy this situation by creating the GOR V server.

INTRODUCTION

Structural information can provide insight into protein function, and therefore, high-accuracy prediction of protein structure from its sequence is highly desirable. The availability of structural information may expedite drug design efforts and provide a more detailed understanding of protein–protein interaction networks. Secondary structure prediction methods are also useful for motif detection in globular (Rost, 2001) and membrane proteins (Chen and Rost, 2002; Chen et al., 2002), or for enhancing homology modeling (Schwede et al., 2003).

The protein secondary structure prediction problem has been intensively studied by many research groups for over three decades. The first prediction methods developed by Chou and Fasman (1974), Lim (1974a,b) and Garnier et al. (1978) reached an accuracy of ~60%. Some of the most successful recent methods based on neural networks such as PhD (Rost, 2003) and PSIPRED (Jones, 1999) reported an accuracy of above 76%. Frishman and Argos (1997) reached an accuracy of 74.8% using PREDATOR. Some secondary structure predictions even reached an accuracy of 77% (Levin and Garnier, 1988; Petersen et al., 2000). Support vector machines (Hua and Sun, 2001) and recently fragment databases (Cheng et al., 2005) were also successfully used for the secondary structure prediction, among others. Although the GOR V method is around 5% less accurate (when Q3 values are compared) than the widely used neural-network based methods of PhD and PSIPRED, it may provide complementary information because it is based on different approaches, such as information theory and Bayesian statistics.

The secondary structure predictions are usually compared with DSSP (Kabsch and Sander, 1983) assignments of secondary structure from crystallographically determined coordinates. Although DSSP defines eight different structural elements, these eight states are commonly translated into three secondary structure states: α-helix, β-sheet and coil. This translation is usually performed in the following manner: (1) α-helix in the three letter code corresponds to H (α-helix), G (310 helix) and I (φ-helix) from the DSSP 8-letter code, (2) sheet corresponds to B (bridge—single residue sheet), and E (extended β-strand) in DSSP nomenclature and finally, (3) coil in 3-letter code corresponds to the remaining three DSSP states: T (β-turn), S (bend) and C (coil). In the GOR V output, α-helix is represented by a letter H, β-sheet by E and coil by C.

IMPLEMENTATION

The GOR (Garnier–Osguthorpe–Robson) method uses both information theory and Bayesian statistics for predicting the secondary structure of proteins (Garnier et al., 1978). Over the years, the method has been improved by including larger databases and more detailed statistics, which account not only for amino acid composition but also for amino acid pairs and triplets (Garnier and Robson, 1989; Garnier et al., 1996; Gibrat et al., 1987). These changes were gradually integrated into the first four versions of GOR. The fourth version of GOR algorithm, GOR IV, has been available for many years online at: http://abs.cit.nih.gov/gor/. However since GOR IV does not utilize evolutionary information its accuracy measured by Q3 is (similar to other single sequence-based prediction methods) ~65%.

In the most recent GOR version, GOR V (Kloczkowski et al., 2002), several additional improvements were incorporated into the prediction methodology. The most crucial change in the algorithm was the inclusion of evolutionary information using PSI-BLAST (Altschul et al., 1997). Multiple alignments are generated using PSI-BLAST after five iterations based on the non-redundant database (Benson et al., 1999). The idea behind incorporating multiple sequence alignments into GOR is to increase the information content for improved discrimination among secondary structures. Note that only the sequence information of these multiple alignments is being used in GOR V, and not the secondary structure information of these aligned sequences. In GOR V, the prediction accuracy Q3 using full jackknifing reached 73.5%. The segment overlap (Zemla et al., 1999), which is a measure of normalized secondary structure segments, was 70.8%.

Although GOR V is one of the better secondary structure methods, which provide high prediction accuracy, the public was not able to use GOR V for secondary structure predictions for their own sequences. Now, we have taken the initiative to set up the GOR V server, available to everyone.

Since GOR V is based on completely different principles (such as information theory) than most of the other secondary structure prediction methods, we believe that its inclusion on metaservers for secondary structure prediction would be beneficial, and could improve the overall accuracy of the prediction of metaservers. In our recent work on protein binding site prediction (Sen et al., 2004), we have combined several orthogonal methods, such as support vector machines, threading, conservatism of conservatism and phylogenic trees, and developed a consensus method that has an accuracy of prediction better than each of the individual methods. This shows that consensus predictions benefit from the inclusion of predictions that are not perfect but based on fundamentally different principles.

The GOR V server is based on the database of Cuff and Barton (1999, 2000) of 513 sequentially non-redundant domains, which contains 84 107 residues. To ensure that such a set was representative of available proteins, non-redundancy was defined with stringent tests. Instead of employing a simple percentage of sequence identity between pairs of proteins, a range of sequence alignments and subsequent clusterings were performed. After randomization, only the aligned sequences with a Z-score <5 were considered dissimilar to hinder homology among sequences. Details of this data set can be found in Cuff and Barton (1999, 2000).

The GORV server works in the following manner. When the input sequence is provided by the user, the GORV server that was trained on 513 proteins calculates the helix, sheet and coil probabilities at each residue position and makes an initial prediction based on the structural states having highest probabilities. After this initial prediction, heuristic rules are applied. These rules include converting helices shorter than five residues and sheets shorter than two residues to coil. For a more detailed discussion of these heuristic rules, please refer to the original GOR V paper (Kloczkowski et al., 2002). As output, the user receives the secondary structure prediction for the input sequence and the probabilities for each secondary state element at each position. The prediction results are shown in the web browser, which should stay open during the run, and are also sent to the e-mail address previously provided by the user. Any run-time error message will appear in the web browser, and if any problem arises, the user can contact the system administrator via the e-mail provided on the web page.

For a sequence of 100 amino acids, the secondary structure prediction takes ~1 min. However, the most time consuming steps are PSI-BLAST alignments, that in some cases—for many hits and slowly converging iterations may take considerable time. We have also tested the GOR V server for sequences up to 300 amino acids successfully. Currently, the server is a Linux box with RedHat Enterprise 3.0 system installed with 4.5GB RAM and 140GB memory. The program code is compiled using the Intel Fortran Compiler 8.0.034, and the web interface is established with a CGI script written using HTML and PERL. In the future, we will enhance the GOR V server both in hardware or software for improved performance, especially if user demand necessitates it.

Acknowledgments

T.Z.S., R.L.J. and A.K. were supported by NIH grants R01GM072014 and R21GM066387.

References

  1. Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Benson DA, et al. GenBank. Nucleic Acids Res. 1999;27:12–17. doi: 10.1093/nar/27.1.12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Chen CP, Rost B. State-of-the-art in membrane protein prediction. Appl Bioinformatics. 2002;1:21–35. [PubMed] [Google Scholar]
  4. Chen CP, et al. Transmembrane helix predictions revisited. Protein Sci. 2002;11:2774–2791. doi: 10.1110/ps.0214502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cheng H, et al. Prediction of protein secondary structure by mining fragments database. Polymer. 2005 doi: 10.1016/j.polymer.2005.02.040. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chou PY, Fasman GD. Prediction of protein conformation. Biochemistry (Mosc) 1974;13:222–245. doi: 10.1021/bi00699a002. [DOI] [PubMed] [Google Scholar]
  7. Cuff JA, Barton GJ. Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins. 1999;34:508–519. doi: 10.1002/(sici)1097-0134(19990301)34:4<508::aid-prot10>3.0.co;2-4. [DOI] [PubMed] [Google Scholar]
  8. Cuff JA, Barton GJ. Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins. 2000;40:502–511. doi: 10.1002/1097-0134(20000815)40:3<502::aid-prot170>3.0.co;2-q. [DOI] [PubMed] [Google Scholar]
  9. Frishman D, Argos P. Seventy-five percent accuracy in protein secondary structure prediction. Proteins. 1997;27:329–335. doi: 10.1002/(sici)1097-0134(199703)27:3<329::aid-prot1>3.0.co;2-8. [DOI] [PubMed] [Google Scholar]
  10. Garnier J, Robson B. In: Prediction of protein structure and the principles of protein conformation. Fasman GD, editor. Plenum Press; New York: 1989. pp. 417–465. [Google Scholar]
  11. Garnier J, et al. Analysis of the accuracy and implications of simple methods for predicting the decondary structure of globular proteins. J Mol Biol. 1978;120:97–120. doi: 10.1016/0022-2836(78)90297-8. [DOI] [PubMed] [Google Scholar]
  12. Garnier J, et al. GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol. 1996;266:540–553. doi: 10.1016/s0076-6879(96)66034-0. [DOI] [PubMed] [Google Scholar]
  13. Gibrat JF, et al. Further developments of protein secondary structure prediction using information theory: new parameters and consideration of residue pairs. J Mol Biol. 1987;198:425–443. doi: 10.1016/0022-2836(87)90292-0. [DOI] [PubMed] [Google Scholar]
  14. Hua S, Sun Z. A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. J Mol Biol. 2001;308:397–407. doi: 10.1006/jmbi.2001.4580. [DOI] [PubMed] [Google Scholar]
  15. Jones TD. Protein secondary structure prediction based on position specific matrices. J Mol Biol. 1999;292:195–202. doi: 10.1006/jmbi.1999.3091. [DOI] [PubMed] [Google Scholar]
  16. Kabsch W, Sander C. A dictionary of secondary structure. Biopolymers. 1983;22:2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
  17. Kloczkowski A, et al. Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence. Proteins. 2002;49:154–166. doi: 10.1002/prot.10181. [DOI] [PubMed] [Google Scholar]
  18. Levin JM, Garnier J. Improvements in a secondary structure prediction method based on a search for local sequence homologies and its use as a model building tool. Biochim Biophys Acta. 1988;955:283–295. doi: 10.1016/0167-4838(88)90206-3. [DOI] [PubMed] [Google Scholar]
  19. Lim V. Structural principles of the globular organization of protein chains: a stereochemical theory of globular protein secondary structure. J Mol Biol. 1974a;88:857–872. doi: 10.1016/0022-2836(74)90404-5. [DOI] [PubMed] [Google Scholar]
  20. Lim V. Algorithm for prediction of β-helical and β-structural regions in globular proteins. J Mol Biol. 1974b;88:873–894. doi: 10.1016/0022-2836(74)90405-7. [DOI] [PubMed] [Google Scholar]
  21. Petersen TN, et al. Prediction of protein secondary structure at 80% accuracy. Proteins. 2000;41:17–20. [PubMed] [Google Scholar]
  22. Rost B. Review: protein secondary structure prediction continues to rise. J Struct Biol. 2001;134:204–218. doi: 10.1006/jsbi.2001.4336. [DOI] [PubMed] [Google Scholar]
  23. Rost B. Prediction in 1D: secondary structure, membrane helices, and accessibility. Methods Biochem Anal. 2003;44:559–587. [PubMed] [Google Scholar]
  24. Schwede T, et al. SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res. 2003;31:3381–3385. doi: 10.1093/nar/gkg520. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Sen TZ, et al. Predicting binding sites of hydrolase–inhibitor complexes by combining several methods. BMC Bioinformatics. 2004;5:205. doi: 10.1186/1471-2105-5-205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Zemla A, et al. Processing and analysis of CASP3 protein structure predictions. Proteins. 1999;(Suppl 3):22–29. doi: 10.1002/(sici)1097-0134(1999)37:3+<22::aid-prot5>3.3.co;2-n. [DOI] [PubMed] [Google Scholar]

RESOURCES