Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2013 May 21;41(Web Server issue):W384–W388. doi: 10.1093/nar/gkt458

GalaxyRefine: protein structure refinement driven by side-chain repacking

Lim Heo 1, Hahnbeom Park 1, Chaok Seok 1,*
PMCID: PMC3692086  PMID: 23737448

Abstract

The quality of model structures generated by contemporary protein structure prediction methods strongly depends on the degree of similarity between the target and available template structures. Therefore, the importance of improving template-based model structures beyond the accuracy available from template information has been emphasized in the structure prediction community. The GalaxyRefine web server, freely available at http://galaxy.seoklab.org/refine, is based on a refinement method that has been successfully tested in CASP10. The method first rebuilds side chains and performs side-chain repacking and subsequent overall structure relaxation by molecular dynamics simulation. According to the CASP10 assessment, this method showed the best performance in improving the local structure quality. The method can improve both global and local structure quality on average, when used for refining the models generated by state-of-the-art protein structure prediction servers.

INTRODUCTION

The structure of a protein can be predicted accurately from its sequence by template-based modeling when the sequence identity is sufficiently high (e.g >30%) (1,2). However, even at a high sequence identity, side-chain structure may be less accurate than the backbone structure, whereas at a lower sequence identity, predicted structures may have significant errors in both side-chain and backbone structures. Although ab initio protein structure predictions from sequences are notoriously difficult (3,4), ab initio refinement starting from a reasonable initial model structure is expected to be less difficult. Successful refinement can increase the applicability range of template-based models by providing more precise structures for functional study, molecular design or experimental structure determination (5,6).

Since 2008, various refinement methods have been tested in the refinement category of the community-wide protein structure prediction experiment Critical Assessment of techniques for protein Structure Prediction (CASP) (5,6). Several methods were shown to improve the initial model structures (7–12). Consistent improvements in such refinement experiments is more difficult than the typical refinement tests performed on lower quality initial structures, as the initial structures are selected from the best models submitted by CASP predictors, which have been already refined by other prediction methods (6).

In this article, we present a new model structure refinement web server called GalaxyRefine that has shown consistent improvement in CASP10, the most recent CASP held in 2012. GalaxyRefine first rebuilds all side-chain conformations and repeatedly relaxes the structure by short molecular dynamics simulations after side-chain repacking perturbations. Interestingly, this method can improve global and local structure quality. The method can improve global and local structure accuracy as well as physical correctness in 59, 67 and 79% of the CASP10 refinement category targets when measured by GDT-HA (13), GDC-SC (14) and MolProbity score (15). This method has been assessed to be more successful in refining the local structure and side-chain quality than any other methods tested in CASP10. GalaxyRefine also provides four additional models generated by relaxation simulations after larger perturbations on secondary structure elements and loops, resulting in larger changes from the initial model structure. GalaxyRefine can improve the models generated by state-of-the-art structure prediction servers such as I-TASSER (16) and ROSETTA (17) when tested on the server models submitted in CASP10.

THE GALAXYREFINE METHOD

GalaxyRefine first rebuilds all side-chains by placing the highest-probability rotamers (18), starting from the core and then extending to the surface layer by layer. On detecting steric clashes, rotamers of the next highest probabilities are attached. After attaching all side chains, the number of neighboring Cβ atoms is counted around each side chain, and the initial side-chain conformation is recovered if the number deviates from the canonical distribution for the amino acid under the same degree of surface exposure.

The model with the rebuilt side chains is then refined by two relaxation methods, a mild relaxation and an aggressive one. The lowest energy model of 32 models generated by the mild relaxation is returned as model 1, and four additional models closest to the four largest clusters of 32 models generated by aggressive relaxation are returned as models 2–5. Both of the methods are based on repetitive relaxations (22 and 17 for mild and aggressive relaxations, respectively) by short molecular dynamics simulations (0.6 and 0.8 ps for mild and aggressive relaxations, respectively) with 4 fs time step after structure perturbations. Structure perturbations are applied only to clusters of side chains in the mild refinement, whereas more forceful perturbations to secondary structure elements and loops are applied in the aggressive refinement. The triaxial loop closure method (19–21) is used to avoid breaks in model structures caused by perturbations to internal torsion angles.

The energy functions used for the two relaxation methods are linear combinations of a physics-based energy function complemented by database-derived terms and a harmonic restraint energy derived from the given initial model structure. The relative weight of the restraint energy to the physics-based energy for the mild relaxation is five times larger than that for the aggressive relaxation. The physics-based energy function contains CHARMM22-based molecular-mechanics bonded energy terms (22), Lennard–Jones interaction energy, Coulomb potential energy, FACTS solvation free energy (23) and solvent accessible surface area energy, whereas the database-derived energy function contains hydrogen bond energy (24), dipolar-DFIRE potential energy (25) and side-chain and backbone torsion angle energy (26).

Performance of the method

The GalaxyRefine method has been extensively tested on (i) the refinement category targets of CASP8 (5), CASP9 (6) and CASP10 (53 proteins), (ii) Zhang-server (I-TASSER) models (84 proteins) (11) and (iii) ROSETTA server models (69 proteins) (17) for CASP10 template-based modeling targets and (iv) FG-MD benchmark set targets (147 proteins) (8). The test results in terms of improvement of model 1 (and the best refined model out of model 1–5) over initial input models for backbone structure accuracy measured by GDT-HA (13), side-chain structure accuracy measured by GDC-SC (14) and physical correctness measured by MolProbity score (15) are summarized in Table 1. The GalaxyRefine server shows average improvement in all test cases except for the MolProbity score of ROSETTA models, which have exceptionally good MolProbity scores. Although GalaxyRefine can improve GDT-HA and GDC-SC for all test sets, the average improvements are small (<1 and <3%, respectively), suggesting the necessity for further improvement in this field. Improvement in MolProbity score is relatively larger with an average improvement of 0.6 (from 2.58 to 1.96). Typical MolProbity scores for experimental structures are in the range of 1–2. A successful refinement example is illustrated in Figure 1.

Table 1.

GalaxyRefine test results for model 1 (and the best model out of model 1–5 in parentheses)

Test set Number of targets Mean improvement/Median improvement/Percentage of improved targets
GDT-HA GDC-SC MolProbity score
CASP refinement category targets CASP8 12 0.57/0.26/50 (1.45/0.63/67) 3.43/3.02/83 (4.07/3.07/83) 0.99/1.14/100a (1.25/1.27/100a)
CASP9 14 0.78/0.72/64 (2.19/1.22/93) 0.62/-0.05/43 (1.09/0.87/57) 0.62/0.44/71 (0.84/0.71/71)
CASP10 27 0.08/0.63/59 (1.06/1.52/67) 1.10/1.36/67 (1.96/2.67/67) 0.70/0.80/79 (1.50/1.47/96)
All 53 0.38/0.63/59 (1.45/1.19/74) 1.50/0.95/64 (2.21/2.36/68) 0.74/0.86/82 (1.26/1.37/90)
CASP10 server models I-TASSERb 84c 0.41/0.44/66 (1.40/1.13/76) 2.52/2.22/87 (3.42/3.08/92) 0.69/0.73/98 (1.01/1.06/99)
ROSETTAd 69c 0.45/0.49/64 (1.33/0.93/75) 0.67/0.59/64 (1.47/1.45/73) −0.03/−0.14/26 (−0.01/−0.05/44)
FG-MD benchmark set 147c 0.61/0.81/65 (1.80/1.69/80) 1.74/1.24/75 (2.78/2.47/87) 0.89/ 0.92/100 (1.18/1.16/100)

aInitial structure for the target TR476 has no side-chain coordinates; therefore, it is excluded in the MolProbity analysis.

bZhang-server models submitted for the CASP10 TS category targets,

cNon-oligomeric targets with TM-score (27) >0.5 and no severe crystallographic contacts.

dROSETTA-BAKER server models submitted for the CASP10 TS category targets.

Figure 1.

Figure 1.

Refinement results for a CASP10 target TR681. (A) The initial structure (pink, GDT-HA = 57.6) and (B) the refined structure (cyan, GDT-HA = 64.1) is shown superimposed to the experimental structure (brown). Multi-criterion kinemage of (C) the initial structure (MolProbity score = 2.90) and (D) the refined structure (MolProbity score = 2.06). MolProbity highlights steric clashes as pink spikes, poor rotamers as gold side-chains and Ramachandran outliers as green lines.

THE GALAXYREFINE SERVER

Hardware and software

The GalaxyRefine server runs on a cluster of 4 Linux servers of 2.33 GHz Intel Xeon 8-core processors. The web application uses Python and the MySQL database. The refinement method implemented in the GALAXY program package (28–31) is written in Fortran 90. The Java viewer JMol (http://www.jmol.org) is used for visualization of predicted structures.

Input and output

The only required input is a single-chain protein structure without internal gap in the PDB format. The expected run time is generally 1–2 h. Five refined models can be viewed and downloaded from the website (Figure 2). Information on structural changes obtained by the refinement of the input structure is provided in terms of GDT-HA, RMSD and MolProbity score in a separate table.

Figure 2.

Figure 2.

GalaxyRefine output page. The five top-ranking models are shown in static images, and they can also be viewed using the Jmol structure viewer. The structure changes relative to the initial model in terms of GDT-HA, RMSD and MolProbity score are presented in a separate table. Three components of the MolProbity score, namely, the number of atomic clashes per 1000 atoms, the percentages of rotamer outliers and Ramachandran favored backbone torsion angles, are also reported in the table.

CONCLUSIONS

GalaxyRefine is a web server for protein model structure refinement that is particularly successful in improving local structure quality as demonstrated by the tests on CASP refinement category targets and CASP10 server models. On average, it shows moderate improvement in backbone structure quality. The server may be used to refine model structures obtained from available structure prediction methods, including the current best template-based modeling servers.

FUNDING

National Research Foundation of Korea funded by the Ministry of Education, Science and Technology [2012-0001641, 2011-0012456 and 2012M3C1A6035362]. Funding for open access charge: Seoul National University.

Conflict of interest statement. None declared.

REFERENCES

  • 1.Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A. Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. 2000;29:291–325. doi: 10.1146/annurev.biophys.29.1.291. [DOI] [PubMed] [Google Scholar]
  • 2.Kryshtafovych A, Fidelis K, Moult J. CASP9 results compared to those of previous CASP experiments. Proteins. 2011;79(Suppl. 10):196–207. doi: 10.1002/prot.23182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ben-David M, Noivirt-Brik O, Paz A, Prilusky J, Sussman JL, Levy Y. Assessment of CASP8 structure predictions for template free targets. Proteins. 2009;77(Suppl. 9):50–65. doi: 10.1002/prot.22591. [DOI] [PubMed] [Google Scholar]
  • 4.Kinch L, Yong Shi S, Cong Q, Cheng H, Liao Y, Grishin NV. CASP9 assessment of free modeling target predictions. Proteins. 2011;79(Suppl. 10):59–73. doi: 10.1002/prot.23181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.MacCallum JL, Hua L, Schnieders MJ, Pande VS, Jacobson MP, Dill KA. Assessment of the protein-structure refinement category in CASP8. Proteins. 2009;77(Suppl. 9):66–80. doi: 10.1002/prot.22538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.MacCallum JL, Perez A, Schnieders MJ, Hua L, Jacobson MP, Dill KA. Assessment of protein structure refinement in CASP9. Proteins. 2011;79(Suppl. 10):74–90. doi: 10.1002/prot.23131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bhattacharya D, Cheng J. 3Drefine: consistent protein structure refinement by optimizing hydrogen bonding network and atomic-level energy minimization. Proteins. 2013;81:119–131. doi: 10.1002/prot.24167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zhang J, Liang Y, Zhang Y. Atomic-level protein structure refinement using fragment-guided molecular dynamics conformation sampling. Structure. 2011;19:1784–1795. doi: 10.1016/j.str.2011.09.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Raman S, Vernon R, Thompson J, Tyka M, Sadreyev R, Pei J, Kim D, Kellogg E, DiMaio F, Lange O, et al. Structure prediction for CASP8 with all-atom refinement using Rosetta. Proteins. 2009;77(Suppl. 9):89–99. doi: 10.1002/prot.22540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Rodrigues JP, Levitt M, Chopra G. KoBaMIN: a knowledge-based minimization web server for protein structure refinement. Nucleic Acids Res. 2012;40:W323–W328. doi: 10.1093/nar/gks376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Xu D, Zhang J, Roy A, Zhang Y. Automated protein structure modeling in CASP9 by I-TASSER pipeline combined with QUARK-based ab initio folding and FG-MD-based structure refinement. Proteins. 2011;79(Suppl. 10):147–160. doi: 10.1002/prot.23111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Park H, Seok C. Refinement of unreliable local regions in template-based protein models. Proteins. 2012;80:1974–1986. doi: 10.1002/prot.24086. [DOI] [PubMed] [Google Scholar]
  • 13.Zemla A. LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res. 2003;31:3370–3374. doi: 10.1093/nar/gkg571. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Keedy DA, Williams CJ, Headd JJ, Arendall WB, III, Chen VB, Kapral GJ, Gillespie RA, Block JN, Zemla A, Richardson DC, et al. The other 90% of the protein: assessment beyond the Calphas for CASP8 template-based and high-accuracy models. Proteins. 2009;77(Suppl. 9):29–49. doi: 10.1002/prot.22551. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chen VB, Arendall WB, III, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW, Richardson JS, Richardson DC. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D Biolo. Crystallogr. 2010;66:12–21. doi: 10.1107/S0907444909042073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Roy A, Kucukural A, Zhang Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nat. Protoc. 2010;5:725–738. doi: 10.1038/nprot.2010.5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, Kaufman K, Renfrew PD, Smith CA, Sheffler W, et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 2011;487:545–574. doi: 10.1016/B978-0-12-381270-4.00019-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Dunbrack RL., Jr Rotamer libraries in the 21st century. Curr. Opin. Struct. Biol. 2002;12:431–440. doi: 10.1016/s0959-440x(02)00344-5. [DOI] [PubMed] [Google Scholar]
  • 19.Ko J, Lee D, Park H, Coutsias EA, Lee J, Seok C. The FALC-Loop web server for protein loop modeling. Nucleic Acids Res. 2011;39:W210–W214. doi: 10.1093/nar/gkr352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Lee J, Lee D, Park H, Coutsias EA, Seok C. Protein loop modeling by using fragment assembly and analytical loop closure. Proteins. 2010;78:3428–3436. doi: 10.1002/prot.22849. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Coutsias EA, Seok C, Jacobson MP, Dill KA. A kinematic view of loop closure. J. Comput. Chem. 2004;25:510–528. doi: 10.1002/jcc.10416. [DOI] [PubMed] [Google Scholar]
  • 22.MacKerell AD, Bashford D, Bellott M, Dunbrack RL, Evanseck JD, Field MJ, Fischer S, Gao J, Guo H, Ha S, et al. All-atom empirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B. 1998;102:3586–3616. doi: 10.1021/jp973084f. [DOI] [PubMed] [Google Scholar]
  • 23.Haberthur U, Caflisch A. FACTS: fast analytical continuum treatment of solvation. J. Comput. Chem. 2008;29:701–715. doi: 10.1002/jcc.20832. [DOI] [PubMed] [Google Scholar]
  • 24.Kortemme T, Morozov AV, Baker D. An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. J. Mol. Biol. 2003;326:1239–1259. doi: 10.1016/s0022-2836(03)00021-4. [DOI] [PubMed] [Google Scholar]
  • 25.Yang Y, Zhou Y. Specific interactions for ab initio folding of protein terminal regions with secondary structures. Proteins. 2008;72:793–803. doi: 10.1002/prot.21968. [DOI] [PubMed] [Google Scholar]
  • 26.Canutescu AA, Shelenkov AA, Dunbrack RL., Jr A graph-theory algorithm for rapid protein side-chain prediction. Protein Sci. 2003;12:2001–2014. doi: 10.1110/ps.03154503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins. 2004;57:702–710. doi: 10.1002/prot.20264. [DOI] [PubMed] [Google Scholar]
  • 28.Ko J, Park H, Heo L, Seok C. GalaxyWEB server for protein structure prediction and refinement. Nucleic Acids Res. 2012;40:W294–W297. doi: 10.1093/nar/gks493. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Ko J, Park H, Seok C. GalaxyTBM: template-based modeling by building a reliable core and refining unreliable local regions. BMC Bioinformatics. 2012;13:198. doi: 10.1186/1471-2105-13-198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Lee H, Park H, Ko J, Seok C. GalaxyGemini: a web server for protein homo-oligomer structure prediction based on similarity. Bioinformatics. 2013;29:1078–1080. doi: 10.1093/bioinformatics/btt079. [DOI] [PubMed] [Google Scholar]
  • 31.Shin WH, Seok C. GalaxyDock: protein-ligand docking with flexible protein side-chains. J. Chem. Inf. Model. 2012;52:3225–3232. doi: 10.1021/ci300342z. [DOI] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES