Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Oct 13.
Published in final edited form as: Proteins. 2008 Sep;72(4):1171–1188. doi: 10.1002/prot.22005

Refining Homology Models by Combining Replica-Exchange Molecular Dynamics and Statistical Potentials

Jiang Zhu †,1, Hao Fan ‡,$,1, Xavier Periole , Barry Honig †,*, Alan E Mark ‡,§,*
PMCID: PMC2761145  NIHMSID: NIHMS92362  PMID: 18338384

Abstract

A protocol is presented for the global refinement of homology models of proteins. It combines the advantages of temperature-based replica-exchange molecular dynamics (REMD) for conformational sampling and the use of statistical potentials for model selection. The protocol was tested using 21 models. Of these 14 were models of 10 small proteins for which high-resolution crystal structures were available, the remainder were targets of the recent CASPR exercise. It was found that REMD in combination with currently available force fields could sample near-native conformational states starting from high-quality homology models. Conformations in which the backbone RMSD of secondary structure elements (SSE-RMSD) was lower than the starting value by 0.5 to 1.0 Å were found for 15 out of the 21 cases (average 0.82 Å). Furthermore, when a simple scoring function consisting of two statistical potentials was used to rank the structures, one or more structures with SSE-RMSD of at least 0.2 Å lower than the starting value was found among the 5 best ranked structures in 11 out of the 21 cases. The average improvement in SSE-RMSD for the best models was 0.42 Å. However, none of the scoring functions tested identified the structures with the lowest SSE-RMSD as the best models although all identified the native conformation as the one with lowest energy. This suggests that while the proposed protocol proved effective for the refinement of high-quality models of small proteins scoring functions remain one of the major limiting factors in structure refinement. This and other aspects by which the methodology could be further improved are discussed.

Keywords: homology modeling, protein structure prediction, replica-exchange molecular dynamics, statistical potential, structure refinement

INTRODUCTION

To understand the function of a protein at the atomic level the availability of an accurate three-dimensional model is essential. Where an experimentally determined structure is unavailable, structure prediction techniques can often provide sufficient information for many purposes. Despite the substantial progress in ab initio structure prediction,1,2 homology modeling is still the most reliable and widely used approach to obtain high-quality structural models for a given target protein assuming that a suitable template structure can be identified by sequence similarity.37 However, the extent to which current homology models can be used with confidence is unclear.6 This is in part due to alignment errors, but primarily due to the lack of effective methods that can be used to refine the structural models obtained.810

Alignment quality has improved significantly over recent years with the introduction of the PSI-BLAST11 technique and methods that combine profile-profile sequence comparisons1219 and structural information.2035 Nevertheless, problems associated with insertions, deletions and misalignments remain common especially when there is only a remote relationship between the query and the template sequence. These in turn lead to errors in the homology model, including errors in the packing of side chains, poorly defined conformations of loops, and distortions or shifts in secondary structure elements (SSEs). A number of recent studies began to address the alignment errors either by exploring alternative alignments3638 or by searching for reasonable conformations during the model building step.39 In addition to errors stemming from the alignment procedure, there are also unavoidable errors in any homology model due to the fact that the query sequence and the template are by definition different. In principle, this means that all homology models must be refined. It appears optimal to divide the structure refinement process into two stages depending on the type of error to be addressed. First, local structural errors involving side chains, loops and SSEs are detected and removed while the overall structure of the backbone is kept fixed. Second, the protein backbone, which normally is taken directly from the template, is adjusted in an effort to improve the global fold.

Global structural refinement requires both an efficient means to sample conformational space and a means to accurately identify near-native structures.40 While energy minimization (EM) has been used in a number of studies to optimize the initial model,4143 molecular dynamics (MD) simulation is the most commonly used sampling technique in refinement studies. In contrast a wide variety of different approaches have been used to attempt to select near-native conformations. For example, Lee et al44 were able to select a near-native structure (1.8 Å Cα RMSD) from a MD-generated ensemble starting from a model of 2.8 Å RMSD with respect to the experimental structure using a MM-PBSA free energy function. Lu and Skolnick45 used repeated cycles involving short MD simulations followed by scoring using a statistical potential to refine low-resolution ab initio models of 30 small proteins. Fan and Mark46,47 studied the utility of extended MD simulations in water and a combination of MD together with a heuristic chaperon approach for the refinement of model structures generated using ROSETTA. Flohil et al48 used a simple knowledge-based algorithm to select structures from a series of restrained MD simulations in an attempt to refine three homology models from CASP3. Krieger et al49 optimized an all-atom force field that was used to refine 25 models which on average moved 0.1 Å closer to their native structures.

Despite occasional successes, the main conclusions that can be drawn from these studies are that 1) only conformations close to the initial structure are sampled using standard MD techniques (unless long time scales used) and that 2) the ability of the currently available scoring functions to distinguish near-native from compact non-native structures must be improved.47 Most recently, Chen and Brooks50 applied the replica-exchange molecular dynamics (REMD) technique with a generalized Born (GB) solvent model51,52 to the refinement of 5 CASPR (the Continuous CASP Model Refinement Experiment) models. Alternative refinement procedures that do not depend on MD as the search engine have also been developed primarily in the context of ab initio structure prediction. Baker and co-workers2 for example reported encouraging results for 5 of 16 small proteins using a refinement protocol in which multiple rounds of random torsion-angle perturbation and Monte Carlo (MC) relaxation were performed on low-resolution models built from a set of sequence homologues of the target protein using the standard fragment insertion approach from ROSETTA.53 Another method based on fragment assembly and MC simulation is TASSER,54 which has been applied to the refinement of NMR structures55 and dimeric structural models.56 Recently, these ab initio approaches have also been applied to the refinement of homology models. Misura et al57 attempted to refine a series of homology models using a version of ROSETTA in combination with evolutionarily derived distance constraints. They found that in 22 out of 39 cases a model that is closer to the native structure than the template over the aligned regions could be found within the 10 lowest-energy models. However, this method was very computationally intensive with the refinement of one model requiring 90 CPU days. In addition to the approaches outlined above, a number of methods that employ statistical potentials or empirical scoring functions to select the near-native models from an ensemble of homology models have been developed.37,38,43,58,59

In this article, we investigate the utility of a refinement protocol that combines replica-exchange molecular dynamics (REMD) as the primary sampling technique with series of different statistical potentials to select the best models. Temperature-based REMD has been shown to lead to enhanced sampling with respect to standard MD simulation techniques and has been extensively used in peptide folding simulations as well as for the refinement of NMR models.6064 In particular we have shown that the efficiency of temperature-based REMD results from a combination of enhanced sampling and conformational sorting among the temperature range.65 Statistical potentials are widely used in various applications such as decoy discrimination,6669 model evaluation7074 and loop prediction.75,76 In a previous study, we applied a recently developed statistical potential (DFIRE)68 to the refinement of segments of proteins with considerable success.77 The refinement protocol proposed in this study has been tested using models generated for 10 small proteins. In total 14 models were generated with duplicate models being generated for 4 of the 10 proteins. The backbone RMSD of SSEs (SSE-RMSD) for these models ranged from 1.76 to 2.73 Å. Another 7 models from the Continuous CASP Model Refinement Experiment (CASPR) were also included in the analysis. This extended the range of SSE-RMSD values to between 1.33 and 4.14 Å. Overall we find that REMD was the most effective of the range of sampling strategies investigated and that a simple combination of two statistical potentials appeared to be as accurate as two ROSETTA-based scoring functions.53

MATERIALS AND METHODS

Test sets

Two sets of models have been used to test the refinement protocol. The first set was specifically generated for this study in order to effectively test the refinement protocol. For these models it was required that a high-resolution crystal structure be available for the target structure, that a suitable template structure be available, that the proteins be of manageable size, and that the native structure is stable in the force field used. In order to generate these models the following five-step procedure was used. First, 68 proteins were selected from a database of 974 high-resolution (< 1.60 Å) crystal structures.77 Proteins were selected if they had between 70 and 100 residues and there were no gaps or missing atoms in the structures. Second, a list of templates was identified for each protein and sequence alignments generated using HMAP,32 a fold-recognition method that combines sequence and secondary structure profiles. Third, a structural model was built for each template using Nest,78 a model building program that combines rigid-body assembly and torsion-space optimization. Fourth, the models were compared to the corresponding experimental structures and only models for which the SSE-RMSD was in the range 1.0 to 3.0 Å were retained. After this stage only models for 17 proteins of the original 68 remained. Finally, a 5 ns MD simulation at 300K starting from the experimentally determined (native) structure was performed for each of these 17 proteins. The GROMACS package7981 was used to perform the MD simulations and the protocol is described in detail below in the section on “Conformational sampling”. Only models for those proteins for which the secondary structure elements were very stable in the simulations (SSE-RMSD < 1.5 Å) were retained (see Table I). For these 10 proteins the SSE-RMSD of the models ranged from 1.5 to 3.0 Å with respect to the native structure. For most proteins only one model was obtained, however, for four proteins 1cy5, 1mfg, 1r6j and 1wm3 two models were obtained giving 14 models in total. The second set of models against which the protocol was tested consisted of 7 CASPR targets obtained from http://predictioncenter.org/caspR. These proteins range in size from 70 to 138 residues. The coordinates for side chains not present in the crystal structures of the CASPR targets were generated using SCAP82 and gaps in the structure modelled using LOOPY.83 Together, the two sets provided a total of 21 models for 17 proteins (Table I). As can be seen from Figure 1 these proteins have different topologies, secondary structure compositions and shapes. Only one protein, 1k5n, contained any disulfide bonds. In this case, the disulfide bond in the crystal structure was satisfied in the predicted model.

Table I.

Properties of the 17 Proteins Used to Test the Global Refinement Protocol

PDB Cl Description Exp. Resol. Nres SCOP Ncharge
a. Small, globular protein data set
1cy5a A Apaf-1 caspase recruitment domain X-ray 1.30 92 All α −3
1fm0 D Molybdopterin synthase subunit MoaD X-ray 1.45 81 α + β −7
1gxu A Hydrogenase maturation factor Hypf acylphosphatase-like domain X-ray 1.27 88 α + β −2
1k5n B beta-2-microglobulin, light chain X-ray 1.09 100 All β −1
1mfga A Erb-B2 interacting protein X-ray 1.25 95 All β −1
1opd A Histidine-containing protein X-ray 1.50 85 α + β −2
1r6ja A PDZ2 domain of syntenin X-ray 0.73 82 All β 0
1urr A Drosophila melanogaster acylphosphatase X-ray 1.50 97 α + β 6
1wm3a A Human SUMO-2 protein X-ray 1.20 72 α + β −1
1xmt A Putative acetyltransferase X-ray 1.15 95 α + β 1
b. CASPR data set
1xe1 A Hypothetical protein from Pyrococcus furiosus X-ray 2.00 91 All β 3
1vm0 A Hypothetical protein from Arabidopsis thaliana X-ray 1.80 103 α + β 3
1vla A Hydroperoxide resistance protein OsmC X-ray 1.80 138 α + β 0
1whz A Hypothetical protein from Thermus Thermophilus X-ray 1.52 70 α + β 4
1tvg A Human PP25 gene product, HSPC034 X-ray 1.60 137 All β −13
1xg8 A Hypothetical protein from Staphylococcus aureus X-ray 2.10 102 α/β −7
1o13 A Hypothetical NifB protein in FeMo-Co biosynthesis X-ray 1.83 107 α/β 0

The items listed include the PDB entry name, chain identifier, description of the biological function and source of the protein, structure determination method, resolution in Å, total number of residues, SCOP secondary structure class and net charge at a pH of 7.0. For the CASPR target proteins, gaps in the crystal structure were completed using LOOPY and the coordinates of missing side chains generated using SCAP. The leader sequence was removed and selenomethionine (MSB) was changed to methionine (MET).

a

For these proteins multiple starting models were tested in the refinement (see Table II).

Figure 1.

Figure 1

A cartoon representation of the crystal structures of the 17 proteins used to test the global refinement protocol. PDB entry names 1a. 1cy5; 1b. 1fm0; 1c. 1gxu; 1d. 1k5n; 1e. 1mfg; 1f. 1opd; 1g. 1r6j; 1h. 1urr; 1i. 1wm3; 1j. 1xmt; 1k. 1xe1; 1l. 1vm0; 1m. 1vla; 1n. 1whz; 1o. 1tvg; 1p. 1xg8; and 1q. 1o13.

Refinement Protocol

The refinement protocol consisted of three phases: 1) the identification and correction of local structural errors, 2) the sampling of conformational space around the original model using REMD (or other MD procedures), and 3) the selection of native-like structures using a variety of statistical potentials and scoring functions.

Local structural evaluation and correction

The purpose of this step was to identify and correct local structural errors that occurred in side chains, loops and SSEs. First, the local quality of the model was assessed using two statistical potentials and an empirical energy function each of which provide a normalized quality score per residue. These were the DFIRE potential,68 an inverse Born radius (IBR)-based environmental potential (Zhu and Honig, in preparation) and a tabulated soft-core van der Waals potential.84 The per-residue quality score was plotted against residue number and structural details were inspected visually for residues with scores above 2.0. Side chains, loops and segments deemed to be problematic were remodelled without reference to the native structure. Side chains were repacked using SCAP,82 loops remodelled using LOOPY83 and alternative SSE conformations were generated using SegSam.77 The local quality of the resulting structures was then revaluated to ensure an improvement in the quality of the model. Finally, the entire model structure was energy minimized (EM) using minimize.x within the TINKER package.85 For this minimization an all-atom OPLS force field was used together with an implicit solvation model that combines mAGB model86,87 and a surface term.88

Conformational sampling

After local refinement, the 21 models were simulated using REMD in explicit water using a modified version of the GROMACS package.7981 In the REMD scheme,61 a number of independent simulations (replicas) are performed simultaneously at temperatures ranging from T1 to TM, where M is the number of replicas. At regular intervals, the temperatures of neighbouring replicas i and j are exchanged according to the following Metropolis criterion:

acc(ij)=min(1,exp{(βiβj)(EiEj)}) (1)

where β is the reciprocal temperature, 1/kBT, with kB the Boltzmann constant T, the temperature (K) and E is the potential energy of the system. By allowing replicas to explore a range of temperature space, REMD enables the system to cross energy barriers and access regions of conformational space that would be rarely sampled at standard temperatures.

The GROMOS96 43a1 force field89,90 was used in all simulations. The protonation state of the ionizable amino acids was set appropriate for pH 7.0 assuming standard pKas. No counter-ions were added to neutralize the system. Each model was solvated in a rhombic dodecahedron box using the SPC water model.91 The minimum distance between the solute and the wall of the unit cell was 10 Å. A twin-range method was used to calculate the non-bonded interactions. Interactions within the short-range cutoff of 9 Å were updated every step while interactions within the long-range cutoff (14 Å) were updated every 5 steps together with the pairlist. A reaction field correction92 was applied to the electrostatic interactions beyond 14 Å, using a dielectric constant for water of 78. Covalent bonds in the proteins were constrained using the LINCS algorithm93 and the geometry of the water molecules was constrained using the SETTLE algorithm.94 A time step of 2 fs was used. The protein-water system was first minimized using the steepest descent method and then equilibrated by performing a 100 ps MD simulation with positional restraints on the heavy atoms of the protein. The restrained MD simulations were performed at a constant temperature of 300 K and a constant pressure of 1 bar by coupling to an external heat and an isotropic pressure bath.95 In the REMD simulations, the target temperatures for the replicas were determined between 280 and 320 K, using the method proposed by Garcia and Sanbonmatsu.62 For each protein model, the solvated system was equilibrated at five temperatures, T = 275, 287, 300, 312 and 325 K. The averaged energies E from the five simulations were fitted by a polynomial of T. Finally eq. 1 was solved iteratively between 280 and 320 K using P(exchange) ≈ 0.20. The 21 protein models with different numbers of replicas (Table III) were subjected to 5 ns REMD at constant (N, V, T) with exchanges attempted every 1 ps. Snapshots were stored every 2.5 ps, which led to a total of 2000 conformations per replica (temperature). The conformations corresponding to the 5 lowest temperatures (e.g. for 1fm0 T = 276.4, 279.5, 282.6, 285.7, 288.9; for 1vm0 T = 276.4, 278.0, 279.5, 281.1, 282.7) were subjected to further analysis.

Table III.

RMSD analysis of structures sampled by REMD simulation.a

Models No. Repli. RMSDinit RMSDminb RMSDmax RMSDave±std Perc. (%)
a. Small, globular protein data set
1cy5_1 16 2.77 1.77 7.40 3.33 ± 0.97 34.1
1cy5_2 14 1.90 1.31 5.84 2.14 ± 0.33 17.7
1fm0 14 2.23 1.12 3.32 2.09 ± 0.38 63.8
1gxu 16 1.81 1.09 3.19 1.96 ± 0.32 28.9
1k5n 20 2.05 1.69 2.66 2.15 ± 0.15 29.7
1mfg_1 16 2.41 2.19 3.67 2.93 ± 0.25 3.0
1mfg_2 16 1.73 1.16 4.04 2.09 ± 0.48 20.6
1opd 16 3.22 2.32 5.32 3.43 ± 0.53 39.3
1r6j_1 14 2.15 1.07 4.51 2.68 ± 0.78 33.3
1r6j_2 14 1.77 1.32 3.40 2.17 ± 0.33 12.3
1urr 18 2.01 0.96 3.99 2.61 ± 0.69 21.0
1wm3_1 14 2.38 1.85 3.61 2.52 ± 0.22 24.6
1wm3_2 14 1.63 1.05 3.90 2.33 ± 0.48 7.5
1xmt 20 2.69 1.94 4.16 2.98 ± 0.38 22.1

b. CASPR data set
1xe1 16 1.26 0.94 3.04 1.56 ± 0.27 13.7
1vm0 26 3.00 2.50 9.20 4.03 ± 1.01 9.5
1vla 18 2.26 2.20 7.21 4.01 ± 0.82 0.0
1whz 16 1.69 0.75 3.82 1.98 ± 0.72 41.8
1tvg 20 4.29 3.15 6.27 4.48 ± 0.58 33.0
1xg8 18 4.68 3.88 8.65 6.03 ± 0.91 6.8
1o13 18 1.90 1.51 3.49 2.48 ± 0.38 5.3

The items listed include the model name, number of replicas used in REMD simulations, initial RMSD in the REMD simulation, minimum and maximum RMSD that can be found in the REMD simulation, average RMSD (and standard deviation), and the percentage of conformations that have lower RMSD than the starting structure.

a

RMSD here denotes the SSE-RMSD, the backbone RMSD of secondary structure elements. Only conformations sampled in the 5 REMD simulations with the lowest temperatures were used in the calculation of items listed in columns 3 to 6.

b

The RMSD values that are at least 0.5 Å lower than the initial values are in bold.

Two additional sampling protocols based on conventional MD were also tested in addition to the REMD approach and served as controls. The first was a single long simulation 50 ns in length. The second was a series of short independent simulations (10×5 ns). The control simulations were performed at 300K using the same parameter settings as those used in the REMD simulations. Different initial velocities were, however, generated for each simulation.

Model selection using statistical potentials

The main scoring function used for model selection, RAPDF/HB, was a combination of two statistical potentials: a modified version of the RAPDF potential that uses the conditional probability reference state proposed by Samudrala et al66 together with the distance binning procedure of DFIRE by Zhou et al,68 and a modified version of the orientation-dependent hydrogen bonding potential of Kortemme et al.96 Both RAPDF and DFIRE are atom-based, distance-dependent pairwise potentials that can be used to evaluate atom-atom interactions within a given protein structure. In our implementation of the hydrogen bonding potential, 13 hydrogen bonding patterns were explicitly defined according to the secondary structure type and location (backbone or side chain) of the donor and acceptor. A potential-of-mean-force (PMF) table was then derived for each of these patterns based on the statistics of a database of high-resolution crystal structures. Note, these statistical potentials do not distinguish between cysteines that are protonated and those that form disulfide bonds. The physical energy functions used in this study do distinguish between these two cases. The contributions of the two scores were combined using a scaling factor of 1.0 for the RAPDF score and a factor of 5.0 for the hydrogen bonding score. These weight factors were chosen so that the two terms made a comparable contribution to the final score. They were not optimized to give the best performance on this dataset.

Two alternative scoring functions were taken from the ROSETTA package and tested for comparison (http://depts.washington.edu/ventures/UW_Technology/Express_Licenses/Rosetta). The all-atom form of the ROSETTA scoring function53 contains seven terms, including rama (Ramachandran torsion preferences), LJ (Lennard-Jones interactions), hb (hydrogen bonding), solv (solvation), pair (residue pair interactions such as electrostatics and disulfides), dun (rotamer self-energy) and ref (unfolded state reference energy). One of the functions tested was the default ROSETTA score containing all the energy terms as described by Kuhlman et al. (ROSETTA_soft).97 In this function, the softened Lennard-Jones (LJ) potential option was used to compensate for the differences between the atom radii used in the MD simulations (GROMOS96 43a1) and those used in ROSETTA (CHARMM2298). As an alternative, a subscore of the ROSETTA scoring function (“bk_tot” in the ROSETTA output) was used to evaluate the models (ROSETTA_sub). This function contains the same energy terms as in ROSETTA_soft except for the rama and ref terms. In addition, the normal as opposed to the softened LJ potential was used. The standard weighting factors were used for all terms in this function.

In the model selection, only conformations present in the 5 lowest temperatures (starting from ~280 K) of the REMD simulation were subjected to the model selection by the RAPDF/HB, ROSETTA_soft and ROSETTA_sub scores. The SSE-RMSD was used to evaluate the relative effectiveness of the three scores in selecting the most appropriate models. Note that the secondary structure elements were as defined as in the corresponding crystal structure.

RESULTS

Local structural evaluation and correction

In Table II the SSE-RMSD values and the percentage of problematic residues before and after local structure correction are listed. In general, local structure correction did not have a major effect on the SSE-RMSD (± 0.1 Å) but significantly reduced the percentage of residues with a quality score greater than 2.0. In cases where a high percentage of residues (> 30%) were problematic such as for 1opd, 1xmt, 1tvg and 1xg8, the SSE-RMSD of the model did increase after local structure correction and energy minimization. This reflects the low quality of the models and the difficulty of performing manual adjustments in such cases. For the models of 1gxu and 1whz, which had ~30% problematic residues, the SSE-RMSD was improved after model correction and minimization. These two models also show marked improvement after conformational sampling using REMD and model selection. In most cases the secondary structure elements were held fixed during the process of local structural correction. However, for the models 1opd and 1xg8, in which 61.18% and 83.33% of the residues were considered problematic, respectively, SegSam77 was used to generate alternative conformations for those secondary structure elements that contained a high percentage of residues with quality scores > 2.0. Conformations with improved quality scores were selected but, as indicated by the SSE-RMSD values in Table II, such adjustments resulted in a deterioration of the model in a global sense.

Table II.

Structural properties of the 21 models used to test the global refinement protocol and the corresponding modeling information.

Model Template Exp. Seq. ID RMSD0a Perc0b PercSSE0c Local refinementd RMSD1a Perc1b
a. Small, globular protein data set
1cy5_1 1c15 NMR 100 2.73 2.2 1.1 EM 2.77 0.0
1cy5_2 3ygs X-ray 20 1.89 8.6 6.5 L, SC, EM 1.90 1.1
1fm0 1v8c X-ray 26 2.19 29.6 18.5 L, SC, EM 2.23 3.7
1gxu 1ulr X-ray 37 2.11 30.7 21.6 L, SC, EM 1.81 6.8
1k5n 1je6 X-ray 16 2.11 13.0 3.0 L, SC, EM 2.05 1.0
1mfg_1 1qav X-ray 28 2.44 13.7 0.0 L, SC, EM 2.41 2.1
1mfg_2 1qlc NMR 36 1.77 35.8 5.3 L, SC, EM 1.73 2.1
1opd 1k1c NMR 42 2.57 61.2 43.5 L, SC, SG, EM 3.22 3.5
1r6j_1 1ry4 NMR 28 2.15 7.3 2.4 SC, EM 2.15 1.2
1r6j_2 1d5g NMR 17 1.76 17.1 7.3 EM 1.77 1.2
1urr 1y9o NMR 29 2.00 15.5 8.2 SC, EM 2.01 1.0
1wm3_1 1c3t NMR 16 2.55 6.9 5.6 L, SC, EM 2.38 0.0
1wm3_2 1l7y NMR 13 1.79 18.1 9.7 EM 1.63 0.0
1xmt 1r57 NMR 26 2.40 31.6 23.2 L, SC, EM 2.69 4.2

b. CASPR data set
1xe1 1.33 11.0 7.7 EM 1.26 5.5
1vm0 1h0x X-ray 3.05 13.6 8.7 SC, EM 3.00 0.0
1vla 1ml8 X-ray 2.25 3.6 1.4 EM 2.26 5.8
1whz 1.75 31.4 10.0 SC, EM 1.69 2.9
1tvge 1jhj, 1eut, 1czs, 1kex, 1k12 X-ray 4.14 33.6 21.9 SC, EM 4.29 7.3
1xg8e 1h75, 1b4q, 1ego, 1eej X-ray
NMR
3.03 83.3 55.9 L, SC, SG, EM 4.68 10.8
1o13 1eo1 NMR 1.84 18.7 8.4 SC, EM 1.90 1.9

The items listed include the model name, PDB entry name of template(s), structure determination method for the template(s), sequence identity, SSE-RMSD and percentage of problematic residues before the local refinement, percentage of problematic SSE residues before the local refinement, modeling operations taken in the local refinement, SSE-RMSD and percentage of problematic residues after the local refinement and energy minimization. ‘–’ denotes that no information is available.

a

RMSD denotes SSE-RMSD, the backbone RMSD of the secondary structure elements.

b

Perc is the percentage of problematic residues with a score higher than 2.0 from any of the three scoring functions.

c

PercSSE0 is the percentage of problematic residues within SSEs with a score higher than 2.0 before the local refinement.

d

The modeling operations include L, loop prediction; SC, side chain prediction; SG, generation of alternative conformations for the structural segments containing secondary structure elements. EM, energy minimization of the entire model structure.

e

For these two CASPR targets the model was built using fragments of multiple templates.

To illustrate how the local structural errors were identified and corrected the potential scores for the model of 1urr are plotted in Figure 2 before and after local structure correction. Two spatially related regions, residues 22 to 24 and residues 47 to 49, were considered problematic. As can be seen in Figure 3a residue R23 (charged) is buried within the protein in contact with two hydrophobic residues, F22 and V47. After repacking (Figure 2b), R23′ is exposed to water and is roughly in the same position and orientation as in the native conformation. In addition, there is a small rotation of the benzene ring of F22 resulting in the orientation of both F22 and V47 becoming more native-like. Another region, residues 67 to 70 is marked by high van der Waals scores but not recognized by the other two potentials, suggesting a simple violation in local geometry. Therefore, no reconstruction but only the minimization was performed which resulted in a large improvement in the van der Waals scores.

Figure 2.

Figure 2

The per-residue local quality score of the model for protein 1urr (PDB entry name) is plotted as a function of residue number. Three normalized, residue-based scoring functions are used in the local quality assessment including the DFIRE potential, inverse Born radius (IBR)-based environmental potential and tabulated soft-core van der Waals potential. The local quality scores before and after side chain repacking are shown in magenta and black, respectively. The dotted line in blue denotes the cutoff used for the local quality score, a value of 2.0.

Figure 3.

Figure 3

The superposition of the native structure for protein 1urr (PDB entry name) and the model structure before and after side chain repacking is shown in 2a and 2b, respectively. The residues 22 to 24 and 47 to 49 are represented as a stick model, while the protein body is represented as a cartoon model in shadow. In the stick model the carbon atoms of native structure are in yellow while the carbon atoms of model structure are in cyan. In the cartoon model the native structure is in gray while the model structure is in green. The residue name and residue number are only labeled for those of native structure except for R23, for which the corresponding residue in the model structure is labeled as R23′.

Conformational sampling with REMD

The main criterion used to evaluate the sampling efficiency was the SSE-RMSD with respect to the native structure (see Table III). For 15 out of the 21 cases investigated the lowest SSE-RMSD (RMSDmin) was more than 0.5 Å lower than that of the starting structure (RMSDinit). On average the improvement of SSE-RMSD was 0.82 Å. This suggests that REMD is an effective method to obtain near-native conformations. Conformations that were more than 0.2 Å closer to the native structure than the starting structure were found for all models except 1vla. The structure 1vla corresponds to the hydroperoxide resistance protein OsmC. OsmC is a domain-swapped dimer in which the two monomers are arranged head-to-head.99101 In the absence of the second monomer there is a significant change in the orientation of the N-terminal domain with respect to the large C-terminal domain.

The range of SSE-RMSD values observed as indicated by the difference between the values of RMSDmin and RMSDmax shows that a wide variety of conformational states were sampled in the REMD simulations. The percentage of conformations closer to the native state than the initial model varied greatly between the different models. In one case (1fm0) the majority of the conformations sampled (~64%) were closer to the native structure than the starting structure. However, on average only 22.3% of the conformations sampled had SSE-RMSD values lower than that of the starting structure. For 5 of the 21 cases less than 10% of the conformations sampled were closer to the native state than the original model.

Factors affecting REMD sampling

The model of 1mfg-1 was used to illustrate possible effects of the simulation protocol on the sampling efficiency. This model was selected because of its poor enrichment in lower SSE-RMSD conformations (3.0%). Three REMD protocols were tested. First, to determine the reproducibility of the results the simulations were rerun with different initial random atomic velocities. The temperature series were the same as in the original REMD simulation. In the second and third protocols, four additional replicas were used to either extend the range of temperature or reduce the temperature gap between replicas. The temperature series for these two protocols were derived using the same rules as for the standard protocol. The lowest SSE-RMSD found using the three protocols were 2.13, 2.16 and 2.03 Å, while the enrichment of lower SSE-RMSD conformations was 4.4, 7.1 and 8.0% respectively. The increase in the number of replicas appears to have had the greatest effect on sampling.

To illustrate possible effects of the initial conformation and the secondary structure composition of the protein on the extent of sampling in a REMD simulation, a series of simulations were performed starting from the native structures of three proteins, 1cy5, 1urr and 1k5n, using the same protocol that was used during the refinement of models. The three proteins simulated have distinct secondary structure compositions. As can be seen from Table I and Figure 1, 1cy5 is an all α protein, 1urr is an α+β protein and 1k5n is an all β protein. In addition, 1k5n has a disulfide bond that holds the two planar β-sheets tightly against each other. The mean SSE-RMSD values were 1.47, 0.94 and 0.84 Å, and the standard deviation of SSE-RMSD value was 0.69, 0.24 and 0.14 Å for 1cy5, 1urr and 1k5n, respectively. For 1cy5 and 1urr the range of conformations sampled is much smaller when starting from the native conformation than when starting from the model (see Table III). In the case of 1k5n, the sampling is clearly affected by the disulfide bond in both the native structure and the model. The effect of the secondary structure composition, if any, was small. The results suggest that on the time scale investigated the conformations sampled using REMD are largely determined by the initial conformation of the system.

Comparison of REMD and other MD-based protocols

The efficiency of two other MD sampling protocols was also investigated. The sampling protocols consisted of either a single long simulation or multiple short simulations. Table IV compares the results obtained using these approaches to those obtained using REMD with respect to two measures – the lowest SSE-RMSD and the enrichment of conformations with SSE-RMSD values lower than that of the original model. In the case of the single 50 ns simulation, the results were poor. Less than 2% of the conformations sampled had lower SSE-RMSD values than the initial model. Only in five cases 1gxu, 1mfg_2, 1r6j_2, 1wm3_2 and 1whz were a significant number of conformations with SSE-RMSD values lower than the original model sampled. Interestingly, in all of these five cases the starting SSE-RMSD was below 2.0 Å. Performing multiple short MD simulations was significantly better than a single long trajectory. In 14 of the 21 cases, the lowest SSE-RMSD found in the multiple simulations was more than 0.5 Å lower than the starting value and on average 17.1% of conformations sampled had a SSE-RMSD less than the starting model. A two-way analysis of variance (ANOVA) was used to examine if the differences in the results obtained using multiple MD and REMD were statistically significant. It was found that in terms of the lowest SSE-RMSD sampled the difference was not statistically significant (P=0.65). However, in terms of the enrichment, the probability of the results coming from the same distribution was only 5.0% (P=0.05) suggesting that REMD was still a more effective approach.

Table IV.

Comparison of three MD-based sampling protocols.a

REMD
50 ns MD
10×5 ns MD
Models RMSDminb Perc. (%) RMSDminb Perc. (%) RMSDminb Perc. (%)
a. Small, globular protein data set
1cy5_1 1.77 34.1 2.59 0.0 1.66 23.6
1cy5_2 1.31 17.7 1.90 0.0 1.60 3.8
1fm0 1.12 63.8 1.87 0.3 1.26 43.9
1gxu 1.09 28.9 1.10 49.6 1.14 26.2
1k5n 1.69 29.7 1.85 1.9 1.58 41.6
1mfg_1 2.19 3.0 2.29 0.1 2.15 2.0
1mfg_2 1.16 20.6 0.87 93.4 0.95 37.7
1opd 2.32 39.3 2.91 1.9 2.61 12.0
1r6j_1 1.07 33.3 2.15 0.0 1.48 11.5
1r6j_2 1.32 12.3 1.23 70.9 1.03 21.7
1urr 0.96 21.0 1.91 0.0 1.18 8.8
1wm3_1 1.85 24.6 2.00 18.0 1.47 20.8
1wm3_2 1.05 7.5 1.22 1.1 1.00 10.7
1xmt 1.94 22.1 2.48 0.0 1.87 9.9

b. CASPR data set
1xe1 0.94 13.7 1.05 0.7 0.96 11.8
1vm0 2.50 9.5 2.70 0.4 2.64 2.19
1vla 2.20 0.0 2.26 0.0 2.25 0.0
1whz 0.75 41.8 1.16 17.3 0.65 24.8
1tvg 3.15 33.0 3.83 1.5 3.38 21.1
1xg8 3.88 6.8 4.60 0.0 3.87 10.6
1o13 1.51 5.3 1.80 0.0 1.42 11.8

The items listed include the model name, minimum RMSD that can be found in the simulation and the percentage of conformations that have lower RMSD than the starting structure for three MD-based sampling protocols, which are REMD, a single 50ns MD simulation and ten 5ns MD simulations with different initial atomic velocities.

a

RMSD here denotes SSE-RMSD as in Table III. For the REMD protocol, only conformations sampled in the 5 REMD simulations with the lowest temperatures were used in the calculation of items listed in columns 2 and 3. For the other two MD-based protocols, all conformations sampled in the simulation(s) were used in the calculation of items listed in columns 4 to 7.

b

The RMSD values that are at least 0.5 Å lower than the initial values are in bold.

In Figure 4 the SSE-RMSD is plotted as a function of the simulation time for three MD-based sampling protocols. 1opd and 1whz were selected as illustrative examples because they differ markedly in the quality of the initial models. 1opd has relatively low quality with a SSE-RMSD value of 3.22 Å while 1whz is a near-native model with a SSE-RMSD of 1.69 Å. As can be seen from Figure 4, the three protocols exhibit very distinct patterns. The apparent discontinuities in the RMSD in the REMD trajectory (lowest temperature) correspond to exchanges of the replica (conformation) with the one present at the next higher temperature. In the case of 1opd, low-RMSD conformations (<2.5 Å) are only sampled between 1 and 2 ns. In the case of 1whz near-native conformations are sampled throughout the simulation. In addition to the frequent exchange of conformations, a slow increase of SSE-RMSD was observed for 1mfg_1, 1wm3_2, 1vla and 1xg8, for which the percentage of lower-RMSD conformations is below 10% (results not shown). In the single long MD simulation, conformations tended to drift away from sampling near-native states after a few to tens of nanoseconds. This was observed for all models except 1mfg_2. In this case the SSE-RMSD decreased significantly from 1.75 to 0.87 Å during the last 10 ns simulation. Using multiple short MD simulations the range of SSE-RMSD values explored was comparable to the REMD simulations. However, an ANOVA analysis suggested that statistically REMD was still better than multiple short MD simulations with respect to the enrichment in lower-RMSD conformations.

Figure 4.

Figure 4

A plot of the SSE-RMSD as a function of simulation time for the three MD-based sampling protocols and two models, 1opd and 1whz. The REMD trajectory at the lowest temperature is plotted in 4a and 4d, respectively. The single 50ns MD trajectory is plotted in 4b and 4e, respectively. The first five of the ten 5 ns MD trajectories are plotted in 4c and 4f, respectively.

Selection of the best models

Table V shows the SSE-RMSD of “the best model” ranked by different scoring functions. The best model was selected using the following procedure: First, the best scoring model in each of the 5 lowest-temperature REMD trajectories was determined. Then taking these five models the one with the lowest SSE-RMSD was selected as the best model. Using the combination of two statistical potentials (RAPDF/HB) 11 models were selected with a SSE-RMSD of at least 0.2 Å lower than the starting value. In addition less improvement (< 0.2 Å) was observed in 6 models. Using the ROSETTA_soft score, 7 models were selected with SSE-RMSD decreased by more than 0.2 Å and 6 models with less decrease of SSE-RMSD. ROSETTA_sub score yielded better results than ROSETTA_soft: 9 models were improved by at least 0.2 Å and 7 models by less than 0.2 Å. On average, the improvement of the SSE-RMSD with respect to the starting value was 0.21, 0.05 and 0.24 Å for RAPDF/HB, ROSETTA_soft and ROSETTA_sub, respectively. A two-way ANOVA test shows that the results from the three scores do not differ significantly (P=0.12). This suggests that a simple combination of two statistical potentials is as effective as the ROSETTA energy function in terms of model selection.

Table V.

RMSD of models selected by different scoring functions.a

Models RMSDinit RMSDmin RAPDF/HBb ROSETTA_softb ROSETTA_subb RAPDF/HBEMc
a. Small, globular protein data set
1cy5_1 2.77 1.77 2.34 (5) 2.29 2.14 2.17
1cy5_2 1.90 1.31 1.51 (4) 1.39 1.64 1.44
1fm0 2.23 1.12 1.33 (5) 1.46 1.35 1.25
1gxu 1.81 1.09 1.31 (2) 1.59 1.45 1.29
1k5n 2.05 1.69 2.00 (1) 2.29 2.06 1.82
1mfg_1 2.41 2.19 2.90 (0) 2.67 2.31 2.53
1mfg_2 1.73 1.16 1.21 (5) 1.61 1.53 1.42
1opd 3.22 2.32 2.61 (2) 2.90 3.41 3.27
1r6j_1 2.15 1.07 1.83 (2) 2.08 1.24 1.33
1r6j_2 1.77 1.32 2.13 (0) 1.79 1.62 1.81
1urr 2.01 0.96 1.59 (1) 2.35 1.03 1.55
1wm3_1 2.38 1.85 2.20 (3) 2.26 2.26 2.30
1wm3_2 1.63 1.05 1.35 (1) 1.73 1.71 2.15
1xmt 2.69 1.94 3.20 (0) 3.18 2.52 2.91

b. CASPR data set
1xe1 1.26 0.94 1.05 (2) 1.17 1.17 0.87
1vm0 3.00 2.50 2.98 (1) 3.00 2.64 2.84
1vla 2.26 2.20 2.26 (0) 2.26 2.29 2.35
1whz 1.69 0.75 0.91 (5) 1.00 1.00 0.79
1tvg 4.29 3.15 4.38 (0) 3.48 4.21 4.09
1xg8 4.68 3.88 4.50 (2) 5.36 5.36 4.97
1o13 1.90 1.51 1.78 (2) 2.96 1.88 2.14

The items listed include the model name, initial RMSD in the REMD simulation, minimum RMSD can be found in the REMD simulation, RMSD of the best model ranked by RAPDF/HB score, two ROSETTA scores and RAPDF/HB score after energy minimization.

a

RMSD here denotes SSE-RMSD as in Table III. The RMSD values that are at least 0.2 Å lower than the initial values are in bold.

b

For these three scoring functions, the best model was selected using the following procedure. First, the best scoring model in each of the 5 REMD trajectories at the lowest temperatures was determined by a given scoring function. Then taking these five models the one with the lowest SSE-RMSD was selected as the best model. Note that for the RAPDF/HB scoring function the number of models with lower SSE-RMSD than the initial values is listed in parentheses.

c

In the RAPDF/HBEM scoring scheme, the 100 top-scoring structures ranked by the RAPDF/HB function in each of 5 replica simulations used (a total of 500 structures) were subjected to energy minimization. The RAPDF/HB scores were then calculated for the minimized structures. The SSE-RMSD of the best model selected by the same procedure as described above is listed in the last column.

Based on the initial ranking generated using the RAPDF/HB function, the 100 top-scoring REMD snapshots for each of five replicas used (500 structures in total) were selected and subjected to the 1000 steps of energy minimization using L-BFGS truncated-Newton optimization algorithm in conjunction with the OPLS all-atom force field and mAGB/SA solvation model.86,87 The minimized structures were re-ranked by the RAPDF/HB function. The SSE-RMSD of the best model selected by the same procedure as described above is listed for each model in Table V. Using the RAPDF/HB score in the final selection, the SSE-RMSD was improved for 14 models with respect to that before the minimization. However, the number of models that have at least 0.2 Å SSE-RMSD improvement remained the same, 11, after the minimization.

Correlation between scoring functions and SSE-RMSD

Figure 5 shows the RAPDF/HB, ROSETTA_soft and ROSETTA_sub scores plotted against the SSE-RMSD for the conformations taken from the REMD simulation at the lowest temperature investigated. Four models are presented to illustrate the types of correlation observed. Although the different scoring functions appear similarly effective for model selection, RAPDF/HB shows a better correlation with SSE-RMSD than do the two ROSETTA functions. Furthermore, the correlation shown by ROSETTA_soft appears better than ROSETTA_sub. In the case of 1opd (a low-quality model) the correlation is relatively poor for all scoring functions except RAPDF/HB. In the latter case a gap in the energy separates the two conformations with the lowest RMSD from the remainder. However, a number of high-RMSD conformations became energetically more favorable after energy minimization. This led to the selection of a structure with a SSE-RMSD of 3.27 Å using RAPDF/HB. In the case of 1mfg_1, 1r6j_1, 1r6j_2, 1xmt and 1tvg negative correlations between the energy and RMSD are observed. The case of 1r6j_2 is shown in Figure 5. Although it is unclear what causes the correlations to be negative, this problem certainly has affected the results of model selection (Table V). The case of 1vla deserves special mention as this protein was refined as a monomer but in fact forms a domain-swapped dimer. From Table IV it can be seen that using the three MD sampling protocols tested no conformations with lower SSE-RMSDs than the starting model were sampled. Nevertheless, from Figure 5 it can be seen that there is in fact a very weak correlation between the RAPDF/HB score and the SSE-RMSD (correlation coefficient 0.28). No correlation was found using the ROSETTA functions. The final example, 1whz, shows the highest correlation coefficient, 0.63, of all the models tested. From Table V it can be seen that after energy minimization there was a further improvement in the SSE-RMSD from 0.91 to 0.79 Å for this model. Positive correlations were observed for the majority of the models using the RAPDF/HB function. After 1whz the best correlation between the energy and the SSE-RMSD was observed for 1fm0, which also yielded the best sampling result in terms of percentage of low-RMSD conformations.

Figure 5.

Figure 5

The correlation of selection function score and SSE-RMSD for three scoring functions and four models, 1opd, 1r6j_2, 1vla and 1whz. The correlation of RAPDF/HB score and SSE-RMSD is plotted in 5a, 5d, 5g and 5j, respectively. The correlation of ROSETTA_soft score and SSE-RMSD is plotted in 5b, 5e, 5h and 5k, respectively. The correlation of ROSETTA_sub score and SSE-RMSD is plotted in 5c, 5f, 5i and 5l, respectively.

Scoring of the native conformations

As shown in Table III, for 15 out of the 21 models the lowest SSE-RMSD sampled by REMD is on average 0.82 Å lower than that of the starting structure. However, as shown in Table V the scoring functions investigated here had only very limited ability to identify conformations with the lowest SSE-RMSD values. This raises the question of whether these scoring functions could recognize the native structure. In order to address this issue, the RAPDF/HB score for native structure of each of the 17 proteins was calculated and compared to the score obtained for the REMD conformation that had the lowest SSE-RMSD and that had the lowest RAPDF/HB score (Table VI). In all cases the native structure had a significantly lower score than any of the structures sampled in the REMD simulations. This demonstrates that the RAPDF/HB score could in principle be used to identify near-native structures. However, the structures with the lowest SSE-RMSD values were not ranked as the most native-like structure by the potential. A similar analysis performed using the ROSETTA functions yielded similar results (data not shown). Statistical potentials such as RAPDF/HB and ROSETTA are derived from high-resolution crystal structures. These structures are often solved using data obtained at cryogenic temperatures. In addition the rotameric states of the side chains, bond angles and dihedral angles are usually constrained to ideal or equilibrium values. As a consequence, these potentials are weighted toward fine details of side chain packing and hydrogen bond geometries and perform less well when scoring conformations from molecular dynamics simulations which contain thermal noise and which should satisfy the experimental data only as an average over a representative ensemble. It should also be noted that in this work the analysis is based on conformations with low SSE-RMSD values whereas the scoring of the structures was based on the entire molecule. Thus while the secondary structure elements in the conformations selected were native-like, other regions of the protein may have transiently adopted less favorable conformations. There are other possibilities such as the native well on the conformational free energy landscape not being directly accessible from these particular near-native models. In this case the use of alternative sampling methods such as soft-core van der Waals potentials during the REMD simulations or Hamiltonian REMD102104 may be more effective than the temperature REMD alone.

Table VI.

RAPDF/HB scores of native structure and conformations sampled by REMD.

Models Enative Emin Elrms
a. Small, globular protein data set
1cy5_1 −4054 −3124 −3055
1cy5_2 −4054 −3268 −2989
1fm0 −2848 −2307 −2130
1gxu −3297 −2542 −2314
1k5n −3084 −2319 −1982
1mfg_1 −2633 −2330 −1901
1mfg_2 −2633 −2294 −2002
1opd −2850 −2216 −1970
1r6j_1 −2539 −2052 −1938
1r6j_2 −2539 −1902 −1666
1urr −3156 −2529 −2074
1wm3_1 −2588 −2066 −1568
1wm3_2 −2588 −1963 −1746
1xmt −3357 −2227 −1656

b. CASPR data set
1xe1 −2458 −1934 −1794
1vm0 −3642 −2874 −2514
1vla −4843 −3930 −3639
1whz −2894 −2333 −2056
1tvg −3703 −2923 −2253
1xg8 −3122 −2084 −1670
1o13 −3740 −2771 −2264

The items listed include the model name, RAPDF/HB score of the native structure for this model, the lowest RAPDF/HB score of REMD-generated structure and RAPDF/HB score of the lowest SSE-RMSD structure sampled in the REMD simulation.

DISCUSSION

The questions that can be addressed using protein structure prediction techniques depend on the quality of predicted models.6 Although there has been significant progress in structure prediction techniques over the last decade, two crucial problems – structure refinement and model assessment have still not been solved. As mentioned above, it seems reasonable to partition the structure refinement process into two phases: local refinement which primarily involves the correction of errors in side chain packing or within loops and secondary structure elements (SSE) and global refinement which aims at resolving differences in the overall fold of the molecules. These two phases require different strategies and techniques and are normally performed sequentially.105

The problem of detecting local structural errors was addressed more than a decade ago with statistical potentials such as Verify3D70,71 and Prosa.72,73 Since then, a number of statistical potentials have been developed to evaluate protein structures in atomic detail.6669,74 Recently machine-learning techniques have been used to combine various features such as structural properties106 and statistical potentials107 as input to predict local model quality. In the current work the local quality of the model was first assessed using three normalized, residue-based scoring functions, of which two are based on recently derived statistical potentials68 (Zhu and Honig, in preparation). These three scoring functions used in combination appear to be sufficient to identify and address most of the local errors in the models tested. The local structural errors identified in the models were associated with the packing of side chains, loops and SSEs. The first two types of error could be readily corrected using existing methods.82,83 The errors within SSEs were more difficult to address as changes in the secondary structure elements directly affected the global fold. In this work a combination of local sampling77 with manual adjustment was used in an attempt to correct apparent errors in SSEs. In general these changes only lowered the overall quality of the model. This work together with other recent studies on the SSE refinement,77,108,109 suggest that an automated procedure is preferable to manual adjustment during local structural refinement.

The global refinement of structural models of proteins remains a major challenge. Although progress has been marginal, a number of recent studies have attempted to address this problem using different approaches.46,47,57,110 There are two fundamental challenges in global refinement: 1) efficiently sampling the available conformational space and 2) selecting near-native conformations. In this study temperature-based REMD60,61 was the primary method used to sample the conformational space surrounding the initial model. REMD allows exchanges between systems simulated at a range of temperatures. In principle this enables the system to cross energy barriers that would not be possible to cross at lower temperatures. Of equal importance on short time scales, REMD acts to sort a range of independent simulations giving increased weight to low-energy conformations.64 The sampling efficiency of REMD was compared to a single extended simulation and a series of short simulations of equivalent length. Although REMD performed best of the three strategies investigated it was only marginally more efficient than performing multiple short MD simulations. This suggests that on the time scale simulated the potential for REMD to enhance barrier crossings was not significant. One possibility to improve the sampling efficiency might be to include more high-temperature replicas, in this case however it is necessary to also include structural restraints to avoid complete unfolding as proposed by Chen and Brooks.50 In their work, REMD simulations on a broader range of temperature (270 – 600K) were used together with dihedral and distance restraints to maintain the secondary structure elements and overall topology during the refinement. Another possibility may be to use models based on alternative sequence alignments generated from various procedures as replicas in the REMD simulation. REMD provides a general framework through which a range of alternative sampling approaches can be incorporated into refinement calculations,111,112 however further studies are required to demonstrate that REMD truly enhances the sampling of native-like conformations given the limitations in the available force fields.113,114

Even if near-native conformations can be sampled efficiently, the question of how to identify the near-native conformations from a large ensemble of low-energy alternatives remains. Scoring functions must be both fast and accurate. Thus while energy functions based on molecular mechanics force fields in combination with an implicit description of solvation effects might in principle be of sufficient accuracy to discriminate native from non-native conformations,115 such approaches require extensive optimization of protein structures prior to scoring, which is computationally expensive. Statistical potentials in contrast are fast and simple to implement but can be sensitive to slight deviations from ideal geometries. In this study we compared a combination of two statistical potentials (RAPDF/HB)66,96 to alternative functions from ROSETTA. The use of RAPDF as the primary scoring function was based on the observation that RAPDF was more effective at discriminating native from non-native structures obtained from the MD simulations while DFIRE was more effective at discriminating between models generated in the course of structure prediction and modeling.77,107,116 We believe this difference stems from the differences in their respective reference state, which relates to how a random distribution is defined in the derivation of the statistical potential. Our results also suggest that the selection function must be compatible with the sampling protocol in order to achieve the best performance. Incorporation of the hydrogen bonding potential was highly effective in improving the RMSD results. Together, the RAPDF and hydrogen bonding potentials constituted a simple but effective scoring function which could potentially be used as the basis to develop a more sophisticated scoring function. The results from the two ROSETTA functions examined are varied, ROSETTA_soft showed the better correlation between energy and RMSD but was less effective in selecting the conformations with the lowest RMSD. In this function, the rama term that accounts for the preference of backbone dihedrals and the softened LJ term that accounts for atomic clashes in modeling appeared to yield inappropriate rankings when applied to conformations generated using MD simulations.

Although each of the scoring functions could discriminate the native structure from alternative models, none of the scoring functions tested could reliably identify near-native structures sampled during the REMD simulations. One explanation for this could be the sensitivity of these potentials to thermal noise inherent in MD-generated structures. Another contributing factor could be that the scoring functions were applied to the whole structures while the structural comparisons were based on the SSE-RMSD which considers only a subset of backbone atoms. Intuitively a global quality measure that can take into account both backbone and side chain information117 may lead to better correlations, while the use of coarse-grained statistical potentials that simplify side chains118 may be an alternative approach. Independent of the scoring function, it is also possible to improve the model selection by using a structure-clustering algorithm, which has been found to be useful in a number of studies in ab initio structure prediction119,120 and local structural refinement.77 In our study, EM was found to improve the RMSD results for only 60% of the models. This might be due to the inconsistency of using a physical energy function to optimize the structures but a statistical potential to rank the minimized structures. The physical energy function used also contains an electrostatic solvation term86,87 that will affect the structural packing of charged and polar residues during minimization. However, none of the statistical potentials used include such solvation effects explicitly.

In addition to conformational sampling and model selection, other factors may also play a role in structure refinement. One factor is whether the protein forms part of a larger complex. In this case, the model should be refined in the multimeric state. However, this can be extremely challenging and has yet to be properly addressed in structure prediction. Recently, Grimm et al reported a benchmark study of dimeric threading and structure refinement,56 in which three model dimers under optimal conditions were refined. In each case the two models were connected by a 30-glycine linker so that methodology developed for single proteins (TASSER54) could be applied. In our study, the refinement of the isolated monomer of 1vla, which in reality forms a domain-swapping dimer, was the only case in which conformations with lower SSE-RMSD values than the initial model were not sampled during the REMD simulations. Similarly, the scoring functions, which have been primarily developed for single isolated proteins may have to be re-evaluated for use with proteins removed from multimeric complexes. Another factor that affects refinement, in particular the selection step, is the overall quality of the model. In our study, the REMD protocol was an equally effective sampling protocol for both sets of models. In contrast the scoring functions appeared to perform better on the non-CASPR models, for which the structure was of higher quality. Overall the refinement of small, globular and near-native models was more successful than the refinement of irregularly shaped larger models with relatively low quality. This was particularly evident when the RAPDF/HB function was used for selection. The only two successful examples in the CASPR data set were 1xe1 and 1whz, which both are small, globular structures with low starting RMSD values.

In summary, we have presented a simple but effective strategy for refining homology models. A number of fundamental issues in structure refinement, such as conformational sampling and model selection, have been investigated by testing alternative methods on a data set of 21 models with various qualities. We believe that the experience gained from this study will greatly facilitate the design of more effective strategies for global refinement.

Acknowledgments

The authors gratefully thank Brian Kuhlman, Glenn Butterfoss and Jim Havranek for valuable suggestions in regard to the use of the ROSETTA package. This work was supported by the NIH grant GM30518.

References

  • 1.Bonneau R, Baker D. Ab initio protein structure prediction: Progress and prospects. Annual Review of Biophysics and Biomolecular Structure. 2001;30:173–189. doi: 10.1146/annurev.biophys.30.1.173. [DOI] [PubMed] [Google Scholar]
  • 2.Bradley P, Misura KMS, Baker D. Toward high-resolution de novo structure prediction for small proteins. Science. 2005;309(5742):1868–1871. doi: 10.1126/science.1113801. [DOI] [PubMed] [Google Scholar]
  • 3.Sanchez R, Sali A. Advances in comparative protein-structure modelling. Current Opinion in Structural Biology. 1997;7(2):206–214. doi: 10.1016/s0959-440x(97)80027-9. [DOI] [PubMed] [Google Scholar]
  • 4.Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A. Comparative protein structure modeling of genes and genomes. Annual Review of Biophysics and Biomolecular Structure. 2000;29:291–325. doi: 10.1146/annurev.biophys.29.1.291. [DOI] [PubMed] [Google Scholar]
  • 5.Al-Lazikani B, Jung J, Xiang ZX, Honig B. Protein structure prediction. Current Opinion in Chemical Biology. 2001;5(1):51–56. doi: 10.1016/s1367-5931(00)00164-2. [DOI] [PubMed] [Google Scholar]
  • 6.Petrey D, Honig B. Protein structure prediction: Inroads to biology. Molecular Cell. 2005;20(6):811–819. doi: 10.1016/j.molcel.2005.12.005. [DOI] [PubMed] [Google Scholar]
  • 7.Ginalski K. Comparative modeling for protein structure prediction. Current Opinion in Structural Biology. 2006;16(2):172–177. doi: 10.1016/j.sbi.2006.02.003. [DOI] [PubMed] [Google Scholar]
  • 8.Moult J. A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Current Opinion in Structural Biology. 2005;15(3):285–289. doi: 10.1016/j.sbi.2005.05.011. [DOI] [PubMed] [Google Scholar]
  • 9.Valencia A. Protein refinement: A new challenge for CASP in its 10th anniversary. Bioinformatics. 2005;21(3):277–277. doi: 10.1093/bioinformatics/bti249. [DOI] [PubMed] [Google Scholar]
  • 10.Tress M, Ezkurdia L, Grana O, Lopez G, Valencia A. Assessment of predictions submitted for the CASP6 comparative modeling category. Proteins-Structure Function and Bioinformatics. 2005;61:27–45. doi: 10.1002/prot.20720. [DOI] [PubMed] [Google Scholar]
  • 11.Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. 1997;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Heger A, Holm L. Picasso: generating a covering set of protein family profiles. Bioinformatics. 2001;17(3):272–279. doi: 10.1093/bioinformatics/17.3.272. [DOI] [PubMed] [Google Scholar]
  • 13.Pietrokovski S. Searching databases of conserved sequence regions by aligning protein multiple-alignments. Nucleic Acids Research. 1996;24(19):3836–3845. doi: 10.1093/nar/24.19.3836. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Jaroszewski L, Rychlewski L, Godzik A. Improving the quality of twilight-zone alignments. Protein Science. 2000;9(8):1487–1496. doi: 10.1110/ps.9.8.1487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Rychlewski L, Jaroszewski L, Li WZ, Godzik A. Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Science. 2000;9(2):232–241. doi: 10.1110/ps.9.2.232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Yona G, Levitt M. Within the twilight zone: A sensitive profile-profile comparison tool based on information theory. Journal of Molecular Biology. 2002;315(5):1257–1275. doi: 10.1006/jmbi.2001.5293. [DOI] [PubMed] [Google Scholar]
  • 17.Sadreyev R, Grishin N. COMPASS: A tool for comparison of multiple protein alignments with assessment of statistical significance. Journal of Molecular Biology. 2003;326(1):317–336. doi: 10.1016/s0022-2836(02)01371-2. [DOI] [PubMed] [Google Scholar]
  • 18.Edgar RC, Sjolander K. COACH: profile-profile alignment of protein families using hidden Markov models. Bioinformatics. 2004;20(8):1309–1318. doi: 10.1093/bioinformatics/bth091. [DOI] [PubMed] [Google Scholar]
  • 19.Marti-Renom MA, Madhusudhan MS, Sali A. Alignment of protein sequences by their profiles. Protein Science. 2004;13(4):1071–1087. doi: 10.1110/ps.03379804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Fischer D, Eisenberg D. Protein fold recognition using sequence-derived predictions. Protein Science. 1996;5(5):947–955. doi: 10.1002/pro.5560050516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Rost B, Schneider R, Sander C. Protein fold recognition by prediction-based threading. Journal of Molecular Biology. 1997;270(3):471–480. doi: 10.1006/jmbi.1997.1101. [DOI] [PubMed] [Google Scholar]
  • 22.Jaroszewski L, Rychlewski L, Zhang BH, Godzik A. Fold prediction by a hierarchy of sequence, threading, and modeling methods. Protein Science. 1998;7(6):1431–1440. doi: 10.1002/pro.5560070620. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Jones DT. GenTHREADER: An efficient and reliable protein fold recognition method for genomic sequences. Journal of Molecular Biology. 1999;287(4):797–815. doi: 10.1006/jmbi.1999.2583. [DOI] [PubMed] [Google Scholar]
  • 24.Panchenko AR, Marchler-Bauer A, Bryant SH. Combination of threading potentials and sequence profiles improves fold recognition. Journal of Molecular Biology. 2000;296(5):1319–1331. doi: 10.1006/jmbi.2000.3541. [DOI] [PubMed] [Google Scholar]
  • 25.Kim D, Xu D, Guo JT, Ellrott K, Xu Y. PROSPECT II: protein structure prediction program for genome-scale applications. Protein Engineering. 2003;16(9):641–650. doi: 10.1093/protein/gzg081. [DOI] [PubMed] [Google Scholar]
  • 26.Karchin R, Cline M, Mandel-Gutfreund Y, Karplus K. Hidden Markov models that use predicted local structure for fold recognition: Alphabets of backbone geometry. Proteins-Structure Function and Genetics. 2003;51(4):504–514. doi: 10.1002/prot.10369. [DOI] [PubMed] [Google Scholar]
  • 27.Skolnick J, Kihara D, Zhang Y. Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm. Proteins-Structure Function and Bioinformatics. 2004;56(3):502–518. doi: 10.1002/prot.20106. [DOI] [PubMed] [Google Scholar]
  • 28.Reinhardt A, Eisenberg D. DPANN: Improved sequence to structure alignments following fold recognition. Proteins-Structure Function and Bioinformatics. 2004;56(3):528–538. doi: 10.1002/prot.20144. [DOI] [PubMed] [Google Scholar]
  • 29.Scheeff ED, Bourne PE. Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction. Bmc Bioinformatics. 2006:7. doi: 10.1186/1471-2105-7-410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kelley LA, MacCallum RM, Sternberg MJE. Enhanced genome annotation using structural profiles in the program 3D-PSSM. Journal of Molecular Biology. 2000;299(2):499–520. doi: 10.1006/jmbi.2000.3741. [DOI] [PubMed] [Google Scholar]
  • 31.Shi JY, Blundell TL, Mizuguchi K. FUGUE: Sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. Journal of Molecular Biology. 2001;310(1):243–257. doi: 10.1006/jmbi.2001.4762. [DOI] [PubMed] [Google Scholar]
  • 32.Tang CL, Xie L, Koh IYY, Posy S, Alexov E, Honig B. On the role of structural information in remote homology detection and sequence alignment: New methods using hybrid sequence profiles. Journal of Molecular Biology. 2003;334(5):1043–1062. doi: 10.1016/j.jmb.2003.10.025. [DOI] [PubMed] [Google Scholar]
  • 33.Ginalski K, Pas J, Wyrwicz LS, von Grotthuss M, Bujnicki JM, Rychlewski L. ORFeus: detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Research. 2003;31(13):3804–3807. doi: 10.1093/nar/gkg504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Zhou HY, Zhou YQ. Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition. Proteins-Structure Function and Bioinformatics. 2004;55(4):1005–1013. doi: 10.1002/prot.20007. [DOI] [PubMed] [Google Scholar]
  • 35.Zhou HY, Zhou YQ. Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments. Proteins-Structure Function and Bioinformatics. 2005;58(2):321–328. doi: 10.1002/prot.20308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Jaroszewski L, Li WZ, Godzik A. In search for more accurate alignments in the twilight zone. Protein Science. 2002;11(7):1702–1713. doi: 10.1110/ps.4820102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Chivian D, Baker D. Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selection. Nucleic Acids Research. 2006;34(17) doi: 10.1093/nar/gkl480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.John B, Sali A. Comparative protein structure modeling by iterative alignment, model building and model assessment. Nucleic Acids Research. 2003;31(14):3982–3992. doi: 10.1093/nar/gkg460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Kolinski A, Gront D. Comparative modeling without implicit sequence alignments. Bioinformatics. 2007 doi: 10.1093/bioinformatics/btm380. Published on line. [DOI] [PubMed] [Google Scholar]
  • 40.Lazaridis T, Karplus M. Effective energy functions for protein structure prediction. Current Opinion in Structural Biology. 2000;10(2):139–145. doi: 10.1016/s0959-440x(00)00063-4. [DOI] [PubMed] [Google Scholar]
  • 41.Summa CM, Levitt M. Near-native structure refinement using in vacuo energy minimization. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(9):3177–3182. doi: 10.1073/pnas.0611593104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Kmiecik S, Gront D, Kolinski A. Towards the high-resolution protein structure prediction. Fast refinement of reduced models with all-atom force field. Bmc Structural Biology. 2007:7. doi: 10.1186/1472-6807-7-43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Kosinski J, Cymerman IA, Feder M, Kurowski MA, Sasin JM, Bujnicki JM. A “Frankenstein’s monster” approach to comparative modeling: Merging the finest fragments of fold-recognition models and iterative model refinement aided by 3D structure evaluation. Proteins-Structure Function and Genetics. 2003;53(6):369–379. doi: 10.1002/prot.10545. [DOI] [PubMed] [Google Scholar]
  • 44.Lee MR, Tsai J, Baker D, Kollman PA. Molecular dynamics in the endgame of protein structure prediction. Journal of Molecular Biology. 2001;313(2):417–430. doi: 10.1006/jmbi.2001.5032. [DOI] [PubMed] [Google Scholar]
  • 45.Lu H, Skolnick J. Application of statistical potentials to protein structure refinement from low resolution Ab initio models. Biopolymers. 2003;70(4):575–584. doi: 10.1002/bip.10537. [DOI] [PubMed] [Google Scholar]
  • 46.Fan H, Mark AE. Mimicking the action of folding chaperones in molecular dynamics simulations: Application to the refinement of homology-based protein structures. Protein Science. 2004;13(4):992–999. doi: 10.1110/ps.03449904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Fan H, Mark AE. Refinement of homology-based protein structures by molecular dynamics simulation techniques. Protein Science. 2004;13(1):211–220. doi: 10.1110/ps.03381404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Flohil JA, Vriend G, Berendsen HJC. Completion and refinement of 3-D homology models with restricted molecular dynamics: Application to targets 47, 58, and 111 in the CASP modeling competition and posterior analysis. Proteins-Structure Function and Genetics. 2002;48(4):593–604. doi: 10.1002/prot.10105. [DOI] [PubMed] [Google Scholar]
  • 49.Krieger E, Koraimann G, Vriend G. Increasing the precision of comparative models with YASARA NOVA - a self-parameterizing force field. Proteins-Structure Function and Genetics. 2002;47(3):393–402. doi: 10.1002/prot.10104. [DOI] [PubMed] [Google Scholar]
  • 50.Chen JH, Brooks CL. Can molecular dynamics simulations provide high-resolution refinement of protein structure? Proteins-Structure Function and Bioinformatics. 2007;67(4):922–930. doi: 10.1002/prot.21345. [DOI] [PubMed] [Google Scholar]
  • 51.Im WP, Lee MS, Brooks CL. Generalized born model with a simple smoothing function. Journal of Computational Chemistry. 2003;24(14):1691–1702. doi: 10.1002/jcc.10321. [DOI] [PubMed] [Google Scholar]
  • 52.Chen JH, Im WP, Brooks CL. Balancing solvation and intramolecular interactions: Toward a consistent generalized born force field. Journal of the American Chemical Society. 2006;128(11):3728–3736. doi: 10.1021/ja057216r. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Rohl CA, Strauss CEM, Misura KMS, Baker D. Protein structure prediction using rosetta. Numerical Computer Methods, Pt D. Methods in Enzymology. 2004;383:66-+. doi: 10.1016/S0076-6879(04)83004-0. [DOI] [PubMed] [Google Scholar]
  • 54.Zhang Y, Arakaki AK, Skolnick JR. TASSER: An automated method for the prediction of protein tertiary structures in CASP6. Proteins-Structure Function and Bioinformatics. 2005;61:91–98. doi: 10.1002/prot.20724. [DOI] [PubMed] [Google Scholar]
  • 55.Lee SY, Zhang Y, Skolnick J. TASSER-based refinement of NMR structures. Proteins-Structure Function and Bioinformatics. 2006;63(3):451–456. doi: 10.1002/prot.20902. [DOI] [PubMed] [Google Scholar]
  • 56.Grimm V, Zhang Y, Skolnick J. Benchmarking of dimeric threading and structure refinement. Proteins-Structure Function and Bioinformatics. 2006;63(3):457–465. doi: 10.1002/prot.20878. [DOI] [PubMed] [Google Scholar]
  • 57.Misura KMS, Chivian D, Rohl CA, Kim DE, Baker D. Physically realistic homology models built with ROSETTA can be more accurate than their templates. Proceedings of the National Academy of Sciences of the United States of America. 2006;103(14):5361–5366. doi: 10.1073/pnas.0509355103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Petrey D, Honig B. Free energy determinants of tertiary structure and the evaluation of protein models. Protein Science. 2000;9(11):2181–2191. doi: 10.1110/ps.9.11.2181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Rai BK, Fiser A. Multiple mapping method: A novel approach to the sequence-to-structure alignment problem in comparative protein structure modeling. Proteins-Structure Function and Bioinformatics. 2006;63(3):644–661. doi: 10.1002/prot.20835. [DOI] [PubMed] [Google Scholar]
  • 60.Mitsutake A, Sugita Y, Okamoto Y. Generalized-ensemble algorithms for molecular simulations of biopolymers. Biopolymers. 2001;60(2):96–123. doi: 10.1002/1097-0282(2001)60:2<96::AID-BIP1007>3.0.CO;2-F. [DOI] [PubMed] [Google Scholar]
  • 61.Sugita Y, Okamoto Y. Replica-exchange molecular dynamics method for protein folding. Chemical Physics Letters. 1999;314(1–2):141–151. [Google Scholar]
  • 62.Garcia AE, Sanbonmatsu KY. Exploring the energy landscape of a beta hairpin in explicit solvent. Proteins-Structure Function and Genetics. 2001;42(3):345–354. doi: 10.1002/1097-0134(20010215)42:3<345::aid-prot50>3.0.co;2-h. [DOI] [PubMed] [Google Scholar]
  • 63.Gnanakaran S, Nymeyer H, Portman J, Sanbonmatsu KY, Garcia AE. Peptide folding simulations. Current Opinion in Structural Biology. 2003;13(2):168–174. doi: 10.1016/s0959-440x(03)00040-x. [DOI] [PubMed] [Google Scholar]
  • 64.Nymeyer H, Gnanakaran S, Garcia AE. Atomic simulations of protein folding, using the replica exchange algorithm. Numerical Computer Methods, Pt D. Methods in Enzymology. 2004;383:119-+. doi: 10.1016/S0076-6879(04)83006-4. [DOI] [PubMed] [Google Scholar]
  • 65.Periole X, Mark AE. Convergence and sampling efficiency in replica exchange simulations of peptide folding in explicit solvent. Journal of Chemical Physics. 2007;126(1) doi: 10.1063/1.2404954. [DOI] [PubMed] [Google Scholar]
  • 66.Samudrala R, Moult J. An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. Journal of Molecular Biology. 1998;275(5):895–916. doi: 10.1006/jmbi.1997.1479. [DOI] [PubMed] [Google Scholar]
  • 67.Lu H, Skolnick J. A distance-dependent atomic knowledge-based potential for improved protein structure selection. Proteins-Structure Function and Genetics. 2001;44(3):223–232. doi: 10.1002/prot.1087. [DOI] [PubMed] [Google Scholar]
  • 68.Zhou HY, Zhou YQ. Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Science. 2002;11(11):2714–2726. doi: 10.1110/ps.0217002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Melo F, Feytmans E. Novel knowledge-based mean force potential at atomic level. Journal of Molecular Biology. 1997;267(1):207–222. doi: 10.1006/jmbi.1996.0868. [DOI] [PubMed] [Google Scholar]
  • 70.Eisenberg D, Luthy R, Bowie JU. VERIFY3D: Assessment of protein models with three-dimensional profiles. Macromolecular Crystallography, Pt B. Methods in Enzymology. 1997;277:396–404. doi: 10.1016/s0076-6879(97)77022-8. [DOI] [PubMed] [Google Scholar]
  • 71.Luthy R, Bowie JU, Eisenberg D. Assessment of Protein Models with 3-Dimensional Profiles. Nature. 1992;356(6364):83–85. doi: 10.1038/356083a0. [DOI] [PubMed] [Google Scholar]
  • 72.Sippl MJ. Recognition of Errors in 3-Dimensional Structures of Proteins. Proteins-Structure Function and Genetics. 1993;17(4):355–362. doi: 10.1002/prot.340170404. [DOI] [PubMed] [Google Scholar]
  • 73.Sippl MJ. Calculation of Conformational Ensembles from Potentials of Mean Force - an Approach to the Knowledge-Based Prediction of Local Structures in Globular-Proteins. Journal of Molecular Biology. 1990;213(4):859–883. doi: 10.1016/s0022-2836(05)80269-4. [DOI] [PubMed] [Google Scholar]
  • 74.Melo F, Feytmans E. Assessing protein structures with a non-local atomic interaction energy. Journal of Molecular Biology. 1998;277(5):1141–1152. doi: 10.1006/jmbi.1998.1665. [DOI] [PubMed] [Google Scholar]
  • 75.de Bakker PIW, DePristo MA, Burke DF, Blundell TL. Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all-atom statistical potential and the AMBER force field with the generalized born solvation model. Proteins-Structure Function and Genetics. 2003;51(1):21–40. doi: 10.1002/prot.10235. [DOI] [PubMed] [Google Scholar]
  • 76.Zhang C, Liu S, Zhou YQ. Accurate and efficient loop selections by the DFIRE-based all-atom statistical potential. Protein Science. 2004;13(2):391–399. doi: 10.1110/ps.03411904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Zhu J, Xie L, Honig B. Structural refinement of protein segments containing secondary structure elements: Local sampling, knowledge-based potentials, and clustering. Proteins-Structure Function and Bioinformatics. 2006;65(2):463–479. doi: 10.1002/prot.21085. [DOI] [PubMed] [Google Scholar]
  • 78.Petrey D, Xiang ZX, Tang CL, Xie L, Gimpelev M, Mitros T, Soto CS, Goldsmith-Fischman S, Kernytsky A, Schlessinger A, Koh IYY, Alexov E, Honig B. Using multiple structure alignments, fast model building, and energetic analysis in fold recognition and homology modeling. Proteins-Structure Function and Genetics. 2003;53(6):430–435. doi: 10.1002/prot.10550. [DOI] [PubMed] [Google Scholar]
  • 79.Berendsen HJC, van der Spoel D, van Drunen R. Gromacs - a Message-Passing Parallel Molecular-Dynamics Implementation. Computer Physics Communications. 1995;91(1–3):43–56. [Google Scholar]
  • 80.van der Spoel D, van Buuren AR, Apol E, Meulenhoff PJ, Tieleman DP, Sijbers ALTM, Hess B, Feenstra KA, Lindahl E, van Drunen R, Berendsen HJC. Gromacs user manual version 3.0. Nijenborgh 4, 9747 AG Groningen; the Netherlands: 2001. [Google Scholar]
  • 81.Lindahl E, Hess B, van der Spoel D. GROMACS 3.0: a package for molecular simulation and trajectory analysis. Journal of Molecular Modeling. 2001;7(8):306–317. [Google Scholar]
  • 82.Xiang ZX, Honig B. Extending the accuracy limits of prediction for side-chain conformations. Journal of Molecular Biology. 2001;311(2):421–430. doi: 10.1006/jmbi.2001.4865. [DOI] [PubMed] [Google Scholar]
  • 83.Xiang ZX, Soto CS, Honig B. Evaluating conformational free energies: The colony energy and its application to the problem of loop prediction. Proceedings of the National Academy of Sciences of the United States of America. 2002;99(11):7432–7437. doi: 10.1073/pnas.102179699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Taylor RD, Jewsbury PJ, Essex JW. FDS: Flexible ligand and receptor docking with a continuum solvent model and soft-core energy function. Journal of Computational Chemistry. 2003;24(13):1637–1656. doi: 10.1002/jcc.10295. [DOI] [PubMed] [Google Scholar]
  • 85.Ponder JW. TINKER-software tools for molecular design, version 3.7. Washington University; St. Louis, MO: 1999. [Google Scholar]
  • 86.Zhu J, Alexov E, Honig B. Comparative study of generalized Born models: Born radii and peptide folding. Journal of Physical Chemistry B. 2005;109(7):3008–3022. doi: 10.1021/jp046307s. [DOI] [PubMed] [Google Scholar]
  • 87.Fan H, Mark AE, Zhu J, Honig B. Comparative study of generalized Born models: Protein dynamics. Proceedings of the National Academy of Sciences of the United States of America. 2005;102(19):6760–6764. doi: 10.1073/pnas.0408857102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Richmond TJ. Solvent Accessible Surface-Area and Excluded Volume in Proteins - Analytical Equations for Overlapping Spheres and Implications for the Hydrophobic Effect. Journal of Molecular Biology. 1984;178(1):63–89. doi: 10.1016/0022-2836(84)90231-6. [DOI] [PubMed] [Google Scholar]
  • 89.van Gunsteren WF, Daura X, Mark AE. Encyclopedia of computational chemistry. Vol. 2. John Wiley & Sons; 1998. The GROMOS force field; pp. 1211–1216. [Google Scholar]
  • 90.van Gunsteren WF, Billeter SR, Eising AA, Hunenberger PH, Kruger P, Mark AE, Scott WRP, Tironi IG. Groningen Molecular Simulation (GROMOS) System. University of Groningen; the Netherlands, ETH Zurich; Switzerland: 1996. [Google Scholar]
  • 91.Berendsen HJC, Postma JPM, van Gunsteren WF, Hermanns J. Interaction models for water in relation to protein hydration. In: BP, editor. Interrmolecular Forces. Dordrecht; Reidel: 1981. pp. 331–342. [Google Scholar]
  • 92.Tironi IG, Sperb R, Smith PE, van Gunsteren WF. A Generalized Reaction Field Method for Molecular-Dynamics Simulations. Journal of Chemical Physics. 1995;102(13):5451–5459. [Google Scholar]
  • 93.Hess B, Bekker H, Berendsen HJC, Fraaije J. LINCS: A linear constraint solver for molecular simulations. Journal of Computational Chemistry. 1997;18(12):1463–1472. [Google Scholar]
  • 94.Miyamoto S, Kollman PA. Settle - an Analytical Version of the Shake and Rattle Algorithm for Rigid Water Models. Journal of Computational Chemistry. 1992;13(8):952–962. [Google Scholar]
  • 95.Berendsen HJC, Postma JPM, van Gunsteren WF, Dinola A, Haak JR. Molecular-Dynamics with Coupling to an External Bath. Journal of Chemical Physics. 1984;81(8):3684–3690. [Google Scholar]
  • 96.Kortemme T, Morozov AV, Baker D. An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. Journal of Molecular Biology. 2003;326(4):1239–1259. doi: 10.1016/s0022-2836(03)00021-4. [DOI] [PubMed] [Google Scholar]
  • 97.Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D. Design of a novel globular protein fold with atomic-level accuracy. Science. 2003;302(5649):1364–1368. doi: 10.1126/science.1089427. [DOI] [PubMed] [Google Scholar]
  • 98.MacKerell AD, Bashford D, Bellott M, Dunbrack RL, Evanseck JD, Field MJ, Fischer S, Gao J, Guo H, Ha S, Joseph-McCarthy D, Kuchnir L, Kuczera K, Lau FTK, Mattos C, Michnick S, Ngo T, Nguyen DT, Prodhom B, Reiher WE, Roux B, Schlenkrich M, Smith JC, Stote R, Straub J, Watanabe M, Wiorkiewicz-Kuczera J, Yin D, Karplus M. All-atom empirical potential for molecular modeling and dynamics studies of proteins. Journal of Physical Chemistry B. 1998;102(18):3586–3616. doi: 10.1021/jp973084f. [DOI] [PubMed] [Google Scholar]
  • 99.Rehse PH, Ohshima N, Nodake Y, Tahirov TH. Crystallographic structure and biochemical analysis of the Thermus thermophilus osmotically inducible protein C. Journal of Molecular Biology. 2004;338(5):959–968. doi: 10.1016/j.jmb.2004.03.050. [DOI] [PubMed] [Google Scholar]
  • 100.Meunier-Jamin C, Kapp U, Leonard GA, McSweeney S. The structure of the organic hydroperoxide resistance protein from Deinococcus radiodurans -Do conformational changes facilitate recycling of the redox disulfide? Journal of Biological Chemistry. 2004;279(24):25830–25837. doi: 10.1074/jbc.M312983200. [DOI] [PubMed] [Google Scholar]
  • 101.Lesniak J, Barton WA, Nikolov DB. Structural and functional features of the Escherichia coli hydroperoxide resistance protein OsmC. Protein Science. 2003;12(12):2838–2843. doi: 10.1110/ps.03375603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Fukunishi H, Watanabe O, Takada S. On the Hamiltonian replica exchange method for efficient sampling of biomolecular systems: Application to protein structure prediction. Journal of Chemical Physics. 2002;116(20):9058–9067. [Google Scholar]
  • 103.Faraldo-Gomez JD, Roux B. Characterization of conformational equilibria through Hamiltonian and temperature replica-exchange simulations: Assessing entropic and environmental effects. Journal of Computational Chemistry. 2007;28(10):1634–1647. doi: 10.1002/jcc.20652. [DOI] [PubMed] [Google Scholar]
  • 104.Affentranger R, Tavernelli I, Di Iorio EE. A novel Hamiltonian replica exchange MD protocol to enhance protein conformational space sampling. Journal of Chemical Theory and Computation. 2006;2(2):217–228. doi: 10.1021/ct050250b. [DOI] [PubMed] [Google Scholar]
  • 105.Baker D, Sali A. Protein structure prediction and structural genomics. Science. 2001;294(5540):93–96. doi: 10.1126/science.1065659. [DOI] [PubMed] [Google Scholar]
  • 106.Wallner B, Elofsson A. Identification of correct regions in protein models using structural, alignment, and consensus information. Protein Science. 2006;15(4):900–913. doi: 10.1110/ps.051799606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Fasnacht M, Zhu J, Honig B. Local quality assessment in homology models using statistical potentials and support vector machines. Protein Science. doi: 10.1110/ps.072856307. In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Li X, Jacobson MP, Friesner RA. High-resolution prediction of protein helix positions and orientations. Proteins-Structure Function and Bioinformatics. 2004;55(2):368–382. doi: 10.1002/prot.20014. [DOI] [PubMed] [Google Scholar]
  • 109.Rohl CA, Strauss CEM, Chivian D, Baker D. Modeling structurally variable regions in homologous proteins with rosetta. Proteins-Structure Function and Bioinformatics. 2004;55(3):656–677. doi: 10.1002/prot.10629. [DOI] [PubMed] [Google Scholar]
  • 110.Misura KMS, Baker D. Progress and challenges in high-resolution refinement of protein structure models. Proteins-Structure Function and Bioinformatics. 2005;59(1):15–29. doi: 10.1002/prot.20376. [DOI] [PubMed] [Google Scholar]
  • 111.Wang J, Gu Y, Liu HY. Determination of conformational free energies of peptides by multidimensional adaptive umbrella sampling. Journal of Chemical Physics. 2006;125(9) doi: 10.1063/1.2346681. [DOI] [PubMed] [Google Scholar]
  • 112.Lyman E, Ytreberg FM, Zuckerman DM. Resolution exchange simulation. Physical Review Letters. 2006;96(2) doi: 10.1103/PhysRevLett.96.028105. [DOI] [PubMed] [Google Scholar]
  • 113.Mackerell AD, Feig M, Brooks CL. Extending the treatment of backbone energetics in protein force fields: Limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations. Journal of Computational Chemistry. 2004;25(11):1400–1415. doi: 10.1002/jcc.20065. [DOI] [PubMed] [Google Scholar]
  • 114.Hornak V, Abel R, Okur A, Strockbine B, Roitberg A, Simmerling C. Comparison of multiple amber force fields and development of improved protein backbone parameters. Proteins-Structure Function and Bioinformatics. 2006;65(3):712–725. doi: 10.1002/prot.21123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Feig M, Brooks CL. Evaluating CASP4 predictions with physical energy functions. Proteins-Structure Function and Genetics. 2002;49(2):232–245. doi: 10.1002/prot.10217. [DOI] [PubMed] [Google Scholar]
  • 116.Soto CS, Fasnacht M, Zhu J, Forrest L, Honig B. Loop modeling: Sampling, filtering and scoring. Proteins-Structure Function and Bioinformatics. doi: 10.1002/prot.21612. In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Abagyan RA, Totrov MM. Contact area difference (CAD): A robust measure to evaluate accuracy of protein models. Journal of Molecular Biology. 1997;268(3):678–685. doi: 10.1006/jmbi.1997.0994. [DOI] [PubMed] [Google Scholar]
  • 118.Zhang C, Liu S, Zhou HY, Zhou YQ. An accurate, residue-level, pair potential of mean force for folding and binding based on the distance-scaled, ideal-gas reference state. Protein Science. 2004;13(2):400–411. doi: 10.1110/ps.03348304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Shortle D, Simons KT, Baker D. Clustering of low-energy conformations near the native structures of small proteins. Proceedings of the National Academy of Sciences of the United States of America. 1998;95(19):11158–11162. doi: 10.1073/pnas.95.19.11158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Zhang Y, Skolnick J. SPICKER: A clustering approach to identify near-native protein folds. Journal of Computational Chemistry. 2004;25(6):865–871. doi: 10.1002/jcc.20011. [DOI] [PubMed] [Google Scholar]

RESOURCES