Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2007 Aug 13.
Published in final edited form as: J Phys Chem B. 2006 Feb 16;110(6):2869–2880. doi: 10.1021/jp055771+

Optimization of the GB/SA Solvation Model for Predicting the Structure of Surface Loops in Proteins

Agnieszka Szarecka 1, Hagai Meirovitch 1,*
PMCID: PMC1945207  NIHMSID: NIHMS11162  PMID: 16471897

Abstract

Implicit solvation models are commonly optimized with respect to experimental data or Poisson Boltzmann (PB) results obtained for small molecules, where the force field sometimes is not considered. In previous studies we have developed an optimization procedure for cyclic peptides and surface loops in proteins based on the entire system studied and the specific force field used. Thus, the loop has been modeled by the simplified solvation function Etot = EFF(ɛ=2r) + ∑i σiAi, where EFF(ɛ=nr) is the AMBER force field energy with a distance dependent dielectric function, ɛ=nr, Ai is the solvent accessible surface area of atom i, and σi is its atomic solvation parameter. During the optimization process the loop is free to move while the protein template is held fixed in its X-ray structure. To improve on the results of this model, in the present work we apply our optimization procedure to the physically more rigorous solvation model, the generalized Born with surface area (GB/SA) (together with the all-atom AMBER force field) as suggested by Still and coworkers (J. Phys. Chem.A 1997, 101, 3005). The six parameters of the GB/SA model namely, P1-P5 and the surface area parameter, σ (programmed in the program TINKER) are re-optimized for a “training” group of nine loops, and from the individual sets of optimized parameters a best-fit set is defined. The best-fit set and Still’s original set of parameters (where Lys, Arg, His, Glu, and Asp are charged or neutralized) were applied to the training group as well as to a “test” group of seven loops and the energy gaps and the corresponding RMSD values were calculated. These GB/SA results based on the three sets of parameters have been found to be comparable; surprisingly, however, they are somewhat inferior (e.g.., of larger energy gaps) to those obtained previously from the simplified model described above. We discuss recent results for loops obtained by other solvation models and potential directions for future studies.

Introduction

The interest in surface loops and the difficulty in predicting their structure

A surface loop in a protein is a chain segment connecting two secondary structure elements, which generally protrudes into the solvent and thus is expected to be relatively flexible, as indeed has been found by multidimensional nuclear magnetic resonance (NMR) experiments. In many cases this flexibility is also reflected in X-ray crystallography data in terms of large B-factors1 or a complete disorder. Surface loops take part in protein-protein and protein-ligand interactions, where their flexibility in many cases is essential for these recognition processes. For example, the conformational change between a free and a bound antibody demonstrates the flexibility of the antibody combining site, which typically includes hypervariable loops; this provides an example of induced fit as a mechanism for antibody-antigen recognition (e.g., see Refs. 2 and 3). Alternatively, the selected-fit mechanism has been suggested, where the free loop interconverts among different states, and one of them is selected upon binding.4 Dynamic NMR experiments5 and molecular dynamics (MD) simulations6 of HIV protease have found a strong correlation between the flexibility of certain segments of the protein and the movement of the flaps (that cover the active site) upon ligation.7 Loops are known to form “lids” over active sites of proteins and mutagenesis experiments show that residues within these loops are crucial for substrate binding or enzymatic catalysis; again, these loops are typically flexible (see review by Fetrow8).

Predicting of loop structures by computational methods is important in homology modeling, where a framework of unconnected homologous segments is initially created and the structure of the loops connecting these segments has to be subsequently determined. For long loops this is an unsolved problem to date.912 Prediction of loop structures constitutes a challenge also in protein engineering, where a loop undergoes mutations, insertions, or deletions of amino acids. Studying the flexibility of loops by experimental methods is not straightforward and theoretical analysis by molecular modeling techniques is expected to clarify the picture.

The interest in surface loops has yielded extensive theoretical work where one avenue of research has been the classification of loop structures.1321 However, to understand various recognition mechanisms like those mentioned above, it is mandatory to be able to predict the structure (or structures) of a loop by theoretical/computational procedures, which is not a trivial task due to the irregular structures of loops, their flexibility and exposure to the solvent. Loop structures are commonly predicted by either a comparative modeling approach based on known loop conformations from the Protein Data Bank (PDB),22,23 or an energetic approach; also, methods exist that are hybrids of these two approaches. Due to the lack of sufficiently large data bases, only short loops (up to five residues) could be treated effectively by comparative modeling,2429 while hybrid methods are effective up to nine residues.24,26,3033 With the energetic approach loop structures are generated by conformational search methods (simulated annealing, bond relaxation algorithm and others) subject to the spatial restrictions imposed by the known 3D structure of the rest of the protein (the template). The quality of the prediction depends on the quality of the loop-loop and loop-template interaction energy, the modeling of the solvent, and the extent of conformational search applied.3444 An extensive discussion, references, and background material on loops appear in our previous work, denoted here as papers I45 and II.46

With the energetic approach modeling of the solvent is of special importance. In some of the earlier studies the solvation problem was not addressed at all, while others only used a distance dependent dielectric function (ɛ = r). Better treatments of solvation were applied by Moult and James35 and Mas et al.47 A systematic comparison of solvation models was first carried out by Smith and Honig,48 who tested the ɛ = r model against results obtained by the finite difference Poisson Boltzmann (FDPB) calculation including a hydrophobic term; the implicit solvation model of Wesson and Eisenberg49 with ɛ = r was also studied by them. Later, the generalized Born surface area (GB/SA) model50 was applied to loops of ribonuclease (RNase)51 A and has been found by Blundell’s group to discriminate better than other models between the native loop structures and close to native “decoy” structures.5253 Very recently an extensive study of loops was carried out by Jacobson et al. 54 who used the surface GB55 and a nonpolar solvation model56 (SGB-NP) with the OPLS force field.57 Zhang et al.58 have tested their knowledge-based statistical potential, DFIRE (distance-scaled, finite ideal gas reference state) by applying it to the loop sets studied in Refs 5253 and 54 (see the section results and discussion). Another interesting loop prediction algorithm has been suggested by Xiang et al.59 Finally we mention our loop studies in papers I and II, which will be discussed in detail later. However, more work is needed to compare the quality of the various models for loops and other systems.

Statistical mechanics methodology for treating flexibility

The foregoing discussion indicates that, to date, the energetic approach is the best way for predicting the structure of large loops in homology modeling and protein engineering. It also constitutes the only alternative for studying intermediate flexibility, where a loop populates several microstates in equilibrium (see below). Recently, we have developed a statistical mechanics methodology for treating intermediate flexibility (most suitable for implicit solvation models) which was applied initially to peptides,6065 and in papers I45and II,46 also to surface loops.6668 The first step is to carry out an extensive conformational search using our local torsional deformation (LTD) method45,46 60,69,70, from which the global energy minimum (GEM) loop structure and low energy minimized structures within 2–3 kcal/mol above GEM are identified; a subgroup of them that are significantly different are then selected where each becomes a “seed” for a local Monte Carlo (MC) or MD simulation that spans its vicinity (this local region is called microstate). Finally, the free energies of the most stable microstates are obtained (with the local states method71,72 or the hypothetical scanning MC method,73,74) which lead to the populations and to weighted averages of physical quantities that are compared with the experiment.61,64,65 Developing a reliable solvation energy function is mandatory and thus is the aim of this paper (as has been the aim of papers I45 & II46).

Previous optimization of a simplified solvation model

Because explicit solvent, the most accurate model, is computationally expensive, we have chosen to study initially a relatively simple implicit solvation model defined by eq 1, which was applied to cyclic peptides in DMSO, and in papers I45 and II46 also to loops in water,

Etot=EFF(ɛ=nr)+Esolv=EFF(ɛ=nr)+iσiAi. (1)

EFF is the force field energy, Ai is the structure dependent solvent accessible surface area of atom i, and σi is the atomic solvation parameter (ASP); ɛ = nr is a distance dependent dielectric function, where n is a parameter. Even with such a simplified model, treatment of loops is feasible only for a relatively small template that typically consists of those atoms that are located within 10 Å´ from any loop atom in a specific loop structure; the template atoms are fixed in their known X-ray structure, whereas the loop is free to move. Etot includes the loop-loop and loop-template energy, while the template-template interactions are ignored. With this model, the conformational search, the identification of the most stable microstates, and the calculation of their free energy is considerably easier than with explicit solvent. Therefore, most of the loop studies in the literature are based on implicit solvation models with relatively small number of exceptions where explicit models were used (e.g., Refs. 27, 75, and 76).

eq 1 is not new and has been used in many previous studies, where the ASPs for a protein have been commonly determined from the free energy of transfer of small molecules from the gas phase to water.49.77 However, it is not clear to what extent ASPs derived for small molecules are suited for the protein environment. Also, these sets of ASPs were used with various force fields, in most cases without further calibration (see discussions in Refs. 60 and 63, and in references cited therein). Recent studies based on various solvation potentials, Esolv, including our results in papers I45 and II,46 support these reservations.48,51 This problem has was first recognized by Schiffer et al.,78 and then by Fraternali and van Gunsteren.79 Optimization of solvation models with respect to a force field has now become a common practice.

We have developed a procedure for optimizing parameters of implicit solvation models that to a large extent is free of the limitations discussed above. This procedure was applied first to cyclic peptides and recently to loops modeled by eq 1; in an attempt to further improve the latter results our main objective in this paper is to apply this procedure to the GB/SA model of Still and coworkers,50 which relies on stronger theoretical grounds than eq 1. We shall compare results for loops obtained in papers I and II (using eq 1) to the GB/SA results, and will study eq 1 again, where ɛ = nr is replaced by more complex dielectric functions. Because the general features of the optimization method apply to any model, we discuss them with respect to eq 1.

Thus, for a given loop the optimized ASPs and n are those for which the known X-ray loop structure becomes the GEM structure. This definition, however, turns out to be too strict and in papers I and II we argue that it can be relaxed; thus, an energy difference (the energy gap) of up to 2–3 kcal/mol is allowed between the GEM and the energy of the native optimized structure (NOS) (obtained by local energy minimization of the known X-ray loop structure using the optimized parameters; a more precise definition of NOS will be given later. EFF (eq 1) is defined by the all-atom AMBER80 force field that for loops has been found to perform better than other force fields (see paper I45). The optimization is based on an extensive conformational search using LTD, which its program has been implemented within the molecular mechanics/molecular dynamics program TINKER.81 For the optimized sets of ASPs (denoted σi*) and the optimal n=2, the energy gap, ΔEtotm(n, σi*) is defined by

ΔEtotm(n,σi*)=EtotNOS(n,σi*)Etotm(n,σi*) (2)

where Etot m(n, σi*) is the lowest minimized energy obtained, which is assumed to be the GEM. EtotNOS(n, σi*) is the minimized energy of NOS based on the optimal parameters. Thus, unlike the conventional parametrization of eq 1 that relies on free energy of transfer data of small molecules, our derivation of the ASPs depends on the force field used and is based on the energy of the entire loop in the protein environment.

Our aim is to derive ASPs for the solution environment, where the side chains of a surface loop, and to a lesser extent also the backbone, typically exhibit intermediate flexibility.82,83 It should be noted, however, that our optimization is carried out with respect to a single X-ray crystal structure, where some aspects of its flexibility are only expressed by elevated B-factors. This problem may be alleviated as high-resolution X-ray structures become available, which enables one to extract information about side chain rotamers and their populations.84,85 Notice also that the derivation of the ASPs is based on the minimized energies, thus ignoring the flexibility (i.e., entropy) of the microstates. The first step to eliminate this limitation was done in paper II, where differences in the free energy for three loops were calculated. Etot is a free energy function that depends on the temperature (through the σi) but will be referred to as energy. It should also be emphasized that the ASPs are derived only for surface loops that protrude into the solvent due to strong hydrophilic interactions. Indeed, the individual sets of ASPs optimized in papers I and II are mostly negative (hydrophilic), even those of carbon (in contrast to the positive ASP obtained by Wesson and Eisenberg49 (see discussion in paper II46).

From initial studies in paper II it became evident that for highly charged loops the Coulombic interactions are too strong leading to large energy gaps (in some cases of ~20 kcal/mol); therefore, in all calculations the charges of Arg, Lys, His, Asp, and Glu were neutralized. Individual sets of ASPs were optimized for a diverse (“training”) group of 12 surface loops of 5–12 residues from different proteins. The extent of similarity among the optimized individual sets enabled defining a reasonable best-fit set of ASPs, which was tested on the training group as well as on an additional (“test”) group of eight loops. The results for eq 1 where found to be much better than those obtained with the force field [EFF(ɛ = 2r) alone. The root mean square deviations (RMSD) of the GEM structures from the corresponding NOS were found in most cases better than those obtained by other methods. However, the energy gaps in many cases were above 3 kcal/mol, due to strong electrostatic interactions; this has motivated us to study the GB/SA model that treats these interactions in a more rigorous way than eq 1.

Theory and methods

In this section we describe the GB/SA model and the LTD method, and provide specific details about the methodology and the calculations.

The GB/SA solvation model

Several versions of the GB/SA model are currently available, where their parameters are commonly optimized against properties of small molecules - experimentally determined solvation energies or free energies obtained by the Poisson Boltzmann (PB) equation; in general, the more complex models show better agreement with PB at the expense of an increase in computer time.50,51,55,56,8696 With the model of Still and coworkers50,86 (implemented in TINKER) the solvation energy Esol consists of an electrostatic polarization energy term, Epol and a non-polar (hydrophobic) energy component, Ehyd = ∑Aiσi (compare with eq 1), thus,

Esol=Epol+Aiσi (3)

where Esol is a free energy term, which as before, in most cases will be referred to as energy. The total electrostatic energy, Ees of the system (in kcal/mol) is

Ees=332i<jqiqjɛinrij+Epol=332i<jqiqjɛinrij166(1ɛin1ɛw)i,jqiqjfGB (4)

where

fGB=[rij2+αiαjexp(rij2/kαiαj)]1/2,

and qi is the charge of atom i, rij is the distance (Å) between atoms i and j, αi is the Born radius of atom i, and k is a factor that is taken as 4 in Ref. 50. Epol is the electrostatic component of the free energy of transfer of a molecule with an interior dielectric constant, ɛin from vacuum to a continuum medium (water) of dielectric constant ɛw. The total energy, Etot is

Etot=EFF+Epol+Ehyd (5)

where EFF is the energy of the all-atom AMBER94 force field,80 which includes the first term of Ees (eq 4); AMBER94 is chosen to be consistent with eq 1 studied in papers I45 and II.46 Notice that in TINKER Ehyd is defined as a product of a single parameter σ and the total surface area of the solute calculated with a spherical solvent molecule (water) of radius 1.4 Å.

The heart of the GB/SA model is the calculation of the αi‘s, which in the work of Still and coworkers are defined by a function depending on five parameters, P1-P5 (see Ref. 50); thus,

αi=166/Gpol,i (6)

where

Gpol,i=166RvdWi+φ+P1+stretchP2Vjrij4+bendP3Vjrij4+nonbondedP4VjCCFrij4 (7)

and φ = −0.09 Å is a dielectric offset. rij = distance between atoms i and j(Å), Vj = volume of atom j3), RvdW-i = van der Waals radius of atom i (Å), P1 = single atom scaling factor, P2=1,2 scaling factor, P3 = 1,3 scaling factor, P4 1,≥ 4 = scaling factor, P5 = soft cutoff parameter, and CCF = close contact function for 1,≥ 4 interactions where,

CCF=1.0if (rijRvdWi+RvdWj)2>1P5 (8)

otherwise

CCF={0.5[1.0cos{(rijRvdWi+RvdWj)2P5π}]}2

We optimize the parameters P1-P5 and σ.

The LTD method

The local torsional deformation (LTD)60,69 method has been described in detailed before. Here we only discuss its main features. This is a conformational search procedure for cyclic molecules and protein loops modeled by a force field with flexible bond lengths and angles. An LTD simulation starts from an arbitrary energy minimized loop structure, i, with energy Ei0; i is then distorted by a single or several local torsional rotations along the chain followed by energy minimization. The resulting conformation j (with minimized energy Ej0) is accepted according to the Metropolis transition probability, pij,

pij=min(1,exp[(Ej0Ei0)/kBT*]) (9)

where the accepted structure is deformed again and the process continues. This Monte Carlo minimization procedure97, is a “selection procedure” that efficiently directs the search towards the low energy region in conformational space. Notice that T* is not a usual temperature but a parameter that affects the efficiency of the process98. In most of our runs T* was changed every 50 Monte Carlo (LTD) steps by 10 K from 200 K to 1000 K and vice versa. The coordinates and energies of all the energy minimized structures (including those which were rejected through eq 9), were stored in a file for further analysis.60

The local backbone rotations are described elsewhere.60,69 Typically, in each LTD step several independent but significant such rotations (determined randomly) are carried out along the chain, and therefore energy barriers are crossed efficiently. These local conformational changes are especially important in a dense protein environment to reduce the chance for creating undesired loop-template entanglements. Notice that together with the backbone angles, side-chain dihedrals are randomly selected as well, and they are changed at random (but not locally). Thus, the whole loop is treated at once, in contrast to procedures used by others and discussed in papers I and II. The present implementation of LTD is exactly the same as that applied to the cyclic hexapeptide described in detail in Ref. 60. LTD has been found to be significantly more efficient than simulated annealing.69

It should be pointed out that while Monte Carlo Minimization (thus LTD) is a stochastic procedure, the chance of finding the GEM is higher if the search starts from a conformation that is similar to the GEM structure than from a distant conformation. Therefore, we start all the LTD runs from the native loop structures (NOS), which are not expected to differ significantly from the corresponding GEMs. This choice would lead to the expected increase in the search efficiency only if the loop does not get trapped in the starting microstate, which was verified by the relatively large RMSD values (up to ~6 Å) obtained for the trajectories of the generated loops (meaning that a significant part of conformational space was sampled) and the fact that in many cases the energy was decreased significantly. Finally, the energy is minimized by the L-BFGS procedure,99 which (as the LTD program) has been incorporated in TINKER.

The Loops Studied and Modeling Issues

It should first be pointed out that the backbone structure of a stretched loop will be predicted correctly by all conformational search methods (see discussion in paper II). Therefore, as in papers I and II we obtained for each loop the ratio, R=[length of a completely stretched (extended) loop/distance between its ends], where these lengths are calculated between the Cα atoms of the first and last residues of the loop. The length (in Å) of the extended structure is calculated using the expressions, 6.046(n/2−1)+3.46 and 6.046(n−1)/2 for an even and odd number, n of residues, respectively; the factors 6.046 and 3.46 Å are taken from Flory100 (Chapter VII, p. 251). To a large extent, R reflects the conformational freedom of the loop backbone and partially also of the side chains, the larger is R the higher the flexibility (which is also determined by the surrounding template and sequence of residues).

To be able to compare the performances of GB/SA and eq 1 we have chosen the same training group of loops studied in paper II, besides the two loops of BPTI [(6–12) and (18–24)] and the loop (119–125) of myoglobin that are extremely stretched (R=1, 1, and 1.1, respectively). We added to this group the loop (64–71) of RNase A (loop 1) and for each of the nine loops of this group an individual set of parameters were optimized. Again, as in paper II, for each of these loops an individual set of parameters were optimized; the extent of similarity among these sets enabled us to define a reasonable best-fit set of ASPs, which was tested on the training group as well as on an additional test group of seven loops that were also studied in paper II; these groups of loops, the related proteins, and template sizes appear in Table 1.

Table 1.

The proteins and the corresponding loops and templatesa

Protein Loop Sequence R # atoms (loop) # atoms (template) Radius (Å) (template)
Training Group
RNase A (1rat) Loop 3, 89–97 (9) SSKYPNCAY 2.8 133 726 10
RNase A (1rat) Loop 1 64–71 (8) ACKNGQTN 3.2 107 745 10
Acidic fibroblast (FGF) (2afg) 90–94 (5) EENHY 2.3 84 700 10
Adenylate kinase (AK) (4ake) 73–80 (8) AQEDCRNG 2.1 112 856 10
Peptidase (5cpa) 205–213 (9) PYGYTTQSI 3.5 138 1109 10
Antibody, McPC603 (1mcp) Loop 1, L26-L37 (12) SQSLLNSGNQKN 2.5 175 893 9
Antibody, McPC603 (1mcp) Loop 2, H102-H109 (8) YYGSTWYF 3.7 139 1492 9
Penicillopepsin (3app) 129–137 (9) INTVQPQSQ 2.7 139 999 9
Proteinase (2apr) 202–210 (9) ATVGTSTVA 4.8 112 804 9
Test Group
Ser-Proteinase (2ptn) 143–151 (9) NTKSSGTSY 4.9 117 809 9
Proteinase (2apr) 188–196 (9) IDNSRGWWG 4.5 143 1270 9
Proteinase (2apr) 128–137 (10) DTITTVRGVK 4.3 158 1145 9
Peptidase (5cpa) 244–250 (7) ITTIYQA 2.7 114 1010 9
RNase H (2rn2) 57–63 (7) EALKEHC 1.6 110 929 9
Antibody (1mcp) 56L-62L (7) GASTRES 1.3 93 1007 9
Antibacterial protein (1noa) 25–30 (6) GLQAGT 1.3 74 536 9
a

R is the ratio between the length of the stretched (extended) loop and the distance between the Cα of the first and last residues of the loop. The charged residues are bold-faced.

The 3D structures of the proteins of the training group (taken from the PDB) were all determined with 2 Å resolution or less, except for that of the antibody McPC603 that was obtained with 2.7 Å resolution. These loops range in size from five to twelve amino acid residues, and all of them are predominantly hydrophilic, i.e., polar or charged. It should be pointed out that the coordinates of the side chain atoms of the highly charged loops of acidic FGF (2 charged residues) and AK (3 charged residues) were obtained with elevated B-factors, 47–88 for AK, and 50–100 for chain B of acidic FGF (see detailed discussion in paper II). These large B-factors suggest that the side chains might populate several rotamers, but no analysis of such populations is available [Müller and Schulz do not determine dihedral angles if the B-factors of the involved atoms are 60 and above101 while others adopt even a smaller value of 40 (J. Rosenberg, private communication)]. Obviously, this uncertainty in the coordinates of the loops will be reflected in the reliability of the corresponding optimized sets of parameters. The optimized parameters might also be affected by the existence of more than one molecule in the unit cell as is the case for AK and acidic FGF, which have two and four molecules in the unit cell, respectively. Indeed, in paper II we have found that for FGF the B-factors and energy gaps of loop 90–94 in molecules B and C are different due to different environments. In the present study we have taken into consideration molecule B only. The optimized parameters might also be affected by close molecules in neighbor cells. However, we have not investigated this point.

The number of atoms (including hydrogens) of the training group ranges from 84 (acidic FGF) to 175 (the 12-residue loop of the antibody; see Table 1). The template is defined by the following procedure. First, hydrogen atoms are added to the PDB X-ray structure by the program TINKER. In the second step, to remove possible atomic overlaps, the energy of the protein is minimized using the AMBER potential [EFF(ɛ=1), eq 1] with an additional harmonic restraint of 5 kcal/mol/Å2 applied to each atomic position. This minimized structure is the native optimized structure (NOS), mentioned earlier which can deviate from the PDB structure by an all-heavy-atom RMSD of no more than ~0.15 Å. Most templates include any non-loop atom with a distance smaller than 10 Å from at least one loop atom (in NOS) together with all the other atoms belonging to the same residue. However, for some of the larger proteins distances smaller than 10 Å were used to keep the template size manageable. The smaller cutoff distance is justified in light of our finding (paper II) that decreasing the distance from 10 to 7 changed the energy only slightly (≤ 1 kcal/mol), suggesting that the effect on energy differences between two structures would be small. The template sizes in Table 1 range from 700 (acidic FGF) to 1492 (antibody, loop 2), which are larger than their counterparts in paper II due to larger radii.

The test group (see Table 1) includes seven of the eight loops studied in paper II, where loop 1 of RNase A was transferred to the training set. All of them are un-stretched solvent-exposed surface loops with B-factors smaller than 40, except for the loop of ser-proteinase, where all the coordinates are given but seven outer atoms of side chains have zero electron density. For all these loops the templates have been defined with a radius of 9 Å.

More details about the optimization procedure

TINKER assigns the hydrogen atoms to the PDB structure by a prescription that does not optimize their positions with respect to the energy; therefore, in paper I it was found necessary to optimize the orientations of the OH and NH vectors of NOS and the template. This is carried out by a Monte Carlo minimization procedure, where the polar vectors are rotated by LTD while each non-rotatable atom is restrained to its NOS position by a harmonic potential of 0.15–0.40 kcal/mol/Å2 (see Appendix C of paper I). These optimizations of the polar hydrogen networks [using EFF(ɛ=10)], carried out in paper II46 and here, lead to NOS structures that deviate by RMSD ~0.2 Å from the PDB loop structures; these structures, denoted NOS1 (to be distinguished from NOS2 defined later), are considered to be the correct (experimental) ones against which the RMSD of structures is calculated.

As for the ASPs, in the GB/SA optimizations the charges of Arg, Lys, His, Asp, and Glu, and the end groups of the protein are neutralized to decrease the effect of the electrostatic interactions (see details in paper II); notice, however, that these interactions are still significant due to large dipole moments. Also, for all the loops we carry out LTD runs based on Still’s original (standard) parameters with neutralized as well as charged Arg, Lys, His, Asp, and Glu.

The optimization of the parameters is based on a multi-stage search for low energy minimized structures carried out with LTD, as described in detail in Appendix B of paper I. In short, for each loop the first stage is a conformational search run of ~3000 energy minimizations based on Still’s original set of parameters (denoted P1- P5). From this sample we define a subgroup of 500–800 significantly different structures (according to the variance criterion that at least one dihedral angle differs by 60° or more) with minimized energies within a ~7 kcal/mol range above the GEM (assumed here to correspond to the lowest minimized energy structure generated). NOS1 is added to this group as well. At this stage the parameter P1 is optimized (P2P5 are kept fixed) by changing its value and minimizing the energy of the above group of structures to find the value (P1′) that leads to the smallest energy gap between GEM and the minimized NOS1 (eq 2). P1′ is a temporary optimized value which is kept constant when P2 is optimized in the same way. However, the subgroup of structures might not remain of low energy for the set P1′, P2′, P3 - P5. Therefore, a new LTD run based on the latter values is performed and a new subgroup is determined, which is used in the optimization of P3, etc. After optimizing P5 a new round of optimizations based on P1′- P5′ is started until convergence of the parameter values is attained. The entire optimization requires typically 20,000– 30,000 LTD minimizations.

After completing the optimization, an LTD run consisting of at least ~3000 minimized structures (with the optimal set of parameters) is carried out (in some cases longer runs up to 9000 structures were generated). These simulations always start from NOS, which is not a limitation as discussed earlier. The computer time required for the two components of the optimization procedure (i.e., LTD and minimizations of the partial group) depends on the size of the loop and the template. For example, an LTD run of 3000 minimizations of the (shortest) loop of acidic FGF (5 residues) and loop 2 of the antibody (8 residues and a large template) require ~70 and ~354 h CPU on an AMD Athlon 2.6 GHz processor, respectively. It should be pointed out that NOS1 undergoes further optimization during this procedure which might lead to a conformational change; this optimized NOS1 is denoted NOS2. Thus, NOS2 is used in the calculation of the final energy gaps, while the RMSD is calculated with respect to NOS1. It is important to verify that NOS2 does not differ significantly from NOS1.

Results and discussion

Optimization of the GB/SA parameters and the energy gaps of the training group

GB/SA is expected to model the electrostatic interactions better than eq 1; therefore, it was not clear a priori whether in the GB/SA parameter optimization the charges of Arg, Lys, His, Asp, and Glu should be neutralized as in paper II. To answer this question we first applied Still’s standard parameters (P1- P5, and σ) with charged and neutralized residues to the training group, i.e., for each loop we carried out an LTD run of ~3000 minimizations [using Etot (eqs 35) where EFF is defined by the all-atom AMBER force field]. The corresponding energy gaps appear in Table 2 under “Still’s set” where for each loop the results in the upper and lower rows are for the neutralized and charged residues, respectively. The table shows that overall the two sets of results are comparable with average gaps that are equal within the statistical errors. However, because for five out of the nine loops the neutralized set of results exhibit the lowest energy gaps, we decided to optimize the GB/SA parameters with neutralized charges on the loop and template. Notice also that according to our criterion both sets of gaps are too large, as they exceed the 3 kcal/mol value, except for peptidase (neutralized). However, overall Still’s results should be considered better than those obtained in paper II for EFF(ɛ=2r) (eq 1) that are provided as well. The EFF gaps from paper II are smaller than Still’s neutralized and charged gaps only for three and two loops, respectively. Again, the average gap value obtained by EFF(ɛ=2r) does not provide a reliable measure of performance (even though it is slightly larger than those of Still’s set) because of its large error bars, which reflect the strong scatter of the individual results. For most loops the RMSD between NOS1 and NOS2 is small (less than 0.5 Å) except for proteinase and AK where the RMSD is 1.6 and 0.98 Å (for both the charged and neutralized loops), respectively. Therefore, the results for these loops should be evaluated with caution.

Table 2.

Optimized GB/SA parameters for the training group of loops and the corresponding energy gapsa

Protein/Loop Parameters Energy Gaps (kcal/mol)
P1 P2 P3 P4 P5 σ Still’s setb P1-P5c P1-P5d Best-fit Still EFF (ɛ=2r) eq 1 Optimal ASPs eq 1 Best-fit ASPs eq 1
Still’s set 0.073 0.921 6.211 15.236 1.254 0.0049
Best-fit set −0.08 0.02 5.30 13.90 1.10 0.003
RNase A Loop 3 89–97 (9) 0.05 −0.10 0.50 15.40 0.90 0.003 7.3 2.9 2.8 3.7 5.5 1.8 1.9
4.4
RNase A, Loop 1 64–71 (8) −0.10 0.01 8.20 15.236 1.00 0.001 5.1 1.7 1.4 1.8 0.6 0.4
7.1
Acidic FGF 90–94 (5) 0.001 0.02 0.10 10.00 1.70 0.003 12.7 7.4 6.7 8.1 12.3 4.6 8.5
4.9
Adenylate kinase (AK) 73–80 (8) −0.01 0.01 −14.00 −5.00 0.75 0.0045 11.1 7.4 7.2 9.2 11.8 6.0 13.5
14.3
Peptidase 205–213 (9) 0.005 0.005 0.50 2.00 1.50 −0.025 3.0 1.2 0.7 1.3 4.1 0.5 6.0
4.6
Antibody Loop 1 L26-37(12) −0.20 0.07 13.00 14.00 1.05 −0.001 13.0 8.8 4.3 8.4 14.6 4.9 4.8
10.9
Antibody, Loop 2 H102-109(8) −0.22 0.921 6.211 17.00 0.10 0.011 5.4 1.1 0.8 1.7 5.5 1.6 1.9
4.2
Penicillopepsin 129–137 (9) −0.25 −0.48 8.50 22.00 1.04 0.0005 6.1 3.0 1.6 2.6 10.3 1.8 4.1
6.5
Proteinase 202–210 (9) 0.001 −3.00 5.00 15.236 1.00 0.0034 3.2 2.7 2.6 5.0 4.9 0.5 3.4
3.4
Averages 7.4 ± 1.3 4.0±1.0 3.1±0.8 4.6±3.2 7.7±4.7 2.7±0.7 4.9±1.3
6.7 ± 1.2 .
a

Still’s parameters P1-P5 and σ are defined in eqs 38. Energy gaps between NOS2 and GEM were obtained by at least 3000 LTD minimizations. The energy gaps denoted EFF, optimal ASPs, and best-fit ASPs are taken from paper II. Energy gaps smaller than 3 kcal/mol are bold-faced. The errors in the averages are one standard deviation divided by n½ where n=9. For the optimal ASPs the result for loop 1 of RNase A (from paper II) is unavailable. Optimized ASPs results for loop 1 of RNase are not available.

b

Energy gaps obtained with Still’s standard set of parameters, where the charge of Arg, Lys, Asp, Glu, and His is neutralized (upper row) and kept intact (lower row).

C

Energy gaps for optimized P1-P5.

d

Energy gaps for optimized P1-P5 and σ.

The table reveals that the optimized P1- P5 for the individual loops lead to a significant decrease in the energy gaps as compared to those obtained with Still’s standard parameters and neutralized charge, and that with the optimizing both P1- P5 and σ these values decrease further. Thus, for six of the loops, the gaps (bold-faced in the table) are smaller than 3 kcal/mol; correspondingly, the average gaps decrease significantly. However, for AK and proteinase the RMSD values between NOS1 and NOS2 are relatively large, 1.3 and 1.6 Å (for both optimal sets), respectively. The energy gaps obtained with the optimized (P1- P5) and the optimized (P1-P5 plus σ) are comparable to the energy gaps obtained with the optimized ASPs in paper II (see Table 2), which is reflected also by the average gap values. To reduce the gaps further, we attempted for several loops to optimize the parameter k (=4) of eq 3, and ɛin, and ɛw of eq 4; however, we could not find parameter values that would lead to lower gaps.

The individual sets of optimized P1- P5 and σ that appear in Table 2 constitute the basis for calculating the best-fit (bf) set. While no definite prescription exists for such a derivation, a guiding principle would be to average the individual values, excluding parameters that deviate strongly from the others or reducing their absolute values. Thus, the best-fit P1 and P5 are exact and approximate averages over all the nine individual values, respectively. Best-fit P3 and P4 are averages over the individual values of eight loops, ignoring the strongly deviating values, −14.0 and −5.0 of AK, respectively. In the averages defining best-fit P2 and σ the moderately deviating values, −3.0 of proteinase and −0.025 of peptidase were increased to −0.26 and −0.0002, respectively. Overall, the bf parameters are systematically lower than the corresponding Still’s original values, where smaller P1 leads to smaller αi while smaller P2 –P4 lead to larger αi (eq 7).

It should be noted that for the bf parameters the RMSD values between NOS1 and NOS2 are all smaller than 0.85 Å (the value obtained for loop 1 of the antibody). The table shows that the energy gaps obtained with Still(bf) parameters are significantly better (lower) than the corresponding values based on Still’s standard set for both neutralized and charged residues. There are two exceptions, namely proteinase, where the values are 5 vs, 3.2 kcal/mol, respectively, and acidic FGF (8.1 vs. 4.9 kcal/mol) for charged residues. One must note, however, that the reliability of the results obtained for proteinase with Still’s standard parameters is somewhat questionable due to the large RMSD between NOS1 and NOS2 mentioned above. Also, the energy gaps for Still’s(bf) are slightly better than those obtained by ASPs(bf), where four and three gaps are smaller than 3 kcal/mol, respectively (the average gaps are comparable).

RMSD for the training group

The RMSD between the GEM structure and NOS1 is calculated with respect to the heavy atoms and without superposition on NOS1 (the same applies to RMSD between NOS1 and NOS2 discussed earlier). An accepted criterion for a successful prediction of the loop backbone (BB) structure is that the RMSD from the correct structure is not larger than 1 Å;34,35 notice, however, that RMSD values smaller than 0.4 Å are actually insignificant because the two structures belong to the same microstate.

RMSD results (between NOS1 and GEM) for the training set of loops are summarized in Table 3, which is structured similarly to Table 2. In particular, two sets of results are presented in the column “standard Still” where for each loop the first and second row contains results obtained with neutralized and charges residues, respectively. The RMSD values are given for the backbone (BB), the side chains (SC) and the total loop (TOT). The general observation is that for all methods and optimizations the BB results are quite satisfactory. Thus, for each of Still’s standard sets (i.e., charged and neutralized), only three RMSD values (bold-faced in the table) are larger than 1 Å, where they do not exceed 1.4 Å. The same tendency with minor changes characterizes all Still’s results, where the largest RMSD(BB) values occur for proteinase with 1.4 Å for all approximations and loop 1 of antibody and AD with maximal values of 1.8 Å (P1-P5) and 2.8 Å (bf), respectively. It is evident that Still’s(bf) results are slightly inferior to the other sets of Still(BB) values but they are comparable to results based on the force field alone [EFF(ɛ=2r)], where also four deviations larger than 1 Å occur. On the other hand, the RMSD(BB) values for the optimized ASPs and ASPs(bf) are all within the range of 1 Å and thus are better than any of Still’s sets; these trends are also reflected by the averages of the optimized ASPs and ASPs(bf) that are slightly lower than the other averages.

Table 3.

Results for the RMSD between NOS1 and the GEM structure for the training group of loopsa

Protein/loop RMSD (Å)
Still’s set P1-P5 P1-P5, σ Best - fit Still EFF(ɛ=2r)eq 1 Optimized ASPs, eq 1 Best-fit ASPs, eq 1
BB SC TOT BB SC TOT BB SC TOT BB SC TOT BB SC TOT BB SC TOT BB SC TOT
RNase A, Loop 3 89–97 (9) 0.5 0.5 0.5b 0.6 1.7 1.3 0.5 1.3 1.0 0.8 2.1 1.6 0.5 1.5 1.1 0.3 1.2 0.9 0.2 1.4 1.0
0.6 2.0 1.4
RNase A, Loop 1 64–71 (8) 0.7 1.9 1.4 0.5 2.1 1.4 0.5 2.2 1.5 0.5 2.0 1.3 0.4 0.9 0.4 0.9
1.4 3.1 2.3
Acidic FGF 90–94 (5) 0.6 2.0 1.5 0.7 1.8 1.4 0.4 1.8 1.4 1.1 2.7 2.2 0.6 3.1 2.4 0.2 1.5 1.2 0.5 1.7 1.3
0.9 2.4 1.9
Adenylate kinase (AK) 73–80 (8) 1.2 4.1 2.9 1.1 3.5 2.5 1.1 3.8 2.7 2.8 7.2 5.3 1.1 3.5 2.5 1.0 3.2 2.3 0.9 2.9 2.1
0.5 4.0 2.8
Peptidase 205–213 (9) 0.1 1.0 0.7 0.1 1.5 1.1 0.4 1.7 1.3 0.1 1.3 0.9 0.8 1.3 1.0 0.2 0.9 0.6 0.2 1.2 0.9
0.2 1.1 0.8
Antibody, Loop 1 L26-37 (12) 1.1 3.3 2.4 1.8 2.9 2.4 1.3 2.7 2.1 1.6 4.8 3.5 1.1 2.3 1.8 1.0 1.7 1.4 0.9 2.2 1.7
1.0 1.5 1.2
Antibody, Loop 2 H102-109(8) 1.0 3.2 2.5 0.3 0.6 0.5 0.4 1.4 1.1 1.0 3.1 2.5 1.2 0.9 1.0 0.7 0.7 0.7 0.6 0.9 0.8
1.1 1.3 1.2
Penicillopepsin 129–137 (9) 0.4 1.9 1.3 0.2 0.8 0.6 0.2 1.1 0.8 0.3 0.8 0.6 0.4 1.9 1.4 0.1 1.2 0.9 0.1 1.5 1.0
0.5 1.9 1.4
Proteinase 202–210 (9) 1.4 1.7 1.5 1.4 1.7 1.5 1.4 1.7 1.5 1.4 1.7 1.5 1.9 3.1 2.4 0.3 1.2 0.7 0.4 1.3 0.8
1.4 1.7 1.5
Averages 0.8 2.2 1.6 0.7 1.8 1.4 0.7 2.0 1.5 1.1 2.9 2.2 0.9 2.2 1.6 0.5 1.5 1.1 0.5 1.6 1.2
0.8 2.1 1.6
SD/n½ 0.1 0.4 0.3 0.2 0.3 0.2 0.2 0.3 0.2 0.3 0.7 0.5 0.2 0.3 0.2 0.1 0.3 0.2 0.1 0.2 0.2
0.1 0.3 0.2
a

BB, SC, and TOT denote RMSD results for the backbone, side chains, and the total loop, respectively. The different columns are defined in the caption of Table 2. Backbone RMSD values larger than 1 Å are bold-faced. The corresponding errors in the averages appear in the bottom; SD is the standard deviation and n=9. Some of the results for loop 1 of RNase A from paper II are unavailable. Not all the results for loop 1 of RNase A are available.

b

RMSD results obtained with Still’s standard set of parameters, where the charge of Arg, Lys, Asp, Glu, and His is neutralized (upper row) and kept intact (lower row).

Most of the RMSD(SC) results are larger than 1 Å, and for standard Still the charged and neutral results are almost comparable (for four out of seven loops the neutral RMSD(SC) results are smaller than the charged values while the averages are actually identical). The RMSD(SC) results for the optimized P1-P5 and optimized P1- P5 plus σ are comparable and are slightly better (for five out of eight loops) than the standard Still values (neutral and charged). As is shown clearly in the table, Still(bf) results for RMSD(SC) are inferior to those of the other Still’s approximations and even to those obtained by the force field [EFF(ɛ=2r)]; this is also reflected by the relatively high average of 2.9 Å for Still(bf). The best results are obtained for the optimized ASPs and ASPs(bf), where the average RMSD(SC) values are 1.5 and 1.6 Å, respectively; however, notice that within the error bars these values are equal to those obtained for Still’s set, with optimized P1-P5, and optimized P1- P5 plus σ.

Energy gaps for the test group

The energy gaps obtained by various methods for a test group of 7 loops are summarized in Table 4. As in Tables 2 and 3, for each loop results presented in the upper and lower rows of the second column were calculated with Still’s standard parameters with neutralized and charged amino acids, respectively; we start by discussing these results. It should first be noted that the R-values of the last four loops are relatively small (1.3–2.7; see Table 1), suggesting that these loops are only moderately flexible. This is probably reflected in the comparable energy gaps obtained for each pair, even though the loops of RNase H and antibody consist of a relatively large number of charged amino acid residues, i.e., 3 and 2, respectively (as pointed out earlier, even after charge neutralization these residues still have significant dipole moments). Notice also that for the last loop (of antibacterial protein) the gap obtained with standard Still(neutralized) is zero, meaning that the GEM structure = NOS2, where for Still(charged) this gap is small, 1.2 kcal/mol. All these results are reliable in the sense that for each loop the RMSD between NOS1 and NOS2 is smaller than 0.56 Å obtained for RNase H.

Table 4.

Energy gaps for the test group of loopsa

Protein, Loop Energy Gaps (kcal/mol)
Standard Stillb Best-fit, Still EFF(ɛ=2r) eq 1 Best-fit, ASPs eq1
Ser-proteinase 143–151(9) 17.6 13.7 6.9 3.9
12.0
Proteinase 188–196 (9) 7.4 9.3 10.0 4.7
1.8
Proteinase 128–137 (10) 25.4 13.4 14.8 3.3
10.6
Peptidase 244–250 (7) 9.9 6.1 9.0 3.4
9.7
RNase H 57–63 (7) 11.9 7.5 14.0 9.4
11.7
Antibody 56L-62L (7) 8.0 6.6 9.8 8.8
5.6
Antibacterial pro. 25–30 (6) 0.0 0.0 0.5 1.7
1.2
Averages 11.5 ± 3.1 8.1 ±1.8 9.3 ± 1.8 5.0 ± 1.1
7.5 ± 1.7
a

Energy gaps between NOS2 and the GEM were obtained by at least 3000 LTD minimizations. The energy gaps denoted EFF, and best-fit ASPs are taken from paper II. The errors in the averages are one standard deviation divided by n½ where n=7.

b

Energy gaps obtained with Still’s standard set of parameters, where the charge of Arg, Lys, Asp, Glu, and His is neutralized (upper row) and kept intact (lower row).

C

Energy gaps for optimized P1-P5.

The first three loops in Table 4 are the longest (9, 9, and 10 residues), are characterized by relatively large R-values (4.9, 4.5, and 4.3, see Table 1), and they contain one, two, and three charged residues, respectively. The energy gaps obtained for these loops with Still’s standard parameters and charged residues are always significantly smaller than those obtained with the neutralized charge. While such large differences are not unexpected for these potentially flexible loops, part of these results might not be reliable due to large RMSD values between NOS1 and NOS2. For ser-proteinase these RMSD values are small, 0.27 and 0.23 Å for the neutralized and charged residues, respectively, however, they are large for loop 188–196 of proteinase (1.51 and 0.86 Å, respectively), and very large (2.39 Å) for the loop 128–137 of proteinase (charged). In this respect, Still’s bf gaps are more reliable because the RMSD values between NOS1 and NOS2 are smaller than 0.66 Å. As expected, the bf energy gaps are smaller than their counterparts obtained with Still’s standard parameters and neutralized charges, except for loop (188–196) of proteinase where the reliability of 7.4 kcal/mol obtained with Still’s standard parameters is questionable, as discussed above.

The gaps obtained with Still’s best-fit parameters are also smaller than those obtained by the force field alone [EFF (ɛ=2r)] in paper II, which are also presented in the Table; the only exception occur for ser-proteinase. On the other hand, for the first four loops the gaps obtained with ASPs(bf) in paper II are significantly smaller than those obtained with Still(bf), while for the last three loops Still(bf)’s gaps are slightly smaller. This again demonstrates that the simplified model (eq 1) is better than the more sophisticated GB/SA model. This is also demonstrated by the average value for ASPs(bf), 5.1 ±1.1 kcal/mol that is smaller than most of the other averages in the table, where it is only equal (within the error bars) to 7.5 ± 1.7 kcal/mol obtained for Still’s standard parameters (charged).

RMSD for the test group

RMSD results for the test group appear in Table 5, and as for the training group, we discuss them first for the backbone [RMSD(BB)]. For Still’s standard parameters most of the RMSD are smaller than 1 Å besides RMSD=1.5 Å obtained for loop 128–137 of proteinase (charged residues). A relatively large value, 1.5 Å, is also shown for RNAse H (neutralized), where this value decreases to 0.8 Å for Still(bf); the other RMSD(BB) results remain the same for Still(standard) and Still(bf). The RMSD(BB) values for the force field alone [EFF (ɛ=2r)] are larger than those of Still(bf) for ser-proteinase (2.1 vs. 0.2 Å) and for loop 128–137 of proteinase (1.3 vs. 1.1 Å); for the rest of the loops the force field results are predominantly the lowest and they are smaller than 1 Å. However, the lowest set of RMSD(BB) is again that of ASPs(bf) where all are smaller than 1 Å. However, all the averages are below 1 Å and they are equal within the error bars.

Table 5.

Results for the RMSD between NOS1 and the GEM structure for the test group of loopsa

Protein/loop RMSD (Å)
Standard Still Best-fit, Still EFF(ɛ=2r) eq 1 Best-fit ASP’s eq 1
BB SC TOT BB SC TOT BB TOT BB TOT
Ser-Proteinase 143–151 (9) 0.2 1.0 0.7b 0.2 1.7 1.1 2.1 2.4 0.6 0.6
0.2 0.9 0.6
Proteinase 188–196 (9) 0.7 2.3 1.7 0.7 2.3 1.8 0.3 1.5 0.2 0.9
0.3 1.9 1.4
Proteinase 128–137 (10) 1.1 2.8 2.1 1.1 4.8 3.4 1.3 2.3 0.8 1.0
1.5 2.9 2.4
Peptidase 244–250 (7) 0.7 1.5 1.2 0.7 1.6 1.3 0.7 1.3 0.6 2.2
0.8 2.0 1.5
RNase H 57–63 (7) 1.5 3.7 2.8 0.8 2.8 2.1 0.2 1.5 0.2 1.9
0.9 1.9 1.5
Antibody 56L-62L (7) 0.8 1.4 1.1 0.8 1.3 1.0 0.1 0.7 0.7 1.1
0.8 0.9 0.8
Antibacterial prot.25–30 (6) 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.6 0.1 0.7
0.2 0.2 0.2
Averages 0.7 1.8 1.4 0.6 2.1 1.5 0.7 1.5 0.5 1.2
0.7 1.5 1.2
SD/n½ 0.2 0.5 0.4 0.1 0.6 0.4 0.3 0.3 0.1 0.2
0.2 0.3 0.3
a

BB, SC, and TOT denote RMSD results for the backbone, side chains, and the total loop, respectively. The different columns are defined in the caption of Table 2. Backbone RMSD values larger than 1 Å are bold-faced. The corresponding errors in the averages appear in the bottom; SD is the standard deviation and n=7. In paper II results for SC are not provided.

b

RMSD results obtained with Still’s standard set of parameters, where the charge of Arg, Lys, Asp, Glu, and His is neutralized (upper row) and kept intact (lower row).

The RMSD(SC) results obtained for Still’s standard parameters, as expected, are larger than the corresponding RMSD(BB) values and in most cases are larger than 1 Å. However, these values (for the neutral residues) are not worse (and in three cases they are actually better) than the corresponding values obtained for Still(bf); the same applies to the total RMSD values [RMSD(TOT)]. In paper II results were presented for RMSD(TOT) but not for RMSD(SC), which therefore do not appear in Table 5. The table reveals that in five out of seven cases the RMSD(TOT) values obtained with the force field alone or with ASPs(bf) are equal or smaller (better) than those of Still(bf). For Still(bf) the largest RMSD(TOT) is 3.4 Å (proteinase, 128–137), where the largest values obtained with the force field and ASPs(bf) are smaller, 2.4 Å (ser-proteinase), and 2.2 Å (peptidase), respectively. Notice that for five loops the ASPs(bf) TOT values are not larger than 1.1 Å! The averages of RMSD(TOT) follow the above trends but statistically they are all equal.

Overall evaluation of the different models

The above discussion of results already demonstrates some advantage of eq 1 over the GB/SA model. To evaluate these models further, we present in Table 6, averages calculated over the entire group of 16 loops for the energy gaps and the RMSD values as well as their standard deviations (divided by 16½ = 4). As expected, for the three Still’s models, the lowest average energy (6.15 kcal/mol) is obtained with the bf parameters; this value is significantly smaller (i.e., beyond the statistical errors) than 9.75 obtained by Still’s original parameters with neutralized residues and 8.4 kcal/mol obtained by the force field alone [EFF(ɛ=2r), eq 1]. However, 6.15 is equal within the statistical errors to the slightly larger gap, 7.06 kcal/mol obtained for Still’s original parameters with charged residues. The lowest gap, 5.0 kcal/mol (with the lowest statistical error) is observed for ASPs(bf); however, within the error bars, this value should be considered equal to 6.15. Correspondingly, the backbone RMSD of ASPs(bf), 0.46 Å, is significantly lower than the values obtained with the other models, where the latter results are equal within the error bars. Also, the RMSD(TOT) result, 1.18 Å for ASPs(bf) is the lowest, however, its error overlaps those of Still’s(standard).

Table 6.

Average energy gaps and RMSD values for the 16 loops of the training and test groupsa

Standard Still (neutralized)b Standard Still (charged)c Best-fit, Still EFF(ɛ=2r), eq 1 Best- fit ASP’s
Average Energy Gaps (kcal/mol)
9.75 ± 1.73 7.06 ± 1.08 6.15 ± 1.04 8.40 ± 1.15 5.00 ± 0.87
Average RMSD (Å)
BB d 0.75 ± 0.11 0.77 ± 0.11 0.87 ± 0.17 0.80 ± 0.15 0.46 ± 0.07
SC 2.02 ± 0.30 1.86 ± 0.23 2.51 ± 0.45
TOT 1.51 ± 0.21 1.43 ± 0.17 1.88 ± 0.33 1.55 ± 0.17 1.18 ± 0.13
a

The errors in the averages are one standard deviation divided by n½ where n=16.

b

Calculated with Still’s standard parameters with neutralized charge for Arg, Lys, Glu, Asp, and His.

c

Calculated with Still’s standard parameters with charged residues.

d

BB=backbone; SC=side chains; TOT=total.

Thus, while the advantage of eq 1 with ASPs(bf) over Still’s results is in most cases statistically significant, the distinction between the performance of Still’s models would require results from a larger sample of loops. However, the trend shown in the table is that Still(bf) provides the lowest average energy gap (among Still’s models) while its RMSD values are somewhat inferior to those of the other models. In retrospect the fact that comparable results obtained for Still’s models is perhaps not surprising because the standard and best-fit sets of parameters are in most cases not very different, where the four best examples are P3, P4, P5 and σ that are 6.211 vs. 5.30, 15.236 vs. 13.90, 1.254 vs. 1.10, and 0.0049 vs. 0.0030 for Still(standard) and Still(bf), respectively (see Table 2). This should be compared to the more drastic changes occurred in the optimization of the ASPs in paper II, where the optimized (and bf) value of carbon (which is the most frequent atom) has been found to be negative (hydrophilic) versus its positive value (hydrophobic) in the sets of Wesson and Eisenberg,49 and Ooi et al.,77 for example. This may suggest that the original (standard) optimization of Still’s parameters against PB results for small molecules is reasonable, a fact that could not have been gathered a-priori. However, our hope that GB/SA would provide better results than the theoretically inferior eq 1 has not been materialized to our surprise (and disappointment); the reason for this unexpected behavior remains unclear. Still, it is possible that other GB/SA versions would provide better results for loops than the present model.

Attempts to improve eq 1

In view of the above discussion, it would be of interest to check whether eq 1 can still be improved. As has already been pointed out and discussed in more detail in paper II, the dielectric function, ɛ=nr with n=2 used for optimizing the ASPs does not provide the necessary screening of the Coulombic interactions for a loop consisting of several charged residues (even if neutralized), while increasing the screening to ɛ=3r made eq 1 insensitive to conformational changes and thus did not allow optimization of the ASPs. To overcome this problem we decided to replace the ɛ=nr function by more complex dielectric functions and study their performance. The first function, used by Mehler and collaborators is,102,103

ɛ(r)=(ɛw+1)/(1+kexp[λ(ɛw1)r])1 (10)

where ɛw=80, and k and λ are parameters to be optimized. The second function, proposed by Warshel is,104

ɛ(r)={16.55r<3A˚1+60(1exp(0.1r)r3A˚}, (11)

where both functions have been implemented within TINKER. eq 10 was applied to loop 3 of RNase A and the loop of acidic fibroblast, where both ɛ0 and λ, and the ASPs were optimized. eq 11 was applied to loop 3 of RNase A and the second loop of proteinase (of the test group). Here no parameters exist and thus only the ASPs were optimized. However, in both cases we could not obtain better energy gaps than those obtained with ɛ=2r.

Other recent studies of loops

Still’s GB/SA model with the AMBER force field has been applied recently to loops by de Bakker et al.52,53 who treated 385 loop targets (length 2 to 12) collected previously by Fiser et al.44 For each target a set of 1000 decoy structures were generated using the RAPPER and SCRWL search procedures for the backbone and side chains, respectively. The energies of these decoys were than minimized with the GB/SA/AMBER function and for comparison also by the AMBER force field (with ɛ=1) alone, using the program TINKER. As in our studies, they have found in general a better performance with GB/SA/AMBER than with AMBER alone. Later, an extensive study of loops was carried out by Jacobson et al.54 who used the Surface Generalized Born and a nonpolar solvation model (SGB-NP)56 with the latest version of the OPLS force field.102 They have treated a full set of 788 target loops (length 4 to12) and a filtered set of 514 loops, where for each loop 200–1400 decoys have been generated by an elaborate conformational search procedure. Very recently Zhang et al.58 have tested their knowledge-based statistical potential, DFIRE (distance-scaled, finite ideal gas reference state) by applying it to these three loop sets and comparing its performance to those of GB/SA/AMBER and SGB-NP/OPLS. From these results one can obtain some information about the relative performance of the above models.

Thus, in the section “Minimized” of Table S2 of the supplemental material provided by Zhang et al.58 the average RMSD results obtained by GB/SA/AMBER and DFIRE for different loop length are presented. Dividing the provided standard deviation values by n½ where n is the number of loops of certain length studied, show that only for three loop sizes, 3, 4, and 6 the values of GB/SA/AMBER are smaller than those of DFIRE, while in all other cases the corresponding results are equal within the error bars. On the other hand, in the section “Full” of Table S4 OPLS/SGB-NP leads to smaller RMSD values than DFIRE for six loop lengths (from 4 to 9), where for the longer loops (10–12) the results are equal within the statistical errors. For the filtered set, OPLS/SGB-NP leads to the smallest RMSD values for five loop lengths (4 to 8) where for the longer loops (9–12) the results are equal results within the statistical errors.

Thus, OPLS/SGB-NP performs better with respect to DFIRE than does AMBER/GB/SA, suggesting that OPLS/SGB-NP is the more reliable model among the two at least for loops. Clearly, this conclusion should be taken with some caution because the RAPPER set is smaller and different from Jacobson’s sets, and from our experience, the number of decoys used in these studies is insufficient. In our studies, for example, 3000–9000 conformations are generated for each loop in a search process (LTD) that directs the loop towards its GEM structure. Also, it is not clear what is the relative contribution of the force fields to the performance of these models. In paper I we have found AMBER to be better than OPLS for loops but the torsional potentials of OPLS have been recently improved105 and used in the OPLS/SGB-NP study.

This discussion is closely related to recent performance studies of GB/SA solvation models. It has been found that some combinations of force fields and GB/SA models are better than others and can lead to results that are close to those obtained in the experiment or by explicit solvation models. A well-studied example is the (caped) C-terminal polypeptide from the B1 domain of protein G, a 16-residue peptide that has been found experimentally to fold to a β-hairpin in aqueous solutions.106–,108 Folding simulations based on different explicit water models (TIP3P, SPC) and force fields have all found the β-hairpin state the most populated.109112 On the other hand, simulations of Zhou and Berne,113 Zhou,112 and Levy’s group114 have shown that only few of the implicit models studied predict the β-hairpin state to be the most stable.

Conclusions

All of the solvation models studied here [including EFF(ɛ = 2r)] are considerably better than using the force field with ɛ = 1 [ EFF(ɛ = 1)] as has been discussed in papers I and II. Based on results for 16 loops, we have not found significant differences in performance among the three GB/SA models studied. All of them, however, have been shown to be somewhat inferior to eq 1, which itself is unsatisfactory, leading to too high energy gaps of ~5 kcal/mol. We have also concluded (indirectly) about differences in the performance of DFIRE58 and the models of de Bakker et al.52,53 and Jacobsen et al. However, these differences (based on the average behavior) are not very large as well, and for certain individual loops are reversed. It should be pointed out that for loops shorter than 8 residues RMSD(BB) obtained by all these models is satisfactory.

Implicit solvation models are very convenient for studying loops due to their relative simplicity and the fact that they are amenable to efficient conformational search techniques. The problem is whether they can be improved significantly further. In this context it should be emphasized again that most of the loop studies (excluding DFIRE) are based on minimized energy structures, where RMSD differences of 0.1–0.5 Å are insignificant because the corresponding structures belong to the same microstate. Neglecting the conformational entropy also hampers the search for correlation between RMSD and the free energy gap. Preliminary calculations in paper II have shown, however, that the contribution of the entropy has led to an insufficient decrease in the free energy gaps, i.e., only by ~0.6 kcal/mol. Entropic effects have been included successfully in the colony free energy.59,115 Better agreement with the experimental data can expected to be achieved by taking into account the crystal environment and the effect of ions, and by selecting loops with low B-factors.54,116

An important factor which affects the quality of loop modeling is an optimal match between a given implicit solvation model and the force field used. To be consistent with papers I45 and II46 we have applied here GB/SA with AMBER94; however, extensive studies of the C-terminal polypeptide from the B1 domain of protein G by Zhou using AMBERx/GBSA,109 where x=94, 96, and 99 discovered that only AMBER96 (Ref. 117) with GB/SA gave a reasonable free energy profile (but one erroneous salt bridge); therefore, optimizing eq 1 with AMBER96 or other new optimized force fields might have improved this model further. One perhaps might choose GB models which maximally mimic of the Poisson Boltzmann (PB) equation; however, Lee and coworkers118,119 have argued recently that PB itself has its limitation and one has to resort to explicit-implicit hybrid models. Thus, developing the optimal implicit solvation model in general and for loops in particular still remains an open problem.120

Acknowledgments

This work was supported by NIH grants R01GM61916 and R01GM66090 and by National Science Foundation Large Information Technology Research Grant NSF0225636.

References

  • 1.Karplus PA, Schulz GE. Naturwissenschaften. 1985;72:212. [Google Scholar]
  • 2.Getzoff ED, Geysen HM, Rodda SJ, Alexander H, Tainer JA, Lerner RA. Science. 1987;235:1191. doi: 10.1126/science.3823879. [DOI] [PubMed] [Google Scholar]
  • 3.Rini JM, Schulze-Gahmen U, Wilson IA. Science. 1992;255:959. doi: 10.1126/science.1546293. [DOI] [PubMed] [Google Scholar]
  • 4.Constantine KL, Friedrichs MS, Wittekind M, Jamil H, Chu CH, Parker RA, Goldfarb V, Mueller L, Farmer BT. Biochemistry. 1998;37:7965. doi: 10.1021/bi980203o. [DOI] [PubMed] [Google Scholar]
  • 5.Nicholson LK, Yamazaki T, Torchia DA, Grzesiek S, Bax A, Stahl SJ, Kaufman JD, Wingfield PT, Lam PYS, Jadhav PK, Hodge CN, Domaille PJ, Chang C-H. StructBiol. 1995;2:274. doi: 10.1038/nsb0495-274. [DOI] [PubMed] [Google Scholar]
  • 6.Collins JR, Burt SK, Erickson JW. StructBiol. 1995;2:334. doi: 10.1038/nsb0495-334. [DOI] [PubMed] [Google Scholar]
  • 7.Wagner G. StructBiol. 1995;2:255. [Google Scholar]
  • 8.Fetrow JS. FASEB J. 1995;9:708. [PubMed] [Google Scholar]
  • 9.Bates PA, Sternberg M. J Proteins. 1999;(Suppl 3):47. doi: 10.1002/(sici)1097-0134(1999)37:3+<47::aid-prot7>3.3.co;2-6. [DOI] [PubMed] [Google Scholar]
  • 10.Mosimann S, Meleshko R, James MN. Proteins. 1995;23:301. doi: 10.1002/prot.340230305. [DOI] [PubMed] [Google Scholar]
  • 11.Sali A. CurrOpinBiotechnol. 1995;6:437. [Google Scholar]
  • 12.Petrey D, Xiang Z, Tang CL, Xie L, Gimpepev M, Mitros T, Soto CS, Goldsmith-Fischman S, Kernytsky A, Schlessinger A, Koh IYY, Alexov E, Honig B. Proteins. 2003;53:430. doi: 10.1002/prot.10550. [DOI] [PubMed] [Google Scholar]
  • 13.Crasto CJ, Feng J. Proteins. 2001;42:399. doi: 10.1002/1097-0134(20010215)42:3<399::aid-prot100>3.0.co;2-e. [DOI] [PubMed] [Google Scholar]
  • 14.Leszczynski JF, Rose GD. Science. 1986;234:849. doi: 10.1126/science.3775366. [DOI] [PubMed] [Google Scholar]
  • 15.Donate LE, Rufino SD, Canard LH, Blundell TL. Protein Sci. 1996;5:2600. doi: 10.1002/pro.5560051223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Fechteler T, Dengler U, Schomburg D. JMolBiol. 1995;253:114. doi: 10.1006/jmbi.1995.0540. [DOI] [PubMed] [Google Scholar]
  • 17.Kwasigroch JM, Chomilier J, Mornon JP. JMolBiol. 1996;259:855. doi: 10.1006/jmbi.1996.0363. [DOI] [PubMed] [Google Scholar]
  • 18.Martin AC, Toda K, Stirk HJ, Thornton JM. Protein Eng. 1995;8:1093. doi: 10.1093/protein/8.11.1093. [DOI] [PubMed] [Google Scholar]
  • 19.Oliva B, Bates PA, Querol E, Aviles FX, Sternberg MJ. JMolBiol. 1997;266:814. doi: 10.1006/jmbi.1996.0819. [DOI] [PubMed] [Google Scholar]
  • 20.Ring CS, Kneller DG, Langridge R, Cohen FE. JMolBiol. 1992;224:685. doi: 10.1016/0022-2836(92)90553-v. [DOI] [PubMed] [Google Scholar]
  • 21.Pal M, Dasgupta S. Proteins. 2003;51:591. doi: 10.1002/prot.10376. [DOI] [PubMed] [Google Scholar]
  • 22.Chothia C, Lesk AM. JMolBiol. 1987;196:901. doi: 10.1016/0022-2836(87)90412-8. [DOI] [PubMed] [Google Scholar]
  • 23.Chothia C, Lesk AM, Tramontano A, Levitt M, Smith-Gill SJ, Air G, Sheriff S, Padlan EA, Davies D, Tulip WR. Nature. 1989;342:877. doi: 10.1038/342877a0. [DOI] [PubMed] [Google Scholar]
  • 24.Fidelis K, Stern PS, Bacon D, Moult J. Protein Eng. 1994;7:953. doi: 10.1093/protein/7.8.953. [DOI] [PubMed] [Google Scholar]
  • 25.Espandaler J, Fernandez-Fuentes N, Hermoso A, Querol E, Aviles FX, Sternberg MJE, Oliva B. Nucleic Acids Res. 2004;32:D185. doi: 10.1093/nar/gkh002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Summers NL, Karplus M. JMolBiol. 1990;216:991. doi: 10.1016/S0022-2836(99)80016-3. [DOI] [PubMed] [Google Scholar]
  • 27.Tappura K. Proteins. 2001;44:167. doi: 10.1002/prot.1082. [DOI] [PubMed] [Google Scholar]
  • 28.Wohlfahrt G, Hangoc V, Schomburg D. Proteins. 2002;47:370. doi: 10.1002/prot.10098. [DOI] [PubMed] [Google Scholar]
  • 29.Deane CM, Blundell TL. Proteins. 2000;40:135. [PubMed] [Google Scholar]
  • 30.van Vlijmen HW, Karplus M. JMolBiol. 1997;267:975. doi: 10.1006/jmbi.1996.0857. [DOI] [PubMed] [Google Scholar]
  • 31.Wojcik J, Mornon JP, Chomilier J. JMolBiol. 1999;289:1469. doi: 10.1006/jmbi.1999.2826. [DOI] [PubMed] [Google Scholar]
  • 32.Samudrala R, Moult J. J Mol Biol. 1998;275:895. doi: 10.1006/jmbi.1997.1479. [DOI] [PubMed] [Google Scholar]
  • 33.Sudarsanam S, DuBose RF, March CJ, Srinivasan S. Protein Sci. 1995;4:1412. doi: 10.1002/pro.5560040715. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Bruccoleri RE, Karplus M. Biopolymers. 1987;26:137. doi: 10.1002/bip.360260114. [DOI] [PubMed] [Google Scholar]
  • 35.Moult J, James MN. Proteins. 1986;1:146. doi: 10.1002/prot.340010207. [DOI] [PubMed] [Google Scholar]
  • 36.Fine RM, Wang H, Shenkin PS, Yarmush DL, Levinthal C. Proteins. 1986;1:342. doi: 10.1002/prot.340010408. [DOI] [PubMed] [Google Scholar]
  • 37.Higo J, Collura V, Garnier J. Biopolymers. 1992;32:33. doi: 10.1002/bip.360320106. [DOI] [PubMed] [Google Scholar]
  • 38.Rosenfeld R, Zheng Q, Vajda S, DeLisi C. JMolBiol. 1993;234:515. doi: 10.1006/jmbi.1993.1607. [DOI] [PubMed] [Google Scholar]
  • 39.Shenkin PS, Yarmush DL, Fine RM, Wang HJ, Levinthal C. Biopolymers. 1987;26:2053. doi: 10.1002/bip.360261207. [DOI] [PubMed] [Google Scholar]
  • 40.Caralacci L, Englander SW. JComputChem. 1996;17:1002. [Google Scholar]
  • 41.Dudek MJ, Scheraga HA. JComputChem. 1990;11:121. [Google Scholar]
  • 42.G‚ N, Scheraga HA. Macromolecules. 1970;3:178. [Google Scholar]
  • 43.Zheng Q, Rosenfeld R, Vajda S, DeLisi C. JComputChem. 1993;14:556. [Google Scholar]
  • 44.Fiser A, Do RKG, Šali A. Protein Sci. 2000;9:1753. doi: 10.1110/ps.9.9.1753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Das B, Meirovitch H. Proteins. 2001;43:303. doi: 10.1002/prot.1041. [DOI] [PubMed] [Google Scholar]
  • 46.Das B, Meirovitch H. Proteins. 2003;43:470. doi: 10.1002/prot.10356. [DOI] [PubMed] [Google Scholar]
  • 47.Mas MT, Smith KC, Yarmush DL, Aisaka K, Fine RM. Proteins. 1992;14:483. doi: 10.1002/prot.340140409. [DOI] [PubMed] [Google Scholar]
  • 48.Smith KC, Honig B. Proteins. 1994;18:119. doi: 10.1002/prot.340180205. [DOI] [PubMed] [Google Scholar]
  • 49.Wesson L, Eisenberg D. Protein Sci. 1992;1:227. doi: 10.1002/pro.5560010204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Qiu D, Shenkin PS, Hollinger FP, Still WC. JPhysChem A. 1997;101:3005. [Google Scholar]
  • 51.Rapp CS, Friesner RA. Proteins. 1999;35:173. [PubMed] [Google Scholar]
  • 52.de Bakker PIW, DePristo MA, Burke DF, Blundell TL. Proteins. 2003;51:21. doi: 10.1002/prot.10235. [DOI] [PubMed] [Google Scholar]
  • 53.DePristo MA, de Bakker PIW, Lovell SC, Blundell TL. Proteins. 2003;51:41. doi: 10.1002/prot.10285. [DOI] [PubMed] [Google Scholar]
  • 54.Jacobson MP, Pincus DL, Rapp CS, Day TJF, Honig B, Shaw DE, Friesner RA. Proteins. 2004;55:351. doi: 10.1002/prot.10613. [DOI] [PubMed] [Google Scholar]
  • 55.Ghosh A, Sendrovic, Rapp C, Friesner R. J Phys Chem. 1998;102:10983. [Google Scholar]
  • 56.Gallicchio E, Zhang LY, Levy RM. J Comput Chem. 2002;23:517. doi: 10.1002/jcc.10045. [DOI] [PubMed] [Google Scholar]
  • 57.Jorgensen WL, Maxwell DS, Tirado-Rives J. JAmChemSoc. 1996;118:11225. [Google Scholar]
  • 58.Zhang C, Liu S, Zhou Y. Protein Sci. 2004;13:391. doi: 10.1110/ps.03411904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Xiang X, Soto CS, Honig B. Proc Natl Acad Sci USA. 2002;99:7432. doi: 10.1073/pnas.102179699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Baysal C, Meirovitch H. JAmChemSoc. 1998;120:800. [Google Scholar]
  • 61.Baysal C, Meirovitch H. Biopolymers. 1999;50:329. doi: 10.1002/(SICI)1097-0282(199909)50:3<329::AID-BIP8>3.0.CO;2-4. [DOI] [PubMed] [Google Scholar]
  • 62.Meirovitch H, Meirovitch E, Lee J. JPhysChem. 1995;99:4847. [Google Scholar]
  • 63.Meirovitch H, Meirovitch E. JPhysChem. 1996;100:5123. [Google Scholar]
  • 64.Baysal C, Meirovitch H. Biopolymers. 2000;54:416. doi: 10.1002/1097-0282(200011)54:6<416::AID-BIP60>3.0.CO;2-2. [DOI] [PubMed] [Google Scholar]
  • 65.Baysal C, Meirovitch H. Biopolymers. 2000;53:423. doi: 10.1002/(SICI)1097-0282(20000415)53:5<423::AID-BIP6>3.0.CO;2-C. [DOI] [PubMed] [Google Scholar]
  • 66.Maiorov VN, Crippen GM. J Mol Biol. 1992;227:876. doi: 10.1016/0022-2836(92)90228-c. [DOI] [PubMed] [Google Scholar]
  • 67.Mirny LA, Shakhnovich EI. J Mol Biol. 1996;264:1164. doi: 10.1006/jmbi.1996.0704. [DOI] [PubMed] [Google Scholar]
  • 68.Seok C, Rosen JB, Chodera JD, Dill KA. J Comput Chem. 2003;24:89. doi: 10.1002/jcc.10124. [DOI] [PubMed] [Google Scholar]
  • 69.Baysal C, Meirovitch H. JPhysChem. 1997;101:2185. [Google Scholar]
  • 70.Baysal C, Meirovitch H. JComputChem. 1999;20:1659. [Google Scholar]
  • 71.Meirovitch H. ChemPhysLett. 1977;45:389. [Google Scholar]
  • 72.Meirovitch H, Koerber SC, Rivier JE, Hagler AT. Biopolymers. 1994;34:815. doi: 10.1002/bip.360340703. [DOI] [PubMed] [Google Scholar]
  • 73.White RP, Meirovitch H. Proc Natl Acad Sci USA. 2004;101:9235. doi: 10.1073/pnas.0308197101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Cheluvaraja S, Meirovitch H. Proc Natl Acad Sci USA. 2004;101:9241. doi: 10.1073/pnas.0308201101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Tanner JJ, Nell LJ, McCammon JA. Biopolymers. 1992;32:23. doi: 10.1002/bip.360320105. [DOI] [PubMed] [Google Scholar]
  • 76.Lins RD, Briggs JM, Straatsma TP, Carlson HA, Greenwald J, Choe S, McCammon JA. Biophys J. 1999;76:2999. doi: 10.1016/s0006-3495(99)77453-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Ooi T, Oobatake M, Nemethy G, Scheraga HA. ProcNatlAcadSciUSA. 1987;84:3086. doi: 10.1073/pnas.84.10.3086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Schiffer CA, Caldwell JW, Kollman PA, Stroud RM. MolecSimul. 1993;10:121. [Google Scholar]
  • 79.Fraternali F, Van Gunsteren WF. JMolBiol. 1996;256:939. doi: 10.1006/jmbi.1996.0139. [DOI] [PubMed] [Google Scholar]
  • 80.Cornell WD, Cieplak P, Bayly CI, Gould IR, Merz KM, Jr, Ferguson DM, Spellmeyer DC, Fox T, Caldwell JW, Kollman PA. JAmChemSoc. 1995;117:5179. [Google Scholar]
  • 81.Ponder, J.W. TINKER-software tools for molecular design. St.Louis:Washington University 2001;Version 3.9.
  • 82.Najmanovich R, Kuttner J, Sobolev V, Edelman M. Proteins. 2000;39:261. doi: 10.1002/(sici)1097-0134(20000515)39:3<261::aid-prot90>3.0.co;2-4. [DOI] [PubMed] [Google Scholar]
  • 83.Zhao S, Goodsell DS, Olson AJ. Proteins. 2001;43:271. doi: 10.1002/prot.1038. [DOI] [PubMed] [Google Scholar]
  • 84.Wilson MA, Brunger AT. JMolBiol. 2000;301:1237. doi: 10.1006/jmbi.2000.4029. [DOI] [PubMed] [Google Scholar]
  • 85.Esposito L, Vitagliano L, Sica F, Sorrentino G, Zagari A, Mazzarella L. JMolBiol. 2000;297:713. doi: 10.1006/jmbi.2000.3597. [DOI] [PubMed] [Google Scholar]
  • 86.Still WC, Tempczyk A, Hawley RC, Hendrickson T. J Am Chem Soc. 1990;112:6127. [Google Scholar]
  • 87.Hawkins GD, Liotard DA, Cramer CJ, Truhlar DG. J Org Chem. 1998;63:4305. [Google Scholar]
  • 88.Schaefer M, Karplus M. J Phys Chem. 1996;100:1578. [Google Scholar]
  • 89.Jayaram B, Liu Y, Beveridge DL. J Chem Phys. 1998;109:1465. [Google Scholar]
  • 90.Dominy BN, Brooks CL., III J Phys Chem. 1999;103:3765. [Google Scholar]
  • 91.Onufriev A, Bashford D, Case DA. J Phys Chem. 2000;104:3712. [Google Scholar]
  • 92.Lee MS, Salsbury FR, Jr, Brooks CL., III J Chem Phys B. 2002;116:10606. [Google Scholar]
  • 93.Lee MS, Feig M, Salsbury FR, Jr, Brooks CL., III J Comput Chem. 2003;24:1348. doi: 10.1002/jcc.10272. [DOI] [PubMed] [Google Scholar]
  • 94.Zhang W, Hou T, Qiao X, Xu X. J Phys Chem B. 2003;107:9071. [Google Scholar]
  • 95.Jayaram B, Sprous D, Liu Y, Beveridge DL. J Chem Phys B. 1998;102:9571. [Google Scholar]
  • 96.Morozov AV, Kortemme T, Baker D. J Phys Chem B. 2003;107:2075. [Google Scholar]
  • 97.Li Z, Scheraga HA. Proc Natl Acad Sci USA. 1987;84:6611. doi: 10.1073/pnas.84.19.6611. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.von Freyberg B, Braun W. J Comput Chem. 1991;12:1065. [Google Scholar]
  • 99.Liu, D. C.; Nocedal, J. Technical Report NAM03, Evanston,IL, Department of Electrical Engineering and Computer Science, North Western University 1988;
  • 100.Flory, P.J. Statistical Mechanics of Chain Molecules; Hasner. 1988;
  • 101.Muller CW, Schulz GE. JMolBiol. 1992;224:159. doi: 10.1016/0022-2836(92)90582-5. [DOI] [PubMed] [Google Scholar]
  • 102.Hassan SA, Guarnieri F, Mehler EL. J Phys Chem B. 2000;104:6478. [Google Scholar]
  • 103.Hassan SA, Mehler EL. Proteins. 2002;47:45. doi: 10.1002/prot.10059. [DOI] [PubMed] [Google Scholar]
  • 104.Warshel A, Russel STQ. Rev Biophys. 1984;17:283. doi: 10.1017/s0033583500005333. [DOI] [PubMed] [Google Scholar]
  • 105.Kaminski GA, Friesner RA, Tirado-Rives J, Jorgensen WL. J Phys Chem B. 2001;105:6476. [Google Scholar]
  • 106.Blanco FJ, Rivas G, Sranno L. Nat Struct Biol. 1994;1:584. doi: 10.1038/nsb0994-584. [DOI] [PubMed] [Google Scholar]
  • 107.Munoz V, Thompson PA, Hofrichter J, Eaton WA. Nature. 1997;390:196. doi: 10.1038/36626. [DOI] [PubMed] [Google Scholar]
  • 108.Munoz V, Henry ER, Hofrichter J, Eaton WA. Proc Natl Acad Sci USA. 1998;95:5872. doi: 10.1073/pnas.95.11.5872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Pande VS, Rokhsar DS. Proc Natl Acad Sci USA. 1999;96:9062. doi: 10.1073/pnas.96.16.9062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Garcia AE, Sanbonmatsu KY. Proteins. 2001;42:345. doi: 10.1002/1097-0134(20010215)42:3<345::aid-prot50>3.0.co;2-h. [DOI] [PubMed] [Google Scholar]
  • 111.Zhou R, Berne BJ, Germain R. Proc Natl Acad Sci USA. 2001;98:14931. doi: 10.1073/pnas.201543998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Zhou R. Proteins. 2003;53:148. doi: 10.1002/prot.10483. [DOI] [PubMed] [Google Scholar]
  • 113.Zhou R, Berne BJ. Proc Natl Acad Sci USA. 2002;99:12777. doi: 10.1073/pnas.142430099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Felts AK, Harano Y, Gallicchio E, Levy RM. Proteins. 2004;56:310. doi: 10.1002/prot.20104. [DOI] [PubMed] [Google Scholar]
  • 115.Fogolari F, Tosatto SCE. Protein Science. 2005;14:889. doi: 10.1110/ps.041004105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Rapp SR, Pollack RM. Proteins. 2005;60:103. doi: 10.1002/prot.20492. [DOI] [PubMed] [Google Scholar]
  • 117.Kollman, P.; Dixon, R.; Cornell, W.; Fox, T.; Chipot, C.; Pohorille, A. The development/application of a ‘minimalist’ organic/biochemical molecular mechanic force field using a combination of ab initio calculations and experimental data. in computer simulation of biomolecular systems. (Eds.) van Gunsteren, W.F.; Weiner, P.K.; Wilkinson, A.J. 1997, 3, 83.
  • 118.Lee MS, Salsbury FR, Jr, Olson MA. J Comput Chem. 2004;25:1967. doi: 10.1002/jcc.20119. [DOI] [PubMed] [Google Scholar]
  • 119.Lee MS, Olson MA. J Phys Chem B. 2005;109:5223. doi: 10.1021/jp046377z. [DOI] [PubMed] [Google Scholar]
  • 120.Fan H, Mark AE, Zhu J, Honig B. Proc Natl Acad Sci USA. 2005;102:6760. doi: 10.1073/pnas.0408857102. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES