Abstract
Loop flexibility is often crucial to protein biological function in solution. We report a new Monte Carlo method for generating conformational ensembles for protein loops and cyclic peptides. The approach incorporates the triaxial loop closure method which addresses the inverse kinematic problem for generating backbone move sets that do not break the loop. Sidechains are sampled together with the backbone in a hierarchical way, making it possible to make large moves that cross energy barriers. As an initial application, we apply the method to the flexible loop in triosephosphate isomerase that caps the active site, and demonstrate that the resulting loop ensembles agree well with key observations from previous structural studies. We also demonstrate, with 3 other test cases, the ability to distinguish relatively flexible and rigid loops within the same protein.
1 Introduction
A great deal of effort has been directed towards the development of computational methods for predicting the conformations of protein loops, which is a critical task in comparative protein modeling and in computational protein design1–4. The success of these methods has been evaluated primarily by comparing the results of the loop predictions with the loop conformations observed in crystal structures. That is, the focus is predicting the structure of the loop—a specific conformation—rather than the ensemble of conformations populated at biologically relevant conditions. Although these loop prediction methods can be used to identify multiple low-energy conformations, it is challenging to determine populations of the conformations, i.e., to relate energies of individual conformations to free energies of micro or macrostates in the ensemble, although significant progress in this regard has been made by Meirovitch and coworkers5–7.
The flexibility of loops, i.e., the ability to adopt multiple conformations at relevant temperatures, is often critical to biological function, by playing an important role in molecular recognition. For example, the active site loop of the triosephosphate isomerase (TIM barrel) changes its conformation from an open to a closed state after binding of ligands8, 9. In kinases, two critical loops near the active site are flexible, with important implications for drug discovery: the glycine-rich loop (also called the P-loop) and the activation loop, including the DFG motif, which can adopt at least 2 major conformations in some kinases, referred to as ‘out’ and ‘in’. For example, while c-Src generally adopts the DFG-in conformation, the unfavorable DFG-out conformation can be induced by binding small molecules10. Loop flexibility can also play an important role in antibody-antigen recognition. The H3 loop in the complementarity-determining region of antibodies, which has the most diversity in sequence and is the most critical loop for antigen affinity and specificity, frequently demonstrates evidence of conformational flexibility11–13.
More broadly, there are many cases where loops adopt different conformations in different crystal structures, e.g., holo vs. apo, or even different crystal unit cells for the same protein14. Although the B-factors in crystal structures provide some information about conformational flexibility, each structure is best viewed as a snapshot from the equilibrium ensemble. NMR experiments can provide some direct information about conformational equilibria, but generally cannot provide complete information about the ensemble of interconverting structures.
Molecular dynamics (MD) has been widely used to study protein flexibility, including loop dynamics15, 16. The main liability of MD is that the timescales for interconverting between loop conformations can be long relative to the femtosecond time steps used, such as the millisecond timescale for the TIM capping loop to interconvert between the open and closed states17. Although such timescales may soon become accessible by MD simulation, they will remain extremely computationally expensive. Methods like replica exchange MD can be used to accelerate convergence but are likewise computationally expensive.
Here we describe a Monte Carlo method for generating ensembles of loop conformations and cyclic peptides. It is related to classes of loop prediction methods that use torsion-angle sampling of backbone and side chain degrees of freedom (DoF), which makes it possible to make large conformational moves that cross energy barriers. Specifically, it builds on loop prediction methods that exploit “inverse kinematics” methods for creating move sets that do not ‘break’ the loop18–24. The new contribution here is implementing these moves in a Monte Carlo scheme that also samples side chain DoF25. We apply the method to a number of proteins with flexible loops, including the well-known case of TIM. We also evaluate our ability to distinguish between (relatively) rigid and flexible loops within the same protein.
2 The Move Set: Torsional Perturbations via Inverse Kinematics
2.1 Torsions and Sterics
It is widely accepted that the essential dynamics of a protein backbone can be captured by moves involving only the torsions ϕ, ψ with the other internal variables (bond lengths, bond angles and ω torsions) being kept close to their canonical values, although not necessarily rigid19, 23, 24.
Compared to the high energy associated with ω angle deformation, ϕ and ψ angles are relatively free to rotate but their range is restricted by steric interactions. Ramachandran regions in the (ϕ, ψ) coordinates for each peptide ensure intra-peptide steric avoidance, and additional restrictions are imposed by more distant clashes. Clashes involving backbone atoms (or atoms bonded to them) are completely determined from the backbone angles. On the other hand, atoms further along sidechains (from the γ position out) are not completely determined from the backbone, although their placement may be restricted by it. Significantly, sidechains may interact with other sidechains so that their placement must be accomplished as a whole. Given a backbone conformation, a separate search is required to determine sterically acceptable or otherwise energetically viable sidechain conformations. Reciprocally, backbone moves may be restricted by fixed sidechain geometry.
2.2 MC move and state variables
To design a Monte Carlo move for reversibly exploring the torsion space, we must therefore consider the state space as the set of all torsions, {ti; χj} where the ti are backbone torsions and χj are sidechain torsions, with the indices running respectively over all the backbone and sidechain DoF. A chain of {N, Cα, C} triplets (a standard backbone) is one possibility but chains through e.g. cysteine bridges, or other macromolecules, such as nucleic acids, could also be considered. In the following, we will assume the standard case (protein backbone loops) exclusively. For the case of a loop of N residues bridging two fixed ends the essential backbone DoF would be M = 2N − 6. Here, 6 backbone DoF are involved in placing the end of the loop in a fixed rotation/translation relationship to the beginning. We call these DoF, labeled arbitrarily as ti, (i = 1, …, 6) the compensators. The remaining M DoF, labeled as ti, (i = 7, …, 2N) are the controls. This separation in controls and compensators is arbitrary and may change from one move to the next. We could assume that the end residues 0 and N + 1 act as hinges, i.e. the ϕ0 and ψN+1 torsions are fixed, but ψ0, ϕN+1 are free, adding two DoF to the backbone. The treatment is essentially the same, replacing M by M + 2 and redefining some indices. We will only discuss the first case (no hinge mobility). It will be assumed that there are K sidechain DoF in the set 𝖲 of sidechains interacting with the loop; we may only wish to include in 𝖲 those sidechains on the loop and hinges. The placement for those depends on the loop conformation. We may also include sidechains on residues in some sphere of influence about the loop. Or we may simply include all the sidechains in the protein. We make no distinction at this stage.
Then, to design a reversible MC move that involves only the loop backbone DoF as well as the selected group 𝖲 of sidechains coupled to the loop we must establish the Metropolis criterion for acceptance of a move of the form
(1) |
The shape space geometry accessible via our formulation characterizes our moves: assume that the L (= 2N) torsions for a loop kinematic chain are divided into the L − 6 controls and 6 compensators. The method used here employs the ϕ, ψ pairs of 3 amino acids (the pivots). These can be chosen at arbitrary locations along the loop, breaking it into three subfragments for kinematic purposes. To each value of the L − 6 controls there correspond up to 16 distinct conformations satisfying the closure conditions, each characterized by a unique set of values of the compensators. As discussed in our earlier work26, the 16 alternative solutions represent different orientations of the 3 subfragments between successive pivots in a reference frame attached to the 3 pivot Cα atoms about the 3 axes joining each pair of pivots. Thus we refer to the method as Triaxial Loop Closure (TLC). The basic idea in the TLC method (discussed more in detail in the next section) is to construct a loop with arbitrary internal degrees of freedom, taking advantage of the fact that the inverse kinematic problem can be solved by determining appropriate values of six torsions. Thus any variation in the remaining DoF’s — other torsions, including omegas, bond angles and even bond lengths — can be considered, if so desired. Here we treated only ϕ − ψ variations as these are the most ”flexible” DoF’s, but we could have included all other DoF’s in the MC scheme in any combination desired. The conformational variability of the constitutive pieces for loop closure, i.e. the three subfragments, is of course an important factor for solving the closure problem. We see that this variability can be decomposed in two types: the end-to-end variability of the individual fragments, and the inherent variability of the loop closure problem, i.e. relative locations and orientations of the ends of the loop as well as the environment in the loop vicinity.
The first is a direct problem: compute the fragment (in practice we do not check that the fragment is indeed sterically feasible until the assembly is successful). The individual fragment assembly, being subject to no end constraints, is only limited by the Ramachandran and other steric restrictions. However, for purposes of assembling the three subfragments into a self consistent loop, each individual fragment of length Li residues with i = 1, 2, 3, is encoded by 4 variables: the overall geometric length of the virtual bond joining first and last atoms, di; the angles θi, ξi made by the two end bonds to the virtual bond; and the torsion of the two end bonds about the virtual bond, δi. The variability of the closure problem is governed by these twelve parameters (di, θi, ξi, δi), i = 1, 2, 3. The equations expressing closure depend on these parameters smoothly; small changes cause usually small changes in the number and disposition of solutions except that, for certain arrangements, solutions could spontaneously appear or disappear (pairs of polynomial roots may join and become complex, or the converse, see the discussion of the Inverse Kinematic problem below).
We now search the nearby conformation space by perturbing one of the control torsions. This will result in perturbing the overall structure of one of the chains, leading to a perturbed set of solutions. These changes may lead to overall large motions, see e.g. Ref.27 for a discussion of the end conditions and their constraining of various inner DoF. However, a reasonable acceptance ratio for the method can be more or less guaranteed by varying the controls and restricting the stepsize. Below we discuss a two stage scheme, splitting the move into a pure backbone and a pure sidechain stage.
2.3 Solving the Inverse Kinematic problem
Many methods for finding solutions that satisfy the closure conditions have been proposed, both exact18, 22, 26, 28–32and approximate6, 21, 33–37. Exact methods address the Inverse Kinematic problem by searching for the values of a certain torsion, say τ, in terms of which all other torsions can be determined. Go and Scheraga18 pursued a direct solution in the original angle variables. This involves finding the zeros of a certain transcendental expression, a process that may require substantial computation to adequately resolve the entire domain. Subsequent works employ standard techniques from the robotics literature to convert to a more tractable polynomial form in the variable u = tan τ/2. All the real roots of this 16th degree polynomial can be found efficiently and stably by the use of the method of Sturm chains38. All other torsions can be recovered readily and therefore such methods are capable of finding all backbone solutions for any given combination of control torsion values. On the other hand, approximate methods typically use an iterative procedure to find a solution. As a result they are not guaranteed to find all solutions consistent with a given set of control values, and the same is true for the approach in Ref.18, which is also followed in Refs.20, 23, 24, although for this class of methods the issues are mainly related to the computational sensitivity of multiple roots.
In previous applications the conrot algorithm has been used20. It places the rotatable bonds on 6 consecutive bonds plus a driver. A generalization by Wu and Deem22 uses one driver on either end. A weakness of the conrot approach is that a change on either side of the short compensator segment may make the closure problem unsolvable24. A generalization from robotics removes that restriction29. Our own method for solving the tripeptide closure problem, explained in detail in Ref.26, has the advantage of mathematical simplicity, speed and robustness. It also allows for a straightforward generalization for longer chains of arbitrary geometry. Its simplicity comes from taking advantage of the natural pairing up of rotatable bonds in amino acids to reduce the closure problem to three rotations, and we refer to this as the TLC method26. Referring to Fig. 1(b) we note that each Cα, C, N, Cα unit is identified by four variables: the overall geometric length of the virtual bond joining first and last atoms, di; the angles θi, ξi made by the two end bonds to the virtual bond; and the torsion of the two end bonds about the virtual bond, δi (actually, the formulation uses the angles αi of the triangle formed with edges di). These definitions remain unchanged even if arbitrary structure exists between the two end pairs (Fig. 1(a)). We may produce multiple conformations for a long closed chain by partitioning into 3 subsegments and mapping each to a simple kinematic generalization of the tetrad Cα, C, N, Cα (Fig. 1(a)(b)).
In brief, three Cα atoms are selected (the pivots). The chain between any two of these, containing L atoms including the endpoints is determined to within a rotation/translation (i.e. in its own body frame) by its own internal coordinates: L − 3 torsions, L − 2 angles, L − 1 lengths. With fixed (to any prescribed value) bond-lengths and bond-angles, each chain can be completely described by its L − 3 internal torsions. Below, we will index the residues of the three pivots as 1, 2 and 3 and we will index their backbone atoms as Ni, Cαi and Ci, i = 1, 2, 3 accordingly. Below we use the atom names interchangeably with their cartesian coordinates, e.g. N1 can be thought of as equivalent to the vector R1, etc (see eq. 5).
As is explained in Ref.26 and somewhat more at length in Ref.39 (see also the supplementary material discussion in Ref.40), the three fragments, respectively between pivots 1–2, 2–3 and 3−1, form a triangle with edges di, i = 1, 2, 3. The parameters necessary for setting up and solving the TLC equations can be extracted from knowledge of only the first two and last two atoms of each chain (Fig. 2). Once the three 4-atom fragments have been assembled into a triangle, the relative rotation of each fragment about the triangle must place the end-atoms relative to those on each neighboring fragment so that the angles (NiCαiCi, i = 1, 2, 3 assume prescribed values (Fig. 1). In this way, loop closure is accomplished when an appropriate rotation for each piece has been found. It turns out that the problem overlays the solution of a 16th degree polynomial, so that to each real root there corresponds a possible backbone loop geometry (subject, of course, to overall steric viability) to a total of, at most, 16 solutions possible for a given collection of state variables, the control 2N − 6 torsions.
2.4 Jacobian
Since fixing the end of the chain (the Closure Conditions) implies relationships among the torsions, we seek solution of these relationships such that specifying M torsions along the loop leads to complete determination of all 2N torsions and unambiguous Cartesian coordinates for all loop backbone atoms that are sterically self-consistent. In general, for any feasible value of the controls there may exist multiple sets of compensators that allow the loop to close. They are functions of the controls and their values solve the loop closure problem.
As a result, the element of volume in torsion space, initially uniform in these variables
will need to be modified by
leading to the well known expression (e.g. see formula 23 in23) for the inverse of the above Jacobian:
Since
(2) |
this Jacobian can assume the simpler, 5 × 5 form
(3) |
Here
(4) |
and ei, i = 1, 2, 3 are the usual unit vectors along axes x, y, z of an arbitrary reference frame (the Lab frame). The atoms associated with closure are
(5) |
We note that the term Γ5 × R56 = 0 and was omitted. In the general case, the three pivot residues are indexed by 1 ≤ n1 < n2 < n3 ≤ N, and this reindexing will be implied where appropriate.
It is well known22 that the Jacobian in the form first proposed by Dodd et al.20 is incomplete, and lacks frame invariance. In a rigorous derivation of the Jacobian from the configuration integral, Wu and Deem22 show that the correct, frame invariant form is
(6) |
However, since the acceptance criterion involves ratios of Jacobians computed at the same frame, the additional factors cancel and the relative probabilities remain unchanged.
Although the latter form (6) is indeed invariant if all vectors are changed by an arbitrary affine transformation, it has the undesirable feature that it involves a projection to an arbitrary frame. Consequently, the factor Γ6 · e3 may accidentally vanish (in which case Ji will also vanish) necessitating a random reorientation of the frame to break the degeneracy. Thus it is desirable to eliminate this superuous dependence and derive a form that depends only on intrinsic (body frame) coordinates, for which invariance is easily seen. This can be accomplished by carrying out an expansion of this determinant in complementary minors; indeed, the top 3 rows are expressed in terms of intrinsic coordinates, while the last two involve projections to the space frame. We thus expand the determinant as
(7) |
where the indices (i, j, k, l) are a cyclic permutation of (1, 2, 3, 4).
Applying the well known identity (e.g.,41, Eq.(25), p.76)
(8) |
to the first of the 2 × 2 minors in Eq.(7) we have:
(9) |
The remaining 2 × 2 minors result in analogous expressions. Substituting these into Eq.(7) we have
(10) |
(as above, the indices (i, j, k, l) are a cyclic permutation of (1, 2, 3, 4)), which can be recombined to give the expression for the inverse Jacobian
(11) |
where we took advantage of the fact that Γi × Ri6 = Γi × Ri+1,6 with i = 1, 3 due to the fact that the axes Γi, Γi+1, i = 1 or 3 are coterminal. Fig.(1(a)) shows all quantities that enter in the Jacobian.
This 4 × 4 determinant is the frame invariant form of the inverse Jacobian for the TLC method. It has the advantage that it is expressed entirely in terms of body coordinates, and thus it is free from degeneracies and can be evaluated without projecting to an ad hoc coordinate system. It is numerically equivalent to the Wu and Deem form (6), when the latter is defined. The Jacobian (11) can be easily expressed in terms of the intrinsic parameters (di, θi, ξi, δi), i = 1, 2, 3 entering in the TLC algorithm42, a feature that it shares with reduced Jacobians derived by other authors22, 43. However such expressions lack the simplicity and geometrical appeal of (11).
2.5 Backbone Perturbation Procedure
The loop closure algorithm described in the previous section, while perfectly general, is currently implemented as a strategy for perturbing only the backbone coordinates. The sidechain coordinates perturbation procedure, as well as the strategy for combining these perturbations in a way such that detailed balance is maintained, will be outlined in the two sections. An important design feature of this approach is that the backbone and sidechain perturbations are generated independently.
An important feature of both the backbone selection probability and the sidechain selection probability is that they are reversible, or
(12) |
where t′ = t + δt is the trial move starting from the torsion state t and δt is the perturbation vector to the loop of interest. For the purposes of this work, we require the selection probability to be uniform to enforce equation 12. For this to be true, we need to establish the procedure which ensures that a uniform distribution of torsions over the entire loop can be generated.
The procedure for generating a trial move δt closely follows that of Ref.20, 22, 23, 29. Since the algorithm currently solves for 2N − 6 torsions, and we wish to have a procedure that is valid for loops of arbitrary length, we must select a subset of 2N − 6 torsions. There is some flexibility in how this could be done, but the present implementation is as follows (see Fig. 2):
From the designated loop torsions, a single torsion angle i is selected uniformly and identified as a driver angle coordinate (the yellow circle in Fig. 2), as has been described in previous work26.
For torsion ti, a random variate δti is generated, with a maximum value of up to π.
A randomly constructed triaxial closure is generated by randomly selecting 3 α carbons as pivots from the loop (excluding the α carbon on which the driver angle resides), and assigning the ϕ/ψ angles as the torsions (the grey triangle in Fig. 2).
A set of torsions for the stationary solution tk, k ∈ [1, K] is generated, resulting in up to K = 16 solutions. For this case, only the alternative sets of pivot coordinates are considered, with the driver angle held at ti. For each solution, a Jacobian term J(tk) is computed.
A set of torsions for the perturbed solution tl, l ∈ [1, L] is similarly generated, with associated J(tl) terms.
- A trial solution t′ is selected from the solutions (tk, tl) with the following probability:
To show that this procedure generates a uniform distribution, the phi/psi angles of an 11 residue polypeptide is sampled with no potential. Half of the time, the loop closure procedure is applied as described above, and the other half of the time, only a driver angle is perturbed uniformly, with the remaining Cartesians updated accordingly (with no closure condition enforced). The second procedure is required so that the full space of dihedral angles is accessible. Every move is accepted, with no potential applied or steric exclusion. This procedure generates a uniform distribution of torsions, as is shown in Fig. 3. It shows a distribution of an 11-residue peptide sampled with the loop closure procedure described above. Only backbone DoF are sampled, and no forcefield is applied. The endpoints are constrained to fixed positions. This control closely follows previous work20, 23. Fig. 3(a) shows the distribution of angles with no Jacobian selection term applied, and Fig. 3(b) shows the distribution with the reweighting term applied. The Jacobian term clearly improves the uniformity of the sampling.(13)
2.6 Sidechains
The efficient sampling of sidechains22 is important since sidechain conformations often determine the biological function of proteins. In the current work, the sidechain χ angles are not taken from the rotamer library due to their nonuniform distribution. Instead, to generate the sidechain trial moves, a single sidechain is randomly selected and each χ angle is perturbed by a value which is randomly and uniformly distributed in a defined domain [−d/2, d/2]25, 44. The polar hydrogens for the selected residue are sampled as well over the domain [−π, π].
To improve the sampling efficiency, no energy is computed for the states with steric clashes, which are defined based on the distances between heavy atoms. Specifically, a steric clash is defined when pairs of heavy atoms are closer than 0.7 times the sum of their Lennard-Jones radii. Rapid identification of steric clashes (using neighbor lists) avoids computationally expensive energy evaluations, for conformations that will result in very high energies and negligible acceptance probabilities.
The most expensive term in energy evaluation is the solvation energy in which the time consuming step is the computation of Born radii. Since the Born radii and the long range energy terms generally vary slowly for relatively small, local conformational changes, less frequent evaluation of these terms will contribute more to the sampling performance. For this purpose the Multiple Time-Step Monte Carlo sampling (MTSMC) procedure45 is incorporated in the present method, in a scheme based on that in Ref.44. The Born radii and the long range interactions are held fixed at the latent state of the original coordinates during the inner loop sampling, and only updated in every outer loop calculation. The final configuration from the inner loop is then taken to be a trial move and subjected to the MTSMC acceptance criterion (see Eq. 20 in Ref.25)
2.7 The POSH Monte Carlo Method
Both the TLC method for determining the backbone moves of loop residues and the sidechain sampling via perturbation have been incorporated in the POSH (port out, starboard home) Monte Carlo method introduced in a previous work25. The application of this method on small peptide systems has shown reasonable agreement with experiments25. In the present work, we are interested in its performance in more complicated protein systems with flexible loops.
Briefly, the move sets in this approach consist of two steps: an initial trial (1 → 2) move with large perturbation followed by a series of annealing moves consisting of smaller perturbation within the inner loop of length NI (2 → 3). The generalized Metropolis acceptance probability for this series of moves is given by:
(14) |
where p1 and p3 are the probabilities of being in the original and final trial state, respectively. T41 and T23 are transition probabilities. T23 is the normal forward transition probability, as would be given in the usual derivation of detailed balance, but T41 is a reverse transition probability that is constructed using an alternative reverse path through configuration space that is constructed by taking the final state (state 3) and subtracting the perturbation (1 → 2) from state 3 to arrive at state 4. Further details are given in Ref.25.
The trial moves are generated by a perturbation that uniformly varies over some domain [−d/2, d/2] with a different magnitude for the initial and annealing steps. In this work, for both types of trial moves, either backbone or sidechain is allowed to be perturbed with equal probability. For backbone perturbations, the ϕ or ψ dihedral angle can vary over the domain of [−2π, 2π] for initial steps and [−π/4, π/4] for annealing steps. For sidechain χ angles, the domain is [−π, π] and [−π/9, π/9], respectively, for the initial and inner step trial moves. The number of inner steps NI is set to 20 which was reported as the upper bound of inner steps for generating precise distribution. For all protein systems studied in this work, a mixture of 50% POSH and 50% standard MC sampling, followed by the MTSMC procedure, is used due to its better performance as studied in the previous work25.
3 Simulations
We applied the loop Monte Carlo method described above to several proteins with flexible loops. The first is the enzyme triosephosphate isomerase (TIM) which has been used as a model system for studying loop flexibility, primarily by NMR. This enzyme catalyzes the reversible isomerization of dihydroxy-acetone phosphate (DHAP) to D-glyceraldehyde 3-phosphate (GAP). The active site loop 6 (residues 167–176) undergoes conformational changes upon ligand binding, and is believed to be flexible in the absence of ligand binding, transitioning between ‘open’ and ‘closed’ states. To assess the capability of our method to capture the dynamical properties of this flexible loop, three sets of simulations were performed. The first one started from the apo yeast TIM (PDB ID 1YPI) with open loop conformation (we call this SIM1), and the second started from the 2-phosphoglycolate (PGA)-bound TIM (PDB ID 2YPI) with closed loop conformation (SIM2), and the third is the same as the second except that the ligand PGA was removed from the initial structure (SIM3).
The titratable residues in the starting structures were predicted according to the experimental conditions. Specifically, in all simulations, His95 was treated as neutral, and protonated on the Nε2. Glu165 is protonated in SIM2 in order to maintain the strong interaction with ligand PGA9, but was unprotonated in the other simulations. Residues within 8 Å of the active site loop were included for the side chain sampling and the flexible loop was extended to include residues 165–178 in the simulations for both the backbone and side chain sampling. The force field OPLS-AA46, 47 was used for the protein TIM and ligand PGA except that the partial charges for the phosphate group of PGA were adjusted based on the previous work by Wong et al.48 The Surface Generalized Born (SGB)49, 50 model was used for implicit solvent with the treatment of nonpolar term50. To prevent the sampling from being trapped in local minima, all simulations were performed at the temperature of 600K. Each simulation has a length of No = 2 × 105 up to 5 × 105 outer steps. Data analyses were performed over the equilibrium simulations (roughly after 105 outer steps) during which the potential energy is relatively stable.
The same protocol was also applied to other protein systems which have been studied by NMR experiments, specifically those with PDB ID 1H2O, 1XWE, and 1Q9P. By choosing NMR structures, we eliminate any concerns about crystal packing influencing the loop conformation or flexibility. These specific proteins were chosen because each has two loops consisting of 5–8 residues, one of which has multiple conformations with large variation among the various NMR models (flexible loop) and the other has a narrow range of loop conformations among the NMR models (rigid loop). Both the flexible and rigid loops were simulated using the same sampling protocol and the same parameter settings in order to compare with the experimental data since both loops within the same protein were measured in the same experimental conditions. The titratable residues in the starting structures were protonated at the experimental pH = 7.0 for 1H2O, 6.0 for 1XWE and 5.8 for 1Q9P. The flexible loops consist of residues 59–64 for 1H2O, 1609–1616 for 1XWE and 48–53 for 1Q9P; the residues in the rigid loops are 46–51 for 1H2O, 1536–1540 for 1XWE and 78–82 for 1Q9P.
4 Results and Discussion
As an initial illustration of the utility of our loop MC method for sampling conformation space of protein loops, we applied this method to the well-studied enzyme triosephosphate isomerase (TIM). The active site loop undergoes large-scale motions interconverting between open and closed conformations. This conformational transition occurs on the time scale of milliseconds17, making it a challenge for molecular dynamics simulations in previous studies51, 52.
In the current work, multiple transitions between open and closed loop conformation of yeast TIM have been observed in the simulation of the apo protein, but only at 600 K (vide infra). Figure 4 (a) and (c), which start from the open and closed state, respectively, show sampled loop conformations from the equilibrium ensemble, spanning both open and closed form. In the simulation with the ligand PGA bound, the active site loop stays in the closed conformation, as can be seen in Fig. 4(b). These results agree qualitatively with NMR experiments which found that the loop samples open and closed conformations whether or not a ligand was bound, but that ligand binding shifted the equilibrium strongly towards the closed conformation17, 53. Upon PGA binding, the carboxylate of the ligand protonates residue Glu165 making it hydrogen bonded with PGA instead of with Ser96 in the apo structure, such that the closed loop conformation is preferred in the presence of ligand.
It has been known that the active site loop of TIM moves largely as a rigid unit51, 54. Figure 5 shows that the backbone dihedral angles of the flexible loop in the X-ray structure of apo TIM are very similar to those in the structure of ligand-bound TIM. The ensembles generated by the loop MC method largely agree with the experimental data in this regard. We calculated the backbone ϕ and ψ angles and averaged them over the equilibrium ensemble for each of the three simulations. For the holo simulations, the ensemble averaged ϕ and ψ angles agree well with those measured in the X-ray structures, as shown in Fig. 5 (a) and (b) (blue lines). Similar agreement was also found for the apo simulations started from both the open and closed conformation, except that residues 170–173 have relatively large deviation and fluctuation, which is consistent with the findings in previous simulation studies17, 52 (red and green lines in Fig. 5 (a) and (b)).
NMR spectroscopy can provide information on both the structure and dynamics of proteins in physiologically relevant environments55. The chemical shift is NMR’s most ubiquitous parameter, the variation of nuclear magnetic resonance frequencies of the same kind of nucleus due to variations in the electron distribution. To directly compare with the experimental data, ensemble averaged chemical shifts were calculated for each equilibrium ensemble by using SHIFTX56 to calculate chemical shifts for the residues of the flexible loop in each conformation and then averaging over all the conformations in the ensemble. For the apo simulations, starting from either the open or closed structures, the ensemble-averaged chemical shifts were compared with NMR measurements of apo yeast TIM57. For the simulation of the ligand-bound, closed structure, NMR data measured for G3P-bound yeast TIM57 were used. [The chemical shifts for the closed loop of the enzyme bounded with G3P and GPA are very similar (Yimin Xu, personal communication)]. A strong linear correlation was found between the ensemble-averaged and experimentally measured chemical shifts for Cα (Fig. 6 (a)) and Cβ (Fig. 6 (b)) atoms with the correlation coefficient r of 0.98 or higher in all cases. For carbonyl C and amide N atoms of the flexible loop, although there are fewer experimental chemical shifts available, the calculated ensemble averages have small variations from experimental values (Fig. 6 (c) and (d)). The agreement with the NMR chemical shifts provides additional evidence that the ensembles generated by the loop MC sampling are reasonable.
We note that the experimental chemical shifts were measured at 300 K, while our simulations were performed at 600 K. This is because at 300 K it is difficult to observe the conformational transitions between the open and the closed state. We suspect, but cannot prove, that this occurs in part due to 1) the well-known tendency of generalized Born implicit solvent models to over-stabilize salt-bridges, 2) the effect of constraining the omega angles, as well as the bond angles and lengths, in addition to the loop closure condition, and 3) sampling only the loop and not the remainder of the protein. Using a higher temperature overcomes all of these effects and reasonable ensembles are generated which agree with the NMR chemical shifts. Because the Monte Carlo sampling scheme does not perturb degrees of freedom outside the loop, such that the overall structure is preserved, a higher temperature sampling protocol can still provide physical insights. The efficiency gained by sampling a lower dimensional space, while still obtaining a reasonable estimate of ensemble properties, motivates the use of this set of approximations.
As a second initial application, we also applied our sampling method to other protein structures, solved by NMR, which have loops with differing flexibility in order to evaluate our ability to distinguish the flexible and rigid loops within the same protein. The conformational ensembles from equilibrium simulations for both the flexible and rigid loops are shown in Fig. 7 for three proteins with PDB ID 1H2O (a), 1XWE (b), and 1Q9P (c) sampled at 600 K (left) and 300 K (right). These results clearly show that the loop residues which are flexible in the experimentally derived structures consistently are more floppy in the sampled ensemble at either temperature than the loop residues which are relatively rigid in the same NMR structures. To further quantify these results, root mean square fluctuation (RMSF) of the heavy atoms in both loops were calculated for the sampled and NMR models as shown in Table 1. We recognize that the set of NMR models for each protein cannot be viewed as a true ensemble, but the qualitative agreement is nonetheless encouraging. Thus, for studying protein loop flexibility, our method is a viable alternative to molecular dynamics simulations, which have also been used successfully to obtain ensembles in quantitative agreement with NMR data. In the cases examined here, the differences in rigidity appear to be related simply to the level of solvent exposure, i.e., floppy loops are more solvent exposed and have less interaction with their neighbors. For simulations of all studied protein systems, three NMR targets and TIM, the average acceptance ratio is about 14%.
Table 1.
Heavy atom RMSF | 1H2O | 1Q9P | 1XWE | ||||||
---|---|---|---|---|---|---|---|---|---|
NMR | POSH | NMR | POSH | NMR | POSH | ||||
600 K | 300 K | 600 K | 300 K | 600 K | 300 K | ||||
Flexible loop | 2.75 | 1.64 | 0.75 | 3.51 | 1.27 | 1.06 | 4.83 | 2.50 | 1.10 |
Rigid loop | 0.49 | 0.40 | 0.15 | 1.25 | 0.38 | 0.18 | 1.03 | 0.50 | 0.31 |
Our current approach only varies phi-psi angles as they are most flexible, but actually it is possible to include all other DoF in the MC scheme in any desired combination. We are working on a further version of the algorithm that will incorporate sampling which allows omega angles, as well as bond lengths and angles, to fluctuate more freely, which may allow for lower temperature sampling of systems of this type. Although in the present study we have considered solvation effects implicitly only, including water molecules explicitly in the simulation is possible in principle. However, water molecules in the immediate vicinity of a loop would lead to steric clashes whenever a large backbone move is attempted, which would reduce the efficiency of the present approach.
Acknowledgments
This work was supported in part by grants from NIH-NIGMS, GM081710 (to MPJ and EAC), GM086602 (to MPJ), and R01-GM090205 (EAC). MPJ is a consultant to Schrodinger LLC.
References
- 1.Jones D. Curr Opin Struct Biol. 1997;7:377. doi: 10.1016/s0959-440x(97)80055-3. [DOI] [PubMed] [Google Scholar]
- 2.Fiser A, Do R, Sali A. Protein Sci. 2000;9:1753. doi: 10.1110/ps.9.9.1753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Al-Lazikani B, Jung J, Xiang Z, Honig B. Curr Opin Struct Biol. 2001;5:51. doi: 10.1016/s1367-5931(00)00164-2. [DOI] [PubMed] [Google Scholar]
- 4.Jacobson MP, Pincus D, Rapp C, Day T, Honig B, Shaw D, Friesner R. Proteins. 2004;55:351. doi: 10.1002/prot.10613. [DOI] [PubMed] [Google Scholar]
- 5.Meirovich H. Chem Phys Lett. 1977;45:389. [Google Scholar]
- 6.Baysal C, Meirovich H. J. Phys. Chem. A. 1997;101:2185. [Google Scholar]
- 7.Mihailescu M, Meirovitch H. J. Phys. Chem. B. 2009;113:7950. doi: 10.1021/jp900308y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lolis E, Abler T, Davenport R, Rose D, Hartman F, Petsko G. Biochemistry. 1990;29:6609. doi: 10.1021/bi00480a009. [DOI] [PubMed] [Google Scholar]
- 9.Lolis E, Petsko G. Biochemistry. 1990;29:6619. doi: 10.1021/bi00480a010. [DOI] [PubMed] [Google Scholar]
- 10.Dar A, Lopez M, Shokat K. Chemistry and Biology. 2008;15:1015. doi: 10.1016/j.chembiol.2008.09.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Padlan E. Adv. Protein Chem. 1996;49:57. doi: 10.1016/s0065-3233(08)60488-x. [DOI] [PubMed] [Google Scholar]
- 12.Xu J, Davis M. Immunity. 2000;13:37. doi: 10.1016/s1074-7613(00)00006-6. [DOI] [PubMed] [Google Scholar]
- 13.Wong S, Jacobson MP. Proteins. 2010 in press. [Google Scholar]
- 14.Rapp C, Pollack R. Proteins. 2005;60:103. doi: 10.1002/prot.20492. [DOI] [PubMed] [Google Scholar]
- 15.Wong S, Jacobson MP. Proteins. 2008;71:153. doi: 10.1002/prot.21666. [DOI] [PubMed] [Google Scholar]
- 16.Yi M, Tjong H, Zhou H. Proc. Natl. Acad. Sci. USA. 2008;105:8280. doi: 10.1073/pnas.0710530105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Massi F, Wang C, Palmer AG. Biochemistry. 2006;45:10787. doi: 10.1021/bi060764c. [DOI] [PubMed] [Google Scholar]
- 18.Go N, Scheraga H. Macromolecules. 1970;3:178. [Google Scholar]
- 19.Bruccoleri RE, Karplus M. Macromolecules. 1985;18:2767. [Google Scholar]
- 20.Dodd LR, Boone TD, Theodorou DN. Mol Phys. 1993;78:961. [Google Scholar]
- 21.Deem M, Bader J. Mol.Phys. 1996;87:1245. [Google Scholar]
- 22.Wu MG, Deem MW. Mol. Phys. 1999;97:559. [Google Scholar]
- 23.Dinner A. J Comput Chem. 2000;21:1132. [Google Scholar]
- 24.Ulmschneider JP, Jorgensen WL. J Chem Phys. 2003;118:4261. [Google Scholar]
- 25.Nilmeier J, Jacobson MP. J. Chem. Theory Comput. 2009;5:1968. doi: 10.1021/ct8005166. [DOI] [PubMed] [Google Scholar]
- 26.Coutsias EA, Seok CL, Jacobson MP, Dill KA. J Comput Chem. 2004;25:510. doi: 10.1002/jcc.10416. [DOI] [PubMed] [Google Scholar]
- 27.Hayward S, Kitao A. Biophysical Journal. 2010;98:1976. doi: 10.1016/j.bpj.2010.01.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Wedemeyer WJ, Scheraga HA. J Comp Chem. 1999;20:819. doi: 10.1002/(SICI)1096-987X(199906)20:8<819::AID-JCC8>3.0.CO;2-Y. [DOI] [PubMed] [Google Scholar]
- 29.Wu MG, Deem MW. J. Chem. Phys. 1999;111:6625. [Google Scholar]
- 30.Cortes J, Simeon T, Remaud-Simeon M, Tran V. J Comput Chem. 2004;25:956. doi: 10.1002/jcc.20021. [DOI] [PubMed] [Google Scholar]
- 31.Noonan K, O’Brien D, Snoeyink J. Int J Robotics Res. 2005;24:971. [Google Scholar]
- 32.Milgram R, Liu G, Latombe J. J Comput Chem. 2008;29:50. doi: 10.1002/jcc.20755. [DOI] [PubMed] [Google Scholar]
- 33.Favrin G, Irbäck A, Sjunnesson F. J Chem Phys. 2001;114:8154. [Google Scholar]
- 34.Wang L-CT, Chen CC. IEEE TRANS ROBOT. AUTOM. 1991;7:489. [Google Scholar]
- 35.Cahill S, Cahill M, Cahill K. J Comp Chem. 2003;24:1364. doi: 10.1002/jcc.10245. [DOI] [PubMed] [Google Scholar]
- 36.Canutescu A, Dunbrack R. Protein Sci. 2003;12:963. doi: 10.1110/ps.0242703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Lee A, Streinu I, Brock O. Phys Biol. 2005;2:108. doi: 10.1088/1478-3975/2/4/S05. [DOI] [PubMed] [Google Scholar]
- 38.Stoer J, Bulirsch R. Numerical Analysis. Second ed. Berlin: Springer; 1991. [Google Scholar]
- 39.Coutsias EA, Seok C, Wester MJ, Dill KA. Int J. Quant. Comp. 2006;106:176. [Google Scholar]
- 40.Mandell DJ, Coutsias EA, Kortemme T. Nature Methods. 2009;6:551. doi: 10.1038/nmeth0809-551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Gibbs JW, Wilson EB. Vector Analysis. First ed. New Haven: Yale University Press; 1901. [Google Scholar]
- 42.Pollock SN, Coutsias EA. Numerical Analysis of Inverse Kinematic Algorithms. 2011 preprint. [Google Scholar]
- 43.Hoffman D, Knapp E-W. European Biophysical Journal. 1996;24:387. [Google Scholar]
- 44.Nilmeier J, Jacobson MP. J Chem Theory Comput. 2008;4:835. doi: 10.1021/ct700334a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Hetenyi B, Bernacki K, Berne B. J. Chem. Phys. 2002;117:8203. doi: 10.1063/1.1755195. [DOI] [PubMed] [Google Scholar]
- 46.Kaminski GA, Friesner RA, Tirado-Rives J, Jorgensen WL. J. Phys. Chem. B. 2001;105:6474. [Google Scholar]
- 47.Jorgensen W, Maxwell D, Tirado-Rives J. J. Am. Chem. Soc. 1996;118:11225. [Google Scholar]
- 48.Wong S, Bernacki K, Jacobson MP. J. Phys. Chem. B. 2005;109:5249. doi: 10.1021/jp046333q. [DOI] [PubMed] [Google Scholar]
- 49.Ghosh A, Rapp C, Friesner R. J Phys Chem B. 1998;102:10983. [Google Scholar]
- 50.Gallicchio E, Zhang L, Levy R. J Comput Chem. 2002;23:517. doi: 10.1002/jcc.10045. [DOI] [PubMed] [Google Scholar]
- 51.Joseph D, Petsko G, Karplus M. Science. 1990;249:1425. doi: 10.1126/science.2402636. [DOI] [PubMed] [Google Scholar]
- 52.Derreumaux P, Schlick T. Biophys. J. 1998;74:72. doi: 10.1016/S0006-3495(98)77768-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Williams JC, McDermott AE. Biochemistry. 1995;34:8309. doi: 10.1021/bi00026a012. [DOI] [PubMed] [Google Scholar]
- 54.Davenport R, Bash P, Seaton B, Karplus M, Petsko G, Ringe D. Biochemistry. 1991;30:5821. doi: 10.1021/bi00238a002. [DOI] [PubMed] [Google Scholar]
- 55.Teng Q. Protein Structure Determination from NMR Data. In: Lee W, editor. Structural Biology: Practical NMR Applications. First ed. Berlin: Springer; 2005. [Google Scholar]
- 56.Neal S, Nip A, Zhang H, Wishart D. J Biomol NMR. 2003;3:215. doi: 10.1023/a:1023812930288. [DOI] [PubMed] [Google Scholar]
- 57.Xu Y, Lorieau J, McDermott AE. J. Mol. Biol. 2010;397:233. doi: 10.1016/j.jmb.2009.10.043. [DOI] [PMC free article] [PubMed] [Google Scholar]