Abstract
The self-guided Langevin dynamics (SGLD) is a method to accelerate conformational searching. This method is unique in the way that it selectively enhances and suppresses molecular motions based on their frequency to accelerate conformational searching without modifying energy surfaces or raising temperatures. It has been applied to studies of many long time scale events, such as protein folding. Recent progress in the understanding of the conformational distribution in SGLD simulations makes SGLD also an accurate method for quantitative studies. The SGLD partition function provides a way to convert the SGLD conformational distribution to the canonical ensemble distribution and to calculate ensemble average properties through reweighting. Based on the SGLD partition function, this work presents a force-momentum-based self-guided Langevin dynamics (SGLDfp) simulation method to directly sample the canonical ensemble. This method includes interaction forces in its guiding force to compensate the perturbation caused by the momentum-based guiding force so that it can approximately sample the canonical ensemble. Using several example systems, we demonstrate that SGLDfp simulations can approximately maintain the canonical ensemble distribution and significantly accelerate conformational searching. With optimal parameters, SGLDfp and SGLD simulations can cross energy barriers of more than 15 kT and 20 kT, respectively, at similar rates for LD simulations to cross energy barriers of 10 kT. The SGLDfp method is size extensive and works well for large systems. For studies where preserving accessible conformational space is critical, such as free energy calculations and protein folding studies, SGLDfp is an efficient approach to search and sample the conformational space.
INTRODUCTION
The self-guided molecular dynamics (SGMD) (Refs. 1 and 2) and the self-guided Langevin dynamics (SGLD) (Ref. 3) simulation methods were developed for efficient conformational searching and have found many applications for the study of rare events such as protein folding,4, 5, 6, 7, 8, 9, 10, 11, 12 ligand binding,13 docking,14 conformational transition,15, 16, 17 and surface absorption.18, 19, 20, 21 While SGMD/SGLD can accelerate slow events to an affordable time scale, the perturbation in conformational distribution due to the self-guiding force was not quantitatively understood until recently.22 A common practice for SGMD or SGLD simulations is to limit the guiding factor to a small range so that the effect on conformational distribution is very small and can be neglected.3
For quantitative studies, it is important to obtain correct conformational distributions while accelerating conformational searching. Andricioaei et al. proposed a Monte Carlo procedure called the momentum-enhanced hybrid Monte Carlo method to include the benefit of the guiding force while preserving the ensemble average properties.23 In molecular dynamics simulations, the difficulty in characterizing the effect of the guiding force on ensemble distributions is mainly due to the lack of quantitative definition for the low-frequency motion enhanced in SGLD simulations. Recently, a quantitative understanding of the guiding effect on the conformational distribution has been achieved.22 The low-frequency properties are defined with the local averaging mechanism used for the guiding force calculation. The partition function of the SGLD ensemble is derived based on the separation of the low-frequency properties and the high-frequency properties. Based on the SGLD partition function, the conformational distribution obtained in SGLD simulations can be converted to the canonical ensemble distribution, and ensemble average properties can be calculated from the SGLD simulations through reweighting.
Based on the SGLD partition function, we find that the perturbation due to the momentum based guiding force can be corrected with the force-based guiding force previously used in the SGMD simulation method.1, 2 Therefore, by combining the force-based guiding force with the momentum-based guiding force, this work developed a simulation method to achieve rapid conformational sampling while maintaining approximately the canonical ensemble. For the convenience of reference, we designate this simulation method as the force-momentum-based self-guided Langevin dynamics, abbreviated as SGLDfp. The details of the derivation are described in the Theory and Method section. Some demonstrations and applications of this method are provided in the Results and Discussions section.
THEORY AND METHOD
Most of simulation methods improve conformational searching and sampling through either modifying energy surfaces or raising temperatures so that energy barriers can be crossed easily. The SGLD method is unique in the way that it selectively enhances and suppresses molecular motions based on their frequency to accelerate conformational searching without modifying energy surfaces or raising temperatures. The concept of the low-frequency and high-frequency properties plays a central role in understanding the SGLD method. Even though this work is to present the SGLDfp method, we feel it is necessary to provide a brief description of this concept and how it lead to the SGLD partition function.22
The low-frequency and high-frequency properties
Thermal motion in a molecular system makes all dynamic behaviors possible, such as diffusion, protein folding, and signal transduction. Thermal motion has a distribution of frequencies, from high-frequency motions such as chemical bond vibrations to low-frequency motions such as protein folding. High-frequency motions can repeat in a short time scale and are often easy to study in molecular simulations. Low-frequency motions are important for many macroscopic behaviors such as protein folding and ligand binding, but are often beyond the time scale accessible by molecular simulations with current computing resources.
Corresponding to the low-frequency motion and the high-frequency motion there are the low-frequency conformational space, Ωlf, and the high-frequency conformational space, Ωhf. The total conformational space is a combination of the two: Ω = Ωlf · Ωhf. A simple example is that in a box of water, water molecules diffuse around the box while their bonds stretch and bend quickly. The diffusion motion occurs in the box space while the bond vibrations cover the bond length and bond angle ranges. It is reasonable to separately consider the slow diffusion and the fast vibrations and in many practices, the fast vibrations are completely removed with constraint methods such as SHAKE (Ref. 24) or the semi-flexible constraint dynamics.25
Assume the potential energy surfaces in the low- and high-frequency conformational spaces are Elf and Ehf, respectively. If we assume the temperatures of the low- and high-frequency motions are Tlf and Thf, which normally equal to the system temperature, T, we can write the partition functions in the low- and high-frequency conformational spaces separately:
(1) |
(2) |
Here, k is the Boltzmann constant. By neglecting the coupling between the two conformational distributions, the total partition function can be expressed as
(3) |
In the canonical ensemble, temperatures in all conformational spaces are the same, Tlf = Thf = T, and the total potential energy of the system is a sum of the two, Ep = Elf + Ehf. Therefore, the partition function from Eq. 3 becomes
(4) |
A slow conformational searching in molecular simulation is often related to a weak low-frequency motion. The central idea behind the SGMD/SGLD simulation method is to accelerate conformational searching through enhancing the low-frequency motion. In other words, in SGMD/SGLD simulations, temperatures in the low-frequency and high-frequency conformational space are no longer the same. Unlike high temperature simulations which accelerate motions in all frequencies, SGMD/SGLD simulations increase the temperature of the low-frequency motion, Tlf, and reduce the temperature of the high-frequency motion, Thf, to achieve efficient conformational searching and sampling. The overall temperature in SGMD/SGLD simulations remains unchanged.
To quantitatively describe the conformational distribution of a SGLD simulation, we proposed previously to define a low-frequency property by a method we designated as the local averaging.22 Let us consider a system of N particles described by their positions, qi, and momenta, pi. For any conformation, Ω = {qi}, we define L as its local conformational space where all conformations are within a certain distance from Ω. For a canonical ensemble at temperature T, a local average is an ensemble average over the local conformational space
(5) |
Here, P represents any conformational property. Ep is the potential energy at conformation Ω. is called the local partition function. As shown in Eq. 5, for computational feasibility, this local ensemble average is approximately calculated as a time average over a local average time, tL, which is further approximately calculated as an evolving average. This evolving average is denoted with a “∼” cap: . Because all local averages in this work are calculated as the evolving averages, we also use “〈P〉L” to represent evolving averages when the cap “∼” is not easy to print.
Through the local averaging, high-frequency components are suppressed and low-frequency components remain. Therefore, represents the low-frequency portion of property P. Correspondingly, the high-frequency property is the deviation from the low-frequency property, .
With the evolving averaging, many low-frequency properties can be obtained efficiently in a molecular simulation, for example, low-frequency forces
(6) |
low-frequency momenta
(7) |
and low-frequency potential energies
(8) |
We can calculate some derived low-frequency quantities from these low-frequency properties, such as the low-frequency temperature,
(9) |
Here, NDF is the number of degrees of freedom in all conformational spaces. mi is the mass of particle i and the summation runs over all atoms in the system. The low-frequency temperature, , measures the kinetic energy of the low-frequency motion in the temperature unit, K. Similarly, measures the kinetic energy of the high-frequency motion. Please note that is not Tlf. is calculated with the total number of degrees of freedom, NDF, while Tlf should be calculated with only the number of degrees of freedom in the low-frequency space. A ratio of to T reflects the ratio of the kinetic energy of the low-frequency motion to the total kinetic energy.
The self-guided Langevin dynamics
The equation of motion for the self-guiding Langevin dynamics3 has the following form:
(10) |
where and fi are the time derivative of momentum and the interaction force of particle i, respectively. Ri is a random force, which is related to mass, mi, the collision frequency, γi, and simulation temperature, T, by the following equation:
(11) |
where . The random force is independent in each direction.
In Eq. 10, gi is called the guiding force. When gi = 0, Eq. 10 reduces to the equation of motion for the Langevin dynamics. In SGLD simulations, gi is calculated from the momentum
(12) |
Here, we use to represent the momentum-based guiding force to be distinct from the force-based guiding force, , which will be introduced later. The parameter, , is the momentum guiding factor. A subscript “i” indicates that can be set differently for different atoms. The parameter, ξp, is an energy conservation factor used to cancel any energy input from the guiding force
(13) |
Here, the summation runs over all particles in a simulation system. From Eq. 13, we have
(14) |
The guiding forces defined by Eqs. 12, 13 produce no net work to a simulation system. They act in a direction orthogonal to the velocities. The term accelerates the low-frequency motion, and the term dumps the high-frequency motion. Overall, the thermal temperature is maintained.
Conformational distribution in SGLD
In SGLD simulations, the guiding force has two effects in both the low-frequency conformational space and the high-frequency conformational space. First, it may correlate with the energy surface to have certain bias effects. The effective energy surfaces can be expressed as Elf = λlf〈Ep〉L and Ehf = λhf(Ep − 〈Ep〉L). The factors, λlf and λhf, are called the low-frequency energy factor and the high-frequency energy factor, respectively. Second, the guiding force enhances motions in the low-frequency conformational space and reduces motions in the high-frequency conformational space. These effects cause changes in the effective temperatures in the two conformational spaces: Tlf = T/χlf and Thf = T/χhf. The factors,χlfand χhf, are called the low-frequency collision factor and the high-frequency collision factor, respectively. Therefore, according to Eq. 3, the configurational partition function of a SGLD simulation has the following form:
(15) |
Here, the summation is over all microscopic states. For computational feasibility, the local average potential energy is approximated with the evolving average calculated with Eq. 8. This approximation makes it computationally feasible to estimate conformational distributions in the SGLD simulations.
The factors, λlf, λhf, χlf, and χhf, are related to the guiding force and the simulation systems. Their direct relations with simulation conditions are difficult to derive. Instead, based on their definitions, we can calculate them from quantities observable in the SGLD simulations. Through the local averaging, the equation of motion (Eq. 10) can be separated into that in the low-frequency conformational space,
(16) |
and that in the high-frequency conformational space,
(17) |
These two equations correspond to motions on energy surfaces Elf and Ehf, respectively. Based on the force-energy relation, we have
(18) |
(19) |
To achieve numerical stability, we multiply both sides of Eq. 18 by and both sides of Eq. 19 by . After summing over all particles and all simulation conformations, we obtain the following expressions:
(20) |
(21) |
From Eqs. 20, 21, we can see that λlf represents the average projections of the total low-frequency force onto the low-frequency interaction forces and λhf represents the average projections of the total high-frequency force onto the high-frequency interaction forces.
The factors, χlf and χhf, describe the temperature change in low- and high-frequency conformational spaces
(22) |
(23) |
Here, we assume the temperatures in the low-frequency and high-frequency conformational spaces are proportional to the low-frequency and high-frequency temperatures. is the low-frequency temperature when , i.e., in a LD simulation. We call the reference low-frequency temperature. To calculate χlf and χhf with above equations, one need to know from a separate SGLD simulation with which is actually a LD simulation. Based on the definition, we know depends on the simulation conditions and the local average time, tL.
Alternatively, we can calculate χlf and χhf directly from a SGLD simulation. From Eq. 11, we know that the temperature is inversely proportional to the collision frequency, if the random forces remain the same. Therefore, a change in temperature indicates a change in the effective collision frequency. For the low-frequency motion, a change in temperature from T to Tlf = T/χlf corresponds to a change in the collision frequency from γi to χlfγi. Similarly, for the high-frequency motion, a change in temperature from T to Thf = T/χhf corresponds to a change in the collision frequency from γi to χhfγi. The guiding forces in the equations of motion (Eqs. 16, 17) behave like a modification in the collision frequency
(24) |
(25) |
To achieve numerical stability, we multiply both sides of Eq. 24 by and both sides of Eq. 25 by . After summing over all particles and all simulation conformations, we obtain the following expressions:
(26) |
(27) |
It is more accurate to estimate χlf and χhf from and using Eqs. 22, 23, if is available from a separate SGLD simulation with . Otherwise, they can be estimated from Eqs. 26, 27 without knowing .
From the SGLD partition function (Eq. 15), we can calculate the weighting factor, wSGLD, for each conformation
(28) |
Any canonical ensemble average, 〈P〉, can be calculated in a SGLD simulation as a weighted average
(29) |
In SGLD simulations, the thermal motions in the low-frequency and high-frequency conformational spaces are scaled by χlf and χhf, respectively. The conformational searching ability increases with the thermal temperature in the low-frequency conformational space. Because SGLD simulations do not change the simulation temperature, the thermal motion in the high-frequency conformational space decreases as the low-frequency conformational space increases. Therefore, the conformational searching ability can approximately be expressed as inversely proportional to χlf and directly proportional to χhf. To describe the conformational searching ability of a SGLD simulation, we define the self-guiding temperature as the following form:
(30) |
The self-guiding temperature, Tsg, provides a qualitative measure of the conformational searching ability in the unit of temperature. An SGLD simulation with a self-guiding temperature of Tsg has a conformational searching ability comparable to that in a high temperature simulation at T = Tsg. As can be seen from Eq. 30, for a LD simulation, based on the definition, , we have Tsg = T. For an SGLD simulation with λ > 0, we have an increased low-frequency motion, ; therefore, Tsg > T. And with λ < 0, we have ; thus, Tsg < T. Tsg can be used as a guidance to choose the value of λ, which lacks physical meaning. For example, it is reasonable to choose a value of λ to reach Tsg = 2T. However, when λ is large and Tsg is too large as compared to T, it is difficult to obtain an accurate canonical ensemble through the reweighting with Eqs. 28, 29. Therefore, λ should be chosen to balance the acceleration in conformational searching and the accuracy in conformational reweighting.
Force-momentum-based self-guided Langevin dynamics
The SGMD method utilizes the local average forces, while the SGLD method uses the local average momentum to calculate the guiding force to achieve accelerated conformational searching. These two types of guiding forces have opposite bias effects on the low-frequency energy surface. The low-frequency force favors low states, just as normal forces do, while the low-frequency momentum favors high states, just as temperature does. These two types of low-frequency properties can be combined in such a way that the bias effects cancel each other. We call this kind of SGLD simulation as the SGLDfp.
In SGLDfp, the guiding force, gi, is a combination of forces and momentums in the following form:
(31) |
Here, λf is defined as the force guiding factor and ξf is defined as the force damping factor. The energy conservation factor, ξp, is calculated according to Eq. 13 with the guiding force defined by Eq. 31
(32) |
According to Eqs. 20, 21, the parameters λlf and λhf have the following relation to λf and ξf:
(33) |
(34) |
Here, , , and . These quantities can be estimated from simulations.
We can adjust the values of λf and ξf to make
(35) |
and
(36) |
Equation 35 will result in a canonical-like conformational distribution in the low-frequency space, and Eq. 36 will result in a canonical-like conformational distribution in the high-frequency space. When both Eqs. 35, 36 are satisfied, the partition function reduces to that of the canonical ensemble
(37) |
From Eqs. 33, 34, we can solve
(38) |
(39) |
This solution is approximate because , , χlf, and χhf also depend on λf and ξf. In many cases, Eqs. 38, 39 can further be simplified with approximations κlf ≈ 1 and κhf ≈ 0.
By using λf and ξf calculated from Eqs. 38, 39, we can obtain a conformational distribution similar to that of the canonical ensemble as shown by Eq. 37. Therefore, ensemble average properties can be directly estimated from a SGLDfp simulation
Four parameters, , ξp, λf, ξf, are used to define the guiding force in Eq. 31. To maintain a canonical ensemble, only one parameter, e.g., , is independent while the others depend on it. If we are interested in maintaining a canonical distribution only in the low-frequency conformational space, in addition to , we can adjust either λfor ξf to satisfy Eq. 35 and at the same time to achieve an optimal efficiency in conformational searching. In this work, we only focus on the case of maintaining a full canonical ensemble. A leap-frog Verlet simulation algorithm of the SGLDfp method is described in the Appendix. The force guiding parameters λf and ξf, as well as , , χlf, and χhf, are estimated during simulation according to Eqs. 26, 27, 33, 34, 35, 36, 37, 38, 39, with the ensemble average properties in these equations replaced by long evolving averages calculated in the following way:
(40) |
Here, P(t) represents an instantaneous value of any property, and represents its estimated average. The estimation time, test, determines the estimation accuracy. Typically, we set test = 10tL.
To run a SGLDfp simulation, one can either set and let λf and ξf be calculated from Eqs. 37, 38, or set a target self-guiding temperature,22, and adjust to make. Because an increase in always results in an increase in , and so in Tsg, can be adjusted in the following way to make Tsg approach :
(41) |
Then, λf and ξf will be calculated according to Eqs. 37, 38 in the same way as when is set. Because Tsg is a derived quantity, its value range is limited by the simulation temperature, system size, and other SGLDfp parameters. The value of must be set within its Tsg limits to produce a converging, . For example, one may set for a SGLDfp simulation. For systems of unknown Tsg limits, it is suggested to set instead of .
SIMULATION DETAILS
To demonstrate the application of the SGLDfp simulation method, we report the results for several simple systems. A leap-frog Verlet algorithm for the SGLDfp simulation shown in the Appendix has been implemented into CHARMM (Refs. 26 and 27) version 36. Because a SGLDfp simulation involves extra calculation only in the propagation of the equation of motion as compared to a normal LD simulation, the cost of SGLDfp simulation is almost identical to a LD simulation for the same number of time steps. SGLDfp simulations, as well as SGLD simulations, do use additional memory due to the need to store the guiding forces, as well as some arrays for the weighting factor calculation. In all SGLD and SGLDfp simulations reported here, we set the momentum guiding factor, , instead of a target self-guiding temperature, , for an easy comparison. Because only is set and is the same for all particles, in the following description the guiding factor, λ, represents the momentum guiding factor, .
RESULTS AND DISCUSSIONS
The SGLDfp method includes forces in its guiding force to compensate the perturbation on the conformational distribution caused by the momentum-based guiding force. As described in the Theory and Method section, a series of approximations is introduced to make a SGLDfp simulation possible. To validate these approximations and to demonstrate the conformational searching efficiency of this method, we present the simulation results of three model systems.
The skewed double well system
A skewed double well system represents the simplest system with an energy barrier to cross. This system has only one particle, and the particle moves on a skewed double well energy surface of the following function:
(42) |
In this work, we chose a = 500 kcal/(molÅ2), b = 1 kcal/(molÅ4), c = 0.25 kcal/(molÅ), and w = 2 Å. Figure 1 shows the energy surface of this double well potential. This energy surface is designed in such a way that it restricts the particle to move near the y axis with two energy minimums of different depths, −0.0038 kcal/mol and 0.4960 kcal/mol, along the y axis at (0, −0.0299 Å, 0) and (0, 1.9672 Å, 0), respectively. The potential is symmetric about the y axis with a strong dependence on the distance, rxz, from the y axis, where . The minimum transition energy from one well to the other well is 1.2578 kcal/mol at (0, 1.0627 Å, 0) between the two wells. Such a design forces the particle to have a high-frequency motion in the x–z direction and a low-frequency motion in the y direction. An argon atom was used to represent the particle. The simulations were carried out with a local average time, tL = 0.2 ps. A time step of 1 fs was used and the simulation length was 1000 ns for each simulation. The collision frequency was 10/ps unless otherwise noted.
This system was simulated at 80 K with different guiding factors to examine the ensemble distributions from the SGLD and SGLDfp simulations. At 80 K, the average y-energy of the system is 0.107 kcal/mol. The energy barrier height from the average y-energy to the transition energy is 7.24 kT and the energy difference between the two wells is 3.14 kT. Figure 2 compares the potential energy distributions in the SGLD simulations and the SGLDfp simulations. To illustrate the deviations from the LD result, we show the SGLD and SGLDfp results with λ up to 2, far beyond the recommended range of λ < 1. In the SGLD simulations, as can be seen in the top panel of Fig. 2, as λ increases, the distribution decreases in the low energy region and increases in the high energy region. The middle panel of Fig. 2 shows the reweighted energy distributions.22 Clearly, all curves converge fairly well to the one with λ = 0, except for the case when the guiding factor is very large, λ = 2, validating the SGLD reweighting mechanism (Eqs. 28, 29) and in turn, validating the SGLD partition function (Eq. 15). The bottom panel of Fig. 2 shows the results from the SGLDfp simulations. The densities with different guiding factors converge together, even with λ = 2, proving that the SGLDfp simulations can preserve the energy distribution to a reasonable degree of accuracy.
To further demonstrate the preservation of conformational distributions in SGLDfp simulations, we plot the conformational density as a function of the y coordinates in Fig. 3 at different guiding factors and in Fig. 4 at different local averaging times. The top panel of Fig. 3 shows the SGLD distributions at different guiding factors. There are two peaks with different heights, corresponding to the two skewed double wells. As λ increases, the left peak (the higher peak) decreases, while the right peak (the lower peak) grows. The middle panel of Fig. 3 shows the reweighted SGLD distributions. All distributions converge fairly well to the one with λ = 0, except for the one with λ = 2, validating the reweighting mechanism. The SGLDfp distributions are shown in the bottom panel of Fig. 3. The densities at different guiding factors almost overlap with each other, except for the one with λ = 2, proving that the SGLDfp simulation preserves the conformational distribution fairly well with λ < 1. When the guiding factor is too large, here, λ = 2, the approximation error becomes significant. Therefore, λ < 1 is recommended for the SGLD reweighting or the SGLDfp simulation.
In addition to the guiding factor, λ, the local average time, tL, is another important parameter for SGLD and SGLDfp simulations. The local average time tL determines the separation of the low-frequency and high-frequency properties. Figure 4 examines the distributions at various tL values. The guiding factors are all set to λ = 1, except for the LD simulation which corresponds to a SGLD simulation with λ = 0. Again, as compared to the SGLD results (top panel), the reweighted SGLD results (middle panel) and the SGLDfp results (bottom panel) have much improved agreements with the LD result.
One should keep in mind that there are certain approximations in the SGLD reweighting and the SGLDfp method. The major approximations are the use of evolving averages (Equations 5, 40) to replace local averages in partition functions (Eqs. 15, 37), and to replace ensemble averages in the parameter calculations (Eqs. 26, 27, 33, 34, 38, 39). These approximations break down and cause significant errors when the guiding factor exceeds a certain range. To quantitatively compare the LD, SGLD, and SGLDfp results, we plot their root-mean-square deviations (RMSDs) in Fig. 5. The upper panel and the lower panel of Fig. 5 show the RMSDs of the energy distributions, δρE, and the RMSDs of the y distributions, δρy, respectively. The RMSDs of the SGLD results are much larger than that of the reweighted SGLD results and the SGLDfp results. The deviations of the SGLD and SGLDfp results increase with the guiding factor. These results confirm that λ < 1 is a recommended range for a reasonable accuracy to calculate ensemble average properties.
While the SGLDfp simulation preserves the ensemble distribution, it is interesting to see by how much it enhances conformational searching. Figure 6 shows the trajectories of the particle in the LD, SGLD, and SGLDfp simulations. Both the SGLD and SGLDfp simulations were run with a guiding factor, λ = 1. Clearly, both the SGLD and SGLDfp simulations increased transition rates as compared with the LD simulation. However, the SGLDfp simulation shows fewer transitions than the SGLD simulation.
The collision frequency, γ, plays an important role in Langevin dynamics. It also plays an important role in SGLD and SGLDfp because it directly determines the strength of the guiding force (see Eqs. 12, 31). Through this skewed double well system, we can examine its effect on SGLD and SGLDfp simulations.
We performed a series of SGLD and SGLDfp simulations with λ = 1 at various γ and T, and the transition rates are shown in Fig. 7. For the convenience of plotting in a logarithm scale, the transition count starts with 1. A transition of 1 means that the particle has never crossed the energy barrier. The collision frequency controls the diffusion and the temperature controls the relative energy barrier heights. At T = 100 K, 60 K, and 40 K, the average y-energies are 0.152, 0.0793, and 0.0561 kcal/mol, respectively. The energy differences between the global minimum and the transition barrier are 6.35 kT, 10.58 kT, and 15.87 kT, and the relative barrier heights from the average y-energies to the transition barrier are 5.56 kT, 9.89 kT, and 15.1 kT at T = 100 K, 60 K, and 40 K, respectively.
In Fig. 7, we can see that the transition rates of the LD simulations generally decrease with γ at all temperatures. At 40 K, only a few transitions or no transitions at all are observed in the LD simulations. Higher γ reduces diffusion and slows down all events in LD simulations, regardless of their energy barriers. The transition rates of both the SGLD and SGLDfp simulations are higher than that in the LD simulations, except at T = 40 K and γ ≤ 10/ps where fewer than three transitions are observed for LD and SGLDfp and it is hard to tell the difference between LD and SGLDfp simulations. The accelerations increase with γ because the guiding force is proportional to γ, as shown in Eqs. 12, 31. Comparing the SGLD and the SGLDfp results, we can see that the SGLDfp simulations have much fewer transitions than the SGLD simulations. These results show that the SGLDfp method sacrifices the enhancement in conformational searching to maintain correct conformational distributions.
In Fig. 7, there are maximums in the transitions∼γ curves from the SGLD and SGLDfp simulations. These maximums are due to the competition of two control effects on the transition. One is the energy barrier effect and the other is the diffusion effect. An increase in γ will increase the guiding force, as well as the friction force. At low γ, the energy barrier controls the transition and an increase in the guiding force helps energy barrier crossing more than the slow down by the increase in the friction force. Therefore, an increase in γ results in more transitions. At high γ, the diffusion effect controls the transition and an increase in γ will result in fewer transitions. At γmax, the energy barrier effect is balanced by the diffusion effect.
Alanine dipeptide
Alanine dipeptide is the simplest molecule that is relevant to proteins. The conformation of this molecule is mainly characterized by two dihedral angles, ϕ: C–N–Cα–C and ψ: N–Cα–C–N. The CHARMM all-atom force field28 was used to describe the interactions. A distance-dependent dielectric constant of 4r was used to represent a solvent screening effect to simplify the example. The cutoff distance was set to 100 angstroms to avoid any cutoff effect in non-bonded interaction calculation within this small molecule.
All simulations were performed with a time step of 2 fs, and the SHAKE algorithm24 was employed to fix bond lengths. Each simulation was 200 ns in length and conformations of every 2 ps were saved for post analysis. The SGLD and SGLDfp simulations were performed with a local average time of tL = 0.2 ps and a temperature of 300 K. A collision frequency of 10/ps was used for all these simulations.
Figure 8 compares the ϕ–ψ dihedral angle distributions of the alanine dipeptide in the LD, SGLD, and SGLDfp simulations. To show the details in low density area, the contours are drawn in an exponential scale. For this small molecule at the simulation condition, LD can well sample the conformational space. There is a major peak at (−90°, 170°), and a secondary peak near (−90°, −70°). There is also a trace region around (70°, −120°). Comparing the LD and SGLD distributions, we can see the major peak from the SGLD simulation is significantly lower, while the trace region is much higher. The SGLD distribution has a broader baseline and the valley around ψ = 30° is less deep. After the reweighting, all characteristics of the LD distribution are recovered: the major peak is elevated, the trace region is reduced, and the baseline shrinks, which again, demonstrating that the SGLD distribution can be converted to the LD distribution through the reweighting. Comparing the ϕ–ψ distributions from the SGLDfp simulation and the LD simulation, we can clearly see that they agree with each other fairly well. The root-mean-square differences from the LD distribution are 1.08, 0.574, and 0.380 for the SGLD distributions before and after reweighting, and for the SGLDfp distribution, respectively.
To demonstrate the conformational searching ability, we compare the SGLD and SGLDfp simulations with high temperature LD simulations. To quantitatively compare the conformational searching ability, we calculated the transition rate for the dihedral angles, (ϕ, ψ), to transfer from one local minimum at (−90°,−70°) to another local minimum at (−90°, 170°). One transfer is counted when (ϕ, ψ) changes from within 40° of one local minimum to within 40° of the other local minimum.
Figure 9 shows the average potential energies as a function of the transition rate in the high temperature LD simulations as well as in the SGLD and SGLDfp simulations. The average potential energy reflects the conformational distribution to a certain degree. A change in the average energy indicates a change in the conformational distribution. As can be seen from Fig. 9, the high-temperature simulation not only increases the transition rate, but also significantly increases the average potential energy. In the SGLD and SGLDfp simulations, the average potential energy experiences little change except for the SGLD simulations with λ = 1. The transition rate increases significantly with λ in both the SGLD and SGLDfp simulations, even though the SGLDfp simulations have fewer transitions as compared to the SGLD simulations with the same λ. It is also clear from Fig. 9 that the SGLDfp simulation preserves the average energy better.
Folding of a pentamer peptide
Protein folding is a major challenge for conformational searching. Due to many degrees of freedom of proteins, the conformational space of a protein is extremely large and an exhaustive conformational search is impossible. To reach the folded state, we believe the accessible conformational space for a protein is limited and the protein can find its folded state quickly by moving through this accessible conformational space. Many methods, such as high-temperature simulations, not only accelerate conformational searching but also increase the accessible conformational space. An increase in the accessible conformational space not only makes the conformational searching problem worse, but also may alter the global minimum and make it harder to identify the folded state. The ability to preserve the conformational distribution makes SGLDfp a suitable means to study problems where the preservation of conformational distribution is critical.
To demonstrate the application of the SGLDfp method in a protein folding study, we performed folding simulations for a pentamer peptide9, 29 which forms a type II turn according to the experimental observation. The sequence of the pentamer peptide is: Tyr-Pro-Gly-Asp-Val. To simplify the demonstration, all simulation conditions were the same as those for the alanine dipeptide simulations described above. A temperature of 300 K and a collision frequency of 1/ps were set for all the simulations. The guiding factor was λ = 0.5 for the SGLD simulation and was λ = 1 for the SGLDfp simulation, so both the simulations have similar conformational searching ability. All simulations were started from an extended conformation and were 200 ns in length.
Because a large number of conformations were visited during these simulations, in order to simplify the description, we clustered the conformations to six major clusters using the local maximum clustering method.30 The distances between conformations are calculated as the sum of the difference square of the backbone dihedral angles. Figure 10 shows the representative structures of these six major clusters. Clusters 1 and 4 have a broad turn involving Pro-Gly-Asp wherein the proline carbonyl oxygen points up and points down, respectively. Clusters 2 and 3 have a tight turn involving Pro-Gly wherein the proline carbonyl oxygen points up and points down, respectively. Clusters 5 and 6 form a helical coil wherein the C-terminal points up and points down, respectively.
Figure 11 compares the conformational distributions obtained from the LD, SGLD, and SGLDfp simulations. The conformational distributions are shown in two-dimensional contour plots with the distances to the center conformations of clusters 1 and 2 as x coordinates and y coordinates, respectively. Even though the peptide has only five residues, the conformational space is extremely large and the LD simulation of 200 ns may not necessarily properly sample the whole conformational space. All six major clusters can clearly be identified in these simulations, even though the SGLD and SGLDfp simulation results have some trace amounts of other clusters. The density from the SGLD simulation shows broader peaks than those in the LD and SGLDfp results. After reweighting, the SGLD result has peaks as sharp as the LD result. The SGLDfp result resembles the LD result fairly well, again demonstrating that the SGLDfp method can eliminate the perturbation of the SGLD method. Due to the approximations in the SGLD reweighting and in the SGLDfp method, certain deviations from the LD result still remain, especially when the guiding factor is exceptionally large, e.g., larger than 1. The RMSDs from the LD result are 1.44, 1.59, and 0.81 for the SGLD results before and after reweighting and for the SGLDfp result, respectively. The large RMSD for the reweighted SGLD result is caused by a significant fluctuation produced by the reweighting process.
Figure 12 plots the cluster transitions during the first 2000 ps simulations. As can be seen, the LD simulation did not reach cluster 1 during the first 2000 ps simulation and the transitions between clusters were not as frequent as that in the SGLD and SGLDfp simulations. The most frequency transitions occurred between cluster 2 and cluster 5. This agrees with Fig. 11, which shows clusters 2 and 5 are two high density clusters and they are near each other. There are also a significant number of transitions between cluster 2 and cluster 3, but not between cluster 2 and cluster 4, agreeing with Fig. 11 where clusters 2 and 4 are separated by clusters 3, 5, and 6. This example further demonstrates that the SGLDfp method is a suitable approach for protein folding studies in term of its ability to accelerate conformational searching while maintaining the conformational distribution.
CONCLUSIONS
Based on the understanding of the conformational distribution in the SGLD simulation,22 we developed a force-momentum-based self-guided Langevin dynamics simulation method, abbreviated as SGLDfp, to approximately maintain the canonical ensemble distribution while accelerating conformational searching. This method is a general simulation method and can be applied to any studies where a LD simulation can be applied. This method does not need to predetermine important degrees of freedom to achieve accelerated conformational searching. Even though SGLDfp accelerates conformational searching to a less degree than SGLD, it is more convenient than SGLD since it does not require a post-processing step to reweight the visited conformations. In addition, SGLDfp is size extensive, i.e., it can be applied regardless of the system sizes. By contrast, the SGLD reweighting mechanism is difficult to converge for large systems or with large guiding factors. Because SGLDfp does not have such a size limitation, it, therefore, has a wider range of applicability. Since the guiding force is proportional to the collision frequency, the enhancement in conformational searching increases with the collision frequency. For systems with large energy barriers, SGLDfp, as well as SGLD, performs better with large collision frequency. In practical terms, with optimal parameters, SGLD can be used to routinely cross barriers of 20 kT, whereas SGLDfp can be used to cross barriers of 15 kT at a rate LD crosses barriers of 10 kT. Simulation results with a skewed double well system indicate that the guiding factor of 1 or less is recommended to keep acceptable deviations from a canonical ensemble. More details on this will be the subject of a subsequent report. Because SGLDfp can produce approximately the canonical ensemble distribution directly without reweighting, it is more convenient for quantitative simulation studies. For studies requiring a preserved conformational distribution, such as protein folding simulation and the free energy calculation, SGLDfp is a suitable choice to accelerate conformational searching, especially for larger systems where other methods fail. The SGLDfp method can be combined with many other free energy calculation methods, such as the adaptive biasing force method by Darve et al.31 and the orthogonal-space-random-walk method by Zheng et al.,32, 33, 34 to achieve increased accuracy and efficiency.
ACKNOWLEDGMENTS
This research was supported by the Intramural Research Program of the NIH, NHLBI. We thank the reviewers for their valuable comments and suggestions that lead to a significant improvement of the manuscript. We thank Terry Brooks for proof reading the manuscript.
APPENDIX: SGLDFP SIMULATION ALGORITHM
To help understand and implement the SGLDfp method, we describe a leap-frog Verlet SGLDfp simulation algorithm below. Assuming the momentum guiding factor, λp, is input, a SGLDfp simulation is performed in the following steps:
-
(i)
Initiate the force guiding factors: λf(0) = 0, ξf(0) = 0, and the low-frequency variables: , , , and .
-
(ii)At time step, t, calculate interaction forces, fi(t), random forces, Ri(t), and the partial guiding forces, . The interaction forces, fi(t), must include any constraint forces as described later in Eq. A20. Random forces, Ri(t), are generated from a Gaussian distribution with zero mean
The low-frequency momentum is calculated using the momentum at the previous half step, pi(t − (δt/2)),
The low-frequency force is calculated using the force at current step, fi(t),(A1) (A2) -
(iii)Calculate the energy conservation factor, ξp. The half step velocity, , can be expressed in the following form:
calculate the friction-free velocity at the half step(A3)
From Eqs. A3, A4, we have(A4)
Based on the energy conservation relation (Eq. 13) and neglecting the higher power term of ξp, we can solve for the energy conservation factor,(A5) (A6) The actual guiding force is(A7) (A8) -
(iv)
Update low-frequency variables and accumulators for the calculation of the force guiding factors.
Low-frequency potential energy
Low-frequency guiding forces(A9) (A10)
From the low-frequency momentums, we can calculate the low-frequency temperature(A11)
Update averages for the calculation of the collision factors and energy factors(A12) (A13a) (A13b) (A13c) (A13d) (A13e)
The collision and energy factors are calculated with the averages(A13f)
When is available from a previous SGLD simulation with λ = 0, is recommended, otherwise, χlf = 1 − GPLF/PPLF has to be used. From χlf, we can calculate the high-frequency collision factor: .(A14) -
(v)Advance velocities to the next half time step
Here, the scaling parameter, χi, is calculated as(A17)
Then, advance positions to the next time step(A18)
If internal coordinates need to be constrained, apply constraining algorithms, such as SHAKE (Ref. 24) or semi-flexible constraint dynamics,25 to obtain constrained positions, , from ri(t + δt). The constraint forces must be included in the low-frequency force calculation. The constraint forces are calculated by the following equation:(A19) (A20) -
(vi)
Continue to step (ii) with t = t + δt until the end of the simulation.
References
- Wu X. and Wang S., J. Chem. Phys. 110(19), 9401 (1999). 10.1063/1.478948 [DOI] [Google Scholar]
- Wu X. and Wang S., J. Phys. Chem. B 102(37), 7238 (1998). 10.1021/jp9817372 [DOI] [Google Scholar]
- Wu X. and Brooks B. R., Chem. Phys. Lett. 381 (3–4), 512 (2003). 10.1016/j.cplett.2003.10.013 [DOI] [Google Scholar]
- Lee M. S. and Olson M. A., J. Chem. Theory Comput. 6(8), 2477 (2010). 10.1021/ct100062b [DOI] [PubMed] [Google Scholar]
- Lee C. I. and Chang N. Y., Biophys. Chem. 151(1–2), 86 (2010). 10.1016/j.bpc.2010.05.002 [DOI] [PubMed] [Google Scholar]
- Wu X. and Brooks B. R., Biophys. J. 86(4), 1946 (2004). 10.1016/S0006-3495(04)74258-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu X., Wang S., and Brooks B. R., J. Am. Chem. Soc. 124(19), 5282 (2002). 10.1021/ja0257321 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu X. and Wang S., J. Phys. Chem. B 105(11), 2227 (2001). 10.1021/jp004048a [DOI] [Google Scholar]
- Wu X. and Wang S., J. Phys. Chem. B 104 (33), 8023 (2000). 10.1021/jp000529i [DOI] [Google Scholar]
- Wu X.-W. and Sung S.-S., Proteins: Struct., Funct., Genet. 34(3), 295 (1999). [DOI] [PubMed] [Google Scholar]
- Wen E. Z. and Luo R., J. Chem. Phys. 121(5), 2412 (2004). 10.1063/1.1768151 [DOI] [PubMed] [Google Scholar]
- Wen E. Z., Hsieh M. J., Kollman P. A., and Luo R., J. Mol. Graphics Modell. 22(5), 415 (2004). 10.1016/j.jmgm.2003.12.008 [DOI] [PubMed] [Google Scholar]
- Varady J., Wu X., and Wang S., J. Phys. Chem. B 106(18), 4863 (2002). 10.1021/jp0131469 [DOI] [Google Scholar]
- Chandrasekaran V., Lee C. J., Lin P., Duke R. E., and Pedersen L. G., J. Mol. Model. 15(8), 897 (2009). 10.1007/s00894-008-0444-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pendse P. Y., Brooks B. R., and Klauda J. B., J. Mol. Biol. 404(3), 506 (2010). 10.1016/j.jmb.2010.09.045 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Damjanovic A., Garcia-Moreno E. B., and Brooks B. R., Proteins: Struct., Funct., Bioinf. 76(4), 1007 (2009). 10.1002/prot.22439 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Damjanovic A., Wu X., Garcia-Moreno E. B., and Brooks B. R., Biophys. J. 95(9), 4091 (2008). 10.1529/biophysj.108.130906 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tsuru T., Yosuke A. B. E., Kaji Y., Tsukada T., and Jitsukawa S., Zairyo/J. Soc. Mater. Sci. Jpn. 59(8), 583 (2010). 10.2472/jsms.59.583 [DOI] [Google Scholar]
- Sheng Y., Wang W., and Chen P., Protein Sci. 19(9), 1639 (2010). 10.1002/pro.444 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sheng Y., Wang W., and Chen P., J. Phys. Chem. C 114(1), 454 (2010). 10.1021/jp908629g [DOI] [Google Scholar]
- Abe Y. and Jitsukawa S., Philos. Mag. Lett. 89(9), 535 (2009). 10.1080/09500830903140735 [DOI] [Google Scholar]
- Wu X. and Brooks B. R., J. Chem. Phys. 134, 134108 (2011). 10.1063/1.3574397 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Andricioaei I., Dinner A. R., and Karplus M., J. Chem. Phys. 118(3), 1074 (2003). 10.1063/1.1528893 [DOI] [Google Scholar]
- Ryckaert J. P., Ciccotti G., and Berendsen H. J. C., J. Comput. Phys. 23, 327 (1977). 10.1016/0021-9991(77)90098-5 [DOI] [Google Scholar]
- Wu X. W. and Sung S. S., J. Comput. Chem. 19(14), 1555 (1998). [DOI] [Google Scholar]
- Brooks B. R., Brooks Iii C. L., A. D.MackerellJr., Nilsson L., Petrella R. J., Roux B., Won Y., Archontis G., Bartels C., Boresch S., Caflisch A., Caves L., Cui Q., Dinner A. R., Feig M., Fischer S., Gao J., Hodoscek M., Im W., Kuczera K., Lazaridis T., Ma J., Ovchinnikov V., Paci E., Pastor R. W., Post C. B., Pu J. Z., Schaefer M., Tidor B., Venable R. M., Woodcock H. L., Wu X., Yang W., York D. M., and Karplus M., J. Comput. Chem. 30(10), 1545 (2009). 10.1002/jcc.21287 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brooks B. R., Bruccoleri R. E., Olafson B. D., States D. J., Swaminathan S., Jaun B., and Karplus M., J. Comput. Chem. 4, 187 (1983). 10.1002/jcc.540040211 [DOI] [Google Scholar]
- A. D.MacKerellJr., Bashford D., Bellott M., R. L.DunbrackJr., Evanseck J. D., Field M. J., Fisher S., Gao J., Guo H., Ha S., Joseph-McCarthy D., Kuchnir L., Kuczera K., Lau F. T. K., Mattos C., Michnick S., Ngo T., Nguyen D. T., Prodhom B., W. E.ReiherIII, Roux B., Schlenkrich M., Smith J. C., Stote R., Straub J., Watanabe M., Wiorkiewicz-Kuczera J., Yin D., and Karplus M., J. Phys. Chem. B 102, 3586 (1998). 10.1021/jp973084f [DOI] [PubMed] [Google Scholar]
- Dyson H. J., Rance M., Houghten R. A., Wright P. E., and Lerner R. A., J. Mol. Biol. 201(1), 201 (1988). 10.1016/0022-2836(88)90447-0 [DOI] [PubMed] [Google Scholar]
- Wu X., Chen Y., Brooks B. R., and Su Y. A., EURASIP J. Appl. Signal Process. 2004(1), 53 (2004). 10.1155/S1110865704309145 [DOI] [Google Scholar]
- Darve E., Rodriguez-Gomez D., and Pohorille A., J. Chem. Phys. 128(14), 144120 (2008). 10.1063/1.2829861 [DOI] [PubMed] [Google Scholar]
- Zheng L., Carbone I. O., Lugovskoy A., Berg B. A., and Yang W., J. Chem. Phys. 129(3), 034105 (2008). 10.1063/1.2953321 [DOI] [PubMed] [Google Scholar]
- Zheng L., Chen M., and Yang W., Proc. Natl. Acad. Sci. U.S.A. 105(51), 20227 (2008). 10.1073/pnas.0810631106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng L., Chen M., and Yang W., J. Chem. Phys. 130(23), 234105 (2009). 10.1063/1.3153841 [DOI] [PubMed] [Google Scholar]