Skip to main content
The Journal of Chemical Physics logoLink to The Journal of Chemical Physics
. 2008 Sep 15;129(11):114101. doi: 10.1063/1.2976440

An experimentally guided umbrella sampling protocol for biomolecules

Maria Mills 1,a), Ioan Andricioaei 1,b)
PMCID: PMC2736582  PMID: 19044944

Abstract

We present a simple method for utilizing experimental data to improve the efficiency of numerical calculations of free energy profiles from molecular dynamics simulations. The method involves umbrella sampling simulations with restraining potentials based on a known approximate estimate of the free energy profile derived solely from experimental data. The use of the experimental data results in optimal restraining potentials, guides the simulation along relevant pathways, and decreases overall computational time. In demonstration of the method, two systems are showcased. First, guided, unguided (regular) umbrella sampling simulations and exhaustive sampling simulations are compared to each other in the calculation of the free energy profile for the distance between the ends of a pentapeptide. The guided simulation use restraints based on a simulated “experimental” potential of mean force of the end-to-end distance that would be measured by fluorescence resonance energy transfer (obtained from exhaustive sampling). Statistical analysis shows a dramatic improvement in efficiency for a 5 window guided umbrella sampling over 5 and 17 window unguided umbrella sampling simulations. Moreover, the form of the potential of mean force for the guided simulations evolves, as one approaches convergence, along the same milestones as the extensive simulations, but exponentially faster. Second, the method is further validated by replicating the forced unfolding pathway of the titin I27 domain using guiding umbrella sampling potentials determined from actual single molecule pulling data. Comparison with unguided umbrella sampling reveals that the use of guided sampling encourages unfolding simulations to converge faster to a forced unfolding pathway that agrees with previous results and produces a more accurate potential of mean force.

INTRODUCTION

The potential of mean force (PMF),1 i.e., the free energy W(ξ)=−kBT ln⟨p(ξ)⟩ with respect to a chosen coordinate ξ, is fundamentally related to that coordinate’s distribution function, averaged over all the other degrees of freedom ⟨p(ξ)⟩. As such, it is a central concept in the statistical mechanical representation of molecular systems and has been employed in a number of computational applications,2, 3 describing, for example, processes ranging from hydrophobic interactions4 or organic reactions in water,5 to proton transfer6, 7, 8 or ionic permeation through membrane channels,9 to peptide10, 11 and protein12 equilibria, to nucleic acid base flipping13, 14 or more complex conformational changes15 in DNA and RNA. The PMF can be used to clearly represent the equilibrium distribution of one or a few relevant conformational variables (angles, distances, etc.) with increasing accuracy as longer sampling and improved force fields become available. Alternatively, it can also be used to describe nonequilibrium processes from the solution of a reduced Fokker–Planck equation for the time dependence of the probability distribution of the ξ variable diffusing in the potential of mean force W(ξ) (describing, e.g., the transfer of an ion or a conformational transition of a biomolecule). Therefore, if the chosen coordinate is a good reaction coordinate (i.e., if it is much slower than any other degree of freedom), a dynamical propagation on the PMF can simulate the kinetics of the reaction of interest. Moreover, the derivatives of the PMF along a selected set of holonomically constrained degrees of freedom16, 17 [whose thermodynamic integration yields the PMF (Ref. 18)] can be used to locate free energy minima19 or to sample optimal reaction paths.20

At the same time with the increase in the accuracy of calculating equilibrium distributions by computer simulations, advances in the experimental realm have also made it possible to determine relatively accurately distribution functions of measurable coordinates. For example, measurements in molecular optical spectroscopy21—traditionally performed in the bulk—can now be recorded at the single molecule level, thereby resolving possible differences between the behavior of a number of individual molecules versus measuring ensemble averages. This in turn can reflect for skewed probability distribution in differences between observing the most probable versus the average observables,22 and more fundamentally, can reveal new regimes for the processes at the nanoscale.23, 24 These experimental distribution functions can be used, in principle, to obtain free energy landscapes (PMFs) for conformational dynamics of proteins and other biomolecules.25, 26, 27 For example, a study by Deniz et al.25 using single molecule Förster resonance energy transfer (FRET) between sites that are fluorescently labeled in a biomolecule has enabled the characterization of protein folding distributions by measuring the probability density of the distance between the N terminus and the loop of chymotrypsin inhibitor 2 at various concentrations of denaturant. FRET can also probe conformational changes for RNA molecules, which have been shown to require large equilibrium dynamical transitions as a prerequisite for their function.28

Another source of experimental distribution functions comes from mechanically extending biomolecules one at a time. Single molecule pulling techniques have been employed to unfold biomolecules by using atomic force microscopy (AFM),29 optical30 or magnetic trapping,31 thereby generating extension trajectories that, when binned and processed appropriately,32, 33, 34, 35, 36 can produce a mapping of the potential of mean force and possibly of its entropic and enthalpic components37 along the pulling coordinate in the absence of the pulling force or for various values of the force.38 The latter aspect—a kinetic one—is important in accurately assessing the dependence of the unfolding rate constant on force,39 as this has important consequences in vivo for instances when proteins are unfolded as part of their cellular function.40, 41 Alternatively, in equilibrium conditions, constant-force reversible folding-unfolding measurements can also be used to map the folding energy landscape, as was showcased in recent work on DNA hairpins.42

However, experiments such as the ones described above or others (including bulk experiments) can resolve general conformational changes along only a small set of degrees of freedom [for regular FRET, one or—with the emergence of three-color FRET (Refs. 43, 44)—up to three spatial separations (but see also Ref. 45)] and they often yield only on-off signals. Theoretical work on the applicability of FRET has explored the range of conditions that interfere with the ability of FRET to reveal the true PMF, mainly poor time resolution (relative to the actual timescale of distance fluctuations) and FRET efficiency distributions that are mainly due to shot noise rather than with the actual distance fluctuations.46, 47 Similarly, in single molecule pulling, the extension that is measurable is poorly resolved at the point of transition (e.g., when a protein unfolds). Molecular dynamics simulations,48 on the other hand, are useful for computing dynamic and equilibrium processes of large molecules in atomic detail for any degree of freedom and with high femtosecond time resolution, but processes which involve large free energy barriers or long timescales are too computationally demanding to be simulated directly. As a consequence, this breaks any assumption of ergodicity,49 in the sense that the computed time averages, do not equal those in the ensemble measurements.

For this reason, it is often difficult to compute the averages required for the calculation of a free energy profile from a direct molecular dynamics simulations, i.e., by directly counting the frequency of occurrence of the values of the reaction coordinate during the simulation run. The most used method for generating a PMF efficiently from molecular dynamics or Monte Carlo simulations is the venerable umbrella sampling method.50 In this method, a reaction coordinate is chosen and restraining potentials acting on it are used over a series of windows to sample the range of the reaction coordinate. By applying the restraining potentials, the system is encouraged to sample regions of conformational space that would not otherwise be accessible during the direct sampling. The result is a series of histograms which contain the biased distribution of the reaction coordinate from each window. These histograms are then unbiased and combined,51, 52 usually with the aid of the weighted histogram analysis method (WHAM).53 Typically, the restraining potentials are chosen through trial and error, which results in a great deal of preliminary data being discarded as the potentials are optimized. We propose here a simple method which would use umbrella sampling potentials based on an experimentally determined (but not necessarily accurate) PMF from measured equilibrium distributions to improve molecular dynamics simulation efficiency.

For a uniform sampling of the reaction coordinate, the ideal “restraining” potential would be the negative of the exact potential of mean force (that is, if the PMF would be known a priori). Using the negative of the PMF flattens the energy landscape, circumventing the problem of trapping by large energy barriers.3, 53 Most techniques for improving the efficiency of umbrella sampling, such as adaptive umbrella sampling methods54, 55, 56, 57, 58, 59, 60 and other techniques that generate flat distributions,61, 62, 63, 64 focus on this approach. In adaptive umbrella sampling, a continuously updated umbrella potential which is a function of one or a few important degrees of freedom is added iteratively to the unperturbed Hamiltonian of the system. The umbrella potential for each iteration of the simulation is chosen to be an estimate of the negative of the PMF determined from the previous iteration. As a result, one obtains a uniform sampling distribution along the important degree(s) of freedom. In the multicanonical sampling method61, 62 or its twin sister, the entropy sampling method,65 and in Wang–Landau sampling64 and related methods,66, 67, 68 a uniform energy probability distribution is obtained by assigning weights that are inversely proportional to a precalculated (or a dynamically updated) density of states.

Flattening the conformation or energy distributions to promote uniform sampling comes, however, at the substantial computational cost of predetermining either the adaptive umbrella potentials or the density of states, respectively. Moreover, as most biological molecules do not uniformly sample all conformation space, there would be no need to flatten energy barriers if the location of the barriers would be known beforehand. Instead, we propose here to add to the unperturbed Hamiltonian a set of experimentally derived harmonic potentials of different curvatures, fitted to the positive of an experimental PMF. This will serve as a “guide” for the reaction coordinate along the free energy path. It has been suggested previously that adding harmonic umbrella potentials minus the PMF would improve convergence for multiwindow umbrella sampling techniques (as opposed to adaptive umbrella sampling, in which the negative of the PMF alone is used in a single window that is constantly updated).3, 69 Subtracting the PMF from individual harmonic potentials gives similar potentials to our guiding technique, as long as the initial harmonic potentials are well chosen, as is depicted in the cartoon in Fig. 1 (see also discussion in Sec. 3). In other words, our experimentally guided umbrella sampling should produce similar potentials to the best case of this convergence method. Using a known experimental PMF should allow one to choose optimal initial restraining potentials, eliminating guesswork and wasted data.

Figure 1.

Figure 1

Schematic examples of restraining potentials for the pedagogical case of a double well potential, showing the effect of subtracting the PMF from the restraints. (a) The original double well potential. (b) A set of uneducated-guess (poor) starting potentials (dashed lines) with the negative of the PMF (solid line) and (c) the results of subtracting the PMF from these harmonic potentials (dashed lines) overlaid with the original PMF. (d) A nearly optimal set of restraining potentials and (e) the results of subtracting the PMF from these optimal harmonic potentials, overlaid with the original PMF. The potentials in (e) resulting from the subtraction (dotted lines) produce biasing restraints similar to the type we suggest to use in our guided umbrella sampling protocol.

In both the straightforward harmonic-wells-minus-PMF method and the adaptive umbrella sampling approaches, a good guess of the approximate negative of the PMF is crucial to their rapid convergence. Here we suggest how this initial guess can be provided not by preliminary simulations but directly from experimental input. This approach is similar in spirit to recent approaches that use experimental input in the form of, say, order parameter nuclear magnetic resonance (NMR) data70 or hydrogen exchange71 to improve the description of conformational equilibrium of proteins.

To showcase our method, we report here an initial feasibility study using two systems. First, we study a test case: an unstructured pentapeptide with sequence Cys-Ala-Gly-Gln-Trp; we choose the end-to-end distance as the reaction coordinate. We selected this peptide as a test system because it has a size small enough to allow for exhaustive sampling by direct methods, and because both experimental72 and computational73 work have used this pentapeptide as a model for contact formation in protein folding. Because no experimental potential of mean force for our chosen reaction coordinate was available, we have instead generated the PMF from extensive sampling simulations. This PMF was subsequently used as the “experimental” guiding PMF. Guided umbrella sampling was run using five windows with restraining potentials of varying curvature based on the guiding PMF. Regular, unguided umbrella sampling experiments were also run in five and seventeen windows for comparison with guided umbrella sampling simulations. Second, the method is applied to a more complex protein system, the I27 domain of the muscle protein titin, for which an experimental PMF has been determined by the forced unfolding of the I27 domain using AFM.74 After presenting a background on umbrella sampling (in the next section) and introducing our experimentally guided umbrella sampling method (in Sec. 3), we present, in the Results sections, the outcome and the comparison of the various unguided and guided simulations for the two systems. We end with a few concluding discussions.

BACKGROUND ON UMBRELLA SAMPLING

In umbrella sampling,50 a biasing potential is added to the Hamiltonian to direct the simulations toward a certain goal, for example, to induce a certain conformational transition. The biasing potential is usually in the form of a harmonic potential restraint that keeps the value of a relevant reaction coordinate fluctuating around successive positions along that coordinate. The result of this “stratification” is a series of biased histograms. In order to obtain accurate information about the free energy of the system, the raw data from individual simulations must be unbiased and recombined. The addition of the harmonic restraining potential wi is equivalent to a multiplicative weight in the Boltzmann factor of ewi(ξ)∕kBT. Thus, the biased distribution function for each window is

p(ξ)ibiased=ewi(ξ)kBTp(ξ)ewi(ξ)kBT1 (1)

and the unbiased PMF in the ith window is

Wi(ξ)=kBTlnp(ξ)wi+Fi, (2)

where the unknown free energy constant Fi is defined by the equation

eFikBT=ewikBT. (3)

In the weighted histogram analysis method (WHAM),53 one uses an iterative process to determine the free energy constants, Fi. An initial guess set of Fi is used to estimate the unbiased probability distribution, where

p(ξ)=inip(ξ)i[jnje(wj(ξ)Fj)kbT]1. (4)

The resulting probability is then used in the equation

eFikBT=dξewi(ξ)kBTp(ξ) (5)

to determine a new set of Fi values. The last two equations are solved iteratively until they are self consistent.

For regular implementations of WHAM using multiple-window umbrella sampling techniques, the strength of the harmonic biasing potentials and the number of simulation windows must be chosen to allow sufficient overlap for the data to be recombined while at the same time encouraging more rapid sampling of the reaction coordinate. Simply increasing the number of windows will not necessarily improve convergence.3, 69 Furthermore, in order to obtain accurate information about other degrees of freedom, the chosen reaction coordinate must be the slowest one.69 Determining optimal factors by trial and error leads to a great deal of wasted CPU time. Adaptive umbrella sampling overcomes this to a certain extent by updating the estimate of the PMF and using its negative to flatten barriers, but for regions which have not been sampled in previous simulations, the shape of the restraining potential must be set arbitrarily or extrapolated.54, 55, 56, 57, 58 Another issue is the error in the PMF calculated by the WHAM equations, which is sensitive to the size of the bins used in constructing the histograms. Insufficient sampling leads to a large number of bins with too few configurations per bin, resulting in statistical error, while using too few bins can lead to errors in the probability density.75

THE EXPERIMENTALLY GUIDED UMBRELLA SAMPLING METHOD

As discussed in the Introduction, knowing the PMF from experiment makes the choice of restraining potentials optimal, but it is important that the simulations are carried out in a way that yields information about the system of interest as a whole, rather than just the chosen reaction coordinate. This means that, in order for the experimentally guided biasing method to be practical, it is important to use experimental guiding potentials that are reported along the slow degrees of freedom of the system, and that the other degrees of freedom are sufficiently sampled so that they converge in the simulation (but see discussion later in the text regarding single molecule nonergodicity, for which departure from this convergence is worthwhile exploring as well).

The experimental PMF input in the calculation is fitted with harmonic restraining potentials of varying curvature. Assume that an experimental guiding PMF, Wexp is available, say from the probability distribution pexp(ξ) of a distance ξ measured from a FRET or micromanipulation experiment, i.e., up to a constant,

Wexp(ξ)=kBTlnpexp(ξ). (6)

This guiding potential could be used in one of the following two ways (see panels B–C and D–E, respectively, in Fig. 1). The first possibility would be to choose a number of restraining potentials wi centered each around a corresponding ξi value, and to run the corresponding ith simulation using the effective potential energy

V(r)+wi(ξ(r))Wexp(ξ(r)), (7)

where V is the interatomic potential energy function employed describing the physical interactions in the system, with r the 3N-dimensional conformational vector. On one hand, if the simulation in the ith window would be run with a total potential consisting of the sum of just the first two terms in Eq. 7, i.e., with just V(r)+wi(ξ(r)), this would be nothing else but regular umbrella sampling. On the other hand, in one dimensional cases, VWexp, so the subtraction of Wexp in Eq. 7 contributes to a flattening of the energy landscape along ξ, and the restraints wi focus the simulation in a particular domain of ξ. In the absence of any knowledge of a priori details of the PMF, in regular umbrella sampling, all the wis are the same across all windows. For the simulation run with the potential in Eq. 7, the subtraction of Wexp from the sum of the first two terms is equivalent to an effective change in the individual values of the restraining potentials from window to window. In general cases, the wi’s are to be optimized [see Figs. 1d, 1e] such that the guiding wi(g) restraining potentials replace wi(ξ(r))−Wexp(ξ(r)), i.e., the combination of negative PMF plus equal force-constant restraint is recast as a window-dependent restraint. In that case, this is equivalent to running simulations in the ith window on an effective potential

V(r)+wi(g)(ξ(r)), (8)

where this time the guiding potentials,

wi(g)(ξ)=12ki(g)(ξξi(g))2 (9)

have optimal ξi(g) positions of their minima and optimal ki(g) force-constant values; the optimization is found from piecewise fitting of the Wexp PMF. This second way to perform umbrella sampling using experimental guiding data is the essence of the method we propose.

For the pentapeptide test case we showcase herein, the guiding potentials were chosen such that the center of the harmonic well was located at either a minimum of a well or at the central position of a shoulder. In other words, the piecewise fit of the above guiding potentials to the experimental PMF was done based on the plot of the second derivative d2Wexpdξ2, i.e., based on the long-scale change of the curvature of the “experimentally” available PMF (see Fig. 2 for details). The value of each force constant ki(g) was chosen such that the harmonic well approximated the curvature of the PMF around the minimum. The choice of a harmonic potential that approximates the shape of the well is an obvious one, since there would be no point in using a potential wider than the well, and using a narrower potential would require more windows in order to sample over the excluded space. In situations where this does not provide sufficient overlap, another potential may be added with its center between those of the surrounding potentials and its ki(g) value also chosen to approximate the curvature of the PMF.

Figure 2.

Figure 2

(a) “Experimental” potential of mean force for the end-to-end distance of pentapeptide with fitted biasing potentials. (b) Second derivative of the PMF calculated using a finite-difference scheme and by eliminating, with a running average filter of width 2.5 Å, changes in the curvature occurring on length scales smaller than the thermally accessible range for the harmonic potentials. The plot has five flat regions; correspondingly, five guiding umbrella potentials are chosen with the locations of their minima within the five regions placed according to a root-mean-square best fit to the experimental PMF.

The PMF for the pentapeptide test system used had no barriers; in the case when barriers in the experimental PMF would exist (see Fig. 1), a potential with the center placed at the peak of a barrier would have the ki(g) value chosen to approximate the (negative) local curvature at the transition point. In the case of the pentapeptide studied here (see Fig. 2), the choice of five wi windows was obvious (cf. Fig. 2, there are five regions in the curvature plots). For the unfolding of the titin I27 domain, potentials were fitted to the curvature of the experimental PMF. The positions of the potentials were chosen to optimize overlap between umbrella sampling windows. In general applications, an iterative optimization procedure that minimizes a piecewise cost function describing the root-mean-square difference between the experimental PMF and the fitting windows can be done over the relevant set of parameters, i.e., number of windows, positions of the restraints ξig, and force constants kig.

In any case, regardless of their particular choice, the addition of the guiding potentials wi(g) in Eq. 9 to the system’s unperturbed Hamiltonian can be unbiased using the regular weighted histogram analysis method as presented in Sec. 2.

RESULTS

Pentapeptide test case

Molecular dynamics simulations were performed on a five amino acid peptide with sequence Cys-Ala-Gly-Gln-Trp with the polar-hydrogen parameter set 19 of the CHARMM force field,76 using the analytical continuum electrostatics (ACE II) implicit solvent model,77 which is based on an analytical approximation to the generalized Born solvation equation.

The starting structure of the peptide was built using typical internal coordinate values. All simulations were minimized with 1000 steps of steepest descent followed by 2000 steps of the adopted-basis Newton-Raphson procedure and heated up to 300 K in 100 ps. All simulations used Langevin dynamics at 300 K with a (water) frictional coefficient set to 91 ps−1. The reaction coordinate ξ was defined as the distance in Angstroms between the sulfur of cysteine and the center of mass of the tryptophan aromatic ring. The various sets of simulations we ran were as follows:

  • (a)

    Exhaustive sampling simulations totaling 0.5 μs simulation time were run using constant-temperature molecular dynamics (in the absence of any restraining potential). We generated 25 initial conformations from a 10 ns long equilibration out of which originated 25 equilibrium trajectories, each 20 ns long, with different canonically distributed initial conditions; the conformational snapshots such obtained during the 0.5 μs cumulative time were thus representative conformations drawn from the canonical ensemble and were binned by the value of the reaction coordinate ξ to generate the experimental potential of mean force. In other words, this is, in the absence of a real experimental PMF, what we will use as the guiding PMF to which the guiding potentials will be fit.

  • (b)
    Unguided (regular) equal-restraint umbrella sampling simulations run in
    • (i)
      seventeen windows; wi=ki(ξ−ξi)2, i=1,17¯, all with the same restraining potential constant, ki=1.2 kcal∕(mol Å2) were run for 15 ns each for a total of 255 ns. The windows were placed at positions ξi that ranged from 3 to 19 Å in 1 Å increments. The value of the reaction coordinate was recorded every 10 fs.
    • (ii)
      five windows; wi=ki(ξ−ξi)2, i=1,5¯; this second set of umbrella sampling simulations was run using the same range but fewer windows and wider restraining potentials with force constant ki=0.4 kcal∕(mol Å2) placed at 3, 7, 11, 15, and 19 Å for 15 ns each for a total of 75 ns.
  • (c)

    experimentally guided umbrella sampling with five guiding potentials wi(g)=ki(g)(ξξi(g))2 fitted to the simulated experimental PMF obtained from the snapshots recorded during the 0.5 μs exhaustive sampling trajectory (Fig. 2). The guided umbrella sampling potential force constants were: k1(g)=1.1kcal(molÅ2), k2(g)=0.11kcal(molÅ2), k3(g)=0.07kcal(molÅ2), k4(g)=0.12kcal(molÅ2), k5(g)=0.3kcal(molÅ2), placed at ξ1(g)=5.2Å, ξ2(g)=7.7Å, ξ3(g)=10.6Å, ξ4(g)=13.9Å, and ξ4(g)=16.5Å. Each window was run for 15 ns for a total of 75 ns.

All umbrella sampling distributions were unbiased using a memory efficient implementation of the WHAM [Eqs. 4, 5] by Grossfield.78 PMFs were also generated for the exhaustive and guided umbrella sampling simulations at various times during the simulations in order to compare their evolution in time.

Comparison of the guided umbrella sampling simulations to unguided, regular, equal-constraint umbrella sampling simulations run in 17 and 5 windows showed that the guided method is more effective and requires fewer windows than the unguided one. Comparison of the final calculated PMFs of the umbrella sampling methods with the PMF from extensive sampling on the unperturbed Hamiltonian at various simulation times shows that only the guided method produces a PMF that mimics the convergence of unbiased simulations.

Importantly, however, the PMF obtained from guided umbrella sampling continued to evolve beyond its input, i.e., beyond the one obtained from the “extensive” simulations, to a slightly (up to 0.69 kcal∕mol) different PMF before converging. This implies that, even if the experimental PMF is not accurate, the guided simulation based on it will evolve towards improving the accuracy of the final PMF.

Comparison of the probability distributions and PMFs. An efficient umbrella sampling simulation typically consists of a set of as few restraining windows as possible, with as much overlap between neighboring windows as possible. The histograms for the three umbrella sampling experiments are shown in Fig. 3. While the overall shape looks similar for all three, the five window unguided umbrella sampling does not have a substantial amount of overlap between the histograms. This could be improved by adding more windows or altering the restraining potentials, but either would result in the need for more computational time.

Figure 3.

Figure 3

Biased histograms for the pentapeptide in the case of (a) the unguided 17 window umbrella sampling, (b) 5 window unguided umbrella sampling, and (c) 5 window guided umbrella sampling after 10 ns per window for each. Note the poor overlap for the histograms in panel (b) relative to (a), and the recovery of the overlap in (c).

The resulting unbiased probability distribution for the reaction coordinate [i.e., the result of unbiasing the histograms in Fig. 3 with the WHAM formula in Eqs. 4, 5] were quantitatively compared to the experimental distribution from extensive sampling quantitatively using a Kolmogorov–Smirnov (K-S) test. The K-S statistic79 is defined as

D=max<ξ<N1(ξ)N2(ξ), (10)

where N1(ξ) and N2(ξ) are the cumulative probability distributions for the two sets to be compared; it has been used previously80 as a stringent test for accurately gauging conformational sampling efficiency.

The K-S statistics are plotted as a function of simulation time in Fig. 4. A small value of the K-S statistic indicates a higher probability that the two data sets that are compared are drawn from the same underlying probability distribution, with a value of 0 indicating identical distributions. The results suggest that our guided umbrella sampling approaches the exhaustive sampling probability distribution much more quickly than the 17 window unguided umbrella sampling while the 5 window unguided umbrella sampling never comes as close to reproducing the results of the extensive sampling as the other methods, even after 75 ns. At about 30 ns, the guided umbrella sampling gives a minimum K-S value of 0.025 as compared to extensive sampling, but after 30 ns the K-S value starts to increase. However, if the guided umbrella sampling is compared to the 17 window unguided umbrella sampling after 170 ns (shown in purple in Fig. 4), the similarity of the guided umbrella sampling to the unguided begins to increase at the same time that its similarity to the exhaustive sampling begins to decrease. The K-S statistic for the final distributions of the guided and 17 window unguided umbrella sampling simulations is 0.018. Visual comparison of the probability curves and PMFs shows that the guided umbrella sampling is actually most like the exhaustive one at about 10 ns. The PMFs of the extensive sampling after the full simulation time and, for the guided and 17 window unguided umbrella sampling simulations, at the point in the simulation at which they most closely resembled the guiding extensive sampling, at 10 and 68 ns, respectively, are shown in Fig. 5a. The guided umbrella sampling result is the closest to the extensive sampling one. The five window unguided umbrella sampling, also shown after the full 75 ns, never resembles the extensive strongly. The final PMFs after convergence of the umbrella sampling techniques are compared to the extensive sampling in Fig. 5b. At this point the umbrella sampling techniques yield PMFs that are more similar to each other than to the one obtained from the extensive sampling.

Figure 4.

Figure 4

K-S test of similarity for the pentapeptide model. The cumulative probability distribution functions for the distance between the sulfur of cysteine and the center of mass of the aromatic ring of tryptophan of the pentapeptide for the three umbrella sampling methods at different simulation times were compared to the exhaustive simulations at 0.5 μs. The K-S values for the 17 window umbrella sampling, 5 window umbrella sampling, and the 5 window guided umbrella sampling are shown. The K-S statistics for the guided umbrella sampling at various times as compared to the 17 window umbrella sampling at 170 ns is also shown. The lines are meant as guides to the eye. A lower K-S value indicates higher likelihood that the distributions represent the same data.

Figure 5.

Figure 5

(a) Potentials of mean force for the end-to-end distance of the pentapeptide for the guiding extensive sampling at 0.5 μs and for the umbrella sampling methods at the time they most resemble the extensive run: i.e., at 68 ns for the 17 window unguided run and at 10 ns for the guided run. The 5 window unguided umbrella sampling never fully resembles the extensive simulation, not even after 75 ns. (b) PMFs for the 17 window unguided umbrella sampling after the full 255 ns and the guided umbrella sampling after the full 75 ns compared to 0.5 μs of extensive sampling.

In other words, after reaching a profile very close to that of the extensive sampling, the probability distributions of the guided and 17 window unguided umbrella sampling techniques begin to shift toward the right and eventually resemble each other more than the extensive. These relationships are shown qualitatively in Fig. 6, which shows the probability distributions of extensive sampling with the guided umbrella sampling at 10 ns and the 17 window unguided sampling at 68 ns (a) and the guided sampling at 30 ns and unguided at 102 ns (b). Figure 6c shows the final 17 window unguided umbrella sampling probability distribution with those for the guided at 50 and 75 ns. A possible explanation for this drift away from the exhaustive sampling distribution is that there is a transition which is not seen after the 0.5 μs of extensive sampling, i.e., that the later did not have sufficient time to exhaustively sample to convergence of the right weight of the ξ variable. This possibility seems most likely given the large timescales involved in the relaxation dynamics of many biomolecules of similar size and the fact that the umbrella sampling distributions start out looking similar to the exhaustive then gradually shift to a different distribution.

Figure 6.

Figure 6

Comparison of probability curves of pentapeptide end-to-end distance at various simulation times. (a) Exhaustive sampling at 0.5 μs, guided umbrella sampling at 10 ns (2 ns per window), and the 17 window unguided umbrella sampling at 68 ns (4 ns per window). (b) Exhaustive sampling at 0.5 μs, guided umbrella sampling at 30 ns (6 ns per window), and 17 window unguided umbrella sampling at 102 ns (6 ns per window). It can be seen that the umbrella sampling probabilities are beginning to shift to the right. (c) Probability distribution for the unguided 17 window umbrella sampling is shown at 255 ns (15 ns per window) with the guided umbrella sampling at 50 and 75 ns. There is very little difference in the guided umbrella sampling between 50 and 75 ns.

Time scaling of the sampling efficiency: A timing comparison. In order to test the possibility that there is a transition, the PMFs for the guided umbrella sampling were compared with the guiding extensive simulations at various simulation times using percent similarities and visual comparison. The PMFs appear to line up at several points. Figure 7a shows a plot of the extensive simulation times versus the guided umbrella sampling simulation times. An exponential of the form tex=1.707 exp(0.485tgu), where tex and tgu are the total extensive sampling time and the total (cumulative) guided umbrella sampling time, respectively, was the result of an exponential fit to the plot and was used to transform the guided umbrella sampling simulation with respect to time. A plot of the guided umbrella sampling PMF curve as a function of cumulative simulation time is shown in Fig. 7b. The PMF begins to shift after about 12 ns. Figure 7c shows the extensive PMF as a function of cumulative simulation time (purple) overlaid with the first 10 ns of the guided umbrella sampling PMF transformed with the exponential time relationship (red). The two plots line up extremely well with the exception of a slight decrease in the exhaustive around 12 Å from 50 to 200 ns. The plots do not match before 25 ns because the guided umbrella sampling simulations were started with the peptide in a conformation near the median reaction coordinate for each potential. There is no such relationship between the PMFs with respect to time for 17 window unguided umbrella sampling simulations and the extensive simulations.

Figure 7.

Figure 7

Comparison of the exhaustive and guided umbrella sampling PMFs for the pentapeptide end-to-end distance as a function of simulation time. (a) Plot of exhaustive sampling times vs the guided umbrella sampling times with a similar PMF; exponential fit yields tex=1.707 exp(0.485tgu). (b) Guided umbrella sampling PMF as a function of simulation time. The time evolution of the potential of mean force of the reaction coordinate shows that the free energy well begins to narrow after about 25 ns and does not converge until about 45 ns. (c) Overlay of the PMFs from the exhaustive sampling (purple) and the guided umbrella sampling (red). The time of the guided umbrella sampling has been transformed using the exponential from panel (a) in order to make it line up with the exhaustive run results. The two simulations follow the same relative time path with respect to the reaction coordinate.

Sampling of the “perpendicular” coordinates: Dihedral angle distributions. To determine if this method, given a chosen reaction coordinate to restrain, would yield accurate information for degrees of freedom to which no restraining potential was explicitly applied, we also compared the probability distributions of the peptide backbone dihedral angles Φ and Ψ.

Dihedral angle probability curves for all simulations were calculated using a modification of the WHAM equation which addressed the need to unbias the probabilities of reaction coordinates on which the restraining potential entering the perturbed Hamiltonian did not depend explicitly. The unrestrained reaction coordinate, say, a dihedral angle Φ, was unbiased using the equation

p(Φ)=ξinip(Φ)i[jnje(wj(ξ)Fj)kbT]1, (11)

where ξ is the restrained reaction coordinate and Φ is the unrestrained coordinate. Iterative calculations of p and ξ were done using Eqs. 4, 5.56, 81, 82

Since the peptide chosen contains only five amino acids, it was feasible to calculate the unbiased PMFs of all the Φ and Ψ dihedral angles. The individual angles from the 0.5 μs long extensive sampling simulation were compared with those from the umbrella sampling methods over 68 ns for the 17 window unguided and 10 ns for the guided umbrella sampling protocols (Fig. 8). The five window unguided umbrella sampling simulation was not compared as it failed to produce a valid distribution for the reaction coordinate.

Figure 8.

Figure 8

Free energy profiles for “perpendicular” degrees of freedon, i.e., the pentapeptide dihedral angles. Exhaustive sampling distribution at 0.5 μs is shown in green, 17 window umbrella sampling (68 ns) in blue, and 5 window guided umbrella sampling (10 ns) in red.

In the particular case of backbone dihedrals for this system, a good amount of similarity is observed for the (unrestrained) dihedral distribution when comparing exhaustive sampling with umbrella sampling. However, there were exceptions, as detailed below. In general cases, the quality of the match will depend on the amount of coupling between the restrained and unrestrained coordinates. A suitable measure of the degree of relationship between two coordinates has been shown to be provided, when it can be calculated to convergence, by the mutual entropy,82 a high value for it indicating that the coordinates are correlated. This correlation is unlikely to be the case for the present two coordinate manifolds (end-to-end distance and backbone dihedrals) because of the pliability of protein backbones. This means that guiding along the end-to-end distance will not enhance significantly the sampling of relevant backbone dihedral values. As described below, very rare events in the perpendicular degrees of freedom not coupled to the reaction coordinate may still not be seen with this method, and if properties of the perpendicular manifold are desired, then multidimensional umbrella sampling (i.e., sampling along two or more reaction coordinates) should be employed.83 In this regard, it would also be interesting to apply multidimensional umbrella sampling using guiding data from three-color FRET experiments43, 44 that can report up to three interfluorophore distances.

Despite the general agreement, there were angles which differed in terms of their sampled distributions between the three numerical experiments significantly: the Φ and Ψ angles of alanine, the Ψ angle of glycine, and the Ψ angle of glutamine. The differences between the Ψ angle of alanine for the three techniques are relatively small and may be due in part to the larger range of flexibility of this angle due to the small size of the side chain. The same is true of the glycine Ψ angle, although in this case the guided umbrella sampling yielded results that were more like those from extensive sampling than from the unguided one. The most significant differences are in the Φ dihedral of alanine and Ψ dihedral of glutamine. The extensive simulation gives a small well in the PMF of alanine Φ which is not seen in the other sampling methods. This well is the result of a single event during the 0.5 μs of simulations. The other difference which appears to be significant is in the Ψ angle of glutamine (see Fig. 8). The well at about −π∕4 is overpopulated in the 17 window unguided umbrella sampling and not seen in the guided umbrella sampling. The guided umbrella sampling protocol appears to have partially brought about crossing of the barrier, but then returned to the more favorable angle. This well also represents occurrences of a rare event, which took place only four times during the extensive simulations. The fact that nonconvergence of the Φ dihedral of alanine and Ψ dihedral of glutamine did not affect the convergence of the distribution of ξ further emphasizes the fact that neither of these dihedrals is expected to be coupled to the reaction coordinate (i.e., that integration over the Boltzmann factor of the dihedral energy term comes out as a constant in the expression for the PMF).

Ideally, in general cases, any efficient umbrella sampling simulations would visit every possible configuration in the directions perpendicular to the reaction coordinate. Unfortunately, given the limited simulation time, this may not always be possible. For example, as seen here, for singular events such as those which contribute to the differences in the PMFs for the alanine Φ and glutamine Ψ dihedrals, it may not be possible to obtain a completely accurate profile without enhancing sampling in those directions. However, in the particular case of weak coupling between the perpendicular coordinates and the reaction coordinate, this lack of accuracy might not affect significantly the quality of the PMF along the reaction coordinate. A related example has to do with the calculation of order parameters that report on the motion of bond vectors in proteins as measured by solution NMR relaxation experiments.84 Computer simulation of these experiments have shown that there is good agreement with experiment for backbone order parameters (which can be calculated well with a simple harmonic vibrational approximation85, 86, 87 or by an even simpler contact model88). By contrast, for order parameters of side chains (which are the “perpendicular coordinates” in the language of our previous discussion) agreement to experiment is relatively poorer89, 90 and a more thorough dynamical sampling is needed.91 The relatively larger difficulty in converging side-chain versus backbone parameters is because of the more complex motion of the side chains, but the lack of convergence for side chains does not preclude the ability to calculate accurate backbone parameters, which would indicate that there is relatively limited coupling between the backbone and the side chains, at least as far as the order parameter estimator is concerned.

Application to titin protein: Unfolding free energy profile

To further test our method to a more complex system, we applied it to the computation of the free energy profile for the forced unfolding of the I27 domain of the muscle protein titin. Because the “force field” used by nature is different from the empirical force field used in the simulations, it was important to test our method with a guiding PMF that was obtained from a different source than the simulation used to compute the final PMF (as was the case for the simulated experimental PMF of the pentapeptide in the previous section). To this end, we have used an actual experimental PMF: recently, the free energy profile for forced unfolding of the I27 domain of titin has been determined from single molecule pulling experiments74 by using the Jarzynski equality.92

The I27 domain of titin was chosen as a test system for our experimentally guided umbrella sampling method because, in addition to the recently determined PMF, its forced unfolding pathway had previously been studied extensively by both experimental74, 93, 94, 95 and numerical96, 97, 98, 99, 100 methods. Taken together, the combination of AFM, protein engineering (by ϕ value mutational analysis), molecular dynamics pulling simulations, and NMR structural considerations present a detailed picture of the unfolding pathway of this domain that involves breaking a cluster of hydrogen bonds between parallel beta sheets experiencing shear.

The experimentally derived PMF (Ref. 74) revealed a free energy change of 11.4 kcal∕mol between the equilibrium state and a transition state, the latter assumed at a previously determined value of 6 Å extension.95 Using this PMF, we ran experimentally guided umbrella sampling simulations on the titin I27 domain using the CHARMM parameters and the implicit solvent model as described in the previous section. We compared the final PMF such obtained with that obtained directly from regular, unguided umbrella sampling simulations. Each umbrella sampling window was equilibrated for 500 ps followed by a production run of 5 ns. The reaction coordinate used was the difference of the distance between the N and C termini from the equilibrium distance, i.e., the equilibrium distance, 46 Å is designated r=0, 47 Å is r=1, and so on up to r=8.5 Å. For guided umbrella sampling, potentials were assigned every 1 or 0.5 Å, with a force constant based on fitting to the experimental PMF. Distance between adjacent windows was determined based on overlap between the restraining potentials. For unguided umbrella sampling, potentials were assigned every 0.5 Å with a force constant of 5 kcal∕(mol Å2). The total simulation time was 75 ns for guided umbrella sampling and 90 ns for the unguided umbrella sampling. The starting configuration for each window was taken from the previous window.

Figure 9a shows the experimentally derived PMF with a sample of the restraining potentials we used. In order to ensure adequate sampling, potentials were fit to the experimental curve every 1 Å initially and every 0.5 Å in the instances where 1 Å did not provide sufficient overlap between reaction coordinate distributions in adjacent windows. For the fitting, we have used the assumption that the equilibrium native state of titin I27 is 6 Å away from the transition state,95 the value also used by Harris et al.74 to estimate the height of the barrier relative to the folded state. As can be seen from the figure, at the low energy equilibrium state the force constant of the umbrella potential is low. As the free energy increases, so do the force constants. This makes intuitive sense, as stronger potentials will be needed to drive the protein away from its native state. The area of the curve immediately after the transition state cannot be determined directly from experiment, due to snapping of the cantilever after the domain begins to unfold. In this region a constant k value of 5 kcal∕(mol Å2) was used.

Figure 9.

Figure 9

(A) Experimental PMF and a selection of guiding umbrella sampling potentials used in the experimentally guided umbrella sampling simulations of unfolding of titin I27. As the simulations progess toward higher energy states, the force constant increases. (B) PMF for guided and unguided umbrella sampling simulations. The two methods produce substantially different curves; in contrast to the unguided runs, the guided sampling converges significantly faster, and to a profile consistent with experimental data (see text).

Figure 9b shows the PMFs calculated from the guided umbrella sampling simulation and the unguided umbrella sampling with a constant k value of 5 kcal∕(mol Å2) every 0.5 Å. The guided umbrella sampling PMF shows an inflection point at 6 Å. While the guided umbrella sampling generates the correct intermediate structure reported to exist at 6 Å, the free energy value at r=6 is 26.8 kcal∕mol, considerably larger than the experimental value of 11.4 kcal∕mol.74 This is not entirely surprising, since our unfolding simulations use an implicit solvent model optimized for folded states, and similar differences between free energies calculated by simulation and experiment have been reported previously.101, 102 The lack of explicit-water hydrogen bonding might additionally limit accurate energy estimation in the particular case of I27, as it has been shown that water molecules can bridge the mechanical unfolding transition state of this protein.103

In contrast with the guided PMF, the PMF of the unguided umbrella sampling has a significantly different shape, with an inflection point at 4 Å. By 6 Å, in the unguided PMF calculation, the protein had not yet unfolded, even if the computing time spent (90 ns) exceeded that for the guided simulations (75 ns). To check that the PMF obtained by guiding is indeed the converged free energy profile, three additional simulation sets were run. The first one involved doubling the simulation time of the original 75 ns guided umbrella sampling windows: no noticeable change in the PMF was observed after a total of 150 ns. Second, we increased the simulation time for the unguided sampling and included starting structures in the unguided windows from the original 75 ns guided sampling: in the absence of any guiding potential, the unguided PMF started to converge towards the 75 ns guided PMF after about 165 ns total simulation time [see Fig. 9b]. Third, doubling the simulation time of the unguided umbrella sampling with no input from the guided sampling had no significant convergence effect towards the experimental PMF (data not shown).

The equilibrium structure and the snapshots for the two sets of umbrella sampling simulations [Figs. 10a, 10b, 10c, respectively] reveal the structural basis for the differences in the PMFs. In the guided umbrella sampling simulations at r=6 Å, the N-terminal strand of the domain is pulled away from the C terminus. This is in agreement with the unfolding pathway as determined by experimental ϕ values and forced unfolding simulations,97, 99 in which titin I27 initially unfolds to an intermediate state with the N-terminal beta sheet partially separated from the rest of the domain before reaching the transition state, in which the N- and C-terminal strands are completely separated from each other. The unguided umbrella sampling simulations, on the other hand, did not replicate this pathway. At 6 Å, the N-terminal beta strand was stretched but not separated from the domain [see Fig. 10b]. It was only at 13 Å (results not shown) that the N-terminal strand had begun to pull away, but the overall structure looked more like the guided umbrella sampling intermediate at 6 Å than a more extended transition state. Moreover, a H-bond analysis of the purely unguided simulation showed that the unguided restraining windows by themselves did not follow the unfolding pathway indicated by the above described experimental and forced unfolding studies. In contrast, despite the fact that the guiding PMF was incomplete beyond the transition state, using the experimental PMF as a guide for simulations resulted in an improvement in the accuracy of umbrella sampling in the region of 0–6 Å, accuracy which included the confirmation of the 6 Å transition intermediate and of the putative structural mechanism underlying unfolding.

Figure 10.

Figure 10

(A) Equilibrated structure of the titin I27 domain. (B) Guided umbrella sampling titin structure for r=6 Å. (C) Unguided umbrella sampling structure for r=6 Å. The dotted lines mark the distance between Cα carbons for residues 3 to 26 and 5 to 24, which are hydrogen bonded in the equilibrium structure. The dissociation of the A strand (in red), residues 3 to 7, is the first step in the forced unfolding pathway, which is correctly reproduced by the guided sampling, but not by the unguided one, even for twice the simulation time (see text).

The application of our guided PMF calculation to I27 unfolding served the purpose to show that it can be applied with experimentally derived data to more complex systems. While a full description of the entire folding free energy surface of the molecule is beyond the scope of the present methodology-oriented paper, the application did reveal that using the experimentally guided PMF yields data on other properties that were not “fitted” in the PMF that is consistent with what is known about the molecule: (i) an inflection point with a free energy change of 2.4 kcal∕mol at 1.5 Å, exactly where the experimental studies of Williams et al. find the location of a pretransition state from the native to an on-path intermediate (cf. Fig. 2 in Ref. 95), and (ii) structural data on the transition state consistent with NMR, mutational (ϕ-value analysis), and single molecule experiments. Further simulations of the system of extensive duration are needed to compute the conformational entropy component of the free energy for the unfolded chain beyond the transition point. As well, accurate explicit solvent representation will probably be of relevance to account for the role of water bridging and stabilization of the intermediates identified, in consistency with solvent substitution studies.103

CONCLUDING DISCUSSION

We have presented a theoretical analysis of the feasibility of an experimentally guided umbrella sampling calculation protocol. Using guiding restraining potentials based on an experimental free energy profile derived from measured equilibrium distributions of a conformational variable, we suggest a method that is able to “guide” umbrella sampling simulations to a converged potential of mean force along that variable. Using the distribution of the end-to-end distance of a pentapeptide model system, the guided umbrella sampling was more efficient than regular, unguided umbrella sampling, requiring fewer windows and giving the same results in less than one sixth the total simulation time. This did not count the simulation time of discarded unguided umbrella sampling runs with restraining potentials which were not effective. The new method also outperformed an extensive sampling done with regular molecular dynamics of 0.5 μs total time. Our guided umbrella sampling also provided potentials of mean force for unbiased reaction coordinates, the dihedral angles of the peptide, which were similar to those of the extensive sampling, with a few notable exceptions. Given the fact that they did not affect the convergence of the PMF along the constrained coordinate, the exceptions were attributed to negligible coupling between the reaction coordinate values and the respective dihedral angle values.

Our results for the pentapeptide test case also indicated that, with the guided umbrella sampling procedure, there was a transition of the converged free energy profile which went beyond that obtained from the 0.5 μs of direct sampling, but which is seen with regular umbrella sampling. This is consistent with explicit water simulations of the same peptide and others,73 which needed microsecond scale simulations to converge the results of contact formation observables in peptide loops such that they matched the corresponding experimental values.72 Interestingly, although the guided umbrella simulations on the peptide were based on the PMF from the extensive simulations, the distribution of the reaction coordinate began to look more like the 17 window regular umbrella sampling simulations after about 30 ns cumulative time. An implication is that the experimentally guided umbrella sampling protocol we propose will be effective even when the experimental PMF used to determine the restraining potentials is not an exactly accurate PMF. This holds true also for the titin test case, in which experimentally guided umbrella sampling produced better results than the unguided version, despite errors inherent in both the experimental and simulation methods.

The comparison of the guided umbrella sampling and direct exhaustive molecular dynamics of the pentapeptide as a function of simulation time showed that the guided umbrella sampling simulations roughly follow the same time course as the exhaustive simulation result, but did so exponentially faster. The time plots also showed how the guided PMF evolves in time past the profile reached by exhaustive sampling. Similarly, for the forced unfolding of titin I27, guided umbrella sampling simulations followed an unfolding pathway in agreement with previous simulations and experiment,97, 99 whereas umbrella sampling with arbitrarily chosen force constants did not.

Although the usefulness of our proposed technique obviously depends on the availability of reliable experimental data, we believe that the potential applications are promising. We intend to extend this method from its use for equilibrium processes to using it for kinetic processes, i.e., for the calculation of time correlation functions. We are currently applying our guiding method to study the mechanism of folding of CI2, for which the PMF for the distance between two residues in various concentrations of denaturant has been determined using single molecule FRET by Weiss and co-workers.25, 26 While errors in the FRET experiments can lead to inaccuracies in the resulting PMFs,46, 47 our work shows that umbrella sampling is improved even when the guiding potential is not the exact PMF (as was the case for the pentapeptide), or incomplete (as in the case of titin I27).

Having the experimental guiding potential for large complex biomolecules might prove useful because it can guide sampling in regions of configuration space that would not even be visited by the other methods (because of long time sampling problems). As such, properties of perpendicular degrees of freedom might be revealed that would not have been gauged before. For example, FRET experiments can report on the distance between two residues of CI2 at different folded states, but cannot directly reveal the sequence of folding in the manifold perpendicular to the interfluorophore distance.25, 26 In addition, in comparison to techniques such as steered,104 biased,105 or targeted molecular dynamics,106 which use information about the final state of a system to direct simulations that reveal information about a pathway, this technique can use experimental information about a reaction coordinate to direct simulations in a way that can reveal more detailed information about a system in which details of the final state may not be known. This may be important, as studies on several proteins including titin107 indicate, to discern between unfolding by pulling at one end through a pore versus at both end,41 or to assess the dependence on the direction of pulling.108

The insights derived from the PMF curve, an inherent equilibrium calculation, can be further enhanced by advanced dynamical sampling techniques such as transition path sampling.109, 110, 111 This powerful method for calculating kinetics—whose use in biomolecular applications has been reported112, 113, 114—can be applied with transition points from the structures generated from the guided simulations to obtain dynamical trajectories that span the length between the initial and final states of the PMF. These trajectories can probe if the dynamics progresses as expected from the equilibration assumption inherent to the PMF calculations (i.e., to probe whether the choice of the reaction coordinate as the slowest order parameter was appropriate,115 or if coupling to other degrees of freedom is significant116, 117). Because transition path sampling is a method that does not require the definition of a reaction coordinate, the ensembles of dynamical transition trajectories generated by it (or by other long-time dynamical methods118, 119, 120, 121, 122, 123) can be grouped together and further analyzed to obtain information about distinct reaction pathways.

In the same vein, there may be exciting applications that have to do with heterogeneity and the apparent nonergodicity124 observed in single molecule studies of complex molecules. For example, in the case of the hairpin ribozyme, single molecule FRET revealed complex structural dynamics that would have been difficult to detect in ensemble measurements. This ribozyme, which undergoes a conformational change from an undocked to a docked conformation has several distinct docked∕undocked substates with different rates of undocking, and the ribozyme exhibited a memory effect, where switching between different docked substates was rarely seen.125 This apparent non-Markovian, memory effect in the hairpin ribozyme is presumably due to different conformations of two loops within the molecule.125 These different conformations cannot be directly detected experimentally, but could be probed during a molecular dynamics simulation that is guided along a distinct experimental (single molecule) PMF.

ACKNOWLEDGMENTS

M.M. was partially supported by an NIH Molecular Biophysics Predoctoral Research Training Program. I.A. acknowledges support from the NSF CAREER award program (CHE-0548047) and the donors of the ACS Petroleum Research Foundation.

References

  1. Kirkwood J. G., J. Chem. Phys. 10.1063/1.1749657 3, 300 (1935). [DOI] [Google Scholar]
  2. Roux B. and Karplus M., Biophys. J. 59, 961 (1991). [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Roux B., Comput. Phys. Commun. 10.1016/0010-4655(95)00053-I 91, 275 (1995). [DOI] [Google Scholar]
  4. Pratt L. R. and Chandler D., J. Chem. Phys. 10.1063/1.435308 67, 3683 (1977). [DOI] [Google Scholar]
  5. Gao J., J. Am. Chem. Soc. 10.1021/ja00020a070 113, 7796 (1991). [DOI] [Google Scholar]
  6. Hinsen K. and Roux B., J. Chem. Inf. Comput. Sci. 10.1021/ci9702398 37, 1018 (1997). [DOI] [Google Scholar]
  7. Smondyrev A. and Voth G., Biophys. J. 82, 1460 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Chen H., Wu Y., and Voth G., Biophys. J. 10.1529/biophysj.107.105742 93, 3470 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Roux B., Allen T., Berneche S., and Im W., Q. Rev. Biophys. 10.1017/S0033583504003968 37, 15 (2004). [DOI] [PubMed] [Google Scholar]
  10. Pettitt B. M. and Karplus M., Chem. Phys. Lett. 10.1016/0009-2614(85)85509-3 121, 194 (1985). [DOI] [Google Scholar]
  11. Tobias D. J. and C. L.BrooksIII, J. Phys. Chem. 10.1021/j100188a054 96, 3864 (1992). [DOI] [Google Scholar]
  12. Boczko E. and Brooks C., Science 10.1126/science.7618103 269, 393 (1995). [DOI] [PubMed] [Google Scholar]
  13. Huang N., Banavali N., and MacKerell J., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.0135427100 100, 68 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Hart K., Nystrom B., Ohman M., and Nilsson L., RNA 10.1261/rna.7147805 11, 609 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Wereszczynski J. and Andricioaei I., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.0603850103 103, 16200 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Carter E., Ciccotti G., Haynes J., and Kapral R., Chem. Phys. Lett. 10.1016/S0009-2614(89)87314-2 156, 472 (1989). [DOI] [Google Scholar]
  17. Sprik M. and Ciccotti G., J. Chem. Phys. 10.1063/1.477419 109, 7737 (1998). [DOI] [Google Scholar]
  18. Kuczera K., J. Comput. Chem. 17, 1726 (1996). [DOI] [Google Scholar]
  19. Wang Y. and Kuczera K., J. Phys. Chem. B 10.1021/jp964027+ 101, 5205 (1997). [DOI] [Google Scholar]
  20. Li G. and Cui Q., J. Mol. Graphics Modell. 10.1016/j.jmgm.2005.06.001 24, 82 (2005). [DOI] [PubMed] [Google Scholar]
  21. Andrews D. and Demidov A., Resonance Energy Transfer (Wiley, New York, 1999). [Google Scholar]
  22. Cooper A., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.73.8.2740 73, 2740 (1976). [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Hill T. L., Thermodynamics of Small Systems (Dover, New York, 1994). [Google Scholar]
  24. Bustamante C., Liphardt J., and Ritort F., Phys. Today 58, 43 (2005). [Google Scholar]
  25. Deniz A. A., Laurence T. A., Beligere G. S., Dahan M., Martin A. B., Chemla D. S., Dawson P. E., Schultz P. G., and Weiss S., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.090104997 97, 5179 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Weiss S., Nat. Struct. Biol. 10.1038/78941 7, 724 (2000). [DOI] [PubMed] [Google Scholar]
  27. Schuler B. and Eaton W., Curr. Opin. Struct. Biol. 10.1016/j.sbi.2007.12.003 18, 16 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Walter N. G., Harris D. A., Pereira M. J. B., and Rueda D., Biopolymers 10.1002/bip.10144 61, 224 (2001). [DOI] [PubMed] [Google Scholar]
  29. Carrion-Vazquez M., Oberhauser A. F., Fisher T. E., Marszalek P. E., Li H., and Fernandez J. M., Prog. Biophys. Mol. Biol. 10.1016/S0079-6107(00)00017-1 74, 63 (2000). [DOI] [PubMed] [Google Scholar]
  30. Bustamante C., Macosko J. C., and Wuite G. J. L., Nat. Rev. Mol. Cell Biol. 1, 130 (2000). [DOI] [PubMed] [Google Scholar]
  31. Strick T. R., Dessinges M. N., Charvin G., Dekker N. H., Allemand J. F., Bensimon D., and Croquette V., Rep. Prog. Phys. 10.1088/0034-4885/66/1/201 66, 1 (2003). [DOI] [Google Scholar]
  32. Hummer G. and Szabo A., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.071034098 98, 3659 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. MacFadyen J. and Andricioaei I., J. Chem. Phys. 10.1063/1.2000242 123, 074107 (2005). [DOI] [PubMed] [Google Scholar]
  34. Schlierf M. and Rief M., Biophys. J. 10.1529/biophysj.105.077982 90, L33 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Dudko O. K., Hummer G., and Szabo A., Phys. Rev. Lett. 10.1103/PhysRevLett.96.108101 96, 108101 (2006). [DOI] [PubMed] [Google Scholar]
  36. West D., Olmsted P., and Paci E., J. Chem. Phys. 10.1063/1.2393232 125, 204910 (2006). [DOI] [PubMed] [Google Scholar]
  37. Nummela J., Yassin F., and Andricioaei I., J. Chem. Phys. 10.1063/1.2817332 128, 024104 (2008). [DOI] [PubMed] [Google Scholar]
  38. Nummela J. and Andricioaei I., Biophys. J. 10.1529/biophysj.107.111658 93, 3373 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Kirmizialtin S., Huang L., and Makarov D. E., J. Chem. Phys. 10.1063/1.1931659 122, 234915 (2005). [DOI] [PubMed] [Google Scholar]
  40. Matouschek A., Curr. Opin. Struct. Biol. 10.1016/S0959-440X(03)00010-1 13, 98 (2003). [DOI] [PubMed] [Google Scholar]
  41. Tian P. and Andricioaei I., J. Mol. Biol. 10.1016/j.jmb.2005.05.035 350, 1017 (2005). [DOI] [PubMed] [Google Scholar]
  42. Woodside M. T., Anthony P. C., Behnke-Parks W. M., Larizadeh K., Herschlag D., and Block S. M., Science 10.1126/science.1133601 314, 1001 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Hohng S., Joo C., and Ha T., Biophys. J. 10.1529/biophysj.104.043935 87, 1328 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Clamme J. P. and Deniz A., ChemPhysChem 10.1002/cphc.200400261 6, 74 (2005). [DOI] [PubMed] [Google Scholar]
  45. Ting C. L. and Makarov D. E., J. Chem. Phys. 10.1063/1.2835611 128, 115102 (2008). [DOI] [PubMed] [Google Scholar]
  46. Schuler B., Lipman E., and Eaton W., Nature (London) 10.1038/nature01060 419, 743 (2002). [DOI] [PubMed] [Google Scholar]
  47. Gopich I. and Szabo A., J. Phys. Chem. B 10.1021/jp027481o 107, 16111 (2003). [DOI] [Google Scholar]
  48. Karplus M. and McCammon J. A., Nat. Struct. Biol. 10.1038/nsb0902-646 9, 646 (2002). [DOI] [PubMed] [Google Scholar]
  49. Palmer R., Adv. Phys. 10.1080/00018738200101438 31, 669 (1982). [DOI] [Google Scholar]
  50. Torrie G. M. and Valleau J. P., J. Comput. Phys. 10.1016/0021-9991(77)90121-8 23, 187 (1977). [DOI] [Google Scholar]
  51. Ferrenberg A. M. and Swendsen R. H., Phys. Rev. Lett. 10.1103/PhysRevLett.63.1195 63, 1195 (1989). [DOI] [PubMed] [Google Scholar]
  52. Ferrenberg A. M., Phys. Rev. Lett. 63, 1658 (1989). [DOI] [PubMed] [Google Scholar]
  53. Kumar S., Bouzida D., Swendsen R. H., Kollman P. A., and Rosenberg J. M., J. Comput. Chem. 10.1002/jcc.540130812 13, 1011 (1992). [DOI] [Google Scholar]
  54. Mezei M., J. Comput. Phys. 10.1016/0021-9991(87)90054-4 68, 237 (1987). [DOI] [Google Scholar]
  55. Hooft R. W. W., Vaneijck B. P., and Kroon J., J. Chem. Phys. 10.1063/1.463947 97, 6690 (1992). [DOI] [Google Scholar]
  56. Bartels C. and Karplus M., J. Comput. Chem. 18, 1450 (1997). [DOI] [Google Scholar]
  57. Bartels C. and Karplus M., J. Phys. Chem. B 10.1021/jp972280j 102, 865 (1998). [DOI] [Google Scholar]
  58. Schaefer M., Bartels C., and Karplus M., J. Mol. Biol. 10.1006/jmbi.1998.2172 284, 835 (1998). [DOI] [PubMed] [Google Scholar]
  59. Darve E., Wilson M. A., and Pohorille A., Mol. Simul. 10.1080/08927020211975 28, 113 (2002). [DOI] [Google Scholar]
  60. Laio A. and Parrinello M., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.202427399 99, 12562 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Berg B. A. and Neuhaus T., Phys. Lett. B 10.1016/0370-2693(91)91256-U 267, 249 (1991). [DOI] [Google Scholar]
  62. Berg B. A. and Neuhaus T., Phys. Rev. Lett. 10.1103/PhysRevLett.68.9 68, 9 (1992). [DOI] [PubMed] [Google Scholar]
  63. Hansmann U. H. E. and Okamoto Y., J. Comput. Chem. 10.1002/jcc.540141110 14, 1333 (1993). [DOI] [Google Scholar]
  64. Wang F. and Landau D., Phys. Rev. Lett. 10.1103/PhysRevLett.86.2050 86, 2050 (2001). [DOI] [PubMed] [Google Scholar]
  65. Lee J., Phys. Rev. Lett. 10.1103/PhysRevLett.71.211 71, 211 (1993). [DOI] [PubMed] [Google Scholar]
  66. Rathore N., Knotts T., and de Pablo J., J. Chem. Phys. 10.1063/1.1542598 118, 4285 (2003). [DOI] [Google Scholar]
  67. Shell M., Debenedetti P., and Panagiotopoulos A., J. Chem. Phys. 10.1063/1.1615966 119, 9406 (2003). [DOI] [Google Scholar]
  68. Kim J., Straub J., and Keyes T., Phys. Rev. Lett. 10.1103/PhysRevLett.97.050601 97, 050601 (2006). [DOI] [PubMed] [Google Scholar]
  69. Beutler T. C. and van Gunsteren W. F., J. Chem. Phys. 10.1063/1.466628 100, 1492 (1994). [DOI] [Google Scholar]
  70. Best R. B. and Vendruscolo M., J. Am. Chem. Soc. 10.1021/ja0396955 126, 8090 (2004). [DOI] [PubMed] [Google Scholar]
  71. Vendruscolo M., Paci E., Dobson C. M., and Karplus M., J. Am. Chem. Soc. 10.1021/ja036523z 125, 15686 (2003). [DOI] [PubMed] [Google Scholar]
  72. Lapidus L. J., Eaton W. A., and Hofrichter J., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.97.13.7220 97, 7220 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Yeh I. C. and Hummer G., J. Am. Chem. Soc. 10.1021/ja025789n 124, 6563 (2002). [DOI] [PubMed] [Google Scholar]
  74. Harris N. C., Song Y., and Kiang C.-H., Phys. Rev. Lett. 10.1103/PhysRevLett.99.068101 99, 068101 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Kobrak M. N., J. Comput. Chem. 10.1002/jcc.10313 24, 1437 (2003). [DOI] [PubMed] [Google Scholar]
  76. Brooks B. R., Bruccoleri R. E., Olafson B. D., States D. J., Swaminathan S., and Karplus M., J. Comput. Chem. 10.1002/jcc.540040211 4, 187 (1983). [DOI] [Google Scholar]
  77. Schaefer M. and Karplus M., J. Phys. Chem. 10.1021/jp9521621 100, 1578 (1996). [DOI] [Google Scholar]
  78. Grossfield A., source code from http://membrane.urmc.rochester.edu/index.html.
  79. Press W., Flannery B., Teukolsky S. A., and Vetterling W. T., Numerical Recipes in FORTRAN 77: The Art of Scientific Computing (Cambridge University Press, Cambridge, 1992). [Google Scholar]
  80. Andricioaei I. and Straub J. E., J. Chem. Phys. 10.1063/1.475203 107, 9117 (1997). [DOI] [Google Scholar]
  81. Gallicchio E., Andrec M., Felts A. K., and Levy R. M., J. Phys. Chem. B 10.1021/jp045294f 109, 6722 (2005). [DOI] [PubMed] [Google Scholar]
  82. Minh D. D. L., J. Phys. Chem. B 10.1021/jp068656n 111, 4137 (2007). [DOI] [PubMed] [Google Scholar]
  83. Kumar S., Rosenberg J. M., Bouzida D., Swendsen R. H., and Kollman P. A., J. Comput. Chem. 10.1002/jcc.540161104 16, 1339 (1995). [DOI] [Google Scholar]
  84. Lipari G., Szabo A., and Levy R. M., Nature (London) 10.1038/300197a0 300, 197 (1982). [DOI] [Google Scholar]
  85. Henry E. R. and Szabo A., J. Chem. Phys. 10.1063/1.448692 82, 4753 (1985). [DOI] [Google Scholar]
  86. Brüschweiler R. and Case D. A., Phys. Rev. Lett. 10.1103/PhysRevLett.72.940 72, 940 (1994). [DOI] [PubMed] [Google Scholar]
  87. Sunada S., Go N., and Koehl P., J. Chem. Phys. 10.1063/1.471170 104, 4768 (1996). [DOI] [Google Scholar]
  88. Zhang F. and Brüschweiler R., J. Am. Chem. Soc. 10.1021/ja027847a 124, 12654 (2002). [DOI] [PubMed] [Google Scholar]
  89. Smith L. J., Mark A. E., Dobson C. M., and van Gunsteren W. F., Biochemistry 10.1021/bi00034a026 34, 10918 (1995). [DOI] [PubMed] [Google Scholar]
  90. Ming D. and Brüschweiler R., J. Biomol. NMR 10.1023/B:JNMR.0000032612.70767.35 29, 363 (2004). [DOI] [PubMed] [Google Scholar]
  91. Best R., Clarke J., and Karplus M., J. Mol. Biol. 10.1016/j.jmb.2005.03.001 349, 185 (2005). [DOI] [PubMed] [Google Scholar]
  92. Jarzynski C., Phys. Rev. Lett. 10.1103/PhysRevLett.78.2690 78, 2690 (1997). [DOI] [Google Scholar]
  93. Marszalek P. E., Lu H., Li H. B., Carrion-Vazquez M., Oberhauser A. F., Schulten K., and Fernandez J. M., Nature (London) 10.1038/47083 402, 100 (1999). [DOI] [PubMed] [Google Scholar]
  94. Li H., Oberhauser A. F., Fowler S. B., Clarke J., and Fernandez J. M., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.120048697 97, 6527 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Williams P. M., Fowler S. B., Best R. B., Toca-Herrera J. L., Scott K. A., Steward A., and Clarke J., Nature (London) 10.1038/nature01517 422, 446 (2003). [DOI] [PubMed] [Google Scholar]
  96. Lu H. and Schulten K., Biophys. J. 79, 51 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Fowler S. B., Best R. B., Herrera J. L. T., Rutherford T. J., Steward A., Paci E., Karplus M., and Clarke J., J. Mol. Biol. 10.1016/S0022-2836(02)00805-7 322, 841 (2002). [DOI] [PubMed] [Google Scholar]
  98. Li P. C. and Makarov D. E., J. Chem. Phys. 10.1063/1.1615233 119, 9260 (2003). [DOI] [Google Scholar]
  99. Best R. B., Fowler S. B., Herrera J. L. T., Steward A., Paci E., and Clarke J., J. Mol. Biol. 10.1016/S0022-2836(03)00618-1 330, 867 (2003). [DOI] [PubMed] [Google Scholar]
  100. Hummer G. and Szabo A., Biophys. J. 85, 5 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Zhou R. H. and Berne B. J., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.142430099 99, 12777 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Nymeyer H. and Garcia A. E., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.2232868100 100, 13934 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Dougan L., Feng G., Lu H., and Fernandez J., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.0706075105 105, 3185 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
  104. Isralewitz B., Gao M., and Schulten K., Curr. Opin. Struct. Biol. 10.1016/S0959-440X(00)00194-9 11, 224 (2001). [DOI] [PubMed] [Google Scholar]
  105. Paci E. and Karplus M., J. Mol. Biol. 10.1006/jmbi.1999.2670 288, 441 (1999). [DOI] [PubMed] [Google Scholar]
  106. Schlitter J., Engels M., and Kruger P., J. Mol. Graphics 10.1016/0263-7855(94)80072-3 12, 84 (1994). [DOI] [PubMed] [Google Scholar]
  107. Sato T., Esaki M., Fernandez J. M., and Endo T., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.0504495102 102, 17999 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. West D. K., Brockwell D. J., and Paci E., Biophys. J. 10.1529/biophysj.106.089490 91, L51 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  109. Dellago C., Bolhuis P. G., Csajka F. S., and Chandler D., J. Chem. Phys. 10.1063/1.475562 108, 1964 (1998). [DOI] [Google Scholar]
  110. Dellago C., Bolhuis P. G., and Chandler D., J. Chem. Phys. 10.1063/1.478569 110, 6617 (1999). [DOI] [Google Scholar]
  111. Dellago C., Bolhuis P. G., and Geissler P. L., Adv. Chem. Phys. 10.1002/0471231509.ch1 123, 1 (2002). [DOI] [Google Scholar]
  112. Bolhuis P. G., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.1534924100 100, 12129 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Marti J. and Csajka F., Phys. Rev. E 10.1103/PhysRevE.69.061918 69, 061918 (2004). [DOI] [PubMed] [Google Scholar]
  114. Radhakrishnan R. and Schlick T., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.0308585101 101, 5970 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Bolhuis P. G., Dellago C., and Chandler D., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.100127697 97, 5877 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  116. Ma A., Nag A., and Dinner A. R., J. Chem. Phys. 10.1063/1.2183768 124, 144911 (2006). [DOI] [PubMed] [Google Scholar]
  117. Best R., Paci E., Hummer G., and Dudko O., J. Phys. Chem. B 10.1021/jp075955j 112, 5968 (2008). [DOI] [PubMed] [Google Scholar]
  118. Olender R. and Elber R., J. Chem. Phys. 10.1063/1.472727 105, 9299 (1996). [DOI] [Google Scholar]
  119. Elber R., Meller J., and Olender R., J. Phys. Chem. B 10.1021/jp983774z 103, 899 (1999). [DOI] [Google Scholar]
  120. Faradjian A. K. and Elber R., J. Chem. Phys. 10.1063/1.1738640 120, 10880 (2004). [DOI] [PubMed] [Google Scholar]
  121. Moroni D., Bolhuis P. G., and van Erp T. S., J. Chem. Phys. 10.1063/1.1644537 120, 4055 (2004). [DOI] [PubMed] [Google Scholar]
  122. Voter A. F., Phys. Rev. B 10.1103/PhysRevB.57.R13985 57, R13985 (1998). [DOI] [Google Scholar]
  123. Shirts M. R. and Pande V. S., Phys. Rev. Lett. 10.1103/PhysRevLett.86.4983 86, 4983 (2001). [DOI] [PubMed] [Google Scholar]
  124. Xie Z., Srividya N., Sosnick T. R., Pan T., and Scherer N. F., Proc. Natl. Acad. Sci. U.S.A. 10.1073/pnas.2636333100 101, 534 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
  125. Zhuang X. W., Kim H., Pereira M. J. B., Babcock H. P., Walter N. G., and Chu S., Science 10.1126/science.1069013 296, 1473 (2002). [DOI] [PubMed] [Google Scholar]

Articles from The Journal of Chemical Physics are provided here courtesy of American Institute of Physics

RESOURCES