Skip to main content
The Journal of Chemical Physics logoLink to The Journal of Chemical Physics
. 2021 Mar 15;154(11):115101. doi: 10.1063/5.0040949

Estimating transition path times and shapes from single-molecule photon trajectories: A simulation analysis

Grace H Taumoefolau 1, Robert B Best 2,a)
PMCID: PMC7963489  PMID: 33752373

Abstract

In a two-state molecular system, transition paths comprise the portions of trajectories during which the system transits from one stable state to the other. Because of their low population, it is essentially impossible to obtain information on transition paths from experiments on a large sample of molecules. However, single-molecule experiments such as laser optical tweezers or Förster resonance energy transfer (FRET) spectroscopy have allowed transition-path durations to be estimated. Here, we use molecular simulations to test the methodology for obtaining information on transition paths in single-molecule FRET by generating photon trajectories from the distance trajectories obtained in the simulation. Encouragingly, we find that this maximum likelihood analysis yields transition-path times within a factor of 2–4 of the values estimated using a good coordinate for folding, but tends to systematically underestimate them. The underestimation can be attributed partly to the fact that the large changes in the end–end distance occur mostly early in a folding trajectory. However, even if the transfer efficiency is a good reaction coordinate for folding, the assumption that the transition-path shape is a step function still leads to an underestimation of the transition-path time as defined here. We find that allowing more flexibility in the form of the transition path model allows more accurate transition-path times to be extracted and points the way toward further improvements in methods for estimating transition-path time and transition-path shape.

I. INTRODUCTION

One of the strengths of the single molecule experiments is their ability to directly resolve subpopulations of molecules in different states, which might be lost in an ensemble-averaged signal.1 This allows a much less ambiguous interpretation than might be the case in an ensemble experiment, but is hardly a unique feature of single-molecule measurements. Many types of spectroscopy performed on a macroscopic sample can yield a similar breakdown into subspecies. The real power of single molecule techniques comes from their ability to monitor the properties [e.g., transfer efficiency in single-molecule Förster resonance energy transfer (FRET) experiments or molecular length in single-molecule force experiments] of a single molecule over a period of time.2–8 This permits the elucidation of the mechanism (e.g., on-pathway vs off-pathway intermediates) but, more generally, allows molecular trajectories to be monitored directly, something which is otherwise only possible using molecular simulation techniques.

Many biomolecules are known to cooperatively fold or undergo conformational changes such as between folded and unfolded or active or inactive states. Thus, molecular trajectories often appear to spend most of their time fluctuating within one of these stable states, with only occasional transitions between them.9,10 Such transition paths (TPs) arise from crossings of the free energy barriers between states and are usually of such a short duration such that they are challenging to resolve, even with the best instrumentation. However, these fleeting events ultimately contain all the information about the mechanism by which a transition occurs11,12 and have therefore been the subject of intense theoretical interest.13–20 The first property of a transition path that can be characterized is its duration or “transition-path time.” Because transition paths are fleeting events, challenging the time resolution of experiments, characterizing transition path times has been the main focus of experimental FRET studies to this point (pulling experiments have been used to characterize other properties such as transition path velocities and shapes21,22).

The most extensive work on transition paths has been performed by single-molecule experiments with optical tweezers by Woodside and co-workers5,6,21,23,24 and single-molecule FRET experimental methods pioneered by Chung, Eaton, and Schuler.2–4,7,8,25 The tweezer experiments have focused mainly on transitions in nucleic acids. In that case, the main experimental challenge is to make the instrument responsive enough that it does not interfere in the measurement while still keeping noise to a level that is manageable.26 In the case of the FRET experiments, the challenge arises because the number of photons that can be detected per unit time is limited by the need to avoid increased photodamage and emission saturation associated with high laser intensities.27 Since this has so far limited the typical interphoton time to 10–100 µs, determining the FRET transfer efficiencies using the ratio of acceptor photons to total photons detected within discrete time bins is not feasible: it would limit detection times to a millisecond time scale, much longer than typical transition paths for protein folding2 (although transition paths for nucleic acid folding tend to be longer6,25). Instead, the detection of transition paths from smFRET experiments has required the development of maximum likelihood methods that can estimate the most likely transition-path time given a string of detected photons or photon trajectory.2 Such models need to make simplifications to limit the number of parameters; however, it is unclear how closely the estimated transition-path times correspond to those that would be obtained if all coordinate information was available.

Here, we set out to evaluate the approximations involved in estimating transition-path times from photon trajectories using a molecular simulation approach. Folding trajectories are generated using a structure-based model for several proteins; these trajectories constitute the “true” events we are trying to describe. Next, stochastic photon trajectories are generated from the molecular trajectories using the fluctuating transfer efficiency and photon flux typical of the experiment. We initially analyze these trajectories using the simplest “step” model for the transition path. Encouragingly, the extracted maximum likelihood transition-path times are within a factor of 2–3 of the transition path time measured by projecting the trajectories onto a good reaction coordinate. However, the transition path times estimated from the photon trajectories are systematically shorter than the values directly estimated from reaction coordinate projections. While this is partly due to the FRET efficiency not being an ideal reaction coordinate for folding, we find that an equally important reason is the deviation of the transition-path “shape” in the simulations from the idealized step function used in the analysis, as we have discussed. We investigate ways in which this analysis might be improved, especially if the photon flux from experiments could be increased.

II. METHODS

A. Coarse-grained simulations

To simulate the dynamics of the proteins α3D (PDB ID: 2A3D), CspTm (PDB ID: 1G6P), and GB1 (PDB ID: 1PGB), we construct coarse-grained Cα Gō models from their respective solution nuclear magnetic resonance (NMR) and crystal structures. Each residue is represented by a bead with the center of mass positioned at its alpha carbon, connected to its neighbors by bonds whose length is taken from the native structure. Residues in contact via hydrogen bonds or hydrophobic contacts in the native state have an attractive potential, while all other residue pairs experience a repulsive potential.28 We employ this minimalist model in order to simulate trajectories over biologically relevant time scales (∼50 ms), from which we can directly compute protein folding and unfolding rates. We perform simulations using the Langevin dynamics integrator in Gromacs version 4.0.529 near the transition temperatures of each protein in the model (270, 310, and 300 K for α3D, CspTm, and GB1, respectively). An integration time step of 10 fs and a friction coefficient of 0.2 ps−1 are used. Frames are written out every 10 000 steps.

For each frame, the fraction of native contacts (Q) and instantaneous FRET efficiency E are calculated. The fraction of native contacts is defined as

Q=1N(i,j)contacts11+eβ(rijλrij(0)),

where the sum runs over N native contact pairs (i, j) involving residues i and j, β = 50 nm−1 controls the steepness of the step function, rij and rij(0) are the distance between i and j in the conformation of interest and the native state, respectively, and λ = 1.2 allows for distance fluctuations within the native state.

The FRET efficiency is defined as

E(r)=11+rR06,

where r is the distance between the donor and acceptor dyes and R0 is the Förster distance. For our analysis, R0 is 5.4 nm, as this is the Förster distance for Alexa 594 and Alexa 488. To closely represent the donor and acceptor dyes conjugated to each protein, we add two sets of five beads with negatively charged ends to the residues corresponding to the sites of conjugation.

We determine folding and unfolding rates by sampling each protein’s dwell times in corresponding equilibrium states. To correct for the faster dynamics that are typical of coarse-grained simulations, we scale simulation time such that the midpoint folding/unfolding rates calculated from simulations and experiments match. Finally, we directly calculate the “true” transition path time by defining it as the time the protein takes to transit from a low-Q threshold to a high-Q threshold or vice versa. These cutoffs are positioned at either side of the energy barrier near the local minima of the equilibrium distribution: for α3D, the cutoffs are 0.31 and 0.78; for CspTm, 0.10 and 0.80; and for GB1, 0.15 and 0.85. In the case of protein GB1, under a constant pulling force of 10 pN between residues 1 and 56 at 280 K (described below), the low-Q and high-Q thresholds were 0.1 and 0.85, respectively. As the starting configuration is the native structure, we largely sample unfolding transition paths when a pulling force is applied.

B. Brownian dynamics simulations

We run Brownian dynamics simulations on a bi-stable, symmetric potential to model diffusive behavior. The quartic potential, with barrier height ϵkBT, is

U0(x)=ϵ(x21)2.

The position of the particle is propagated according to the following expression:

x(t+dt)=x(t)+βDdU0dxdt+g2Ddt,

in which D is the diffusion coefficient and g is a random variable chosen from a Gaussian distribution with unit variance. A time step of dt = 0.0001 was used, with frames saved every 100 steps. We calculate the transition path shape by directly sampling transitions between the two basins over a long trajectory.

C. Generating synthetic photon trajectories

We produce photon trajectories from our molecular dynamics simulations so that we can analyze them via the maximum likelihood analysis. Photon arrival times are randomly selected from an exponential distribution of interphoton time (as expected for Poissonian photon statistics) with average emission rate equal to the target photon flux. The instantaneous FRET efficiency in the simulation at the time of emission determines the probability that the photon originates from the donor or acceptor dye, from which the photon color is randomly generated.

D. Maximum likelihood analysis

We implement the maximum likelihood method developed by Gopich and Szabo30 to estimate the average transition path time of our proteins. Specifically, we calculate the difference in log-likelihoods for the two- and three-state kinetics model. Both the two- and three-state models assume step trajectories, in which the transition path is a virtual intermediate between the high and low FRET efficiency, corresponding to folded and unfolded states. The structure of the likelihood function for the jth transition path comprising photons i with colors ci and interphoton times τi is

Lj=vfinTi=2Nj[F(ci)exp(Kτi)]F(c1)vini.

In this expression, K is the rate matrix for the respective models and F is the photon color matrix as determined by the photon color, i.e., F(acceptor) = E and F(donor) = IE, where E is the diagonal matrix with the FRET efficiencies of the folded, unfolded, and intermediate states and I is the identity matrix. For a two-state transition (instantaneous transition paths),

E=Ef00Eu,K=kukfkukf,

where Ef and Eu are the efficiencies of the folded and unfolded state, respectively, and kf and ku are, respectively, the folding and unfolding rates. A transition path of finite duration is modeled by the three state model,

E=Ef000Et000Eu,K=kukt0ku2ktkf0ktkf,

where Et is the efficiency during the transition path and kt is the rate for conversion from the transition path state to folded or unfolded and is related to the transition-path time τTP by kt = 1/2τTP. Unless otherwise described, it is assumed that Et = (Eu + Ef)/2; in some cases, Et was also explicitly optimized.

Individual likelihood values are calculated by multiplying through all photons within a 50 μs window of the transition interval with folding and unfolding rates and transfer efficiencies as fixed parameters. For each possible transition-path time, we sum the log-likelihood values for individual transition paths to obtain the difference in log likelihoods between three-state and two-state models vs τTP. The point where the three-state model is most likely in the aggregate likelihood curve gives the average transition path time for the protein.

We also investigated the use of a five-state model in which the efficiencies of the three intermediates El were equally spaced between Eu and Ef, i.e., El = Eu + (EfEu)l/4, where l = 1, 2, 3,

E=Ef00000E100000E200000E300000Eu.

However, the lifetimes of each intermediate state τl = 1/(2kl) were allowed to vary to capture features of the transition-path shape,

K=kuk1000ku2k1k2000k12k2k3000k22k3kf000k3kf.

Because transition paths can go both forward and backward among the intermediates, the total transition path time is not simply the sum of the average lifetimes of each state. It can be shown31 that the mean transition path time ⟨τTP⟩ of a general sequential model with N intermediates (i.e., three for our five-state model) is given by

τTP=nψnR(N)ψnL(1)λn2nψnR(N)ψnL(1)|λn|1, (1)

where ΨnR, ΨnL, and λn are, respectively, the nth right and left eigenvectors and eigenvalues of the matrix K′ formed by removing the edges of K, i.e., for the five-state model,

K=2k1k20k12k2k30k22k3.

E. Transition path shape

Folding and unfolding transition paths as defined by the transition region along the fraction of native contacts Q were aggregated to give the average transition path shape of two reaction coordinates: Q and instantaneous FRET efficiency E. The average transition path shape is defined here as ⟨t(Q)⟩ or ⟨t(E)⟩. Setting the beginning of the transition path as t = 0, we interpolate the t(Q) or t(E) for each transition path along the reaction coordinate discretized at a bin size of 0.01 so that each transition path has the same number of points in the transition region. We average the values in time for each bin in the reaction coordinate over all the transition paths. The unfolding transition paths were reversed prior to the interpolation.

III. RESULTS AND DISCUSSION

We perform coarse-grained molecular dynamics simulations with Gō models to monitor folding kinetics over long time scales. By simulating equilibrium trajectories near transition temperatures, we sample as many transition paths as possible with low computational cost. We choose three small proteins from three basic protein structural classes:32 α3D (all-α), CspTm (all-β), and GB1 (α/β) (Fig. 1). For each of the three proteins, we have included additional beads that represent fluorophores covalently attached to the protein constructs in the positions used in single-molecule FRET experiments of these proteins (note, however, that transition-path times have only been estimated from FRET for α3D and GB1).2–4,33

FIG. 1.

FIG. 1.

Structures of proteins studied in the all-atom (upper) and coarse-grained (lower) representation: (a) α3D, (b) CspTm, and (c) protein G. Colored beads indicate the attached fluorophores.

From the simulation trajectories of the three proteins, we calculate the two properties, instantaneous FRET efficiency E and fraction of native contacts Q, for each time point. We chose E because of its close relation to experimental observations and Q because it has previously been found to be a good reaction coordinate for folding,34–36 in contrast to the end–end distance, which is probed by E.37–39 In the case of α3D (Fig. 2), we see that the instantaneous FRET efficiency is high where the fraction of native contacts is also high, both indicating that the protein occupies a folded state. However, regions that yield a low Q and, therefore, unfolded state, see highly variable values for E. Thus, the state of the protein cannot be inferred from an instantaneous value of E, although it is clear from its context in the trajectory. It is important to note that the FRET efficiency in experiment is derived from photon statistics, which must be accumulated over a long period of time, whereas the instantaneous FRET efficiency discussed here is theoretical, solely based on the resonance energy transfer rate expected from the inter-dye distance in individual frames (note that we do not include the effects of chromophore orientation on transfer efficiency in this work, i.e., κ2 = 2/3 is assumed). Because the maximum likelihood model for analyzing transition paths considers the joint likelihood of a series of photons over an extended period of time, it can detect slower fluctuations associated with folding even though the instantaneous E fluctuates rapidly. The fluctuations in instantaneous E can be rationalized from the large fluctuations in inter-dye distance when the unfolded protein samples conformational space. Since E does not even separate the folded and unfolded states (unlike Q), it is clearly not a good reaction coordinate. Calculating the potential of mean force for both Q and E confirms this, showing two basins of attraction corresponding to equilibrium states for Q, whereas the unfolded state appears as a flat surface for E, with no apparent barrier separating it from the folded state. In the absence of a clear local minimum at higher E, we define our low FRET state to be the average FRET efficiency in the unfolded state, matching the definition in experiments.

FIG. 2.

FIG. 2.

Molecular simulation of α3D. Equilibrium trajectories can be projected onto (a) fraction of native contacts Q or (b) the FRET transfer efficiency E. Corresponding potentials of the mean force are shown at the right. (c)–(e) show examples of specific transition paths extracted from the trajectories. Corresponding results for CspTm and GB1 are shown in Figs S1 and S2, respectively.

To further assess Q and E as coordinates for the three coarse-grained protein models used here, we have applied a Bayesian criterion, namely, the probability of being on a transition path (TP) given a particular value of Q or E, i.e., P(TP|Q) or P(TP|E).10,40 If all of the protein configurations for which the committor (probability of folding before unfolding) is 1/2—commonly identified as “transition states”—lie at a single value of the coordinate, the value of P(TP|Q) will be 1/2 at this position, for diffusive dynamics. We find that P(TP|Q) approaches 0.4 for α3D and GB1, close to the best that has been found for other folding coordinates, while still showing a significant peak for CspTm (Fig. S3). By contrast, the maximum value of P(TP|E) is less than 0.1 for all three proteins. While Q is a quantity only accessible with a simulation approach, it is useful for monitoring folding because it can quite accurately capture the dynamics and kinetics of the system when appropriately defined.41,42

Sampling transition paths from simulation trajectories requires that we define the points where the protein begins and ends its transit across the energy barrier. We set cutoffs on both sides of the energy barrier along Q and capture the duration that the protein spends in transit between these values, without recrossing back into one of the end states. We use the reaction coordinate Q as our reference to define transition paths, rather than the instantaneous E (even though it is more closely related to experiment), because of the ambiguity in identifying what state the protein is in based on E. How should the transition path boundaries be chosen? We base our choice on consistency with folding kinetics: once a set of transition paths between folded and unfolded states has been defined, this implies a set of first passage times for folding τF and unfolding τU, from which the folding and unfolding rates can be determined as kF = 1/τF and kU = 1/τU. We choose our boundaries such that these rates are consistent with the autocorrelation time τC of the reaction coordinate assuming a two-state system, i.e., 1/τC = kF + kU. This requires that the boundaries be chosen sufficiently far from the barrier, and, in practice, we find that placing them roughly in the center of the equilibrium distributions for unfolded and folded states is a reasonable choice (see supplementary material, Fig. S4 for CspTm). Since there are no cutoffs that could be applied to instantaneous E to obtain the correct kinetics, we did not use E to define transition path boundaries.

Aggregating all transit times gives us the average transition path time of that protein (histograms of transition-path times are given in Fig. S5). Coefficients of variation (ratio of standard deviation to mean) for the transition path times, computed from these distributions, are 0.85, 0.73, and 0.99 for α3D, CspTm, and GB1, respectively. The fact that these are all less than unity is consistent with a one-dimensional description of the folding being sufficient.20 However, it should be noted that the values obtained are larger than would be obtained for simple one-dimensional diffusion over a barrier with position-independent diffusion coefficient, indicating that the situation may be more complex.

After determining transition intervals with Q, we generate photon-by-photon trajectories from the E trajectories. Photon-by-photon trajectories encode photon color and photon arrival times much like raw data collected from smFRET experiments. For our analysis of transition-path times by FRET, we add an additional 50 µs of context on either side of the true transition path, leaving the maximum likelihood algorithm to separate the transition path from the context (note that increasing the length of the context from 50 to 150 µs had only a small effect on the estimated TPT, Fig. S6). The Gopich–Szabo algorithm works by comparing an idealized instantaneous jump in FRET efficiency with an idealized step trajectory in which a transition path is represented as a single step (usually halfway) between the folded and unfolded FRET efficiencies. If the likelihood of the three state model exceeds the two state model, it indicates that the data are sufficient to distinguish a transition path; the transition-path time is estimated as that for which the likelihood of the three-state model is maximal. In Fig. 3, we show the difference in log-likelihoods for each of the three proteins as a function of the transition-path time. In each case, there is a peak in the log-likelihood difference, identifying the most likely three-state (step) model that fits the data. The peak is less significant for GB1, possible reasons for which will be discussed further below. We can compare this estimate to that derived from Q trajectories. In other words, we can test the performance of the maximum likelihood method in estimating the transition path time because we know the microscopic dynamics of the system in more detail than the experiments permit.

FIG. 3.

FIG. 3.

Maximum likelihood analysis of the transition-path length: (a) α3D, (b) CspTm, and (c) GB1. Red solid curves show difference in log-likelihoods between three-state and two-state models, red vertical broken lines show estimated transition-path time from maximum likelihood, and blue vertical broken lines show the transition-path time estimated directly from molecular trajectories. Curves shown are summed over all transition paths from the simulation.

For all three cases, the maximum likelihood method estimates a transition path time less than the transition-path time estimated from the simulations using the reaction coordinate, Q. The average transition path time for protein G, in particular, is underestimated by a factor of 3.5. While this result is encouraging in the sense that the transition-path time estimates are well within an order of magnitude of the true transition-path time, it is clearly desirable to obtain higher quantitative accuracy. We, therefore, seek to understand the reasons the algorithm seems to systematically underestimate transition-path times.

We first check that the maximum likelihood algorithm estimates the transition path time correctly under ideal circumstances. We feed the algorithm a photon-by-photon trajectory fulfilling the underlying assumptions of the likelihood function, namely, a model step function (Fig. 4). We generate 100 synthetic photon-by-photon trajectories from the same model step-wise trajectory, and we compare the estimated value of the transition path time from 100 trajectories against the true transition path time. We see that the likelihood method tends to give a transition path time estimate very close to the known value [Fig. 4(d)]. At 500 photons per millisecond, approximating a realistic photon flux in experiment, the maximum of the global log likelihood curve is 44.83 µs, very close to the ideal value of 50 µs. As we predict, increasing the photon flux brings the estimated transition path time closer to its true value. We achieve an even better estimation of the transition path time when the average inter-photon time is an order of magnitude smaller.

FIG. 4.

FIG. 4.

Testing maximum likelihood algorithm on the FRET data generated from a model step trajectory. (a) 50 µs step FRET “trajectory” used for analysis. (b) A single realization of a photon trajectory corresponding to the step in (a). (c) Log-likelihood difference for three-state vs two-state model for step. (d) “True” transition-path time (50 µs) compared with the estimate from maximum likelihood analysis as a function of average photon flux. The gray vertical line shows typical flux used in experiments. Blue data are for step function in (a) and orange data are for a sloped transition path shape (Fig. S7).

Since the maximum likelihood algorithm gives accurate estimates for step-like transition paths, we next ask whether the quality of E as a reaction coordinate may be at fault. To test this, we consider how well the likelihood method estimates the transition path time even when the original dynamics on E is one-dimensional so that no projection is required. We run Brownian dynamics simulations to derive transition paths for a particle crossing a one-dimensional bistable potential with two-fold symmetry (Fig. S8). After generating photon-by-photon trajectories, setting the one-dimensional reaction coordinate as E, we calculate the average transition path time to be 26.37 µs for a barrier height of 14 kJ/mol, much shorter than the actual value of 64.47 µs. Thus, even when E is a perfect reaction coordinate, the barrier crossing times are still underestimated.

An alternative reason for the underestimate may be that the assumed step-like form for the efficiency as the barrier is crossed is not a good approximation of the true trajectory E(t). The average “shape” of the transition path, that is, the average trajectory E(t), is shown in Fig. S8A, and is still some way from the stepwise trajectory assumed by the likelihood method. Since E was by construction the only coordinate in this test, it appears that deviation of the transition-path shape from an ideal step-like shape appears to be the most likely reason for underestimation of the transition-path time.

Given that transition paths for proteins are unlikely to conform to the idealized step function in FRET efficiency, it is plausible that the transition path shape along E is the underlying reason for the poor estimation of transition path time. Indeed, synthetic photon-by-photon trajectories generated via a sloped transition path, as opposed to a step-wise transition from earlier, also lead to an underestimate of average transition path time when analyzed using the step-like model (Fig. S7). In that case, increasing the photon flux does not improve the estimation, suggesting that the underlying shape of the trajectory of E imposes limits to calculating the transition path time.

We next examine the average transition path shapes for the three proteins. The transition path shapes along Q show trajectories similar to the one-dimensional system in each case. Along E, the transition path shape of all three proteins switches to high efficiency early in the duration of the transition. For CspTm, the E rises most slowly to the folded value, and for GB1 it rises fastest. This may explain why the maximum likelihood estimates of transition-path time are, respectively, most and least accurate for these two proteins. We note that applying a tensile force to protein G did not improve the estimate of the transition-path time. Although such a pulling force is known to improve the quality of the end–end distance as a reaction coordinate37–39,43,44 and, therefore, the quality of E as a reaction coordinate, it does not change the fact that the transition-path shape is poorly approximated as a step and, therefore, does not improve estimates of the transition path time via maximum likelihood (Fig. S9).

While the likelihood method consistently underestimates the average transition path time owing to limitations imposed by the transition path shape along E, we show that allowing more flexibility in the transition path model can yield improvements in estimates of transition path time. First, we relax the default assumption of the likelihood method that the intermediate FRET efficiency is the midpoint between high and low E, since the most appropriate intermediate FRET efficiency may change depending on the protein, as Fig. 5 suggests. To do this, we implement a two-dimensional scan to maximize the likelihood for two parameters: the intermediate E and transition path time. Note that applying this model to the idealized one-dimensional example in Fig. S8 yields a transition path efficiency Et halfway between unfolded and folded states, as expected (Fig. S8C). Applied to the three proteins, the two-dimensional scan shows maxima where the intermediate FRET efficiencies are not exactly midway and improves the estimation of the average transition path time (Fig. 6). After optimizing the intermediate FRET efficiency, the average transition path time is within a factor of two of the true value for both α3D and GB1. Notably, there is a considerable increase in the log-likelihood difference for GB1, suggesting that the low peak in Fig. 3 is due to the FRET efficiency of the transition path being much higher than the value assumed by the standard likelihood model. In Fig. 5, we show the optimized efficiency of the transition path, which generally coincides with the region of the transition path where E(t) changes most slowly.

FIG. 5.

FIG. 5.

Average transition path shape of (a) and (b) α3D, (c) and (d) CspTm, and (e) and (F) GB1. Transition-path shapes for E are on the left and for Q are on the right. Gray horizontal broken lines in (a), (c), and (e) show the intermediate E derived from two-dimensional optimization (Fig. 6). Gray horizontal broken lines in (b), (d), and (f) show the barrier top for PMF in Q.

FIG. 6.

FIG. 6.

2D maximum likelihood analysis. Log-likelihood difference of three-state vs two-state models as a function of both the FRET efficiency of the intermediate state and the transition-path time for (a) α3D, (b) CspTm, and (c) GB1.

As a second step toward a more general model of the transition-path shape, we also investigate a five-state model, i.e., with three intermediate states. To limit the parameter search, the efficiencies of the three intermediates are evenly spaced between unfolded and folded, and just their lifetimes are allowed to vary. We use the simplex algorithm to find the lifetimes of the intermediate states that maximize the log-likelihood of transition paths. This method also leads to an improvement in the estimates of transition-path times with that for CspTm being in fact slightly larger than the true TP time (Fig. 7).

FIG. 7.

FIG. 7.

Comparison of transition-path times estimated by different methods, relative to the true transition path time defined by Q.

In summary, we have shown in this paper that a step-like model for the transition-path shape provides a good first approximation for estimating transition-path times, given its simplicity, while systematically underestimating barrier crossing times. It appears that the most accurate results are obtained when the transition path shape most closely resembles a step function and vice versa. Allowing slightly more flexibility in the transition-path model permits more accurate estimates of the transition-path time. Therefore, it appears that incorporating the transition-path shape effects will be desirable in future models for interpreting experiments on transition paths. Although this may seem to imply extracting a very large number of parameters from limited photon trajectory data, this need not be the case. For example, using a parameterized one-dimensional energy landscape45 for the free energy, together with a one-dimensional diffusion model for dynamics,41 would allow transition-path shapes to be inferred with a minimum of parameters. In the future, experimental developments such as the use of zero-mode waveguides to enhance fluorophore brightness46 may also make it feasible to use more complex models for transition path dynamics.

We have focused here on specific pairs of residues that were actually labeled with chromophores in experiments in the respective protein, but, of course, other pairs of residues could have been chosen. In principle, for a system of experimental interest, simulation trajectories could be generated as in this case and the labeling scheme could be guided by which labeled pairs of residues are predicted by the simulations to give the closest approximation to the true transition path time. We also note that we have focused on transitions that can be approximated as having a single barrier without populated intermediates. This is the most challenging case, where a better model for the transition path shape is highly desirable. On the other hand, there are transitions where the transition path is dominated by a long-lived intermediate state or states. Such transitions would actually be expected to be the best approximated by step-like changes in the FRET efficiency, although the efficiency of the intermediates and their kinetics of formation and interconversion would, in general, need to be independently optimized.

We have identified deviations from an ideal step-like shape as a cause of underestimation of the transition path length when analyzed using the standard methodology, even for diffusion on a one-dimensional energy surface. However, protein folding is clearly a highly multidimensional process, and only by virtue of the funneled nature of the folding energy landscape47 is it possible to approximate folding dynamics in many cases as diffusion on a one-dimensional coordinate.41,42 This is nonetheless an approximation, and residual multidimensionality (e.g., distinct folding pathways) may lead to additional complications in interpreting experimentally derived transition times, as recently described.20

SUPPLEMENTARY MATERIAL

See the supplementary material for supporting figures showing trajectories for CspTm, GB1, the Bayesian assessment of reaction coordinates Q and E, the dependence of the number of transitions and rate on cutoffs for transition paths, histograms of transition path times for each protein, the sensitivity of the maximum likelihood method to the amount of “context” included, the analysis of a model slope transition path, the transition path shape for the one-dimensional coordinate, and the effect of the applied force on the maximum likelihood transition path time.

ACKNOWLEDGMENTS

We thank Ben Schuler and Hoi Sung Chung for helpful comments on the manuscript. This work was supported by the Intramural Research Program of the National Institute on Alcohol Abuse and Alcoholism and the National Institute of Diabetes and Digestive and Kidney Diseases of the NIH. This work utilized the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov).

DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

REFERENCES

  • 1.Schuler B., “Perspective: Chain dynamics of unfolded and intrinsically disordered proteins from nanosecond fluorescence correlation spectroscopy combined with single-molecule FRET,” J. Chem. Phys. 149, 010910 (2018). 10.1063/1.5037683 [DOI] [PubMed] [Google Scholar]
  • 2.Chung H. S., Louis J. M., and Eaton W. A., “Experimental determination of upper bound for transition path times in protein folding from single-molecule photon-by-photon trajectories,” Proc. Natl. Acad. Sci. U. S. A. 106, 11837–11844 (2009). 10.1073/pnas.0901178106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Chung H. S. and Eaton W. A., “Single molecule fluorescence probes dynamics of barrier crossing,” Nature 502, 685–688 (2013). 10.1038/nature12649 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Chung H. S., Piana-Agostinetti S., Shaw D. E., and Eaton W. A., “Structural origin of slow diffusion in protein folding,” Science 349, 1504–1510 (2015). 10.1126/science.aab1369 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Yu H., Gupta A. N., Liu X., Neupane K., Brigley A. M., Sosova I., and Woodside M. T., “Energy landscape analysis of native folding of the prion protein yields the diffusion constant, transition path time, and rates,” Proc. Natl. Acad. Sci. U. S. A. 109, 14452–14457 (2012). 10.1073/pnas.1206190109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Neupane K., Manuel A. P., Lambert J., and Woodside M. T., “Transition-path probability as a test of reaction-coordinate quality reveals DNA hairpin folding is a one-dimensional diffusive process,” J. Phys. Chem. Lett. 6, 1005–1010 (2015). 10.1021/acs.jpclett.5b00176 [DOI] [PubMed] [Google Scholar]
  • 7.Kim J.-Y. and Chung H. S., “Disordered proteins follow diverse transition paths as they fold and bind to a partner,” Science 368, 1253–1257 (2020). 10.1126/science.aba3854 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Sturzenegger F., Zosel F., Holmstrom E. D., Buholzer K. J., Makarov D. E., Nettels D., and Schuler B., “Transition path times of coupled folding and binding reveal the formation of an encounter complex,” Nat. Commun. 9, 4708 (2018). 10.1038/s41467-018-07043-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bolhuis P. G., Chandler D., Dellago C., and Geissler P. L., “Transition path sampling: Throwing ropes over rough mountain passes, in the dark,” Annu. Rev. Phys. Chem. 53, 291–318 (2002). 10.1146/annurev.physchem.53.082301.113146 [DOI] [PubMed] [Google Scholar]
  • 10.Best R. B. and Hummer G., “Reaction coordinates and rates from transition paths,” Proc. Natl. Acad. Sci. U. S. A. 102, 6732–6737 (2005). 10.1073/pnas.0408098102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Best R. B., Hummer G., and Eaton W. A., “Native contacts determine protein folding mechanisms in atomistic simulations,” Proc. Natl. Acad. Sci. U. S. A. 110, 17874–17879 (2013). 10.1073/pnas.1311599110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Best R. B. and Hummer G., “Microscopic interpretation of folding ϕ-values using the transition path ensemble,” Proc. Natl. Acad. Sci. U. S. A. 113, 3263–3268 (2016). 10.1073/pnas.1520864113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Kim W. K. and Netz R. R., “The mean shape of transition and first-passage paths,” J. Chem. Phys. 143, 224108 (2015). 10.1063/1.4936408 [DOI] [PubMed] [Google Scholar]
  • 14.Makarov D. E., “Shapes of dominant transition paths from single-molecule force spectroscopy,” J. Chem. Phys. 143, 194103 (2015). 10.1063/1.4935706 [DOI] [PubMed] [Google Scholar]
  • 15.Daldrop J. O., Kim W. K., and Netz R. R., “Transition paths are hot,” Eur. Phys. Lett. 113, 18004 (2016). 10.1209/0295-5075/113/18004 [DOI] [Google Scholar]
  • 16.Laleman M., Carlon E., and Orland H., “Transition path time distributions,” J. Chem. Phys. 147, 214103 (2017). 10.1063/1.5000423 [DOI] [PubMed] [Google Scholar]
  • 17.Berezhkovskii A. M. and Makarov D. E., “Communication: Transition-path velocity as an experimental measure of barrier crossing dynamics,” J. Chem. Phys. 148, 201102 (2018). 10.1063/1.5030427 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Carlon E., Orland H., Sakaue T., and Vanderzande C., “Effect of memory and active forces on transition path time distributions,” J. Phys. Chem. B 122, 11186–11194 (2018). 10.1021/acs.jpcb.8b06379 [DOI] [PubMed] [Google Scholar]
  • 19.Medina E., Satija R., and Makarov D. E., “Transition path times in non-Markovian activated rate processes,” J. Phys. Chem. B 122, 11400–11413 (2018). 10.1021/acs.jpcb.8b07361 [DOI] [PubMed] [Google Scholar]
  • 20.Satija R., Berezhkovskii A. M., and Makarov D. E., “Broad distributions of transition-path times are fingerprints of multidimensionality of the underlying free energy landscapes,” Proc. Natl. Acad. Sci. U. S. A. 117, 27116–27123 (2020). 10.1073/pnas.2008307117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Neupane K., Hoffer N. Q., and Woodside M. T., “Testing kinetic identities involving transition-path properties using single-molecule folding trajectories,” J. Phys. Chem. B 122, 11095–11099 (2018). 10.1021/acs.jpcb.8b05355 [DOI] [PubMed] [Google Scholar]
  • 22.Hoffer N. Q., Neupane K., Pyo A. G. T., and Woodside M. T., “Measuring the average shape of transition paths during the folding of a single biological molecule,” Proc. Natl. Acad. Sci. U. S. A. 116, 8125–8130 (2019). 10.1073/pnas.1816602116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Neupane K., Ritchie D. B., Yu H., Foster D. A. N., Wang F., and Woodside M. T., “Transition-path times for nucleic acid folding determined from energy landscape analysis of single-molecule trajectories,” Phys. Rev. Lett. 109, 068102 (2012). 10.1103/physrevlett.109.068102 [DOI] [PubMed] [Google Scholar]
  • 24.Neupane K., Wang F., and Woodside M. T., “Direct measurement of sequence-dependent transition-path times and conformational diffusion in DNA duplex formation,” Proc. Natl. Acad. Sci. U. S. A. 114, 1329–1334 (2017). 10.1073/pnas.1611602114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Truex K., Chung H. S., Louis J. M., and Eaton W. A., “Testing landscape theory for biomolecular processes with single molecule fluorescence spectroscopy,” Phys. Rev. Lett. 115, 018101 (2015). 10.1103/physrevlett.115.018101 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Cossio P., Hummer G., and Szabo A., “On artifacts in single-molecule force spectroscopy,” Proc. Natl. Acad. Sci. U. S. A. 112, 14248–14253 (2015). 10.1073/pnas.1519633112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Roy R., Hohng S., and Ha T., “A practical guide to single-molecule FRET,” Nat. Methods 5, 507–516 (2008). 10.1038/nmeth.1208 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Karanicolas J. and Brooks C. L. III, “The structural basis for biphasic kinetics in the folding of the WW domain from a formin-binding protein: Lessons for protein design?,” Proc. Natl. Acad. Sci. U. S. A. 100, 3954–3959 (2003). 10.1073/pnas.0731771100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Hess B., Kutzner C., van der Spoel D., and Lindahl E., “GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation,” J. Chem. Theory Comput. 4, 435–447 (2008). 10.1021/ct700301q [DOI] [PubMed] [Google Scholar]
  • 30.Gopich I. V. and Szabo A., “Decoding the pattern of photon colours in single-molecule FRET,” J. Phys. Chem. B 113, 10965–10973 (2009). 10.1021/jp903671p [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Berezhkovskii A. M. and Makarov D. E., “On the forward/backward symmetry of transition-path time distributions in nonequilibrium systems,” J. Chem. Phys. 151, 065102 (2019). 10.1063/1.5109293 [DOI] [Google Scholar]
  • 32.Murzin A. G., Brenner S. E., Hubbard T., and Chothia C., “SCOP: A structural classification of proteins database for the investigation of sequences and structures,” J. Mol. Biol. 247, 536–540 (1995). 10.1016/s0022-2836(05)80134-2 [DOI] [PubMed] [Google Scholar]
  • 33.Schuler B., Lipman E. A., and Eaton W. A., “Probing the free-energy surface for protein folding with single-molecule fluorescence spectroscopy,” Nature 419, 743–747 (2002). 10.1038/nature01060 [DOI] [PubMed] [Google Scholar]
  • 34.Shakhnovich E., Farztdinov G., Gutin A. M., and Karplus M., “Protein folding bottlenecks: A lattice Monte Carlo simulation,” Phys. Rev. Lett. 67, 1665–1668 (1991). 10.1103/PhysRevLett.67.1665 [DOI] [PubMed] [Google Scholar]
  • 35.Best R. B. and Hummer G., “Coordinate-dependent diffusion in protein folding,” Proc. Natl. Acad. Sci. U. S. A. 107, 1088–1093 (2010). 10.1073/pnas.0910390107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Best R. B., “Folding and binding: When the force is against you,” Biophys. J. 105, 2611–2612 (2013). 10.1016/j.bpj.2013.10.035 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Socci N. D., Onuchic J. N., and Wolynes P. G., “Stretching lattice models of protein folding,” Proc. Natl. Acad. Sci. U. S. A. 96, 2031–2035 (1999). 10.1073/pnas.96.5.2031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Kirmizialtin S., Huang L., and Makarov D. E., “Topography of the free-energy landscape probed via mechanical unfolding of proteins,” J. Chem. Phys. 122, 234915 (2005). 10.1063/1.1931659 [DOI] [PubMed] [Google Scholar]
  • 39.Graham T. G. W. and Best R. B., “Force-induced change in protein unfolding mechanism: Discrete or continuous switch?,” J. Phys. Chem. B 115, 1546–1561 (2011). 10.1021/jp110738m [DOI] [PubMed] [Google Scholar]
  • 40.Hummer G., “From transition paths to transition states and rate coefficients,” J. Chem. Phys. 120, 516–523 (2004). 10.1063/1.1630572 [DOI] [PubMed] [Google Scholar]
  • 41.Best R. B. and Hummer G., “Diffusion models of protein folding,” Phys. Chem. Chem. Phys. 13, 16902–16911 (2011). 10.1039/c1cp21541h [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Zheng W. and Best R. B., “Reduction of all-atom folding dynamics to one-dimensional diffusion,” J. Phys. Chem. B 119, 15247–15255 (2015). 10.1021/acs.jpcb.5b09741 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Dudko O. K., Graham T. G. W., and Best R. B., “Locating the folding barrier for single molecules under an external force,” Phys. Rev. Lett. 107, 208301 (2011). 10.1103/physrevlett.107.208301 [DOI] [PubMed] [Google Scholar]
  • 44.Morrison G., Hyeon C., Hinczewski M., and Thirumalai D., “Compaction and tensile forces determine the accuracy of folding landscape parameters from single molecule pulling experiments,” Phys. Rev. Lett. 106, 138102 (2011). 10.1103/physrevlett.106.138102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Muñoz V. and Sanchez-Ruiz J. M., “Exploring protein-folding ensembles: A variable-barrier model for the analysis of equilibrium unfolding experiments,” Proc. Natl. Acad. Sci. U. S. A. 101, 17646–17651 (2004). 10.1073/pnas.0405829101 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.de Torres J., Ghenuche P., Moparthi S. B., Grigoriev V., and Wenger J., “FRET enhancement in aluminum zero-mode waveguides,” ChemPhysChem 16, 782–788 (2015). 10.1002/cphc.201402651 [DOI] [PubMed] [Google Scholar]
  • 47.Wolynes P. G., Onuchic J. N., and Thirumalai D., “Navigating the folding routes,” Science 267, 1619–1620 (1995). 10.1126/science.7886447 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

See the supplementary material for supporting figures showing trajectories for CspTm, GB1, the Bayesian assessment of reaction coordinates Q and E, the dependence of the number of transitions and rate on cutoffs for transition paths, histograms of transition path times for each protein, the sensitivity of the maximum likelihood method to the amount of “context” included, the analysis of a model slope transition path, the transition path shape for the one-dimensional coordinate, and the effect of the applied force on the maximum likelihood transition path time.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.


Articles from The Journal of Chemical Physics are provided here courtesy of American Institute of Physics

RESOURCES