Markov State Model Reveals Folding and Functional Dynamics in Ultra-Long MD Trajectories

Thomas J Lane; Gregory R Bowman; Kyle Beauchamp; Vincent A Voelz; Vijay S Pande

doi:10.1021/ja207470h

. Author manuscript; available in PMC: 2012 Nov 16.

Published in final edited form as: J Am Chem Soc. 2011 Oct 26;133(45):18413–18419. doi: 10.1021/ja207470h

Markov State Model Reveals Folding and Functional Dynamics in Ultra-Long MD Trajectories

Thomas J Lane ¹, Gregory R Bowman ¹, Kyle Beauchamp ², Vincent A Voelz ¹, Vijay S Pande ¹

PMCID: PMC3227799 NIHMSID: NIHMS332943 PMID: 21988563

Abstract

Two strategies have been recently employed to push molecular simulation to long, biologically relevant timescales: projection-based analysis of results from specialized hardware producing a small number of ultra-long trajectories and the statistical interpretation of massive parallel sampling performed with Markov state models (MSMs). Here, we assess the MSM as an analysis method by constructing a Markov model from ultra-long trajectories, specifically two previously reported 100 μs trajectories of the FiP35 WW domain (Shaw et. al. (2010) Science, 330: 341–346). We find that the MSM approach yields novel insights. It discovers new statistically significant folding pathways, in which either beta-hairpin of the WW domain can form first. The rates of this process approach experimental values in a direct quantitative comparison (timescales of 5.0 μs and 100 ns), within a factor of ~2. Finally, the hub-like topology of the MSM and identification of a holo conformation predicts how WW domains may function through a conformational selection mechanism.

Introduction

A detailed understanding of protein dynamics is necessary for a complete picture of biology at a molecular scale. Atomistic Molecular Dynamics (MD) simulations have the potential to yield a detailed kinetic description of these dynamics, but remain extremely computationally challenging. Moreover, even if the computational challenges can be surmounted, a new obstacle emerges: the efficient and accurate interpretation of massive MD data sets. Ideally, one could employ a method allowing for facile interpretation of the data, including comparison to experiment for theoretical validation.

Recently, advances have been made in both generation and analysis of MD simulations. To overcome the issue of computational intractability, our group and others have turned to the statistical approach of massively parallel simulation.¹ To unify the information acquired from independent parallel simulations, Markov state models (MSMs) have been employed for analysis. MSMs are discrete-time models based on kinetic exchange between states, which represent a partitioning of phase space into metastable regions.^2–4 This approach has been largely successful, documented in recent reports of the kinetics of moderately sized proteins (80+ residues) at long timescales (microseconds to milliseconds).^5–7

Another approach is to design specialized hardware capable of producing single, long MD trajectories. Recently, this concept has been implemented by D. E. Shaw et al in the form of Anton, a powerful and innovative supercomputer fine-tuned for the calculation of MD simulations.⁸ Anton is unique in its ability to calculate single, long trajectories, the first of which have been recently reported. During these simulations, multiple folding events of a WW domain, FiP35, were observed, the analysis of which led to the conclusion of a single dominant folding pathway.⁹ This conclusion was particularly noteworthy since it was in disagreement with previous simulation work on WW domains demonstrating multiple distinct significant folding mechanisms,^10,11 including Ref. 10, which was a pioneering study in the application of MSM theory that showed heterogeneous pathways in the folding of a WW domain. This conclusion also contradicted the general view that protein folding occurs through heterogeneous pathways, which has prevailed for over a decade.^12–14

Here, we re-analyze these ultra-long FiP35 trajectories, and demonstrate that MSMs can accurately capture long-timescale dynamics and provide a powerful foundation for analysis. A Markov model with short memory (15 ns Markov lag time) built from two 100 μs trajectories of the FiP35 WW domain recapitulates the dynamics of the original trajectories, up to the ~10 μs-scale folding time, the slowest process observed. The MSM, which identifies transitions through statistical inference, reveals many parallel folding pathways with heterogeneous molecular mechanisms. While a majority of the folding flux flows through the path reported by Shaw et al, we find other, distinct routes to the global free energy minimum, suggesting that the folding mechanism of WW domains is heterogeneous, a result previously found by Noé et. al., who employed MSM theory.¹⁰ The timescales of this process match experiment to within a factor of two, recapturing the rates and amplitudes of the bi-exponential process seen in T-jump fluorescence experiments.^15,16

In addition to these folding dynamics, we report on the functional dynamics of the folded protein. That is, once a protein reaches the native state, how do we expect it to carry out its biological function? As in other systems, we have found that the free energy landscape near the native state is a kinetic hub,¹⁷ with many rapidly interconverting native-like states. One native-like state is especially interesting, as it corresponds to holo conformers of the Pin1 WW domain, the wild-type parent of FiP35 that binds proline-rich peptides.

Finally, we explicitly show how the MSM approach, a powerful sampling technique,⁵ is also a powerful analysis tool, representing an advance over traditional techniques in three major ways. First, states are defined based on physical criteria, rather than intuition. Therefore, one can objectively assess how many relevant folding pathways are present rather than assuming a single dominant mechanistic pathway a priori. Second, the MSM allows one to simulate experiments and calculate interesting properties such as committor values without running additional costly simulations. Finally, the MSM provides an intuitive framework for understanding of dynamics; for instance, key features, such as the native and holo states, are easy to identify. Each of these aspects has been leveraged to yield new insight into the dynamics of FiP35.

Methods

200 μs of atomistic MD simulation (Amber ff99SB-ILDN force field,¹⁸ TIP3P solvent¹⁹) was previously performed by Shaw et al⁹ using the Anton supercomputer. A Markov State Model (MSM) was constructed by clustering all saved 10⁶ snapshots of this simulation (200 ps intervals). A 26104 state model was generated with a k-centers algorithm, ensuring that no cluster had a larger radius than 4.5 Å. This radius cutoff ensured that equilibrium properties were preserved, but maximized the resolution of the model (Fig. S1). 10 rounds of local k-medoids was then used to improve this model. The model was assessed for Markovian behavior based on the criterion of implied timescale invariance (Fig. S1). A 200-state macrostate model was constructed from a PCCA-lumping²⁰ of a second 10,000 microstate k-medoids model. This second model was used to generate the marcostate model because k-medoids provides a good partitioning of native dynamics, allowing us to distinguish the detailed native kinetics and identify the holo confomer. All clustering and data analysis was performed using the MSMBuilder2 software package (http://simtk.org/home/msmbuilder), documented in Ref. 21. This references and the references within contain detailed information on the model construction practices employed here.

Results

Models

Markov state models consist of a decomposition of phase space into discrete metastable states, and a master equation describing the kinetics between those states represented by a transition matrix estimated from simulation. From the raw trajectories, two models were created, a microstate model (26104 states) and macrostate model (200 states).²² The microstate model is used for quantitative calculation, while the macrostate model has been employed in visualization and qualitative analysis. See Methods for details on model construction.

Recapitulation of the Raw Data

It is essential that a Markov Model parameterized from MD simulation recovers the kinetics of the simulation. Ultra-long trajectories give a unique opportunity to test if MSMs can recapitulate long-timescale dynamics. Autocorrelation functions provide a direct way to determine dynamics, and have been used here to compare the raw data and MSM. The autocorrelations of RMSD to the native state, Trp8 solvent-exposed surface area (SASA), and native contacts show that the MSM recapitulates the raw data (Fig. 1). Interestingly, there is still significant deviation between the two trajectories themselves, indicating that additional sampling (beyond two 100 μs trajectories) is necessary to draw statistically robust estimates of the kinetic properties of the FiP35 system.

Autocorrelation Functions of (A) RMSD to the native state, (B) Trp8 SASA, (C) Number of Native Contacts. Traces are overlaid for the two raw trajectories (blue, solid) and the Markov Chain relaxation (red, dashed). Each of these trajectories was initiated in the same state and propagated for 100 μs. Each autocorrelation was fit to a single exponential – the relaxation constants of this fit are indicated on the appropriate panel.

Folding Time

Given that the model recapitulates the simulation, we now turn to the question of whether the simulation is an accurate representation of reality. Based on their definitions of the unfolded and folded states, Shaw et al reported a folding time of 10 ± 3 μs. However, the rate calculation is highly dependent on human-defined states (Fig. S2), a common problem for prediction of rates directly from MD simulation with traditional (non-MSM) methods.¹³ The MSM, based on states defined from a kinetic perspective in an algorithmic manner, provides an objective solution to this problem. In particular, the MSM allows one to move away from relying on intuition for state boundaries and determine a physical state decomposition.

We take advantage of this to a mimic the temperature-jump experiment used to determine the folding time of FiP35^15,16 without performing additional MD simulation. This is accomplished by stochastically perturbing the equilibrium populations of the microstates and watching the Markov chain relax back to detailed balance over time, mimicking a relaxation to equilibrium after perturbation by a T-jump. The T-Jump perturbation was mimicked by choosing two states at random, and shifting 25% of the population of one state to the other, repeating this process a number of times equal to the number of states in the model. 100 separate T-Jumps were preformed and averaged to obtain the final result. The timescales observed in the T-Jump were robust to the precise perturbation used – specifically, the two timescales were observed regardless of the perturbation.

Projecting the population at each time onto an observable, here the Trp8 solvent-accessible surface area (SASA, a proxy for fluorescence intensity), we generate a time trace approximately proportional to measured fluorescence over time. This projection procedure amounts to calculating the average Trp8 SASA for the entire ensemble of states as the system relaxes from some random distribution of microstate populations to equilibrium. A double exponential with timescales of 5.0 μs and 100 ns accounts for all variation in the data (Fig. 2).

Simulation of a T-jump experiment. Ensemble averages of the Trp8 SASA are shown in blue, with a double-exponential fit to the data shown in red. The fit results in two timescales of 5.0 μs and 100 ns, respectively. Error bars show one standard deviation around the mean, calculated from an averaging of 100 stochastically generated initial population vectors. The signal intensity has been normalized to span [0,1].

Experimentally, two timescales were found and fit in T-jump experiments for FiP35, and reported as activated and molecular rates.¹⁶ Our multi-exponential fit can be interpreted as corresponding to these rates, which were reported to be about 11 μs and 150 ns at 381K, the highest temperature measured.^15,16 Additionally, the relative amplitudes of the two processes match in both theory and experiment, at ~0.8 and ~0.2, for the slow and fast processes, respectively. This correspondence is evidence that the underlying dynamics observed in the model are the same as those in experiment, though definite proof of this cannot be obtained from a single observable.

By analyzing the eigenvectors and eigenvalues of the MSM transition matrix, one can obtain information about the dynamics of the system. The eigenvectors correspond to transitions between the states, and the eigenvalues give the timescales of those processes.^3,4 By analyzing the four slowest eigenprocesses, we find that the 5 μs transition corresponds to folding (Fig. 3). The next eigenvectors show quick dynamics within the non-native state, including dynamics between states where Trp8 is either highly buried or very exposed, but no productive folded structure is present. We found no one eigenprocess directly accounts for the second timescale observed in the T-Jump, which has contributions from non-native state dynamical processes that occur over an order of magnitude in timescales (Fig. S3). We conclude that the faster timescale measured in the T-Jump is in fact due to rearrangements between unfolded states, rather than some fast process occurring along the folding pathway.

Eigenprocesses occurring in the transition matrix. The eigenvectors of the MSM transition matrix give the dynamical processes occurring in that matrix, while the eigenvalues give the timescales on which those processes occur. (A) Plot of the eigenvectors projected onto RMSD to native. Blue dots are individual states, red lines are weighted histograms showing the binned sum of the individual state’s contributions to the eigenvector. (B) Illustration of six example states from the 3^rd and 4^th eigenprocesses, which are dominated by interconversions between unfolded states. Many of these unfolded states have large deviations in their Trp8 exposed surface area. Trp8 shown in sticks.

Folding Pathways

We find the folding of FiP35 to be complex, exhibiting a diverse set of folding mechanisms (Fig. 4). The macrostate MSM reveals many parallel pathways to the native state, which is characterized by two beta-hairpins and a hydrophobic Pro5-Pro33 core.

Folding of FiP35, with committor (P_fold) values determined from transition path theory. (Left) The top 12 pathways of the folding probability flux, showing heterogeneous, parallel folding pathways for FiP35. Arrow width is proportional to flux, node size is proportional to the free energy of the state, and the *apo* and *holo* states are highlighted in red and blue, respectively. (Right) Overlayed structures with the indicated P_fold, showing an alternate view of folding. The exemplar structure is highlighted with a temperature factor proportional to the structure of each residue, as measured by RMSD. Red indicates an unstructured residue, blue indicates a structured residue, and white is intermediate between the two.

Looking for a pathway directly in MD simulations is difficult. Since systems typically exhibit rapid fluctuations and spend a great deal of time exploring off-pathway (orthogonal) motions, reductionist analysis methods are necessary. Additionally, pathways are only traveled a finite number of times during the simulation, making general extrapolation difficult. The MSM addresses these problems by projecting out fast degrees of freedom, keeping relevant motions we care about, and providing a statistical basis of observation. To elucidate the most important transitions, we employ transition path theory (TPT),^23,24 which represents a kinetic process as reactive flux along pathways of interest.

TPT analysis of FiP35 reveals many parallel, heterogeneous folding pathways traveled with high probability. Despite this heterogeneity, some generalizations can be made. During folding, backbone secondary structure forms first – either of the two beta hairpin collapse, followed by the other – forming a molten globule. Analysis of the folding flux in the microstate MSM gives the probabilities that either hairpin forms first. By calculating the total flux along all folding pathways, defined as routes from states with no hairpins formed to those with both formed (by DSSP²⁵), we determined the relative probabilities for either hairpin to form first. Formation of the N-terminal (first) beta hairpin first was most common, occurring in 39% of folding events, while the second hairpin formed first in 16% of pathways. Due to the discrete nature of MSMs, and the long lag time employed in this particular model, in 49% of transitions both hairpins form concurrently in a single state-to-state transition. These transitions are therefore concerted, with both hairpins forming within one model lag time (100 ns).

Corroborating the heterogeneous paths found by the MSM, direct inspection of the trajectories shows instances of both pathways. A direct analysis predicts approximately a 4:1 ratio in hairpin formation, based on secondary structure assignments of folding events. Additionally, anecdotes of each hairpin’s formation preceding folding have been found in the raw simulation (Fig. S4). Our results show the folding of FiP35 does not always occur along a single pathway, but is heterogeneous, a finding previously reported by this group and others.^10,11,26

Hub-like Nature of Native Structures

Our analysis of the folding pathway reveals that folding flux is channeled through many parallel hubs on the way to a native conformation. In the macrostate MSM, all the states with native structure are highly connected and interconvert rapidly, on the order of 100s of ns. These internal transitions occur two orders of magnitude faster than those between non-native states (10s of μs), as measured by mean-first-passage times (MFPT), which provide a qualitative measure of the distance between states. Additionally, the average MFPTs from non-native to native macrostates show transitions to the native state are rapid (on the order of 1 μs, not weighted by equilibrium population). The disparity between the 1 μs MFPT and the 11 μs folding rate found in the T-Jump (Fig. 2, Fig. 3) is a result the macrostate model’s artificially fast dynamics. This is a result of macrostates being too diverse to ensure kinetic similarity at a macroscopic level and is an intrinsic drawback of current methods to generate macrostate models. The qualitative features described here are valid, however, and are retained in the qualitatively accurate microstate model.

Quick transitions to the native conformation lead us to characterize our MSM as a network with the native states as a hub at the center (Fig. 5). Any state within the model is close to a native-like state, and transitions from any one non-native state to any other non-native state are likely to pass through one of the native macrostates. This hub-like nature has been previously noted in MSMs, and appears to be a recurrent theme.¹⁷ Any state can reach any other state in 4 moves or less, and 94% of states are connected by one state or less. This high degree of connectivity, along with the centrality of the native states are indicative of a hub-like kinetic network.

The hub-like nature of the FiP35 MSM. A subset of the macrostate MSM was represented as a graph, with centrality automatically determined by number of connections (Omnigraffle^® software). We see that the native macrostates, in green, are centrally located, surrounded by molten globule states, in orange. The non-native states (uncolored) sit on the periphery of the graph, but can reach native states in few transitions, through hubs. The *holo* state is highlighted (double ring), and is centrally located in the hub-like network. Transitions to this state are relevant for biological function.

Identification of the Functionally Relevant Holo State

The rapid interconversions within the native network have been found to be important for function,²⁷ allowing a protein to reach a functionally relevant holo state. We have identified macrostates that correspond to the apo and holo conformers of a related mutant, the human Pin1 WW domain (hPin1 WW, Fig. 5).²⁸

NMR solution structures^28,29 of Pin1 WW correspond to the highlighted states in Fig. 4 and Fig. 5. The apo structure is structurally similar to all structures in the native state, within 3.0 Å RMSD of each state. We have highlighted the macrostate with the lowest RMSD to the NMR structure as an apo exemplar for comparison purposes (Fig. 4, Fig. 5). Direct analysis of the key interactions in the apo state indicates that these features are common to all of the states in the native state besides the holo state, with slight differences due to mutation.

More interesting is the holo state itself. Two available ligand-bound NMR structures of hPin1 WW correspond closely to a single macrostate (avg. Cα RMSD of 1.8 Å and 1.9 Å, respectively). This holo macrostate looks qualitatively different than the rest of the observed native states, with Arg11 rotated inward, in a presentation very similar to that of the corresponding residue in ligand-bound hPin1 WW. The holo state is connected to all of the states in the N state, and has a high equilibrium population of 5.5%. This indicates the holo state is part of the functional dynamics of the native basin, and is populated even in the absence of the ligand.

WW Domains bind proline-rich peptide sequences, and in the wild-type hPin1 WW, the dominant interaction occurs when Arg12, in the loop between sheets 1 and 2, rotates outward, binding the peptide. Simultaneously, Trp29 stabilizes this interaction (Fig. 6).²⁹ The correspondence of hPin1 WW to FiP35 is not perfect, due to mutations made in FiP35 to speed folding. The holo topology is the same in both mutants, however, with the appropriate key twist in the hairpin between sheets 1 and 2, rotating, in FiP35, Arg11 outward. FiP35 lacks a second Trp residue due to mutation; such mutations have been demonstrated to destroy any ligand-binding activity.³⁰ The native state dynamics of Pin1 WW have been previously studied using MSMs, which showed native state dynamics similar to what is seen here, with some differences due to mutation.³¹ These studies suggest that WW domains may bind ligands with a conformational selection mechanism, though definite conclusions cannot be made since these studies have spanned only two mutants and have not explicitly simulated ligand binding.

Structural overlays of the (A) *apo* (PDB: 1I6C) and (B) *holo* states (PDB: 1I8G, 1I8H). The NMR holo states here have ligands bound in the structure, a distinct ligand for each PDB entry, but ligands are not show for clarity. NMR structures are shown in orange, MSM state exemplars are shown in teal. Key residues are highlighted: Arg12 (Arg11 in FiP35), along with Trp29 are responsible for ligand binding.

Discussion

Previously, MSMs combined with parallel simulation have effectively overcome the intractability of simulating phenomena involving millisecond or longer timescales in systems with tens of thousands of atoms. With Anton, however, a new opportunity to directly produce long, serial trajectories has arisen. Here, we show an MSM built from serial data can yield insight not possible with traditional methods. Our results indicate that there appears to be little difference between Markovian models parameterized from few long or many short trajectories.

Moreover, the MSM’s probabilistic interpretation is shown to significantly aid analysis of long trajectories, statistically leveraging the data to find transitions missed by other analysis methods. In addition to extracting all possible information from existing data, the MSM provides a powerful, physical foundation for comprehending simulation. In the study of FiP35, an MSM allows us to analyze folding as a probability flux through multiple pathways, illuminating the hub-like structure of the equilibrium distribution.

Finally, in addition to capturing long timescale folding dynamics, the MSM allows for facile analysis of the protein’s functional dynamics. FiP35 is shown to present itself in a holo conformation, even in the absence of any ligand. This presentation is of the sort predicted by the conformational selection model of ligand binding, as opposed to an induced fit model,²⁷ though this simulation represents only one anecdotal observation, and should not be over-generalized.

This work indicates that MSMs provide a powerful analysis tool not only for many short simulations, but also for long trajectories. Given this, future work will focus on the most efficient manner of generating MD simulations to construct Markov models. We suspect that clever combinations of long and short trajectories will be synergistic; shorter simulations are considerably more amenable to parallelization, but due to their short nature individually only explore limited regions of phase space and require methods such as adaptive sampling⁵ to collectively sample the required space. Long trajectories, while expensive to generate, do not suffer from this limitation. Therefore, it is possible to imagine that a combination of the two would lead to models that are both well sampled and cover phase space completely. Future work is needed to refine and test this hypothesis.

Supplementary Material

1_si_001

NIHMS332943-supplement-1_si_001.pdf^{(538.4KB, pdf)}

Acknowledgments

The authors kindly thank Stefano Piana-Agostinetti, Kresten Lindorff-Larsen and their coworkers at DE Shaw Research for providing the two trajectories and a critical reading of the manuscript. Thanks also to Jesus Izaguirre and Christian Schwantes for a many comments, and to Liz Kellogg for spotting an error prior to publication. Computational resources were funded by the NSF (MRI-R2-0960306). VSP acknowledges support from NIH R01-GM062868, NSF-DMS-0900700, NSF-MCB-0954714. VSP and VAV acknowledge support from NSF EF-0623664. TJL was supported by an NSF GRF. GRB was funded by the Berry Foundation.

Footnotes

Supplemental Information

Three supplemental figures are included, depicting (1) the implied timescales and model error as a function of lag time, (2) the lack of robustness in the folding time as a function of state definition, (3) the projection of the transition matrix eigenvectors onto the Trp8 SASA observable, and (4) raw data depicting heterogeneous pathways. This information is available free of charge via the Internet at http://pubs.acs.org/.

References

1.Pande VS, Baker I, Chapman J, Elmer SP, Khaliq S, Larson SM, Rhee YM, Shirts MR, Snow CD, Sorin EJ, Zagrovic B. Biopolymers. 2003;68:91–109. doi: 10.1002/bip.10219. [DOI] [PubMed] [Google Scholar]
2.Bowman GR, Beauchamp KA, Boxer G, Pande VS. J Chem Phys. 2009;131:124101. doi: 10.1063/1.3216567. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Schütte C, Fischer A, Huisinga W, Deuflhard P. J Comput Phys. 1999;151:146–168. [Google Scholar]
4.(a) Noé F, Fischer S. Curr Op Struct Biol. 2008;18:154–162. doi: 10.1016/j.sbi.2008.01.008. [DOI] [PubMed] [Google Scholar]; (b) Prinz JH, Keller B, Noé F. Phys Chem Chem Phys. 2011 doi: 10.1039/c1cp21258c. Accepted. [DOI] [PubMed] [Google Scholar]
5.Bowman GR, Voelz VA, Pande VS. J Am Chem Soc. 2010;133:664–667. doi: 10.1021/ja106936n. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Voelz VA, Bowman GR, Beauchamp K, Pande VS. J Am Chem Soc. 2010;132:1526–8. doi: 10.1021/ja9090353. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Bowman GR, Ensign DL, Pande VS. J Chem Theory Comput. 2010;6:787–794. doi: 10.1021/ct900620b. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Shaw DE, et al. ACM SIGARCH Computer Architecture News. 2007;35:1. [Google Scholar]
9.Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, Eastwood MP, Bank JA, Jumper JM, Salmon JK, Shan Y, Wriggers W. Science. 2010;330:341–346. doi: 10.1126/science.1187409. [DOI] [PubMed] [Google Scholar]
10.Noé F, Schütte C, Vanden-Eijnden E, Reich L, Weikl TR. Proc Nat Acad Sci USA. 2009;106:19011–6. doi: 10.1073/pnas.0905466106. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Xu J, Huang L, Shakhnovich EI. Proteins. 2011;79:1704–1714. doi: 10.1002/prot.22993. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Onuchic JN, Wolynes PG. Curr Op Struct Biol. 2004;14:70–5. doi: 10.1016/j.sbi.2004.01.009. [DOI] [PubMed] [Google Scholar]
13.Ensign DL, Kasson PM, Pande VS. J Mol Biol. 2007;374:806–16. doi: 10.1016/j.jmb.2007.09.069. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Dill KA, Ozkan SB, Shell MS, Weikl TR. Ann Rev Biophys. 2008;9:289–316. doi: 10.1146/annurev.biophys.37.092707.153558. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Liu F, Du S, Fuller AA, Davoren JE, Wipf P, Kelly JW, Gruebele M. Proc Nat Acad Sci USA. 2008;105:2369–74. doi: 10.1073/pnas.0711908105. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Liu F, Nakaema M, Gruebele M. J Chem Phys. 2009;131:195101. doi: 10.1063/1.3262489. [DOI] [PubMed] [Google Scholar]
17.Bowman GR, Pande VS. Proc Nat Acad Sci USA. 2010;107:10890–5. doi: 10.1073/pnas.1003962107. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Lindorff-Larsen K, Piana S, Palmo K, Maragakis P, Klepeis JL, Dror RO, Shaw DE. Proteins. 2010;78(8):1950–1958. doi: 10.1002/prot.22711. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML. J Chem Phys. 1983;79:926. [Google Scholar]
20.Deuflhard P, Huisinga W, Fischer A, Schütte C. Lin Alg Appl. 2000;315:39–59. [Google Scholar]
21.Beauchamp KA, Bowman GR, Lane TJ, Maibaum L, Haque I, Pande VS. J Chem Theory Comput. 2011 doi: 10.1021/ct200463m. Accepted. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Bowman GR, Huang X, Pande VS. Methods. 2009;49:197–201. doi: 10.1016/j.ymeth.2009.04.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Metzner P, Schütte C, Vanden-Eijnden E. Multiscale Modeling & Simulation. 2009;7:1192. [Google Scholar]
24.Berezhkovskii A, Hummer G, Szabo A. J Chem Phys. 2009;130:205102. doi: 10.1063/1.3139063. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Kabsch W, Sander C. Biopolymers. 1983;22:2577–263. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
26.Ensign DL, Pande VS. Biophys J. 2009;96:53–5. doi: 10.1016/j.bpj.2009.01.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Boehr DD, Nussinov R, Wright PE. Nat Chem Biol. 2009;5:789–96. doi: 10.1038/nchembio.232. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Wintjens R, Wieruszeski J-M, Drobecq H, Rousselot-Pailley P, Buée L, Lippens G, Landrieu I. J Biol Chem. 2001;276:25150–6. doi: 10.1074/jbc.M010327200. [DOI] [PubMed] [Google Scholar]
29.Peng T, Zintsmaster JS, Namanja AT, Peng JW. Nat Struct Biol. 2007;14:325–31. doi: 10.1038/nsmb1207. [DOI] [PubMed] [Google Scholar]
30.Jäger M, Zhang Y, Bieschke J, Nguyen H, Dendle M, Bowman ME, Noel JP, Gruebele M, Kelly JW. Proc Nat Acad Sci USA. 2006;103:10648–53. doi: 10.1073/pnas.0600511103. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Morcos F, Chatterjee S, McClendon CL, Brenner PR, López-Rendón R, Zintsmaster J, Ercsey-Ravasz M, Sweet CR, Jacobson MP, Peng JW, Izaguirre JA. Modeling Conformational Ensembles of Slow Functional Motions in Pin1-WW. PLoS Comp Bio. 2010;6:e1001015. doi: 10.1371/journal.pcbi.1001015. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1_si_001

NIHMS332943-supplement-1_si_001.pdf^{(538.4KB, pdf)}

[R1] 1.Pande VS, Baker I, Chapman J, Elmer SP, Khaliq S, Larson SM, Rhee YM, Shirts MR, Snow CD, Sorin EJ, Zagrovic B. Biopolymers. 2003;68:91–109. doi: 10.1002/bip.10219. [DOI] [PubMed] [Google Scholar]

[R2] 2.Bowman GR, Beauchamp KA, Boxer G, Pande VS. J Chem Phys. 2009;131:124101. doi: 10.1063/1.3216567. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Schütte C, Fischer A, Huisinga W, Deuflhard P. J Comput Phys. 1999;151:146–168. [Google Scholar]

[R4] 4.(a) Noé F, Fischer S. Curr Op Struct Biol. 2008;18:154–162. doi: 10.1016/j.sbi.2008.01.008. [DOI] [PubMed] [Google Scholar]; (b) Prinz JH, Keller B, Noé F. Phys Chem Chem Phys. 2011 doi: 10.1039/c1cp21258c. Accepted. [DOI] [PubMed] [Google Scholar]

[R5] 5.Bowman GR, Voelz VA, Pande VS. J Am Chem Soc. 2010;133:664–667. doi: 10.1021/ja106936n. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Voelz VA, Bowman GR, Beauchamp K, Pande VS. J Am Chem Soc. 2010;132:1526–8. doi: 10.1021/ja9090353. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Bowman GR, Ensign DL, Pande VS. J Chem Theory Comput. 2010;6:787–794. doi: 10.1021/ct900620b. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Shaw DE, et al. ACM SIGARCH Computer Architecture News. 2007;35:1. [Google Scholar]

[R9] 9.Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, Eastwood MP, Bank JA, Jumper JM, Salmon JK, Shan Y, Wriggers W. Science. 2010;330:341–346. doi: 10.1126/science.1187409. [DOI] [PubMed] [Google Scholar]

[R10] 10.Noé F, Schütte C, Vanden-Eijnden E, Reich L, Weikl TR. Proc Nat Acad Sci USA. 2009;106:19011–6. doi: 10.1073/pnas.0905466106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Xu J, Huang L, Shakhnovich EI. Proteins. 2011;79:1704–1714. doi: 10.1002/prot.22993. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Onuchic JN, Wolynes PG. Curr Op Struct Biol. 2004;14:70–5. doi: 10.1016/j.sbi.2004.01.009. [DOI] [PubMed] [Google Scholar]

[R13] 13.Ensign DL, Kasson PM, Pande VS. J Mol Biol. 2007;374:806–16. doi: 10.1016/j.jmb.2007.09.069. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Dill KA, Ozkan SB, Shell MS, Weikl TR. Ann Rev Biophys. 2008;9:289–316. doi: 10.1146/annurev.biophys.37.092707.153558. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Liu F, Du S, Fuller AA, Davoren JE, Wipf P, Kelly JW, Gruebele M. Proc Nat Acad Sci USA. 2008;105:2369–74. doi: 10.1073/pnas.0711908105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Liu F, Nakaema M, Gruebele M. J Chem Phys. 2009;131:195101. doi: 10.1063/1.3262489. [DOI] [PubMed] [Google Scholar]

[R17] 17.Bowman GR, Pande VS. Proc Nat Acad Sci USA. 2010;107:10890–5. doi: 10.1073/pnas.1003962107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Lindorff-Larsen K, Piana S, Palmo K, Maragakis P, Klepeis JL, Dror RO, Shaw DE. Proteins. 2010;78(8):1950–1958. doi: 10.1002/prot.22711. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML. J Chem Phys. 1983;79:926. [Google Scholar]

[R20] 20.Deuflhard P, Huisinga W, Fischer A, Schütte C. Lin Alg Appl. 2000;315:39–59. [Google Scholar]

[R21] 21.Beauchamp KA, Bowman GR, Lane TJ, Maibaum L, Haque I, Pande VS. J Chem Theory Comput. 2011 doi: 10.1021/ct200463m. Accepted. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Bowman GR, Huang X, Pande VS. Methods. 2009;49:197–201. doi: 10.1016/j.ymeth.2009.04.013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Metzner P, Schütte C, Vanden-Eijnden E. Multiscale Modeling & Simulation. 2009;7:1192. [Google Scholar]

[R24] 24.Berezhkovskii A, Hummer G, Szabo A. J Chem Phys. 2009;130:205102. doi: 10.1063/1.3139063. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Kabsch W, Sander C. Biopolymers. 1983;22:2577–263. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]

[R26] 26.Ensign DL, Pande VS. Biophys J. 2009;96:53–5. doi: 10.1016/j.bpj.2009.01.024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Boehr DD, Nussinov R, Wright PE. Nat Chem Biol. 2009;5:789–96. doi: 10.1038/nchembio.232. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Wintjens R, Wieruszeski J-M, Drobecq H, Rousselot-Pailley P, Buée L, Lippens G, Landrieu I. J Biol Chem. 2001;276:25150–6. doi: 10.1074/jbc.M010327200. [DOI] [PubMed] [Google Scholar]

[R29] 29.Peng T, Zintsmaster JS, Namanja AT, Peng JW. Nat Struct Biol. 2007;14:325–31. doi: 10.1038/nsmb1207. [DOI] [PubMed] [Google Scholar]

[R30] 30.Jäger M, Zhang Y, Bieschke J, Nguyen H, Dendle M, Bowman ME, Noel JP, Gruebele M, Kelly JW. Proc Nat Acad Sci USA. 2006;103:10648–53. doi: 10.1073/pnas.0600511103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Morcos F, Chatterjee S, McClendon CL, Brenner PR, López-Rendón R, Zintsmaster J, Ercsey-Ravasz M, Sweet CR, Jacobson MP, Peng JW, Izaguirre JA. Modeling Conformational Ensembles of Slow Functional Motions in Pin1-WW. PLoS Comp Bio. 2010;6:e1001015. doi: 10.1371/journal.pcbi.1001015. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Markov State Model Reveals Folding and Functional Dynamics in Ultra-Long MD Trajectories

Thomas J Lane

Gregory R Bowman

Kyle Beauchamp

Vincent A Voelz

Vijay S Pande

Abstract

Introduction

Methods