Abstract
Developing an understanding of protein misfolding processes presents a crucial challenge for unlocking the mysteries of human disease. In this article, we present our observations of β-sheet-rich misfolded states on a number of protein dynamical landscapes investigated through molecular dynamics simulation and Markov state models. We employ a nonequilibrium statistical mechanical theory to identify the glassy states in a protein’s dynamics, and we discuss the nonnative, β-sheet-rich states that play a distinct role in the slowest dynamics within seven protein folding systems. We highlight the fundamental similarity between these states and the amyloid structures responsible for many neurodegenerative diseases, and we discuss potential consequences for mechanisms of protein aggregation and intermolecular amyloid formation.
Introduction
The question of how a protein folds into its functional, three-dimensional native state is intimately related to how this process can fail, i.e., when a protein misfolds into a nonnative but metastable state (1). Such misfolded states can take a pathological role both through inducing protein dysfunction and promoting further aggregation (often into β-sheet-rich amyloid structures) that can disrupt normal cellular and systemic processes. The pathways by which protein aggregates and amyloid structures contribute to neurodegenerative disease in humans are intense subjects of study (2–4). Despite this dedication to studying protein aggregation-related disease, however, the microscopic mechanisms for the formation of misfolded structures remain poorly understood. Descriptions of interactions between misfolded structures and their surroundings in vivo are similarly difficult to access (1–4). To grasp the protein misfolding process, one needs an acute knowledge of the kinetics specific to the system of interest, and one can rely little on information derived from a protein’s crystal structure or native state. As such, few standard experimental, computational, or theoretical methods are well suited to studying the nanoscale physics of protein misfolding and aggregation.
The frustrated kinetics of protein folding—linked to the possibility that disordered (misfolded) states can compete with the native state during the folding process—exhibit natural connections to the theory of glassy physical systems (5–7). In particular, ideas of metastable states pervade discussions of both protein and glassy dynamics. The free energy of a glassy system features many local minima, and the rare transitions between such minima lead to exceptionally slow relaxation kinetics on the configurational landscape. In glass-forming liquids, cooperative molecular rearrangements that occur on the (100 s) timescale are conventionally connected to glassy behavior. In this work, we use the term “glassy” in a more general sense to describe systems in which rare transitions occur among metastable states.
The free-energy landscape for a glass-forming liquid containing (1023) molecules has so many minima (and saddle points) that analysis of its metastable states presents an enormous challenge (8). In particular, one expects the number of relevant metastable states to be of order exp(σN), where σ resembles an entropy per particle (σN is sometimes known as “the complexity”) (9). Clearly, analysis of such minima is only practical in small systems (8). For a single protein molecule, however, relevant metastable basins are likely to be smaller in number and more accessible, offering hope for characterizing ensembles of glasslike states and their possible role in protein misfolding processes.
In this article, we exploit Markov state models (MSMs) of protein folding dynamics to identify and analyze metastable states in several protein systems, and we discuss the possible relationship of these states to protein misfolding, aggregation, and amyloid formation. Constructed from extensive molecular dynamics (MD) data sets, the MSMs analyzed here provide a statistically rigorous representation of protein dynamics that allows for quantitative comparison with theory, simulation, and experiment (10–14). Applying methods from the nonequilibrium physics of glassy systems, we study the slow dynamics and associated metastable states that emerge in protein folding MSMs. Intriguingly, many of the glassy states that appear bear a resemblance to amyloid structures, suggesting routes for protein aggregation via β-sheet-rich monomeric states. Further MD simulations of these glasslike structures reveal that entire ensembles of nonnative, β-rich protein conformations exist on protein folding landscapes—an observation that has gone understated with conventional MD data analysis.
Models and Methods
MSMs and the s-ensemble
Markov state models (MSMs) are obtained by partitioning a protein’s configurational space into a set of discrete states and estimating the rates (or probabilities) of transitions between these states. Having identified N states, and given a fixed lag time τlag, one considers the probability pij that a system initially in state i will evolve into state j, over the time τlag (14). These probabilities are collected into an N × N transition matrix . The partitioning of states and the choice of lag time are made so that transitions between states are, as far as possible, independent of the system’s history: this memoryless property is key to the evaluation of the model. Given the Markovian assumption, the pij values define a stochastic model that operates in discrete time, with each time step corresponding to a physical time interval of length τlag.
In practice, our protein folding MSMs are constructed from large MD simulation data sets obtained from the Folding@home distributed computing project (Stanford University, Stanford, CA) and the specialized ANTON supercomputer (D. E. Shaw, New York City, New York). These advanced computing platforms allow for increasingly large protein systems to be simulated at or near the millisecond timescale (10–13). MSMs built from these simulations leverage rigorous statistical mechanical theory to synthesize vast quantities of MD data, forming a crucial bridge for interpreting and understanding long timescale protein dynamics (14). To date, MSMs have been used to characterize the folding processes of dozens of protein systems (10–12,15,16).
MSMs are composed of thousands of states and many thousands of connections among these states. Accordingly, one needs to carry out further quantitative analysis on MSMs to elucidate the physically relevant processes such models describe. The equilibrium distribution over protein Markov states allows one to estimate free energy differences between conformational ensembles, and probable folding pathways can be identified through transition path theory analysis (11). Mean first passage times (MFPTs) between states provide information on the connectivity of the protein folding network (10); similar information can also be extracted from the eigenvalues and eigenvectors of the matrix .
Here, we analyze protein folding MSMs using the s-ensemble, a method that has recently been applied to glassy systems (17–20). This approach focuses on dynamical trajectories within an MSM—that is, sequences of states that a protein visits over a given observation time, tobs. In this case, we take tobs to be an integer multiple of the lag time τlag, and we concentrate on trajectories with tobs ≫ τlag. We characterize a trajectory x(t) by its dynamical activity, K[x(t)], which measures the number of interstate transitions that occur in the trajectory’s dynamics. Clearly, the property K ≤ tobs/τ holds, as our models allow for at most one transition per lag time. If the system remains in a single state throughout an entire trajectory, then the activity corresponding to that trajectory is zero.
In the s-ensemble approach, a nonequilibrium control parameter s is used to bias these activities of dynamical trajectories (17–20). Denoting the equilibrium probability of a trajectory x(t) as P0[x(t)], we assign it a new probability within the s-ensemble,
| (1) |
where Z(tobs,s) is a normalization constant called the “dynamical partition function” (19,20).
Equation 1 resembles a probability distribution over configurations within the canonical ensemble, with K playing the part of an energy, and s the inverse temperature. For s > 0, we see that trajectories with higher activity K are suppressed; for s < 0, activities are enhanced. We expect the extensive activity K to be proportional to the observation time, tobs. If tobs is large, the exponential dependence of the trajectory probability on K implies that the distribution in Eq. 1 will be dominated by trajectories that are very different from typical equilibrium trajectories. In general, probability distributions of this form are the subject of large deviation theory (21). Based on the analogy with the canonical ensemble, it can be shown that for long trajectories, the bias s plays the same role as a constraint on the activity K. That is, one may think of the s-field as a way of selecting trajectories with below-average (or above-average) activity, but without any other bias or constraint on the particle dynamics. We refer to the probability distribution (1) as the s-ensemble.
In several glassy systems, it has been shown that a positive value of the field s drives the system into very long-lived metastable states (17,19,20). The transition between equilibrium behavior and these metastable states happens at a well-defined value s = s∗ and (in the limit of large tobs) is analogous to a first-order phase transition in the canonical ensemble. We have previously observed (22) such dynamical phase transitions in protein folding MSMs by studying trajectories with tobs ≈ tfold, the protein folding time. Within a set of 16 proteins, we found strong correlations between the microscopic dynamical activity (as illustrated by K at equilibrium) and folding rates. A similar method was also recently applied to a lattice model protein, for large times tobs, where a sharp transition from native behavior to a metastable state was also found (23). In this article, we analyze MSMs to investigate how protein structural ensembles change as the biasing field s is varied. All results are calculated in the long-time limit (tobs → ∞), at which relationships between the s-ensemble and metastable states are clearest (19,24) and the tenets of large deviation theory are applicable (21).
Methods for analyzing the s-ensemble
Properties of trajectories within the s-ensemble can be calculated from the transition matrix, . We define a tilted transition matrix, (s), within the s-ensemble as
| (2) |
where and are matrices containing the off-diagonal and diagonal elements of , respectively (17,21).
In the limit of large tobs, the relevant properties of the s-ensemble can be obtained from the largest eigenvalue of (s) and its associated eigenvectors (25,26). One can write the probability of a trajectory X0,X1,X2,…,Xtobs within the s-ensemble as
| (3) |
where [(s)]x,x′ is a matrix element of (s), and p0(X) is the equilibrium probability that the system occupies an MSM state X. The partition function is obtained by summing such (unnormalized) weights over all states Xk, which leads to a matrix product structure analogous to a transfer matrix (see the Supporting Material for full details). Letting the largest eigenvalue of (s) be designated as λl,s, one can show that, for large tobs, the dynamical partition function scales as (17)
| (4) |
It follows that the mean activity per unit time,
is given by
| (5) |
where the average 〈K〉 is taken with respect to the s-ensemble distribution from Eq. 1 (17).
The function K(s) describes the bias s required to change the mean activity from its equilibrium value to some other regime of interest. As such, it provides a type of fingerprint for a protein that describes the nature of activity fluctuations within the system. These fingerprints have been used to identify different classes of behavior in protein MSMs (22) and to identify dynamical transitions between native and misfolded states in lattice models (23). Here, we concentrate on how protein structural ensembles change as the bias s is introduced to the MSM. To this end, let ϕl,s and ψl,s be the left and right eigenvectors of (s) that correspond to the eigenvalue λl,s. Then, the probability of observing Markov state X within the s-ensemble is
| (6) |
The eigenvectors are normalized as
which ensures that this distribution is also normalized.
Results and Discussion
Ribosome protein domain NTL9(1–39)
To illustrate the behavior of protein folding MSMs in the s-ensemble, we consider a model for the 39-residue protein NTL9(1–39), which forms part of the ribosomal protein L9 (11). The activity curve K(s) for this MSM, shown in Fig. 1, demonstrates how the s-field controls active and glasslike conformational dynamics. We identify four distinct regimes in the system’s response to s, and representative configurations are shown for each regime. The configurations shown are the centroids of the Markov states for which πs(A) is maximal.
Figure 1.

Structural features of NTL9 within the s-ensemble, as a function of the activity-restricting s-field. Mean activities are presented in terms of the average number of conformational changes per time step. (Colored, shaded bars over the activity curve) Values of s over which the corresponding conformations dominate the biased dynamics. The structures shown represent the centroids of the most probable states in each dynamical phase. Dynamics in the three more active phases explore a diverse set of states, but trajectories become stuck in the state shown for the inactive phase. (Inset) Near-equilibrium activities under the s-ensemble. To see this figure in color, go online.
At large negative values of s, trajectories of the system exhibit high activities, and the protein occupies unfolded, extended states. As s becomes smaller in magnitude, we observe a first sharp jump in the activity that relates to a shift from extended conformations to compact, largely unstructured states. Then, in the immediate vicinity of s = 0 (where unbiased equilibrium dynamics are recovered), a second jump in activity indicates where NTL9’s native state comes to dominate the structural ensemble.
At very small positive values of s, another transition occurs: the thermodynamic native state loses its stability, and a nonnative, β-sheet-rich state starts to dominate the s-ensemble. In this regime, where the activity K(s) ≈ 0, the system is effectively stuck in the nonnative state. This crossover resembles the dynamical phase transitions that have been found (17,19,22,23) in both glass-forming systems and proteins. We emphasize that the function K(s) is everywhere continuous in these protein systems—discontinuities associated with dynamical phase transitions occur only in systems where the number of states N is taken to infinity, as part of a thermodynamic limit. Nevertheless, a strong similarity exists between the transition observed at small positive s in Fig. 1 and the glass transitions observed in other studies (17,20): the equilibrium system suddenly collapses into an inactive glass-type state.
How can these properties of s-ensemble dynamics be connected to those of the physical system at equilibrium? Within the s-ensemble, positive values of s generically favor glassy metastable states that have low activity and long lifetimes (17,19). Roughly speaking, one can relate the transition point, s∗, to the equilibration timescale for the inactive state, τI (which often corresponds to the slowest relaxation timescale in the equilibrium system, τI ≈ −τlag/logλ2), with λ2 the largest non-unit eigenvalue of T(0). After computing the activity difference between the inactive (s > 0) regime and the equilibrium state as ΔK ≈ K(s = 0) − K(s > s∗), a standard perturbative argument yields the result s∗ ≈ τlag /(ΔKτI) (19).
For proteins, inactive states are those for which conformational fluctuations are rare and restricted. States with long lifetimes, by contrast, are those from which folding is anomalously slow. The β-sheet-rich state shown for NTL9 in Fig. 1 exhibits both of these properties: we find that the s-ensemble in the glassy regime is dominated by a single Markov state with a self-transition probability pii that is close to 1, indicating that conformational fluctuations around this state are small. This glasslike state is also quite inaccessible from the rest of folding landscape: within the MSM, transit to this conformation from near-native states takes ∼20 ms. The rate of refolding from the glass state to the native state is also low (10–20 per ms). This combination of kinetic inaccessibility and metastability reflects the properties of the inactive states found in glassy systems (17,19). Further information about MFPTs into the glass state is given in the Supporting Material. To illustrate the extent to which this single state dominates the glassy () landscape, we can calculate the Shannon entropy, Σ, over states i in the s-ensemble probability distribution:
The entropy of the equilibrium distribution in NTL9 is ∼4.6, meaning dozens (exp(Σ)) of states contribute significantly to the probability density. By contrast, the entropy at s = 0.1 is ∼1 × 10−5, implying that effectively only one state is relevant in this regime.
One should note that the inactivity of these glassy states does not speak directly to their thermodynamic stability (i.e., equilibrium probability), but rather demonstrates the slow, frustrated kinetics associated with them. It is for this very reason that we designate this inactive state, identified via the s-ensemble, as glassy. As we now discuss, this behavior is mirrored in other proteins as well; the structures of β-rich glassy states seen in these proteins have potential implications for protein misfolding and the formation of amyloid-like states (12,27).
Beta-rich inactive states in six other proteins
Armed with these insights from NTL9, we now extend our search for glassy states to other protein folding MSMs. We have analyzed the 16 model proteins that were considered in a previous study (22), which span a large range of folding timescales and consist of 20–80 residues in length. The states that dominate at large positive s are always quite distinct from the native state in character. To quantify this point, consider the probability pii that a system remains within a single state of the MSM throughout a lag time (of ∼1–10 ns). For native states, pii is typically ∼0.1, meaning that native fluctuations are frequent and the native conformation is relatively fluid. This observation reinforces the contention that the native state exploits its high connectivity to other states to maintain its stability (10,22). At positive s, on the other hand, one often finds extremely inactive states with pii > 0.99—these states are poorly connected on the dynamical landscape and derive their stability from a more glasslike rigidity.
In lieu of applying the formal s-ensemble method, some aspects of this glasslike behavior could be deduced by alternative means. One might start with a simple inspection of the Markov states with the highest pii; one could further search for dominant features in the slow eigenvectors of the transition matrix. These methods can yield similar results to those shown in this work. However, the s-ensemble method has already proven effective in a range of glassy systems, even when other methods fail or become intractable. For example, in systems of interacting particles or spins, the number of states N scales exponentially with the system size, meaning single states or eigenvectors of are impossible to consider (except in extremely small systems). Nevertheless, approaches based on s-ensembles are still possible, using either analytic or numerical methods (17,19,20,23). As a trajectory-based method, the s-ensemble also provides a clear avenue for identifying abstract metastable regions in stochastic models (19,24). It may be that the most important metastable states in a protein model do not correspond with single Markov states in the MSM. In such cases, the Markov states with largest pii need not be the most relevant metastable states in the model, but one would expect the s-ensemble still to capture the dominant metastable regions on the landscape.
Strikingly, for seven of the protein folding systems that we have considered, the glassy (positive-s) states contained either all β-sheet structure or some degree of nonnative β-sheet content. The three slowest-folding proteins studied exhibited this behavior. Conformations representing the resulting glasslike states are illustrated in Fig. 2, along with their corresponding native state structures. The β-content of native and glassy states, as assigned by the DSSP algorithm, is shown in the central bar plots (28). The seven systems are comprised of four native helix-bundles (α3D, λ-repressor, BBL, and Protein B), two proteins (NTL9 and Protein G) with mixed helix-sheet structure in their native states, and one protein with β-rich native structure (WW domain) (11,13,15). In the cases of NTL9 and Protein G, the β-structure shown in Fig. 2 is largely nonnative: the termini in NTL9’s misfolded state are on opposite sides of the protein, and a parallel β-strand structure appears in Protein G’s unfolded helix region. The nonnative β-content in α3D, BBL, and Protein B is less extensive than in the four other systems, consistent with its more rapid formation.
Figure 2.

Illustration of protein folding systems’ nonnative, amyloid-like glassy states (at left for each system), juxtaposed with their respective native state structures (at right). Timescales related to the formation of glassy β-structure, shown on a log scale (at left), are determined from MFPTs within the respective protein folding MSMs. To suggest the nature of structural fluctuations within the glassy and native states, the colored bars between the structures (on a scale from 0 to 100%) illustrate the mean percentage of β-content in each Markov state, as assigned by the algorithm DSSP, framed by lines that represent ±1 SD in percentage. Centroids of these states are presented pictorially. To see this figure in color, go online.
Given its β-rich native structure, the glasslike behavior in the Fip35 WW domain is particularly interesting. Fig. 3 compares the native state of WW with its inactive states, as identified by probabilities in the s > 0 regime. Although the differences appear subtle at first, one sees that the β-structure in the inactive states is substantially rearranged from its native configuration. In the most prominent inactive state, the first β-strand is tucked between second and third strands, and strand-pair interactions are either register-shifted or entirely new. Although the native strand configuration is preserved in the second inactive structure, both strand-pair interactions are register-shifted. The two inactive states are each separated from the native state by ∼300 μs in MFPT. Notably, the glassy rearrangements observed would likely restrict the WW domain from reaching its functionally analogous holo state on short timescales (29).
Figure 3.

Comparison of native and inactive states in the Fip35 WW domain. Strands are labeled such that the native β-strand configuration is written 3←2←1; the color scale helps depict differences in register among states. (a) The most probable state at s > 0 features a reversed strand configuration 3←1←2, as well as a two-residue shift in register between strands 1 and 2 and a new interaction between strands 1 and 3. This reconfiguration renders the functionally analogous holo state inaccessible at short timescales. (b) The second most probable state at s > 0 retains the native strand configuration, but the register between strands 1 and 2 and the register between strands 2 and 3 is shifted by one residue. The MFPT from the native state is similar to both inactive states (≈300 μs).
The glassy structures in all seven systems are accessed over a range of timescales. Fig. 2 indicates that the typical times needed to reach the metastable structures from the native states ranges from 10 to 100 μs for the smaller helix bundles to many milliseconds for the larger proteins (further information about these timescales is given in the Supporting Material). These rare transitions, and the corresponding refolding transitions to the native state, are among the slowest processes observed in these models. Although some degree of misfolded β-structure has been noted for λ-repressor in the past, such states and their glasslike characteristics have not previously been observed over this broad range of systems (12). One might be concerned that these glassy states are unphysical artifacts of force-field inaccuracies; however, we emphasize that the data were collected in a variety of different force fields, some of which have been shown to anomalously favor α-helical structures (11–13).
The nine models that did not exhibit β-rich inactive states serve as somewhat of a control for this work: the inactive states in all such models were collapsed and contained sparse (if any) secondary structure. We have thus seen β-rich inactive states in nearly 50% of globular proteins we have analyzed. The actual β-rich percentage could be somewhat higher, however, owing to possible sampling deficiencies (discussed below). Shannon entropies at s > 0 were <0.1 in all 16 models, and all inactive states identified were relatively far from their native states in terms of MFPTs. These observations suggest that, regardless of structure, dominant, slowly forming inactive states are present in all systems considered.
Nonnative β-sheet content and amyloid-like state formation
The nonnative, β-rich conformations shown in Fig. 2 are interesting for a number of reasons. To begin with, we can now offer several additional concrete examples of intramolecular amyloid structures—conformations in single molecules that resemble intermolecular amyloid aggregates—observed through MD studies (12,27). Formation of such amyloid-like misfolded structures might be more general than previously thought, particularly in larger proteins with less-active folding kinetics (22). The fact that these states are predicted to be highly metastable offers greater hope for their experimental detection, provided that rapid aggregation could be avoided.
Based on the amyloid-like states shown in Fig. 2, one might indeed speculate that these states are amyloidogenic or aggregation-prone: such β-rich conformations could very well serve as nuclei for intermolecular amyloid structure formation. Existing theories differ on how aggregates and amyloids are nucleated, and supporting experimental data for any specific theory are sparse. Some theories suggest that aggregation-prone states form along protein-folding pathways, throughout which hydrophobic patches can become exposed to solvent and interprotein interactions (2). The extent to which the nucleation of particular structures (like amyloids) occurs within protein oligomers remains a subject of doubt, especially when intrinsically disordered proteins come into play (30,31). Theory related to the intrinsically disordered protein polyglutamine suggests that its monomer dynamics are slow and glassy, and some experimental studies indicate that β-rich hairpin conformations in such monomers contribute to amyloid nucleation (32,33). Studies of aggregation-prone states in the globular β2-microglobulin suggest that amyloid nuclei can form on the single molecule level via substantial rearrangement to the protein’s hydrophobic core (34).
The structures presented here are distant from their native states in both structure and passage timescale; in fact, all seven states in Fig. 2 exhibit the absolute slowest interconversion rates with the native states in their respective models. The idea that such misfolded states can still form on biologically relevant timescales (microseconds to milliseconds) is intriguing; their presence provides evidence that aggregate/amyloid nuclei could be composed of far-off-pathway, β-rich conformations of protein monomers. The importance of such structures, of course, would depend on the balance between their formation rates and cytosolic protein concentrations. If protein-protein encounters occur at a much higher rate than β-rich states are visited, more generic aggregation mechanisms are likely to prevail.
From a sampling point of view, it is interesting to note that β-sheet-rich glassy conformations were not detected in the data for NTL9 that was collected on ANTON by Shaw Research (13). A similar condition holds for the λ-repressor protein data. Data collected on the ANTON supercomputer (corresponding to slightly different mutant proteins) comes in the form of several ultralong trajectories started from unfolded states. By contrast, trajectories collected on Folding@home were much shorter, individually; however, these sets of trajectories were also actively driven toward underexplored regions of phase space. Although ultralong trajectories have proven useful for studying protein folding, the adaptive sampling methods employed on Folding@home may be well suited for characterizing full protein dynamical landscapes that describe folding, misfolding, and aggregation-related processes (35,36). One wonders if additional data on these systems would resolve β-rich states and associated long relaxation timescales similar to those seen in Folding@home data.
Stability of glassy states in NTL9
Just how stable are the amyloid-like glassy states shown in Fig. 2? To provide an estimate, we have carried out simulations on two metastable, β-rich conformations of NTL9 using the Folding@home distributed computing project. These states are illustrated in Fig. 4. One conformation is taken from the glasslike state shown in Fig. 1, which is found at positive s; the other was identified by analyzing the low-lying eigenvectors of , which also give insight into metastable states. These two NTL9 conformations were solvated in cubic boxes containing ∼6000 water molecules each, leading to system sizes comprising just under 25,000 particles. Simulations were carried out in the NPT ensemble using the AMBER99SB-ILDN force field (GROMACS, http://www.gromacs.org/Documentation/Terminology/Force_Fields/AMBER) along with the TIP4P-EW water model. Temperature was controlled by a Langevin thermostat, set to 298 K under a collisional frequency of 1 ps−1; pressure was mediated with a Monte Carlo barostat set to 1 bar.
Figure 4.

Structures taken from two metastable, β-rich states of NTL9, as identified from the MSM, for use in seeding Folding@home simulations. (Left) Glassy state from Fig. 1; (right) a second β-rich structure associated with a slow eigenvector of . To see this figure in color, go online.
Approximately 1000 trajectories were initiated from each starting structure, and calculations were allowed to run without further intervention. At the time of analysis, individual trajectory lengths ranged from 200 ns to 1 μs, and ≈300 μs of aggregate data were available corresponding to each seed conformation. It should be noted that each of these states was directly connected to the native state in the original MSM, meaning the minimum path time to folding should be on the order of the lag time (12 ns). A large ensemble of trajectories 20–100 times this length should thus be capable of characterizing rare folding events.
However, few conformations taken from these ≈600 μs of data deviated >5 from their starting configurations. Histograms illustrating the range of states sampled are shown in Fig. 5. These data suggest that ensembles of amyloid-like structures exist on NTL9’s dynamical landscape; such structures are reasonably stable, rich in β-structure, and poorly connected to the native and near-native states. NTL9’s MSM contained only limited data for these β-rich states, but the believed new results presented here allow for their stability to be analyzed more fully. In particular, it is clear that these ensembles are stable over the course of 600 μs. This value should be contrasted with the prediction of the original MSM: the MSM MFPT from glassy to native states is estimated to be ∼40 μs. This process is associated with an eigenvector of the transition matrix whose eigenvalue is consistent with this estimated timescale. In any respect, it seems the MSM underestimates the stability of the glassy state. Presumably, residence times within β-rich ensembles are not fully captured in the original data set, due to limited sampling. The Folding@home simulations described here extend those used to construct the MSM, yielding a direct (and therefore more accurate) estimate of the lifetimes of these states.
Figure 5.

Data indicating the degree of fluctuations within the amyloid-like ensembles investigated through Folding@home simulations. The histograms show the deviations in structure found from the starting configurations illustrated in Fig. 4. Such fluctuations are typically <5 Å in root-mean-square deviation (RMSD), a radius that is comparable in size to that of a typical MSM. Each histogram includes data from ∼300 μs of aggregate simulation time, indicating that the amyloid-like states are both stable and inactive. The snapshots show representative configurations for various values of the RMSD; nonnative β-structure was largely conserved throughout the course of these simulations. To see this figure in color, go online.
We infer that the amyloid-like ensembles being sampled in these simulations are stable on timescales reaching at least one-half of a millisecond. The free energy difference between amyloid and native-state ensembles may be estimated very roughly from
| (7) |
where the second equality follows from detailed balance, in which wamyloid→native is the transition rate from the amyloid to the native state, and similarly wnative→amyloid is the transition rate from native to amyloid. Estimating timescales as (wnative→amyloid)−1 ≈ 24 ms, from the MFPT analysis of the MSM, and (wamyloid→native)−1 500 μs from the Folding@home simulations, we arrive at ΔF ≈ 4KT. Due to the large numerical uncertainties associated with these very long timescales, this value should be regarded only as an order-of-magnitude estimate.
Nevertheless, this calculation serves to indicate that, to our knowledge, the amyloid-like ensemble has the potential to compete with the native ensemble in terms of stability. More subtle considerations, like accessibility of the amyloid-like ensemble from common starting configurations in vivo and in vitro, could serve to temper this statement. We would certainly argue that all of the amyloid-like states observed have the capacity to be aggregation-prone; their persistence over hundreds of microseconds provides hope that such β-rich ensembles might be detected in temperature-jump or other experiments. If such states truly promote aggregation, these β-rich conformations might be very difficult to observe by standard means. Given the intriguing structures observed for NTL9, however, attempts at computational and experimental investigations of the amyloid-like states shown in Fig. 2 are likely warranted.
Conclusion
Applying the nonequilibrium s-ensemble method to protein folding MSMs, we have identified long-lived metastable states within several systems of interest. Given the presence of nonnative β-sheet structures in many such systems, we propose that striking glasslike conformations could be related to generic amyloid-like misfolded structures on protein folding landscapes. Based on a combination of MD simulations, MSMs, and advanced statistical mechanical methods, we have been able to uncover interesting inactive states and study their properties at an atomistic level of detail. This synergy between different classes of theoretical tools illustrates the power of the MSM methodology, both for analyzing MD data and for guiding MD simulations toward processes and structures of interest.
In this instance, the metastable states we have found exhibit rigid β-rich structures of great interest. One might imagine that corresponding amyloid-like ensembles of states, given enough breadth and stability, could make a system particularly prone to aggregation. Our results provide evidence that the nuclei for many protein aggregates may, in fact, already contain a substantial degree of β-sheet structure in single molecular form.
Acknowledgments
This work was funded in part by the NIH (R01-GM062868) and the NSF (MCB-0954714). J.K.W. was supported by the Fannie and John Hertz Foundation on the endowed Yaser S. Abu-Mostafa Fellowship. R.L.J. was supported by the EPSRC through grant EP/I003797/1.
Supporting Material
References
- 1.Dill K.A., Ozkan S.B., Weikl T.R. The protein folding problem. Annu. Rev. Biophys. 2008;37:289–316. doi: 10.1146/annurev.biophys.37.092707.153558. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Dobson C.M. Principles of protein folding, misfolding and aggregation. Semin. Cell Dev. Biol. 2004;15:3–16. doi: 10.1016/j.semcdb.2003.12.008. [DOI] [PubMed] [Google Scholar]
- 3.Braselmann E., Chaney J.L., Clark P.L. Folding the proteome. Trends Biochem. Sci. 2013;38:337–344. doi: 10.1016/j.tibs.2013.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Chiti F., Dobson C.M. Amyloid formation by globular proteins under native conditions. Nat. Chem. Biol. 2009;5:15–22. doi: 10.1038/nchembio.131. [DOI] [PubMed] [Google Scholar]
- 5.Bryngelson J.D., Wolynes P.G. Spin glasses and the statistical mechanics of protein folding. Proc. Natl. Acad. Sci. USA. 1987;84:7524–7528. doi: 10.1073/pnas.84.21.7524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Shakhnovich E.I., Gutin A.M. Formation of unique structure in polypeptide chains. Theoretical investigation with the aid of a replica approach. Biophys. Chem. 1989;34:187–199. doi: 10.1016/0301-4622(89)80058-4. [DOI] [PubMed] [Google Scholar]
- 7.Gutin A., Sali A., Shakhnovich E.I. Temperature dependence of the folding rate in a simple protein model: search for a “glass” transition. J. Chem. Phys. 1998;108:6466–6483. [Google Scholar]
- 8.Heuer A. Exploring the potential energy landscape of glass-forming systems: from inherent structures via metabasins to macroscopic transport. J. Phys. Condens. Matter. 2008;20:373101. doi: 10.1088/0953-8984/20/37/373101. [DOI] [PubMed] [Google Scholar]
- 9.Cavagna A. Supercooled liquids for pedestrians. Phys. Rep. 2009;476:51–124. [Google Scholar]
- 10.Bowman G.R., Pande V.S. Protein folded states are kinetic hubs. Proc. Natl. Acad. Sci. USA. 2010;107:10890–10895. doi: 10.1073/pnas.1003962107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Voelz V.A., Bowman G.R., Pande V.S. Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1-39) J. Am. Chem. Soc. 2010;132:1526–1528. doi: 10.1021/ja9090353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bowman G.R., Voelz V.A., Pande V.S. Atomistic folding simulations of the five-helix bundle protein λ(6−85) J. Am. Chem. Soc. 2011;133:664–667. doi: 10.1021/ja106936n. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lindorff-Larsen K., Piana S., Shaw D.E. How fast-folding proteins fold. Science. 2011;334:517–520. doi: 10.1126/science.1208351. [DOI] [PubMed] [Google Scholar]
- 14.Pande V.S., Beauchamp K., Bowman G.R. Everything you wanted to know about Markov state models but were afraid to ask. Methods. 2010;52:99–105. doi: 10.1016/j.ymeth.2010.06.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Beauchamp K.A., McGibbon R., Pande V.S. Simple few-state models reveal hidden complexity in protein folding. Proc. Natl. Acad. Sci. USA. 2012;109:17807–17813. doi: 10.1073/pnas.1201810109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Piana S., Lindorff-Larsen K., Shaw D.E. Atomistic description of the folding of a dimeric protein. J. Phys. Chem. B. 2013;117:12935–12942. doi: 10.1021/jp4020993. [DOI] [PubMed] [Google Scholar]
- 17.Garrahan J.P., Jack R.L., van Wijland F. Dynamical first-order phase transition in kinetically constrained models of glasses. Phys. Rev. Lett. 2007;98:195702. doi: 10.1103/PhysRevLett.98.195702. [DOI] [PubMed] [Google Scholar]
- 18.Merolle M., Garrahan J.P., Chandler D. Space-time thermodynamics of the glass transition. Proc. Natl. Acad. Sci. USA. 2005;102:10837–10840. doi: 10.1073/pnas.0504820102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Jack R.L., Hedges L.O., Chandler D. Preparation and relaxation of very stable glassy states of a simulated liquid. Phys. Rev. Lett. 2011;107:275702. doi: 10.1103/PhysRevLett.107.275702. [DOI] [PubMed] [Google Scholar]
- 20.Hedges L.O., Jack R.L., Chandler D. Dynamic order-disorder in atomistic models of structural glass formers. Science. 2009;323:1309–1313. doi: 10.1126/science.1166665. [DOI] [PubMed] [Google Scholar]
- 21.Touchette H. The large deviation approach to statistical mechanics. Phys. Rep. 2009;478:1–69. [Google Scholar]
- 22.Weber J.K., Jack R.L., Pande V.S. Emergence of glass-like behavior in Markov state models of protein folding dynamics. J. Am. Chem. Soc. 2013;135:5501–5504. doi: 10.1021/ja4002663. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Mey A.S.J.S., Geissler P.L., Garrahan J.P. Rare-event trajectory ensemble analysis reveals metastable dynamical phases in lattice proteins. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 2014;89:032109. doi: 10.1103/PhysRevE.89.032109. [DOI] [PubMed] [Google Scholar]
- 24.Jack R.L., Garrahan J.P. Metastable states and space-time phase transitions in a spin-glass model. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 2010;81:011111. doi: 10.1103/PhysRevE.81.011111. [DOI] [PubMed] [Google Scholar]
- 25.Jack R.L., Sollich P. Large deviations and ensembles of trajectories in stochastic models. Prog. Theor. Phys. Supp. 2010;184(Supplement 1):304–317. [Google Scholar]
- 26.Lecomte V., Appert-Roland C., van Wijland F. Thermodynamic formalism for systems with Markov dynamics. J. Stat. Phys. 2007;127:51–106. [Google Scholar]
- 27.Prigozhin M.B., Gruebele M. The fast and the slow: folding and trapping of λ6–85. J. Am. Chem. Soc. 2011;133:19338–19341. doi: 10.1021/ja209073z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22:2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
- 29.Lane T.J., Bowman G.R., Pande V.S. Markov state model reveals folding and functional dynamics in ultra-long MD trajectories. J. Am. Chem. Soc. 2011;133:18413–18419. doi: 10.1021/ja207470h. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Teplow D.B. Structural and kinetic features of amyloid β-protein fibrillogenesis. Amyloid. 1998;5:121–142. doi: 10.3109/13506129808995290. [DOI] [PubMed] [Google Scholar]
- 31.Lührs T., Ritter C., Riek R. 3D structure of Alzheimer’s amyloid-β(1-42) fibrils. Proc. Natl. Acad. Sci. USA. 2005;102:17342–17347. doi: 10.1073/pnas.0506723102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Vitalis A., Wang X., Pappu R.V. Quantitative characterization of intrinsic disorder in polyglutamine: insights from analysis based on polymer theories. Biophys. J. 2007;93:1923–1937. doi: 10.1529/biophysj.107.110080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Wetzel R. Physical chemistry of polyglutamine: intriguing tales of a monotonous sequence. J. Mol. Biol. 2012;421:466–490. doi: 10.1016/j.jmb.2012.01.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Eichner T., Kalverda A.P., Radford S.E. Conformational conversion during amyloid formation at atomic resolution. Mol. Cell. 2011;41:161–172. doi: 10.1016/j.molcel.2010.11.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Bowman G.R., Ensign D.L., Pande V.S. Enhanced modeling via network theory: adaptive sampling of Markov state models. J. Chem. Theory Comput. 2010;6:787–794. doi: 10.1021/ct900620b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Weber J.K., Pande V.S. Characterization and rapid sampling of protein folding Markov state model topologies. J. Chem. Theory Comput. 2011;7:3405–3411. doi: 10.1021/ct2004484. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
