Abstract
A molecular understanding of how protein function is related to protein structure requires an ability to understand large conformational changes between multiple states. Unfortunately these states are often separated by high free energy barriers and within a complex energy landscape. This makes it very difficult to reliably connect, for example by all-atom molecular dynamics calculations, the states, their energies, and the pathways between them. A major issue needed to improve sampling on the intermediate states is an order parameter – a reduced descriptor for the major subset of degrees of freedom – that can be used to aid sampling for the large conformational change. We present a method to combine information from molecular dynamics using non-linear time series and dimensionality reduction, in order to quantitatively determine an order parameter connecting two large-scale conformationally distinct protein states. This new method suggests an implementation for molecular dynamics calculations that may be used to enhance sampling of intermediate states.
INTRODUCTION
Proteins represent complex dynamical systems with multiple stable states. Sampling protein motions has shown itself to be complicated by the multiple time scale problem1 with many different wells and barrier heights being found. Despite considerable effort, there is currently no rapid way to determine the range of stable states from a single structure or from sequence alone. Because biological function is intrinsically linked to the large scale conformational change of proteins, an improved understanding of how conformational change in the complex energy landscape of the protein is determined will provide important insights on both biological and physical questions.
When the state change is connected by an obvious low-dimensional reaction coordinate, specialized sampling methods have been developed that can reliably enhance the collection of intermediate states and the understanding of the relative free energy change.2, 3, 4, 5, 6 For example, the passage of an ion through a channel, simple alchemical changes, certain types of conformational change where the movement is mainly hinge-like or otherwise obvious on inspection fall into this category.7, 8 But, for many biological problems, the low-dimensional reaction coordinate that optimally predicts functional behavior is not at all obvious from the structure. As the set of solved x-ray structures has continued to grow, there has been an increasing number of situations where alternative structures for the same protein have been determined.9 Ideally these alternate structures would also suggest the reaction coordinate that connects one conformation to the other. But, despite many outstanding efforts to design sampling methods, a strong limitation is that an order parameter to enable sampling has been difficult to determine from either single or pairs of static x-ray structures.
Some groups have suggested, in this situation, that harmonic analysis from diagonalization of the second derivative matrix (the Hessian) would be sufficient to find the most important collective modes.10, 11, 12, 13, 14, 15, 16 The findings from the coarse-grained model community, in particular, have suggested that this approach can reveal important details about how a protein is connected to large conformational change.17 Other groups have cautioned that the harmonic model is lacking and have suggested instead focusing on determining effective collective modes from the covariance fluctuation matrix (quasi-harmonic, essential dynamics, or principal component analysis).18, 19, 20, 21, 22 These calculations has demonstrated that the low frequency collective motions inferred from the covariance matrix differ from the harmonic analysis. Recently there has been considerable efforts towards further improving on the linear assumptions in principal component analysis, using such tools as kurtosis and quasi anharmonic analysis to further elaborate on the deviations from Gaussian behavior that are found in (MD) molecular dynamics trajectories.23, 24, 25, 26
To directly sample on large-scale conformational change there have been many methods proposed.27, 28, 29, 30, 31, 32, 33, 34, 35, 36 The most well known contemporary method is transition path sampling and uses a small number of conformations along a candidate transition pathway with a Monte Carlo move set to anneal an optimal prediction of intermediates in a conformationally changing system.37, 38 Alternative methods have used the RMS differences between states as an order parameter to control change,39, 40 directly adding a new force that biases motion along the root mean square (RMS) gradient. We developed a method, called dynamic importance sampling (DIMS)41, 42, 43, 44, 45, 46, 47, 48, 49, 50 that uses concepts from stochastic differential equations51 to create a family of independent transitions that together define the likelihood of different pathways and the kinetics of the transition with sufficient sampling. However, similar to the RMS based methods, the DIMS method requires a progress variable for use during the computed transitions and to create the biasing and its correction for an unbiased estimate of pathways, kinetics, and states.
In this paper, we describe the use of effective transfer entropy for the determination of a reduced set of degrees of freedom that can be used to define order parameters behind large scale conformational change. Our approach combines insights from the physics of non-linear time-series analysis, dimensionality reduction, and the chemical physics of protein motions on a complex energy surface to enable the dynamics of the complex system to define an order parameter candidate. This improves on other methods for the determination of order parameters where the candidate order parameter was inferred from empirical analysis of the static structure or simply assumed to correlate with the RMS between two different states. In the calculations to follow, we mainly use the receiver domain of nitrogen regulatory protein C (NtrC) (Fig. 1), in addition, we have performed steered molecular dynamics and checks of the implementation on the glucose-galactose binding protein (GGBP).
PRINCIPAL COMPONENT ANALYSIS
The method of principal component analysis (PCA) has been used in the analysis of protein motions for many years.18, 22, 52, 53, 54, 55, 56, 57 This approach depends on the determination of a set of effective collective modes that define the complex motions that have been seen in the dynamics.52 While the initial excitement over the method as a way to sample on longer time-scales seems to have faded, there remains much effort to use this approach as a tool for the analysis of conformational change. A caution in that analysis has been the suggestion that PCA may lead to significant systematic error when there are multiple stable states separated by a large barrier.53
To compute the PCA modes a MD trajectory is used along with the determination of the average fluctuations in the simulation. Then, from the MD trajectory of a protein with N atoms, the covariance matrix σ is built as follows:
(1) |
where the brackets (⟨…⟩) denote time averages. The orthonormal basis vectors (principal components/PC) are determined by the eigenvalue problem .
The lowest frequency modes from PCA are normally associated with slow, collective motions and have been used to try and predict intermediate states.20 Figure 1 depicts the lowest frequency mode obtained by applying Eq. 1 and, solving the eigenvalue problem for our 600 ns trajectory of NtrC. On this plot the porcupine spines are located at the Cα atoms and their magnitude and direction shows the type of motion involved in the mode.
To connect the PCA modes with conformational transitions between two structures, we use the involvement coefficient. This is defined in the following way. For a given mode α, the involvement coefficients (IC) is
(2) |
where indicates the set of normalized coordinates () that represent the active-state and inactive-state conformations, respectively. Therefore, the ICs measure the amount of overlap between a principal component and the direction defined by the displacement vector between structures. In the case of hinge-bending motions, PCA shows higher values for the ICs compared to those from more complex motions. For instance, in the case of Adenalyte Kinase (AdK), the ICs for the first two modes are 0.49 and 0.63, respectively,58, 59 thus it is possible to characterize most of the transition just by using these two modes. In a previous study, we explored the fact that the structural difference between the apo and the holo states of AdK are almost completely captured by linear correlations within our DIMS framework in order to elucidate ensembles of candidate pathways;41 in a similar way, another study60 was able to obtain intermediate states of the AdK transition by computing the normal modes from an elastic network model during short simulations (≈101 ps).
In the case of NtrC, the ICs are much lower (Fig. 4), in consequence the directions of the first PCs of both stable states are not pointing directly towards the other end state and therefore are not characterized by linear correlations. What is more, in another study, by using a set of order parameters based on observations of both stable structures, it was possible to obtain higher ICs values.59 These order parameters involve only localized regions of the system and are proposed in an orderly series of events, that is, by using a single order parameter it is not possible to characterize the whole transition between the two states.
One of the ideas behind our goal of looking for an order parameter is that a few degrees of freedom dominate part of or the entire transition, while the rest of the system would follow. Therefore finding an order parameter is equivalent to locating such leading modes. In this paper, we use an information theoretical approach to identify the leading modes by measuring the transfer entropy between pairs of residues. The more dominant residues are those that transfer the largest amount of entropy to the rest of the system.
INFORMATION FLOW IN PROTEINS
The networks of interactions between atoms and residues define the web of dependencies and patterns of dynamic coupling between domains in a protein, characterized by the directed flow of information spanning multiple spatial and temporal scales. An initial application of transfer entropy to DNA binding proteins was the first to apply the asymmetry of information transfer to protein molecular motions.61 Let X be the time series for the center of mass of the ith residue and, p(X) its probability distribution. Therefore it is possible to measure the average number of bits needed to optimally encode independent draws by using the Shannon entropy HX = −∑xpxlog p(x),62, 63 where the sum extends over all the states that X can reach.
Transfer entropies
For a residue j ≠ i with a center of mass Y and, probability distribution p(Y); one could say that its trajectory is independent of that of residue i if
(3) |
where p(yn + 1|yn) is the conditional probability to find residue j at state yn + 1 given its past yn, …, y1 and p(yn + 1|yn, xn) is the conditional probability to find residue j at state yn + 1 given the past of both i and j. In the case where there is not a flux of information from X to Y then Eq. 3 is correct. On the other hand, in the event that there is flux of information in any direction, the divergence from correctness of Eq. 3 can be quantified by the Kullback-Leibler entropy64 hence defining the transfer entropy,65
(4) |
The transfer entropy between i and j is minimum and equal to zero when the two residues are independent and there is a maximum and equal to the entropy rate,
(5) |
when the residues are completely coupled. In order to minimize artifacts within the time series, we use the normalized effective transfer entropy given by66, 67
(6) |
where the second term is the average transfer entropy from Ntrials surrogated samples of X, to Y.
The set Γ of most dominant residues
The total flux between two residues X and Y, can be calculated by the equation,
(7) |
Residues are selected according to the following rules: i is selected if DX → Y > 0, residue j is selected if DX → Y < 0 and, if DX → Y = 0 then no residue is selected. The set of most dominant residues Γ is then defined as the set of residues that follow the rules above and also that are above a fixed cutoff |DX → Y| ⩾ Dcutoff.
EXPERIMENTS WITH GGBP
To verify that our implementation was correct, we performed analysis of coupled chaotic Ulam maps, for Henon maps and for autoregressive processes. In addition, as a more challenging test case, we used the Glucose-galactose binding protein (GGBP).68 The two domains of GGBP exhibit a 0.5 rad hinge opening motion from one state to the other. The structure of the open state for an unbound glucose-galactose binding protein (GGBP) was crystallized by Borrock et al. (PDBID:2FW0) (Ref. 68) at 1.55 Å. For the purpose of testing we used both DIMS transitions and we applied a constant pulling force along the line determined by residues Phe:142 and Leu:144 to create a system with a known directional change (highlighted in green in Figure 2). The size of this force was very small, sufficiently so that inspection of unsteered versus steered simulations in visual molecular dynamics (VMD)90 would look identical. Thus, the applied force was meant only to enable us to simulate a situation with a clear set of degrees of freedom that lead and others that should lag, rather than a simulation that was dramatically and artificially shifted too strongly to a non-equilibrium situation.
As a comparison point for our methods, we performed a PCA analysis over the trajectory generated by this same pulling along the residues Phe:142 and Leu:144. With the transient nature of the pulling, it can be seen how PCA is unable to detect the pulling direction (Figure 2). We now describe the data treatment and some results from our initial testing for the transfer entropy analysis that we propose.
Time series treatment
The time series from MD describing the atomic motions of proteins are generally double precision real-valued entries. Previous work on the application of time-series analysis has shown that to determine the joint probability densities in Eq. 4, from real valued data is not only computationally expensive but unnecessary. For example, it has been shown that the amplitude of collective excitations, representing correlated global motions in the protein, samples multicentered distributions.20 Therefore, although single or double precision arithmetic is necessary for the stability and accuracy of the simulations themselves, the accuracy of the analysis does not require this same level of precision. This can greatly aid the determination of the probability distributions while greatly reducing noise and increasing computational efficiency. We optimize our implementation by incorporating high performance computing techniques (massively parallel calculations extended over thousands of cores) and by applying dimensionality reduction and data mining techniques that we briefly describe in the following sections. In other applications of transfer entropies61, 66, 67, 69, 70 discretization of the data is performed mainly by using symbolization techniques. In some cases the discretization maps the data to a single bit time series (spikes), for example in the situations where this analysis has been applied to data from neurophysiological in epilepsy patients.
Piecewise aggregate approximation (PAA)
A time series of length n can be represented by a second time series of length w < n, where each element is computed according to71
(8) |
where Δt = n/w. In other words, each vector of the time series is simply the average, over a time range Δt, of the time series . When Δt is constant, PAA can be seen as an attempt to approximate the original time series with a series of linear functions. Other approaches of PAA include using an adaptive mechanism to adjust Δt according to certain rules, i.e., defining a threshold such that σ(t = T) < ⟨q(t) − ⟨q(t)⟩t = 1…T⟩t = 1…T. For all calculations we set the time range Δt = 0.1 ns.
Transfer entropies from DIMS trajectories
In previous work, we generated a set of transitions for GGBP;44 the simulations were carried out using CHARMM27FF with crossterm map (CMAP)91 (Ref. 72) with our implementation of DIMS and using an implicit solvent model (ACE2).73 The rotational and translational degrees of freedom were removed by rms fitting the target structure to the evolving system and, the alignment atoms were selected on the N-terminal domain (Residues 111 to 252 and, 293 to 305). By applying our transfer entropy analysis we were able to identify the key residues in the DIMS transition (Figure 3). The results show that the leading residues for the transition are located in the three-segment hinge that connects the N- and C-termini 3.
FINDING THE LEADING MODES ON NTRC
The structures of the inactive-state and active-state conformations of NtrC have been solved by NMR.74, 75, 76 At room temperature NtrC samples both conformational states, however after phosphorylation the active states dominate the ensemble set of populations. Recent studies suggest that the transition pathway between the two conformations can be decomposed in a series of segmented progress variables (order parameters).59 For this study both states were solvated in box of dimensions 20 Å × 20 Å × 20 Å with TIP3 waters, equilibrated for 15 ns; the total number of atoms, including solvent and ions, is 12 168 and 13 688 for the active and inactive states respectively. Production runs were performed for 600 ns using NAMD2.7 (Ref. 77) at NICS-Kraken. Analysis of the trajectories was executed using our code at NCSA-Abe/Lincoln.
Computing the modes
A key insight is that the atoms with the strongest leading effective transfer entropy can be used as a subset of degrees of freedom to define collective modes that are new candidate order parameters. To accomplish this goal, once a cutoff and a time-length for the interrogation of the dynamics has been defined, is straightforward. The modes are determined by fluctuations of the leading effective transfer components and together describe a set of collective motions.
For the residues in the set Γ we compute the covariance matrix as in Eq. 1 over the full trajectory and obtain a set of modes . The involvement coefficients (Eq. 2) for different values of the cutoff Dcutoff are presented in Figure 4. As the cutoff increases fewer residues are selected as dominant, however, the involvement coefficients are clearly increasing. This suggests that the most dominant modes are pointing towards the end structure. Since the modes are transferring entropy to the entire system biasing along these modes would result in a collective bias for the entire system.
Since ηα is an orthonormal base we can define the cumulative involvement coefficient μα of the first α principal components as
(9) |
and measure how much of the overall difference is accounted by the first α modes.
This last figure suggests that relatively short molecular dynamics simulations are converging onto the important degrees of freedom determined by the effective transfer entropy analysis (Fig. 5). It suggests that an algorithm for the use of the effective transfer entropy modes can be readily defined in CHARMM or other computer code. In that algorithm the lowest frequency modes would be the direction of biasing that is applied through DIMS or another approach (e.g., transition path sampling or targeted MD). The modes would be defined by a relatively short unbiased simulation and then followed by biasing for a similar amount of time to the mode determination. For example, this figure would suggest that 5 ns of sampling for the effective modes followed by 5 ns of sampling along the modes could be used to improve the confidence that the most important intermediate states are being reached. This would then be repeated with unbiased sampling including light restraints on the backbone atoms to define a new set of effective transfer entropy modes. By continuing this process until the end state is reached, a transition pathway would be defined. If this process is then repeated for multiple starting points with various sampling windows and different random number seeds, along with a random selection of cutoffs and mode selections, then a good sampling of the intermediate space should be obtained.
COMPARISON TO OTHER METHODS AND ASSUMPTIONS
It needs to be emphasized that the proposed method requires dynamic information for the calculation. The time-series comparisons that underlie the proposed method may be sensitive to the system choice and the total amount of information that is captured. Our results in this paper suggest that the total time needed to capture the leading and lagging degrees of freedom is much less than we might have initially assumed. But, the appropriate amount of time needed to collect dynamic information before a calculation of the effective transfer entropies is still an open question. It should be noted that there should be no pulling forces applied or other biasing if the suggestions for an order parameter are to correctly reflect motions from the unbiased state towards other directions. In a similar way, though we have assumed that two conformations are available for the calculation of the utility of the approach for conformational change, there is no restriction to two or more conformations for the calculation of the effective transfer entropy. Instead, the method outlined suggests that an effective order parameter that leads out from a particular conformational state may be defined by this approach and does not require that the order parameter, by itself, lead towards a specific endpoint. In that regard, then, the approach may also be helpful for sampling on multiple intermediate states that connect different larger conformational states. We have yet to fully test the utility of this thought, so the directions that the order parameter may lead could be coupled, as we outlined here, to computational efforts for understanding conformational change between states, or for the purposes of enhancing sampling away from one state and towards other, yet unexplored, states.
In addition, the method should be contrasted with other approaches that have attempted to determine subsets of states from long molecular dynamics data and then by extension to define intermediates and their connections to the states.78 For example, the Head-Gordon group has suggested using instantaneous normal modes to define changes in the AdK system.60 This relates to efforts using modes defined by essential dynamics analysis to sample on conformational change in the same system,79 to work with Monte Carlo methods and collective modes80, 81 as well as to efforts using amplified collective modes.82 Our work on AdK suggests that the conformational change is much simpler than in NtrC, and that the optimal order parameter may be easier.41 Other groups have emphasized that cracking of secondary structural elements may be important for conformational change in AdK and should be considered in conformational change.83 The current approach does not make any assumptions about the nature of the secondary structure of the domain motions needed for the conformational change.
In a related way, there is research from the Pande and other groups that is attempting to define Markov models based on long-dynamics simulations. In principle, the Markov models should also define the reduced descriptors needed for transitions between the Markov defined states. In practice, the approach outlined in this contribution may help with improving sampling between the states defined by the Markov models, since the intermediates may well be undersampled relative to the states themselves.84
Work within the Thorpe group has suggested that conformational change can be considered in terms of the pebble game and degrees of freedom that are available from a static structure.85 In that regard the current contribution may be thought of as finding those most important subsets of degrees of freedom that lead the change, as opposed to defining solely the available subset. It may be fruitful to define more fully what types of correlated motions are most likely to lead to order parameters and what less likely. This would be another interesting extension of this work and would complement work on static structural analysis. In a somewhat similar manner, simpler chemical systems have suggested that algorithms can be designed to follow peaks and valleys on adiabatic surfaces based on a single structure to define transition states.86, 87 Others have suggested that an improved understanding of the connections between temperature and transitions would aid an understanding of the intermediates.88
Finally, there is a growing body of work addressing the limitations of principal component analysis and this suggests that there may be connections between the effective transfer entropy and the improved resolution of non-Gaussian analysis of long molecular dynamics trajectories.26 While the nature of these connections remains to be understood, it suggests that the non-Gaussian components of motion may be the most important determinants of change out from the system. This could also tie into work from the Clementi group that is trying to find dimensionally reduced representations of dynamic conformation space.89
CONCLUSIONS
A molecular understanding of how protein function is related to protein structure will require an ability to understand large conformational changes between multiple states. Unfortunately these states are often separated by high free energy barriers and within a complex energy landscape. This makes it very difficult to reliably connect, for example, by all-atom molecular dynamics calculations, the states, their energies, and the pathways between them. A major issue needed to improve sampling on the intermediate states is an order parameter – a reduced descriptor for the major subset of degrees of freedom – that can be used to aid sampling for the large conformational change. In this paper, we present a way to combine information from molecular dynamics using non-linear time series and dimensionality reduction, in order to quantitatively determine an order parameter connecting two large-scale conformationally distinct protein states. The results presented show that the leading modes can be computed from short simulations. This new method suggests an implementation for molecular dynamics calculations that may dramatically enhance sampling of intermediate states.
References
- Berne B. J., Molecular Dynamics and Monte Carlo Simulations of Rare Events (Academic, 1985). [Google Scholar]
- Kollman P. A., “Free-energy calculations: Applications to chemical and biochemical phenomena,” Chem. Rev. 93, 2395–2417 (1993). 10.1021/cr00023a004 [DOI] [Google Scholar]
- Bartels C. and Karplus M., “Probability distributions for complex systems: Adaptive umbrella sampling of the potential energy,” J. Phys. Chem. B 102, 865–880 (1998). 10.1021/jp972280j [DOI] [Google Scholar]
- Chodera J. D., Swope W. C., Pitera J. W., Seok C., and Dill K. A., “Use of the weighted histogram analysis method for the analysis of simulated and parallel tempering simulations,” J. Chem. Theory Comput. 3, 26–41 (2007). 10.1021/ct0502864 [DOI] [PubMed] [Google Scholar]
- Kumar S., Rosenberg J. M., Bouzida D., Swendsen R. H., and Kollman P. A., “The weighted histogram analysis method for free-energy calculations on biomolecules. i. the method,” J. Comput. Chem. 13, 1011–1021 (1992). 10.1002/jcc.540130812 [DOI] [Google Scholar]
- Shirts M. R. and Chodera J. D., “Statistically optimal analysis of samples from multiple equilibrium states,” J. Chem. Phys. 129, 124105 (2008). 10.1063/1.2978177 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kong X. and Brooks C. L., “Lambda-dynamics: A new approach to free energy calculations,” J. Chem. Phys. 105, 2414 (1996). 10.1063/1.472109 [DOI] [Google Scholar]
- Roux B., Allen T., Berneche S., and Im W., “Theoretical and computational models of biological ion channels,” Q. Rev. Biophys. 37, 15–103 (2004). 10.1017/S0033583504003968 [DOI] [PubMed] [Google Scholar]
- Echols N., Milburn D., and Gerstein M., “Molmovdb: Analysis and visualization of conformational change and structural flexibility,” Nucleic. Acids Res. 31, 478–482 (2003). 10.1093/nar/gkg104 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alexandrov V., Lehnert U., Echols N., Milburn D., Engelman D., and Gerstein M., “Normal modes for predicting protein motions: A comprehensive database assessment and associated web tool,” Protein Sci. 14, 633–643 (2005). 10.1110/ps.04882105 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brooks B. and Karplus M., “Harmonic dynamics of proteins: Normal modes and fluctuations in bovine pancreatic trypsin inhibitor,” Proc. Natl. Acad. Sci. U.S.A. 80, 6571–6575 (1983). 10.1073/pnas.80.21.6571 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tama F. and Sanejouand Y.-H, “Conformational change of proteins arising from normal mode calculations,” Protein Eng. 14, 1–6 (2001). 10.1093/protein/14.1.1 [DOI] [PubMed] [Google Scholar]
- Go N., Noguti T., and Nishikawa T., “Dynamics of a small globular protein in terms of low-frequency vibrational modes,” Proc. Natl. Acad. Sci. U.S.A. 80, 3696–3700 (1983). 10.1073/pnas.80.12.3696 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skjaerven L., Hollup S. M., and Reuter N., “Normal mode analysis for proteins,” J. Mol. Struct.: THEOCHEM 898, 42–48 (2009). 10.1016/j.theochem.2008.09.024 [DOI] [Google Scholar]
- Wako H., Kato M., and Endo S., “Promode: A database of normal mode analyses on protein molecules with a full-atom model,” Bioinformatics 20, 2035–2043 (2004). 10.1093/bioinformatics/bth197 [DOI] [PubMed] [Google Scholar]
- Zheng W. and Doniach S., “A comparative study of motor-protein motions by using a simple elastic-network model,” Proc. Natl. Acad. Sci. U.S.A. 100(23), 13253–13258 (2003). 10.1073/pnas.2235686100 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chennubhotla C. and Bahar I., “Signal propagation in proteins and relation to equilibrium fluctuations,” PLOS Comput. Biol. 3, 1716–1726 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Amadei A., Linssen A. B. M., and Berendsen H. J. C., “Essential dynamics of proteins,” Proteins: Struct., Funct., and Bioinf. 17, 412–425 (1993). 10.1002/prot.340170408 [DOI] [PubMed] [Google Scholar]
- de Groot B. L., van Aalten D. M. F., Amadei A., and Berendsen H. J. C., “The consistency of large concerted motions in proteins in molecular dynamics simulations,” Biophys. J. 71, 1707–1713 (1996). 10.1016/S0006-3495(96)79372-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garcia A. E., “Large-amplitude nonlinear motions in proteins,” Phys. Rev. Lett. 68(17), 2696–2699 (1992). 10.1103/PhysRevLett.68.2696 [DOI] [PubMed] [Google Scholar]
- Hayward S., Kitao A., and Go N., “Harmonicity and anharmonicity in protein dynamics: A normal mode analysis and principal component analysis,” Proteins: Struct., Funct., and Genet. 23, 177–186 (1995). 10.1002/prot.340230207 [DOI] [PubMed] [Google Scholar]
- van Aalten D. M. F., Amadei A., Linssen A. B. M., Eijsink V. G. H., Vriend G., and Berendsen H. J. C., “The essential dynamics of thermolysin: Confirmation of the hinge-bending motion and comparison of simulations in vacuum and water,” Proteins: Struct., Funct., and Bioinf. 22, 45 (1995). 10.1002/prot.340220107 [DOI] [PubMed] [Google Scholar]
- Hub J. S. and de Groot B. L., “Detection of functional modes in protein dynamics,” PLOS Comput. Biol. 5(8), e1000480 (2009). 10.1371/journal.pcbi.1000480 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lange O. F. and Grubmller H., “Full correlation analysis of conformational protein dynamics,” Proteins: Struct., Funct., and Bioinf. 70(4), 1294–1312 (2008). 10.1002/prot.21618 [DOI] [PubMed] [Google Scholar]
- Ramanathan A., Savol A. J., Langmead C. J., Agarwal P. K., and Chennubhotla C. S., “Discovering conformational sub-states relevant to protein function,” PLoS ONE 6(1), e15827 (2011). 10.1371/journal.pone.0015827 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Savol A. J., Burger V. M., Agarwal P. K., Ramanathan A., and Chennubhotla C. S., “QAARM: quasi-anharmonic autoregressive model reveals molecular recognition pathways in ubiquitin,” Bioinformatics 27(13), i52–i60 (2011). 10.1093/bioinformatics/btr248 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crehuet R. and Field M. J., “A temperature-dependent nudged-elastic-band algorithm,” J. Chem. Phys. 118(21), 9563–9571 (2003). 10.1063/1.1571817 [DOI] [Google Scholar]
- Eastman P., Jensen N. G., and Doniach S., “Simulation of protein folding by reaction path annealing,” J. Chem. Phys. 114(8), 3823–3841 (2001). 10.1063/1.1342162 [DOI] [Google Scholar]
- Huang H., Ozkirimli E., and Post C. B., “Comparison of three perturbation molecular dynamics methods for modeling conformational transitions,” J. Chem. Theory Comput. 5(5), 1304–1314 (2009). 10.1021/ct9000153 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huo S. and Straub J. E., “The MaxFlux algorithm for calculating variationally optimized reaction paths for conformational transitions in many body systems at finite temperature,” J. Chem. Phys. 107(13), 5000–5006 (1997). 10.1063/1.474863 [DOI] [Google Scholar]
- Kim M. K., Chirikjian G. S., and Jernigan R. L., “Elastic models of conformational transitions in macromolecules,” J. Mol. Graphics Modell. 21(2), 151–160 (2002). 10.1016/S1093-3263(02)00143-2 [DOI] [PubMed] [Google Scholar]
- Pratt L. R., “A statistical method for identifying transition states in high dimensional problems,” J. Chem. Phys. 85(9), 5045–5048 (1986). 10.1063/1.451695 [DOI] [Google Scholar]
- Ren W., Vanden-Eijnden E., Maragakis P., and W. E, “Transition pathways in complex systems: Application of the finite-temperature string method to the alanine dipeptide,” J. Chem. Phys. 123(13), 134109 (2005). 10.1063/1.2013256 [DOI] [PubMed] [Google Scholar]
- Zhang B. W., Jasnow D., and Zuckerman D. M., “Efficient and verified simulation of a path ensemble for conformational change in a united-residue model of calmodulin,” Proc. Natl. Acad. Sci. U.S.A. 104(46), 18043–18048 (2007). 10.1073/pnas.0706349104 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng W., Brooks B. R., and Hummer G., “Protein conformational transitions explored by mixed elastic network models,” Proteins 69(1), 43–57 (2007). 10.1002/prot.21465 [DOI] [PubMed] [Google Scholar]
- Zheng W. and Brooks B. R., “Normal-modes-based prediction of protein conformational changes guided by distance constraints,” Biophys. J. 88(5), 3109–3117 (2005). 10.1529/biophysj.104.058453 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bolhuis P. G., Dellago C., and Chandler D., “Reaction coordinates of biomolecular isomerization,” Proc. Natl. Acad. Sci. U.S.A. 97(11), 5877–82, 5 (2000). 10.1073/pnas.100127697 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dellago C., Bolhuis P. G., Csajka F. S., and Chandler D., “Transition path sampling and the calculation of rate constants,” J. Chem. Phys. 108(5), 1964–1977 (1998). 10.1063/1.475562 [DOI] [Google Scholar]
- Maragakis P. and Karplus M., “Large amplitude conformational change in proteins explored with a plastic network model: Adenylate kinase,” J. Mol. Biol. 352(4), 807–822, 9 (2005). 10.1016/j.jmb.2005.07.031 [DOI] [PubMed] [Google Scholar]
- van der Vaart A. and Karplus M., “Simulation of conformational transitions by the restricted perturbation–targeted molecular dynamics method,” J. Chem. Phys. 122(11), 114903 (2005). 10.1063/1.1861885 [DOI] [PubMed] [Google Scholar]
- Beckstein O., Denning E. J., Perilla J. R., and Woolf T. B., “Zipping and unzipping of adenylate kinase: atomistic insights into the ensemble of openclosed transitions,” J Mol. Biol. 394(1), 160–176 (2009). 10.1016/j.jmb.2009.09.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Denning E. J. and Woolf T. B., “Cooperative nature of gating transitions in K+ channels as seen from dynamic importance sampling calculations,” Proteins: Struct., Funct., and Bioinf. 78(5), 1105–1119 (2010). 10.1002/prot.22632 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jang H. and Woolf T. B. B., “Multiple pathways in conformational transitions of the alanine dipeptide: An application of dynamic importance sampling,” J. Comput. Chem. 27(11), 1136–1141 (2006). 10.1002/jcc.20444 [DOI] [PubMed] [Google Scholar]
- Perilla J. R., Beckstein O., Denning E. J., and Woolf T. B., “Computing ensembles of transitions from stable states: Dynamic importance sampling,” J. Comput. Chem. 32(2), 196–209 (2011). 10.1002/jcc.21564 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shimamura T., Weyand S., Beckstein O., Rutherford N. G., Hadden J. M., Sharples D., Sansom M. S. P., Iwata S., Henderson P. J. F., and Cameron A. D., “Molecular basis of alternating access membrane transport by the sodium-hydantoin transporter Mhp1,” Science 328(5977), 470–473 (2010). 10.1126/science.1186303 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stansfeld P. J. and Sansom M. S. P., “Molecular simulation approaches to membrane proteins,” Structure (London) 19(11), 1562–1572 (2011). 10.1016/j.str.2011.10.002 [DOI] [PubMed] [Google Scholar]
- Woolf T., “Path corrected functionals of stochastic trajectories: towards relative free energy and reaction coordinate calculations,” Chem. Phys. Lett. 289(5-6), 433–441 (1998). 10.1016/S0009-2614(98)00427-8 [DOI] [Google Scholar]
- Zuckerman D. M. and Woolf T. B., “Dynamic reaction paths and rates through importance-sampled stochastic dynamics,” J. Chem. Phys. 111(21), 9475–9484 (1999). 10.1063/1.480278 [DOI] [Google Scholar]
- Zuckerman D. M. and Woolf T. B., “Efficient dynamic importance sampling of rare events in one dimension,” Phys. Rev. E 63(1), 016702 (2000). 10.1103/PhysRevE.63.016702 [DOI] [PubMed] [Google Scholar]
- Zuckerman D. M. and Woolf T. B., “Transition events in butane simulations: Similarities across models,” J. Chem. Phys. 116(6), 2586–2591 (2002). 10.1063/1.1433501 [DOI] [Google Scholar]
- Wagner W., “Unbiased Monte Carlo evaluation of certain functional integrals,” J. Comput. Phys. 71(1), 21–33 (1987). 10.1016/0021-9991(87)90017-9 [DOI] [Google Scholar]
- Andricioaei I. and Karplus M., “On the calculation of entropy from covariance matrices of the atomic fluctuations,” J. Chem. Phys. 115, 6289–6292 (2001). 10.1063/1.1401821 [DOI] [Google Scholar]
- Balsera M. A., Wriggers W., Oono Y., and Schulten K., “Principal component analysis and long time protein dynamics,” J. Phys. Chem. 100(7), 2567–2572 (1996). 10.1021/jp9536920 [DOI] [Google Scholar]
- Horiuchi T. and Go N., “Projection of Monte Carlo and molecular dynamics trajectories onto the normal mode axes: Human lysozyme,” Proteins: Struct., Funct., and Bioinf. 10(2), 106–116 (1991). 10.1002/prot.340100204 [DOI] [PubMed] [Google Scholar]
- Ichiye T. and Karplus M., “Collective motions in proteins: A covariance analysis of atomic fluctuations in molecular dynamics and normal mode simulations,” Proteins: Struct., Funct., and Bioinf. 11, 205–217 (1991). 10.1002/prot.340110305 [DOI] [PubMed] [Google Scholar]
- Karplus M. and Kushick J., “Method for estimating the configurational entropy of macromolecules,” Macromolecules 14, 325–332 (1981). 10.1021/ma50003a019 [DOI] [Google Scholar]
- Schlitter J., “Estimation of absolute and relative entropies of macromolecules using the covariance matrix,” Chem. Phys. Lett. 215(6), 617–621 (1993). 10.1016/0009-2614(93)89366-P [DOI] [Google Scholar]
- Henzler-Wildman K. and Kern D., “Dynamic personalities of proteins,” Nature (London) 450(7172), 964–972 (2007). 10.1038/nature06522 [DOI] [PubMed] [Google Scholar]
- Lei M., Velos J., Gardino A., Kivenson A., Karplus M., and Kern D., “Segmented transition pathway of the signaling protein nitrogen regulatory protein C,” J. Mol. Biol. 392(3), 823–836 (2009). 10.1016/j.jmb.2009.06.065 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peng C., Zhang L., and H-Gordon T., “Instantaneous normal modes as an unforced reaction coordinate for protein conformational transitions,” Biophys. J. 98(10), 2356–2364 (2010). 10.1016/j.bpj.2010.01.044 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kamberaj H. and van der Vaart A., “Extracting the causality of correlated motions from molecular dynamics simulations,” Biophys. J. 97(6), 1747–1755 (2009). 10.1016/j.bpj.2009.07.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reza F. M., An Introduction to Information Theory (Dover, 1994). [Google Scholar]
- Shannon C. E., “A mathematical theory of communication,” MD Comput. Comp. Med. Pract. 27(4), 306–317 (1948). [PubMed] [Google Scholar]
- Kullback S. and Leibler R. A., “On information and sufficiency,” Ann. Math. Stat. 22(1), 79–86 (1951). 10.1214/aoms/1177729694 [DOI] [Google Scholar]
- Schreiber T., “Measuring information transfer,” Phys. Rev. Lett. 85(2), 461–464 (2000). 10.1103/PhysRevLett.85.461 [DOI] [PubMed] [Google Scholar]
- Gourévitch B. and Eggermont J. J., “Evaluating information transfer between auditory cortical neurons,” J. Neurophysiol. 97(3), 2533–2543 (2007). 10.1152/jn.01106.2006 [DOI] [PubMed] [Google Scholar]
- Marschinski R. and Kantz H., “Analysing the information flow between financial time series. An improved estimator for transfer entropy,” Eur. Phys. J. B 30(2), 275–281 (2002). 10.1140/epjb/e2002-00379-2 [DOI] [Google Scholar]
- Borrok J. J., Kiessling L. L., and Forest K. T., “Conformational changes of glucose/galactose-binding protein illuminated by open, unliganded, and ultra-high-resolution ligand-bound structures,” Protein Sci. 16(6), 1032–1041 (2007). 10.1110/ps.062707807 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lungarella M., Pitti A., and Kuniyoshi Y., “Information transfer at multiple scales,” Phys. Rev. E 76(5), 056117 (2007). 10.1103/PhysRevE.76.056117 [DOI] [PubMed] [Google Scholar]
- Staniek M. and Lehnertz K., “Symbolic transfer entropy,” Phys. Rev. Lett. 100(15), 158101 (2008). 10.1103/PhysRevLett.100.158101 [DOI] [PubMed] [Google Scholar]
- Lin J., Keogh E., Lonardi S., and Chiu B., “A symbolic representation of time series, with implications for streaming algorithms,” in Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery DMKD 03 (ACM, 2003), p. 2.
- Brooks B. R., Brooks C. L., Mackerell A. D., Nilsson L., Petrella R. J., Roux B., Won Y., Archontis G., Bartels C., Boresch S., Caflisch A., Caves L., Cui Q., Dinner A. R., Feig M., Fischer S., Gao J., Hodoscek M., Im W., Kuczera K., Lazaridis T., Ma J., Ovchinnikov V., Paci E., Pastor R. W., Post C. B., Pu J. Z., Schaefer M., Tidor B., Venable R. M., Woodcock H. L., Wu X., Yang W., York D. M., and Karplus M., “CHARMM: The biomolecular simulation program,” J. Comput. Chem. 30(10), 1545–1614 (2009). 10.1002/jcc.21287 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schaefer M., Bartels C., and Karplus M., “Solution conformations and thermodynamics of structured peptides: molecular dynamics simulation with an implicit solvation model,” J. Mol. Biol. 284(3), 835–848 (1998). 10.1006/jmbi.1998.2172 [DOI] [PubMed] [Google Scholar]
- Hastings C. A, Lee S.-Y., Cho Ho. S., Yan D., Kustu S., and Wemmer D. E., “High-resolution solution structure of the beryllofluoride-activated NtrC receiver domain,” Biochemistry 42(30), 9081–9090 (2003). 10.1021/bi0273866 [DOI] [PubMed] [Google Scholar]
- Kern D., Volkman B. F., Luginbühl P., Nohaile M. J., Kustu S., and Wemmer D. E., “Structure of a transiently phosphorylated switch in bacterial signal transduction,” Nature (London) 402(6764), 894–898 (1999). 10.1038/47273 [DOI] [PubMed] [Google Scholar]
- Volkman B. F., Lipson D., Wemmer D. E., and Kern D., “Two-state allosteric behavior in a single-domain signaling protein,” Science 291(5512), 2429–2433 (2001). 10.1126/science.291.5512.2429 [DOI] [PubMed] [Google Scholar]
- Phillips J. C., Braun R., Wang W., Gumbart J., Tajkhorshid E., Villa E., Chipot C., Skeel R. D., Kale L., and Schulten K., “Scalable molecular dynamics with NAMD,” J. Comp. Chem. 26(16), 1781–1802 (2005). 10.1002/jcc.20289 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jain A., Hegger R., and Stock G., “Hidden complexity of protein free-energy landscapes revealed by principal component analysis by parts,” J. Phys. Chem. Lett. 1, 2769–2773 (2010). 10.1021/jz101069e [DOI] [Google Scholar]
- Snow C., Qi G., and Steven H., “Essential dynamics sampling study of adenylate kinase: Comparison to citrate synthase and implications for the hinge and shear mechanisms of domain motions,” Proteins: Struct., Funct., and Bioinf. 67, 325–337 (2009). 10.1002/prot.21280 [DOI] [PubMed] [Google Scholar]
- Kantarci-Carsibasi N., Haliloglu T., and Doruker P., “Conformational transition pathways explored by Monte Carlo simulation integrated with collective modes,” Biophys. J. 95, 5862–5873 (2008). 10.1529/biophysj.107.128447 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miloshevsky G. and Jordan P., “Open-state conformation of the KcsA K+ channel: Monte Carlo normal mode following simulations,” Structure 15(12), 1654–1662 (2007). 10.1016/j.str.2007.09.022 [DOI] [PubMed] [Google Scholar]
- Zhang Z., Shi Y., and Liu H., “Molecular dynamics simulations of peptides and proteins with amplified collective motions,” Biophys. J. 84, 3583–3593 (2003). 10.1016/S0006-3495(03)75090-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miyashita O., Onuchic J. N., and Wolynes P. G., “Nonlinear elasticity, proteinquakes, and the energy landscapes of functional transitions in proteins,” Proc. Natl. Acad. Sci. U.S.A. 100, 12570–12575 (2003). 10.1073/pnas.2135471100 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bowman G. R. and Pande V. S., “Protein folded states are kinetic hubs,” Proc. Natl. Acad. Sci. U.S.A. 107, 10890–10895 (2010). 10.1073/pnas.1003962107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lei M., Zavodsky M. I., Kuhn L. A., and Thorpe M. F., “Sampling protein conformations and pathways,” J. Comput. Chem. 25, 1133–1148 (2004). 10.1002/jcc.20041 [DOI] [PubMed] [Google Scholar]
- Bofill J. M. and Anglada J. M., “Finding transition states using reduced potential-energy surfaces,” Theor. Chem. Acc. 105(6), 463–472 (2001). 10.1007/s002140000252 [DOI] [Google Scholar]
- Cerjan C. J. and Miller W. H., “On finding transition states,” J. Chem. Phys. 75(6), 2800–2806 (1981). 10.1063/1.442352 [DOI] [Google Scholar]
- Elber R. and Shalloway D., “Temperature dependent reaction coordinates,” J. Chem. Phys. 112(13), 5539–5545 (2000). 10.1063/1.481131 [DOI] [Google Scholar]
- Rohrdanz M. A., Zheng W., Maggioni M., and Clementi C., “Determination of reaction coordinates via locally scaled diffusion map,” J. Chem. Phys. 134, 124116 (2011). 10.1063/1.3569857 [DOI] [PubMed] [Google Scholar]
- Humphrey W., Dalke A., and Schulten K., “VMD: Visual molecular dynamics,” J. Mol. Graphics 14, 33–38 (1996). 10.1016/0263-7855(96)00018-5 [DOI] [PubMed] [Google Scholar]
- A. D.MacKerell, Jr., Feig M., C. L.BrooksIII, “Extending the treatment of backbone energetics in protein force fields: Limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations,” J. Comput. Chem. 25, 1400–1415 (2004). 10.1002/jcc.20065 [DOI] [PubMed] [Google Scholar]