Abstract

The potential of molecular simulations is limited by their computational costs. There is often a need to accelerate simulations using some of the enhanced sampling methods. Metadynamics applies a history-dependent bias potential that disfavors previously visited states. To apply metadynamics, it is necessary to select a few properties of the system—collective variables (CVs) that can be used to define the bias potential. Over the past few years, there have been emerging opportunities for machine learning and, in particular, artificial neural networks within this domain. In this broad context, a specific unsupervised machine learning method was utilized, namely, parametric time-lagged t-distributed stochastic neighbor embedding (ptltSNE) to design CVs. The approach was tested on a Trp-cage trajectory (tryptophan cage) from the literature. The trajectory was used to generate a map of conformations, distinguish fast conformational changes from slow ones, and design CVs. Then, metadynamic simulations were performed. To accelerate the formation of the α-helix, we added the α-RMSD collective variable. This simulation led to one folding event in a 350 ns metadynamics simulation. To accelerate degrees of freedom not addressed by CVs, we performed parallel tempering metadynamics. This simulation led to 10 folding events in a 200 ns simulation with 32 replicas.
Introduction
In the past decades, numerous studies have demonstrated that the conformational dynamics of biomolecules is equally important as their 3D structures. Computational modeling of biomolecular dynamics, in particular the method of molecular dynamics (MD) simulation, has become an important alternative and complement to experimental methods.1,2
The impact of MD simulations has always been limited by their high computational costs caused by a small step and a huge number of interatomic potentials evaluated in every step. For this reason, it is possible to simulate nanosecond to microsecond time scales, but longer time scales are still not routine as they require either high computational power in terms of numbers of CPUs and GPUs, or special purpose hardware.
Nanosecond to microsecond timescales often do not allow for sampling of all important states of the studied system. For this reason, numerous extensions of MD have been developed to enhance sampling so that important states can be sampled and their densities can be predicted by short simulations. In this work, we use metadynamics.3 This method enhances sampling by “flooding” local energy minima with an artificial history-dependent bias potential.
The bias potential in metadynamics and many other enhanced sampling methods is a function of a few predefined descriptors of the state of the system known as collective variables (CVs). Choice of CVs is critical to the efficiency of sampling enhancement, especially in complex biomolecular systems.
Any set of CVs reduces the dimensionality of the studied system because it describes a system with high-dimensional Cartesian coordinates x using low-dimensional CVs s. Therefore, general dimensionality reduction techniques (x → s) are predisposed to work as generally applicable methods to design CVs. Indeed, numerous linear and nonlinear dimensionality reduction techniques have been tested as CVs to map high-dimensional conformations into low-dimensional maps and to drive sampling of this map by biased simulations.4−11
The disadvantage of linear dimensionality reduction methods is that the motions of atoms in molecules are nonlinear. Therefore, nonlinear dimensionality reduction methods are likely to provide a better description of the system compared to linear methods with the same number of CVs. tSNE is a successful nonlinear dimensionality reduction method popular in bioinformatics, image analysis, and other fields.12,13 It has also been applied successfully to analyze molecular simulation trajectories.14−16
One of the reasons behind the success of tSNE in different fields of science is its focus on proximity (distance transformed by the Gaussian function) rather than distance of high-dimensional points. Many linear and nonlinear dimensionality reduction methods have been designed so that they accurately reproduce the distances between high-dimensional points x in a low-dimensional space. However, the development of dimensionality reduction methods has shown that the proximity of the points is often more important than their distance. It is more important to cluster together similar points (e.g., gene expression profiles of patients with the same diagnosis), instead of reproducing distances of distant points (gene expression profiles of patients with completely different diagnoses). For this reason, tSNE works with proximities of points in a low-dimensional space rather than distances.
tSNE starts with a set of points x in a high-dimensional space. The number of points is N and their dimension is M. In the molecular world, this could be a trajectory with N time frames and M Cartesian coordinates (M = number of atoms × 3). The method first computes the proximities pij
| 1 |
| 2 |
The variable σi is the bandwidth of the Gaussian kernels. It is controlled by the perplexity parameter (see ref (13) for details). Next, points s in the low-dimensional space are initially estimated by linear dimensionality reduction of x. The proximity in the low-dimensional space qij is calculated as
| 3 |
Finally, the values of s are optimized to reach the best agreement of pij and qij, expressed as the Kullback–Leibler divergence
| 4 |
Unfortunately, the main disadvantage of most nonlinear dimensionality reduction methods, including tSNE, is the fact that it is possible to analyze N high-dimensional structures and obtain their low-dimensional embeddings, but it is not possible to calculate low-dimensional embeddings for a new out-of-sample (N + 1-th) structure. Furthermore, the application of bias forces in the direction of a CV requires the calculation of the first derivative of the CV with respect to the Cartesian coordinates of the atoms. This is also not possible for most nonlinear dimensionality reduction methods including tSNE.
This problem can be solved by an application of a variant of tSNE called parametric tSNE.17 The original tSNE finds low-dimensional embeddings s of data x to get the best agreement (Kullback–Leibler divergence) between similarities of x (defined as pij) and similarities of s (defined as qij). Instead of direct optimization of the values of s, parametric tSNE uses a neural network to calculate s from x and optimizes the parameters (weights and biases) of this neural network. The trained neural network can be used to calculate a low-dimensional embedding for any out-of-sample structure. The analytical derivatives of s with respect to x can also be easily calculated because neural networks are designed to be trained using the backpropagation algorithm, which requires automatic differentiation.
An important limitation of general dimensionality reduction methods is the fact that they highlight the most intensive motions in the system (modes with the highest variance) rather than the slowest ones. However, the best performance of the enhanced sampling methods is achieved when slow motions are accelerated. Conventional methods of dimensionality reduction may lead to acceleration of intensive but fast motions such as motions of protein loops or flexible N-terminal or C-terminal tails. It is possible to emphasize slow motions and weaken intensive but fast motions using a time lag. This is applied in time-lagged independent component analysis (TICA)18,19 or time-lagged tSNE.16
In this work, we combined the advantages of several approaches presented above, namely, metadynamics, tSNE, neural network, and time lag, to calculate the low-dimensional embedding s from coordinates x. We applied a parametric time-lagged tSNE to calculate CVs and to accelerate the folding and unfolding of the tryptophan cage (Trp-cage) miniprotein.
Methods
The code for the parametric time-lagged tSNE is available online at https://github.com/spiwokv/ptltsne, based on Keras,20 TensorFlow,21 and PyTorch.22 Lag time is introduced as described in the article introducing a time-lagged tSNE.16 Briefly, it uses the algorithm inspired by AMUSE23 to obtain a time-lagged high-dimensional representation, followed by a dimensionality reduction by tSNE. In detail, the M × M covariance matrix is calculated from centered coordinates fitted to a reference structure. The coordinates are transformed onto eigenvectors of the matrix normalized by the roots of its eigenvalues. The resulting flattened normalized projections are used to calculate the time-lagged covariance matrix. This matrix is symmetrized by calculating means of above- and below-diagonal terms (CYsym = 1/2(CY+(CY)T)). Eigenvectors of this matrix are expanded by the roots of its eigenvalues. Finally, the trajectory is projected onto these eigenvectors. The resulting projections are analyzed by tSNE.
The 208 μs trajectory of the folding and unfolding of the Trp-cage, used as training data, was provided kindly by D.E. Shaw Research.24
Metadynamics simulations were performed in Gromacs-2021.425 patched with Plumed-2.8.0.26,27 Parallel tempering metadynamics was performed in Gromacs-2021 patched by Plumed-2.9.0 with the PyTorch module (https://github.com/kurecka/plumed2/tree/uvt_extensions, different implementation than in ref (28)). This code makes it possible to use an artificial neural network in PyTorch22 in Plumed (see Supporting Information for details).
The structure of the Trp-cage (sequence DAYAQWLKDGGPSSGRPPPS)29 was obtained from the Protein Databank (PDB-ID 2JOF, the first model).30 It was modeled using the Amber99SB-ILDN force field31 for the protein and SPC-E model for water32 (2540 water molecules in metadynamics and 1616 for parallel tempering metadynamics). A smaller box size was used for parallel tempering to increase the relative fluctuation of potential energy, thus increasing replica exchange probability. One chloride was added as a counterion to each system.
The energy of the systems was minimized by the steepest-descent method. Next, they were equilibrated by 100 ps simulation in a NVT (constant number of particles, volume, and temperature) and 100 ps simulation in a NPT (constant number of particles, pressure, and temperature) ensemble with harmonic restraints applied on non-hydrogen atoms of the protein. The simulation step was set to 2 fs. Bonds involving hydrogen atoms were constrained by the LINCS algorithm.33 Temperature and pressure were controlled by the Parrinello-Bussi34 and Parrinello–Rahman35 algorithms, respectively. Temperature was set to 300 K in metadynamics without parallel tempering. The pressure was set to 1 bar.
Well-tempered36 1.5 μs metadynamics was performed with two ptltSNE CVs or two ptltSNE CVs with α-RMSD CV. The bias potential was formed by the sum of Gaussian hills added every 1 ps. The width of a hill was 1 in the direction of both ptltSNE CVs and 0.2 in the direction of α-RMSD. The height was set at 0.5 kJ/mol. The bias factor was set to 8. These settings (widths relative to CV ranges) follow recommendations for metadynamics37 and well-tempered metadynamics.36
Parallel tempering metadynamics38 was performed with the ptltSNE CVs (without α-RMSD) in the NVT ensemble (200 ns per replica). We selected 32 temperatures of replicas, namely 278, 285, 292, 299, 307, 314, 322, 330, 338, 347, 355, 364, 373, 382, 392, 402, 412, 422, 432, 443, 454, 465, 477, 489, 501, 514, 526, 539, 553, 566, 581, and 595 K. These values were selected to cover biological as well as elevated temperatures. They are exponentially distanced, which usually provides efficient replica exchange at all temperatures. Replica exchange attempts were made every 500 steps (1 ps).
The visualization of metadynamics results was performed using metadynminer and metadynminer3d packages39 or ad hoc scripts.
Raw data (input files and scripts, trajectories without water, and other files) are available online at Zenodo40 (DOI: 10.5281/zenodo.8246334, DOI: 10.5281/zenodo.8246298), GitHub (https://github.com/spiwokv/ptltsne-visualizations) and Plumed Nest27 (https://www.plumed-nest.org/eggs/23/032/).
Results
Parametric Time-Lagged tSNE
The trajectory of Trp-cage folding and unfolding was analyzed using ptltSNE. Analysis was performed on non-hydrogen atoms on 10,439 samples of the 208 μs trajectory. There are multiple hyperparameters for ptltSNE. It is possible to set the lag time, number, and size of layers of the artificial neural network, the activation functions, the batch size, the shuffle time, and the number of training epochs. We tested different combinations of hyperparameters using a trial and error approach, and we selected the settings described below. It is possible to choose hyperparameters with some modern hyperparameter tuning tools; however, the trial and error approach gave us the opportunity to test the functionality of the software for a wide range of hyperparameters. Nevertheless, the experimental application of a tuning tool confirms the trial and error selection to be correct (data not shown).
The Cartesian coordinates were centered on the reference structure and scaled to be in the range of 0–1. Next, they were transformed to introduce the lag time, as defined above. A lag time of 2 (in the number of frames, i.e., 40 ns) was used. The transformed coordinates were used as input. These signals were processed by a feedforward neural network with three hidden layers, each with 256 neurons with the tanh activation function, without biases. In the end, there was a two-dimensional linear output with biases. Perplexity was set to 30.
The training was carried out by 10,000 backpropagation epochs using the ADAM algorithm with batches of size 1024 (shuffle period 10).
The results of ptltSNE are shown in Figure 1. Analogous figures colored by time are available in the Supporting Information (Figures S1–S11). Each structure sampled in the 208 μs trajectory is represented as a point. Their distribution shows some characteristics of tSNE, time-lagged tSNE, and parametric tSNE. That is, points are clustered into multiple clusters, and within each cluster, they are evenly distributed. This is typical for tSNE, where an even distribution of points is ensured by the perplexity. The central cluster (F) represents unfolded structures. These structures are highly variable (i.e., different in RMSD values). This is the result of time lag, which causes short-lived structures to cluster together. A similar pattern was observed for time-lagged tSNE.16 Finally, the separation of clusters in ptltSNE is not as distinct as that in tSNE or time-lagged tSNE. This can be explained by the fact that a relatively small feedforward neural network was used to transform high-dimensional to low-dimensional data. It was necessary to make a trade-off between the separation of clusters and the performance of the neural network. Transformations of forces from CV space to the space of Cartesian coordinates must happen in every step of metadynamics, so a large neural network would decrease the simulation performance, even with an efficient neural network implementation.
Figure 1.
Dimensionality reduction of the Trp-cage trajectory by ptltSNE. Each point represents one snapshot of the training trajectory colored by RMSD. The simulation started from the folded state. The parameters were the following: 10,000 epochs, lag time 2, 3 hidden layers, each with 256 neurons, tanh activation function, batch size 1024, and shuffle interval 10. The plot shows a low-dimensional representation of the data from the Trp-cage trajectory, and the axes are the two CVs obtained by ptltSNE. The data points are colored based on calculated RMSD from the native structure. Representative structures of the Trp-cage for several parts of the plot are also presented (A–F).
Figure 1 shows the structural representation of the selected clusters. The cluster (A) corresponds to the folded structure. The cluster (B) corresponds to a partially folded structure. The figure follows the pattern observed in time-lagged tSNE analysis16 with the unfolded structure (F) in the center and the folded (A) and other long-living structures (C–E) located around the central cluster. These long-lived structures (“kinetic traps”) are stabilized by hydrogen bonds. The unfolded structure (F) does not show hydrogen bonds.
Metadynamics
The set of ptltSNE coordinates was applied as CVs in metadynamics. Well-tempered36 1.5 μs metadynamics was performed with two ptltSNE CVs.
The results are depicted in Figure 2. The RMSD profile shows that after unfolding, the system did not fold back to the native structure. There were multiple visits of structures close to the native state with RMSD from the native structure of approximately 0.4 nm but missing the key helix. This was reflected in the absence of the minima at the point corresponding to the folded state on the free energy surface (Figure 3).
Figure 2.
Progress of metadynamics simulation of the Trp-cage with CVs obtained by ptltSNE. The plot shows changes of RMSD from the native structure over time. Several representative structures are shown.
Figure 3.

Free energy surface from metadynamics simulation of Trp-cage with CVs obtained by ptltSNE. Due to the absence of the folding event, this free energy surface does not reflect folding equilibrium.
The possible explanation for the absence of folding in the metadynamics simulation could be the fact that the CVs tested cannot efficiently accelerate the formation of α helices. To address this issue, we performed a 1.3 μs metadynamics simulation with three CVs, namely with two ptltSNE CVs (the same as in the previous simulation) and α-RMSD.41 This CV has been developed to accelerate the formation of α helices.
In the metadynamic simulation with the ptltSNE and α-RMSD CVs, we observed folding after approximately 350 ns (Figure 4). The free energy surface is shown in Figure 5. Because we used three CVs, the free energy surface is three-dimensional. The isosurface shows that there are many nonhelical states of the protein at the bottom of the 3D plot (low α-RMSD) and a single native minimum with helix (high α-RMSD). The native structure is visible as a vertical lobe in Figure 5A. Figure 5B shows that there are also multiple high-energy states containing α helices. Isosurfaces at other levels are provided as interactive visualizations (https://github.com/spiwokv/ptltsne-visualizations).
Figure 4.
Progress of metadynamics simulation of Trp-cage with CVs obtained by ptltSNE and α-RMSD. The plot shows changes in RMSD from the native structure over time (the folding event is highlighted by the green frame). Several representative structures are shown.
Figure 5.

Free energy surfaces calculated by metadynamics simulation of Trp-cage with CVs obtained by ptltSNE and α-RMSD. Isosurfaces at +35 (A) and +43 (B) kJ/mol (relative to the global minimum) are presented. A lobe with high values of α-RMSD, corresponding to the folded state, is visible in panel (A).
It is clear that a free energy surface calculated from a simulation with a single folding event cannot be accurate because of the lack of data. For this purpose, we tested the ptltSNE CVs (without α-RMSD) in 200 ns parallel tempering metadynamics.38 This method combines parallel tempering42 with metadynamics to accelerate slow motions that are not controlled by CVs. We selected 32 temperatures for replicas from 278 to 595 K.
We observed 10 folding events (Figure 6A) in demultiplexed (demuxed) trajectories of parallel tempering metadynamics (200 ns for each replica). In comparison, there were two folding events in a standard parallel tempering simulation of the same length (results not shown; see the Supporting Information sets).
Figure 6.
Parallel tempering metadynamics. Root-mean-square deviation from the native structure in demultiplexed trajectories is depicted by the color (A). Ten folding events can be seen in the figure (folded states are highlighted by magenta frames). One-dimensional free energy surfaces at different temperatures are depicted by different colors (B). Free energy surface at 292 K was calculated from the bias potential (C). Free energy surface at 290 K calculated from the training 208 μs trajectory24 (D).
For a better comparison of free energy surfaces at different temperatures, we converted 2D surfaces to 1D. This was done by conversion of the 2D free energy surface to the 2D probability profile, followed by integration with respect to one CV and conversion back to free energy. These 1D free energy profiles are shown for all temperatures in Figure 6B. Clearly, free energy surfaces at low temperatures show a deep minimum corresponding to the folded state (ptltSNE1 at 6–7), whereas free energy surfaces at high temperatures are broad with a single minimum at the center.
The free energy surface at 292 K (Figure 6C) was compared with the free energy surface obtained from the training (Figure 6D, 290 K). These free energy surfaces were obtained by different methods (scaled negative image of the metadynamics bias potential vs conversion of a 2D histogram to the free energy surface); therefore, they greatly differ in smoothness. However, both free energy surfaces are very flat (except for the peak in the folded state in Figure 6C). Therefore, despite different methods, force fields, lengths of simulations, and slightly different temperatures (292 vs 290 K), they are in very good agreement.
A common misconception in molecular modeling is the estimation of relative populations of states solely by comparing the depths of the free energy minima. This approach neglects the effect of the widths of the free energy minima. To avoid this, it is necessary to account for different widths of the free energy basins.43 This is especially important for tSNE (as well as the parametric and time-lagged variants) because of its tendency to uniformly distribute data points in low-dimensional space. The uniform distribution of points in the low-dimensional space makes the resulting free energy surface flat. This makes the effect of the widths of the free energy minima important.
Structural interpretation of ptltSNE CVs is not intuitive; therefore, we recalculated the free energy surface with a new CV, namely, RMSD from the native structure. The new free energy profile can be obtained by a reweighting procedure, i.e., by estimating unbiased probabilities from biased simulations. The results of parallel tempering metadynamics were reweighted by umbrella sampling reweighting44 with a Tiwary–Parrinello correction for the time dependence of the metadynamic bias potential.45 RMSD less than 0.35 nm was considered as the folded state. The first 1 ns was discarded as an equilibration. The results expressed as the free energy surface in the space of RMSD and the fraction of the folded state are shown in Figure 7. The free energy surface calculated from the training trajectory is provided in Figure 7A (note that these simulations differ in force field and slightly in temperature). The error bars in Figure 7B represent the standard error of the mean calculated in ten equally sized time windows.
Figure 7.
Reweighting of parallel tempering simulation. The free energy surface at 292 K (blue) is compared to the free energy calculated from the reference 208 μs trajectory (red, 290 K). Both free energy surfaces were calculated with RMSD from the native structure as the collective variable (A). Fraction of folded state as a function of temperature (mean ± SEM) predicted by reweighting the results of parallel tempering metadynamics (B).
The folded state accounts for tens of percent at low temperatures. The fraction of the folded state decreases with the temperature, as expected, except for noise. Its population is negligible at temperatures higher than 400 K.
CVs and enhanced sampling methods can be evaluated based on exploration of conformational space. For this purpose, we took the first 10 ns from each simulation and calculated the cumulative number of conformational clusters sampled in the simulation (Figure 8). An unbiased MD simulation was performed for this purpose. These trajectories were analyzed using the command gmx cluster with the Gromos clustering method46 and the RMSD cutoff set to 0.1 nm. As expected, unbiased simulation explored the lowest number of unique conformations (clusters), namely 91. Metadynamics with ptltSNE and the combination of ptltSNE and α-RMSD explored 351 and 494 clusters, respectively. Parallel tempering and parallel tempering metadynamics explored 900 and 1215 clusters. Clustering was performed on nondemultiplexed trajectories; therefore, replicas at all temperatures contributed to sampling. In conclusion, metadynamics with ptltSNE CVs significantly enhance the exploration of the conformational space.
Figure 8.

Evolution of the total number of clusters explored in the first 10 ns of simulations by different methods and CVs. The number of clusters was calculated on nondemultiplexed trajectories from parallel tempering and parallel tempering metadynamics.
Discussion
Parametric time-lagged tSNE has its advantages compared with other dimensionality reduction methods. Similarly to tSNE, it provides a very good dimensionality reduction, likely better than principal component analysis or TICA. It is difficult to compare the performance of different linear and nonlinear dimensionality reduction methods. For this purpose, we proposed a simple method. We calculated the top 20 nearest neighbors of each snapshot of the training trajectory (with 10,439 frames from 208.8 μs-trajectory) in terms of RMSD values (i.e., in high-dimensional space). Then we calculated the top 20 nearest neighbors in low-dimensional space obtained by each method. Finally, for each method, we calculated the number of common top 20 neighbors in high- and low-dimensional space. The plot (Figure S12) presents mean values for each method. Clearly, tSNE provides the best dimensionality reduction. The performance of the time-lagged tSNE is significantly lower. This is the price paid for the focus on the kinetic aspect. The performance of parametric time-lagged tSNE was comparable to that of time-lagged tSNE. The performance of PCA was lower than that of nonlinear methods. There was a drop in performance from PCA to TICA, analogous to the drop in performance from tSNE to time-lagged tSNE. In conclusion, ptltSNE provides very good dimensionality reduction while focusing on slow processes.
In this work, we selected lag time by visual inspection of ptltSNE plots constructed with different lag time values (see the Supporting Information, Figures S13–S19). The lag time should be short enough so that the analysis captures the processes of interest, but long enough to provide (close to) Markovian dynamics in the low-dimensional space. The lag time of 2 or 3 frames (40 or 60 ns) provides a good resolution of multiple long-lived states from the unfolded state. Higher lag times (80, 100, and 500 ns) provide better separation between the folded and unfolded states, but other long-lived states are poorly separated. We can speculate that simulations with a longer lag time would be better at folding the Trp-cage but worse at exploring its possible conformations.
The common practice for choosing a lag time for construction of Markov state models is by plotting implied time scales as a function of lag time.47 The choice can be verified using the Chapman–Kolmogorov test.48 Here, we cannot directly apply the same method because we dimensionally reduce trajectories but do not cluster their frames into separate microstates. Our choice of lag time (40 ns) is close to the lag time selected for the analysis of the same trajectory by two recent Markov state models (20 and 100 ns).49,50 Since ptltSNE is based on the same foundations as TICA, we believe that lag time was properly chosen in our study.
Another important tSNE parameter is the perplexity. Again, perplexity was selected by a trial and error approach. The results of ptltSNE with different perplexity values are presented in Supporting Information (Figures S20–S26). Low perplexity values (2–10) provide TICA-like plots with low unfolding of the high-dimensional manifold. Higher values (20–100) provide a good resolution of the analyzed structures.
Visual comparison of plots from the tSNE and time-lagged tSNE (see ref (16)) with Figure 1 indicates that the use of the neural network to map high-dimensional structures to low-dimensional space is not for free. The price paid for this is a worse separation of the individual states. Potentially, this can be addressed using a neural network with a higher number of hidden layers and neurons in each layer.
Parametric time-lagged tSNE provides a low-dimensional representation pattern similar to time-lagged tSNE.16 The unfolded state represents the central cluster of points (F in Figure 1). These states are characterized by RMSD values different from those of the native structures (different colors in the plot). This is clearly the effect of the time lag. The conformational changes in the unfolded state are rapid. Therefore, structures distant in RMSD are not distant when analyzed by kinetic methods such as TICA, time-lagged tSNE, or parametric time-lagged tSNE. The native structure (A), the close-to-native structure (B), and other kinetic traps (C–E) surround the unfolded state. In contrast, low-dimensional embeddings from tSNE without time lag show a relatively strong correlation with RMSD from the native structure, including for the unfolded state.16
As explained in the Introduction, tSNE does not attempt to reconstruct high-dimensional distances in the low-dimensional space. Instead, it reconstructs proximities. Points close in the high-dimensional space are close in the tSNE low-dimensional space. Analogously, time-lagged tSNE and ptltSNE do not reconstruct “kinetic distances.” Instead, simulation snapshots that are “kinetically close” (i.e., states with fast mutual transitions) are close in the ptltSNE low-dimensional space. For example, this is the reason behind the proximities of very unstable snapshots of the unfolded state (F in Figure 1). These snapshots are highly variable in terms of RMSD from the native structure (see the different colors of the points in cluster F in Figure 1). Because of their low stability and fast mutual transitions, they are clustered by ptltSNE.
Reconstruction of kinetic distances is provided, for example, by unsupervised machine learning in the recently introduced spectral map method.51 Other approaches, such as SGOOP-d52 or VAMP nets53 can also estimate the kinetic distances from simulation trajectories.
No folding was observed in the metadynamics simulation with parametric time-lagged tSNE CVs; however, we observed folding with the combination of time-lagged tSNE CVs with α-RMSD. On average, there was approximately one folding every 17 μs in the published unbiased simulation,24 which was used as a training data set. We observed one folding in 1.3 μs, which corresponds to 1 order of magnitude acceleration by metadynamics compared to an unbiased MD simulation. In parallel tempering metadynamics (with time-lagged tSNE CVs, without α-RMSD), we observed ten folding events in 200 ns simulation (in total 6.4 μs in all replicas). This corresponds to an acceleration of 1–2 orders of magnitude by parallel tempering metadynamics compared with an unbiased MD simulation. Furthermore, our enhanced sampling simulations explored a wider range of conformations than unbiased simulations (Figure 8).
The predicted population of the folded state (Figure 7) agrees well with the reference long simulations,24 where the folded state represented 25% of 208 μs at 290 K.
Folding of the Trp-cage was also studied by Meshkin and Zhu using an umbrella sampling simulation. The fraction of native contacts was used as a collective variable. Using an approximately 100 μs simulation it was possible to reconstruct the folding free energy surface in great detail.54
Our approach uses training data from a long unbiased simulation. Therefore, this study suffers from the “chicken-and-egg” problem, that is, it uses well-sampled trajectory as training data to enhance sampling. In our opinion, it is important to use training data that cover the relevant conformational space of the studied miniprotein but are not necessarily sampled with an accurate distribution. Metadynamics and parallel tempering metadynamics can correct for inaccuracies in the distribution of training data. Our laboratory develops and tests methods for the generation of such training data without the need to run long molecular simulations.
It is possible to address the “chicken-and-egg” problem by biased simulations, namely, by a coarse biased simulation, analysis of trajectory to obtain unbiased CVs, and recalculation of free energy surface by a biased simulation with new CVs. A similar approach was recently introduced by Rydzewski and Valsson55 in their multiscale reweighted stochastic embedding (MRSE) method. The method is based on parametric (not time-lagged) tSNE. The method integrates the generation of training data, construction of a Gaussian mixture probability model, and reweighting to obtain unbiased CVs. The method was tested on a model energy profile, an alanine dipeptide, and an alanine tetrapeptide. An approach based on biased sampling and CV reweighting with anisotropic diffusion maps as the dimensionality reduction method has been tested by Rydzewski and co-workers.56 In general, CVs obtained by unsupervised machine learning from data obtained by biased simulations are biased in terms of the density of states and geometry.11,55−57 This must be kept in mind when addressing the “chicken-and-egg” problem.
A problem similar to biased vs unbiased CVs is in the temperature used to obtain the training trajectory. If the simulation used to obtain CVs and the subsequent biased simulation are performed at different temperatures, then it may have an effect similar to biasing the training simulation. As far as our knowledge, this issue has not been systematically studied. We used a trajectory sampled at 290 K. The resulting CVs were applied in metadynamics at 300 K and parallel tempering metadynamics at 278–595 K. This fact must be taken into account when the results are interpreted.
It is difficult to find the reason no folding was observed in metadynamics with two ptltSNE CVs. Potential factors may include the reduced size of the neural network (because of computational constraints), the small number of ptltSNE CVs (it is possible that helix formation can be efficiently described by three CVs), or other explanations.
Addition of α-RMSD made it possible to observe one folding event in the metadynamics simulation. Figure S27 (Figure 1 colored by α-RMSD) shows that the folded state is well separated in α-RMSD as well as in the ptltSNE CVs. Therefore, it is difficult to understand why metadynamics without α-RMSD did not lead to folding. It must be kept in mind that the separation of key states is not the only requirement for an efficient collective variable. It is possible that α-RMSD captures the transition states of helix formation better than the ptltSNE CVs. On the other hand, α-RMSD is likely to accelerate only the folding and unfolding of α-helices.
In this study, we used a general measure to calculate the distance between structures (∥xi – xi∥2 in eq 1), namely, mean square deviation. It is possible to use more sophisticated measures, such as smooth overlap of atomic positions (SOAPs).58 This would be essential in processes where the permutation of atoms is essential, for example, in simulations of phase transitions. It is also possible to use other structural data, such as interatomic distances or dihedral angles (with proper treatment of periodicity).
In this study, we demonstrated that our method could accelerate the folding of miniproteins. Taking into account the fact that fast-folding miniproteins are well studied by unbiased simulations and experimental methods and their native structures can be easily predicted by AlphaFold,59 we see the main potential of our CVs in conformational free energy modeling of important parts (in the size of miniproteins) of regularly sized proteins. There are many unexplored mobile elements, for example, activation loops, that control the biological activity of many proteins. The approach can also help refine some parts of proteins or to find cryptic pockets.
Acknowledgments
The authors thank D.E. Shaw Research for providing us with the trajectories. The work was supported by the Czech Science Foundation (22-29667S). Computational resources were provided by the project “e-Infrastruktura CZ” (e-INFRA CZ LM2018140) and the ELIXIR-CZ project (LM2018131) supported by the Ministry of Education, Youth, and Sports of the Czech Republic.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jpcb.3c05669.
Comparison of PyTorch modules, dimensionality reduction of Trp-cage trajectory by ptltSNE colored by time, comparison of performance of different methods, ptltSNE with different values of lag time or perplexity, and ptltSNE colored by α-RMSD (PDF)
The authors declare no competing financial interest.
Special Issue
Published as part of The Journal of Physical Chemistry Bvirtual special issue “Machine Learning in Physical Chemistry Volume 2”.
Supplementary Material
References
- Bock L. V.; Gabrielli S.; Kolář M. H.; Grubmüller H. Simulation of Complex Biomolecular Systems: The Ribosome Challenge. Annu. Rev. Biophys. 2023, 52, 361–390. 10.1146/annurev-biophys-111622-091147. [DOI] [PubMed] [Google Scholar]
- Šponer J.; Bussi G.; Krepl M.; Banáš P.; Bottaro S.; Cunha R. A.; Gil-Ley A.; Pinamonti G.; Poblete S.; Jurečka P.; et al. RNA Structural Dynamics As Captured by Molecular Simulations: A Comprehensive Overview. Chem. Rev. 2018, 118, 4177–4338. 10.1021/acs.chemrev.7b00427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laio A.; Parrinello M. Escaping free-energy minima. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 12562–12566. 10.1073/pnas.202427399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Amadei A.; Linssen A. B. M.; Berendsen H. J. C. Essential dynamics of proteins. Proteins 1993, 17, 412–425. 10.1002/prot.340170408. [DOI] [PubMed] [Google Scholar]
- Spiwok V.; Lipovová P.; Králová B. Metadynamics in Essential Coordinates: Free Energy Simulation of Conformational Changes. J. Phys. Chem. B 2007, 111, 3073–3076. 10.1021/jp068587c. [DOI] [PubMed] [Google Scholar]
- Tribello G. A.; Ceriotti M.; Parrinello M. Using sketch-map coordinates to analyze and bias molecular dynamics simulations. Proc. Natl. Acad. Sci. U.S.A. 2012, 109, 5196–5201. 10.1073/pnas.1201152109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spiwok V.; Králová B. Metadynamics in the conformational space nonlinearly dimensionally reduced by Isomap. J. Chem. Phys. 2011, 135, 224504. 10.1063/1.3660208. [DOI] [PubMed] [Google Scholar]
- Sultan M.; Pande V. S. tICA-Metadynamics: Accelerating Metadynamics by Using Kinetically Selected Collective Variables. J. Chem. Theory Comput. 2017, 13, 2440–2447. 10.1021/acs.jctc.7b00182. [DOI] [PubMed] [Google Scholar]
- Bonati L.; Piccini G.; Parrinello M. Deep learning the slow modes for rare events sampling. Proc. Natl. Acad. Sci. U.S.A. 2021, 118, e2113533118 10.1073/pnas.2113533118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Glielmo A.; Husic B. E.; Rodriguez A.; Clementi C.; Noé F.; Laio A. Unsupervised Learning Methods for Molecular Simulation Data. Chem. Rev. 2021, 121, 9722–9758. 10.1021/acs.chemrev.0c01195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rydzewski J.; Chen M.; Valsson O. Manifold Learning in Atomistic Simulations: A Conceptual Review. Mach. learn.: sci. technol. 2023, 4, 031001. 10.1088/2632-2153/ace81a. [DOI] [Google Scholar]
- Hinton G. E.; Roweis S. In Advances in Neural Information Processing Systems; Becker S., Thrun S., Obermayer K., Eds.; MIT Press, 2002; Vol. 15; Chapter Stochastic Neighbor Embedding, pp 857–864. [Google Scholar]
- van der Maaten L.; Hinton G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Zhou H.; Wang F.; Tao P. t-Distributed Stochastic Neighbor Embedding Method with the Least Information Loss for Macromolecular Simulations. J. Chem. Theory Comput. 2018, 14, 5499–5510. 10.1021/acs.jctc.8b00652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tribello G. A.; Gasparotto P. Using Dimensionality Reduction to Analyze Protein Trajectories. Front. Mol. Biosci. 2019, 6, 46. 10.3389/fmolb.2019.00046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spiwok V.; Kříž P. Time-Lagged t-Distributed Stochastic Neighbor Embedding (t-SNE) of Molecular Simulation Trajectories. Front. Mol. Biosci. 2020, 7, 132. 10.3389/fmolb.2020.00132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van der Maaten L. In Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics; van Dyk D., Welling M., Eds.; PMLR: Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, 2009; Vol. 5; Chapter Learning a Parametric Embedding by Preserving Local Structure, pp 384–391.
- Molgedey L.; Schuster H. G. Separation of a mixture of independent signals using time delayed correlations. Phys. Rev. Lett. 1994, 72, 3634–3637. 10.1103/PhysRevLett.72.3634. [DOI] [PubMed] [Google Scholar]
- Naritomi Y.; Fuchigami S. Slow dynamics in protein fluctuations revealed by time-structure based independent component analysis: The case of domain motions. J. Chem. Phys. 2011, 134, 065101. 10.1063/1.3554380. [DOI] [PubMed] [Google Scholar]
- Chollet F.Keras, 2015. https://keras.io (accessed May 25 2019).
- Abadi M.; Agarwal A.; Barham P.; Brevdo E.; Chen Z.; Citro C.; Corrado G. S.; Davis A.; Dean J.; Devin M.. et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. https://www.tensorflow.org/(accessed May 25 2019).
- Paszke A.; Gross S.; Massa F.; Lerer A.; Bradbury J.; Chanan G.; Killeen T.; Lin Z.; Gimelshein N.; Antiga L.; et al. Advances in Neural Information Processing Systems 32; Curran Associates, Inc., 2019, pp 8024–8035. [Google Scholar]
- Hyvärinen A.; Karhunen J.; Oja E.. Independent Component Analysis; John Wiley & Sons, Ltd, 2001; Chapter 7, pp 145–164. [Google Scholar]
- Lindorff-Larsen K.; Piana S.; Dror R. O.; Shaw D. E. How fast-folding proteins fold. Science 2011, 334, 517–520. 10.1126/science.1208351. [DOI] [PubMed] [Google Scholar]
- Abraham M. J.; Murtola T.; Schulz R.; Páll S.; Smith J. C.; Hess B.; Lindahl E. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 2015, 1–2, 19–25. 10.1016/j.softx.2015.06.001. [DOI] [Google Scholar]
- Tribello G. A.; Bonomi M.; Branduardi D.; Camilloni C.; Bussi G. PLUMED 2: New feathers for an old bird. Comput. Phys. Commun. 2014, 185, 604–613. 10.1016/j.cpc.2013.09.018. [DOI] [Google Scholar]
- Promoting transparency and reproducibility in enhanced molecular simulations. Nat. Methods 2019, 16, 670–673. 10.1038/s41592-019-0506-8. [DOI] [PubMed] [Google Scholar]
- Bonati L.; Trizio E.; Rizzi A.; Parrinello M. A unified framework for machine learning collective variables for enhanced sampling simulations: mlcolvar. J. Chem. Phys. 2023, 159, 014801. 10.1063/5.0156343. [DOI] [PubMed] [Google Scholar]
- Neidigh J. W.; Fesinmeyer R. M.; Andersen N. H. Designing a 20-residue protein. Nat. Struct. Mol. Biol. 2002, 9, 425–430. 10.1038/nsb798. [DOI] [PubMed] [Google Scholar]
- Barua B.; Lin J. C.; Williams V. D.; Kummler P.; Neidigh J. W.; Andersen N. H. The Trp-cage: optimizing the stability of a globular miniprotein. Protein Eng., Des. Sel. 2008, 21, 171–185. 10.1093/protein/gzm082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lindorff-Larsen K.; Piana S.; Palmo K.; Maragakis P.; Klepeis J. L.; Dror R. O.; Shaw D. E. Improved Side-Chain Torsion Potentials for the Amber ff99SB Protein Force Field. Proteins 2010, 78, 1950–1958. 10.1002/prot.22711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chatterjee S.; Debenedetti P. G.; Stillinger F. H.; Lynden-Bell R. M. A computational investigation of thermodynamics, structure, dynamics and solvation behavior in modified water models. J. Chem. Phys. 2008, 128, 124511. 10.1063/1.2841127. [DOI] [PubMed] [Google Scholar]
- Hess B.; Bekker H.; Berendsen H. J. C.; Fraaije J. G. E. M. LINCS: A linear constraint solver for molecular simulations. J. Comput. Chem. 1997, 18, 1463–1472. 10.1002/(SICI)1096-987X(199709)18:12<1463::AID-JCC4>3.0.CO;2-H. [DOI] [Google Scholar]
- Bussi G.; Donadio D.; Parrinello M. Canonical Sampling Through Velocity Rescaling. J. Chem. Phys. 2007, 126, 014101. 10.1063/1.2408420. [DOI] [PubMed] [Google Scholar]
- Parrinello M.; Rahman A. Polymorphic transitions in single crystals: A new molecular dynamics method. J. Appl. Phys. 1981, 52, 7182–7190. 10.1063/1.328693. [DOI] [Google Scholar]
- Barducci A.; Bussi G.; Parrinello M. Well-Tempered Metadynamics: A Smoothly Converging and Tunable Free-Energy Method. Phys. Rev. Lett. 2008, 100, 020603. 10.1103/PhysRevLett.100.020603. [DOI] [PubMed] [Google Scholar]
- Laio A.; Rodriguez-Fortea A.; Gervasio F. L.; Ceccarelli M.; Parrinello M. Assessing the Accuracy of Metadynamics. J. Phys. Chem. B 2005, 109, 6714–6721. 10.1021/jp045424k. [DOI] [PubMed] [Google Scholar]
- Bussi G.; Gervasio F. L.; Laio A.; Parrinello M. Free-Energy Landscape for β Hairpin Folding from Combined Parallel Tempering and Metadynamics. J. Am. Chem. Soc. 2006, 128, 13435–13441. 10.1021/ja062463w. [DOI] [PubMed] [Google Scholar]
- Trapl D.; Spiwok V. Analysis of the Results of Metadynamics Simulations by metadynminer and metadynminer3d. The R Journal 2022, 14, 46–58. 10.32614/RJ-2022-057. [DOI] [Google Scholar]
- European Organization For Nuclear Research OpenAIRE, Zenodo. 2013; https://www.zenodo.org/.(accessed Aug 14 2023).
- Pietrucci F.; Laio A. A Collective Variable for the Efficient Exploration of Protein Beta-Sheet Structures: Application to SH3 and GB1. J. Chem. Theory Comput. 2009, 5, 2197–2201. 10.1021/ct900202f. [DOI] [PubMed] [Google Scholar]
- Sugita Y.; Okamoto Y. Replica-exchange molecular dynamics method for protein folding. Chem. Phys. Lett. 1999, 314, 141–151. 10.1016/S0009-2614(99)01123-9. [DOI] [Google Scholar]
- Dietschreit J. C. B.; Diestler D. J.; Ochsenfeld C. How to obtain reaction free energies from free-energy profiles. J. Chem. Phys. 2022, 156, 114105. 10.1063/5.0083423. [DOI] [PubMed] [Google Scholar]
- Torrie G.; Valleau J. Nonphysical sampling distributions in Monte Carlo free-energy estimation: Umbrella sampling. J. Comput. Phys. 1977, 23, 187–199. 10.1016/0021-9991(77)90121-8. [DOI] [Google Scholar]
- Tiwary P.; Parrinello M. A Time-Independent Free Energy Estimator for Metadynamics. J. Phys. Chem. B 2015, 119, 736–742. 10.1021/jp504920s. [DOI] [PubMed] [Google Scholar]
- Daura X.; Gademann K.; Jaun B.; Seebach D.; van Gunsteren W. F.; Mark A. E. Peptide Folding: When Simulation Meets Experiment. Angew. Chem. Int. Ed 1999, 38, 236–240. 10.1002/(SICI)1521-3773(19990115)38:1/2<236::AID-ANIE236>3.0.CO;2-M. [DOI] [Google Scholar]
- Swope W. C.; Pitera J. W.; Suits F. Describing Protein Folding Kinetics by Molecular Dynamics Simulations. 1. Theory. J. Phys. Chem. B 2004, 108, 6571–6581. 10.1021/jp037421y. [DOI] [Google Scholar]
- Noé F.; Schütte C.; Vanden-Eijnden E.; Reich L.; Weikl T. R. Constructing the equilibrium ensemble of folding pathways from short off-equilibrium simulations. Proc. Natl. Acad. Sci. U.S.A. 2009, 106, 19011–19016. 10.1073/pnas.0905466106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sidky H.; Chen W.; Ferguson A. L. High-Resolution Markov State Models for the Dynamics of Trp-Cage Miniprotein Constructed Over Slow Folding Modes Identified by State-Free Reversible VAMPnets. J. Phys. Chem. B 2019, 123, 7999–8009. 10.1021/acs.jpcb.9b05578. [DOI] [PubMed] [Google Scholar]
- Suárez E.; Wiewiora R. P.; Wehmeyer C.; Noé F.; Chodera J. D.; Zuckerman D. M. What Markov State Models Can and Cannot Do: Correlation versus Path-Based Observables in Protein-Folding Models. J. Chem. Theory Comput. 2021, 17, 3119–3133. 10.1021/acs.jctc.0c01154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rydzewski J. Spectral Map: Embedding Slow Kinetics in Collective Variables. J. Phys. Chem. Lett. 2023, 14, 5216–5220. 10.1021/acs.jpclett.3c01101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tsai S.-T.; Smith Z.; Tiwary P. SGOOP-d: Estimating Kinetic Distances and Reaction Coordinate Dimensionality for Rare Event Systems from Biased/Unbiased Simulations. J. Chem. Theory Comput. 2021, 17, 6757–6765. 10.1021/acs.jctc.1c00431. [DOI] [PubMed] [Google Scholar]
- Mardt A.; Pasquali L.; Wu H.; Noé F. VAMPnets for deep learning of molecular kinetics. Nat. Commun. 2018, 9, 5. 10.1038/s41467-017-02388-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meshkin H.; Zhu F. Thermodynamics of Protein Folding Studied by Umbrella Sampling along a Reaction Coordinate of Native Contacts. J. Chem. Theory Comput. 2017, 13, 2086–2097. 10.1021/acs.jctc.6b01171. [DOI] [PubMed] [Google Scholar]
- Rydzewski J.; Valsson O. Multiscale Reweighted Stochastic Embedding: Deep Learning of Collective Variables for Enhanced Sampling. J. Phys. Chem. A 2021, 125, 6286–6302. 10.1021/acs.jpca.1c02869. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rydzewski J.; Chen M.; Ghosh T. K.; Valsson O. Reweighted Manifold Learning of Collective Variables from Enhanced Sampling Simulations. J. Chem. Theory Comput. 2022, 18, 7179–7192. 10.1021/acs.jctc.2c00873. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rydzewski J. Selecting High-Dimensional Representations of Physical Systems by Reweighted Diffusion Maps. J. Phys. Chem. Lett. 2023, 14, 2778–2783. 10.1021/acs.jpclett.3c00265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- De S.; Bartók A. P.; Csányi G.; Ceriotti M. Comparing molecules and solids across structural and alchemical space. Phys. Chem. Chem. Phys. 2016, 18, 13754–13769. 10.1039/C6CP00415F. [DOI] [PubMed] [Google Scholar]
- Jumper J.; Evans R.; Pritzel A.; Green T.; Figurnov M.; Ronneberger O.; Tunyasuvunakool K.; Bates R.; Žídek A.; Potapenko A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





