Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2010 Jul 19;107(31):13597–13602. doi: 10.1073/pnas.1003293107

Systematic determination of order parameters for chain dynamics using diffusion maps

Andrew L Ferguson a, Athanassios Z Panagiotopoulos a, Pablo G Debenedetti a,1, Ioannis G Kevrekidis a,b
PMCID: PMC2922286  PMID: 20643962

Abstract

We employ the diffusion map approach as a nonlinear dimensionality reduction technique to extract a dynamically relevant, low-dimensional description of n-alkane chains in the ideal-gas phase and in aqueous solution. In the case of C8 we find the dynamics to be governed by torsional motions. For C16 and C24 we extract three global order parameters with which we characterize the fundamental dynamics, and determine that the low free-energy pathway of globular collapse proceeds by a “kink and slide” mechanism, whereby a bend near the end of the linear chain migrates toward the middle to form a hairpin and, ultimately, a coiled helix. The low-dimensional representation is subtly perturbed in the solvated phase relative to the ideal gas, and its geometric structure is conserved between C16 and C24. The methodology is directly extensible to biomolecular self-assembly processes, such as protein folding.

Keywords: hydrophobicity, data mining, solvation, hydrocarbon, folding pathways


It has long been suspected that cooperative couplings between degrees of freedom render the effective dimensionality of biophysical systems far less than the 3R-dimensional coordinate space of the R constituent atoms (15). This has been framed in the projection operator formalism (6) as a separation of time scales in which the important dynamics reside in a “slow subspace” (7) and is associated with a smooth underlying free energy surface (8). For example, two-dimensional descriptions have been formulated for dialanine (9) and a coarse-grained model of the src homology 3 domain (5).

Calculation of the effective dimensionality of a dynamical system, and identification of order parameters describing the low-dimensional “intrinsic manifold” to which the system dynamics are effectively restrained, is a long-standing problem in as seemingly disparate fields as data visualization (10), speech recognition (11), semisupervised learning (12), and spectral clustering (13). The fraction of native contacts (Q) (8, 14) and the folding probability (Pfold) (8, 15) have been used as reaction coordinates for protein folding, but such coarse variables may lump together structurally and kinetically disparate conformations and can prove inadequate for larger proteins with frustrated folding funnels (5, 8). Empirical order parameters also tend to perform poorly on landscapes exhibiting multiple local free-energy (FE) minima or lacking well-defined unfolded and folded basins. Principal components analysis (PCA) is a popular linear dimensionality reduction technique applied extensively to biophysical systems (14, 16) which seeks to describe the “essential subspace” (2) of the dynamics by a set of orthogonal vectors oriented along the directions of largest variance in the data. For the highly nonlinear intrinsic manifolds one expects for complex molecular systems (5), the linearity of this technique renders it appropriate in local regions, but results in a poor characterization of the global features (5, 17). This deficiency leads to poor PCA estimates of the effective dimensionality (17) far in excess of the dimensionality of the phase space dynamics determined by Lyapunov analysis (3).

A number of nonlinear dimensionality reduction techniques have emerged in recent years such as Isomap (17), local linear embedding (LLE) (18), and diffusion maps (1921), which seek to reconstruct the intrinsic manifold by integrating local structural information dictated by the data geometry into a unified global description. Isomap (5, 22) and LLE (23) have been successfully applied to peptide systems, and, although diffusion maps have been used to study phenomena as diverse as chemical reaction networks (24) and defect mobility at an interface (25), they have not been previously applied to systems of biophysical significance.

Path-based techniques such as the finite temperature string (26), nudged elastic band (27), and transition path sampling (28) aim to determine minimum (free) energy routes between metastable states of biophysical systems, from which order parameters and mechanisms may be inferred. Diffusion maps may complement such approaches by furnishing order parameters with which to better characterize metastable basins or by providing a low-dimensional description of the pathway ensemble. Although we were unable to find comparative studies, application of the diffusion map approach to the simulation trajectories in this work was computationally inexpensive, requiring less than 20 h on a single 2.66 GHz processor. Path-based techniques would be more appropriate for systems exhibiting high free-energy barriers, because the diffusion map requires dense sampling of phase space.

N-alkanes are relatively simple molecules whose dynamic and structural behavior nevertheless remains rich and far from well understood, with recent work demonstrating the exotic chain conformations adopted by these molecules (29). Despite the absence of specific interactions or a unique native structure, the behavior of single n-alkane chains in water has long been of interest for understanding the role of hydrophobicity in protein folding (3035). In this work, we have conducted long molecular dynamics simulations of n-octane (C8), n-hexadecane (C16), and n-tetracosane (C24) in explicit water, and corresponding ideal-gas phase Monte Carlo simulations in which an isolated chain interacts only with itself. By applying diffusion maps to the simulation data, we extracted the intrinsic manifolds which we determined to be approximately three-dimensional and well conserved between the ideal gas and solvated phases. The simulations serve only as a means to sample a canonical distribution of system configurations; the underlying dynamics of the algorithms do not play a role in the diffusion map approach. The physical interpretation of order parameters identified by the diffusion map is unknown a priori, but can be facilitated by correlation with “intermediary” variables, in this case the principal moments of the n-alkane gyration tensor. Structural details were resolved by visualizing system configurations at representative data points. We determined three global order parameters for C16 and C24 describing the degree of collapse, location of the bend, and the handedness of the chain helicity, and determined the low-FE pathway for globular collapse to proceed by a kink and slide mechanism, whereby a bend near the end of the extended chain migrates to the center to form a symmetric hairpin, which subsequently collapses into a helical coil.

Diffusion Map

A molecular simulation trajectory recording the coordinates of all R constituent atoms consists of a set of 3R-dimensional snapshots. Using the diffusion map approach, we seek to arrange the trajectory in a low-dimensional space such that snapshots that are dynamically proximate (i.e., the system may evolve from one to the other on a short time scale) are situated near one another. In the following description of the method, we strive to present a physically motivated summary, reserving a more mathematical treatment for the SI Text.

The first step is to establish the dynamic proximity among all snapshot pairs, where, in the absence of an easily computable dynamic measure, we employ a structural similarity metric, with small values implying close dynamic proximity. We choose the rotationally and translationally minimized rmsd between the coordinates of the n-alkane united atom centers (36), denoting the distance between snapshots snapi and snapj as rmsdij. Although this measure ostensibly discards all solvent degrees of freedom, the solvent influences the simulation trajectory and its effect is “encoded” in the n-alkane configurations sampled.

The pairwise distances are now combined with a Gaussian kernel of bandwidth ϵ. For N snapshots, this transformation yields the N-by-N matrix A with elements

graphic file with name pnas.1003293107eq11.jpg [1]

This step provides a smooth threshold for the pairwise distances, discarding large and retaining small rmsdij values, where Inline graphic is a characteristic value below which we consider the similarity metric a meaningful measure of dynamic proximity. Following Grassberger and Procaccia’s use of the correlation dimension as a measure of fractal dimensionality (37), Coifman et al. demonstrated that twice the slope of the linear region of a log–log plot of Inline graphic versus ϵ provides an estimate of the effective dimensionality of the system dynamics and delineates the range of suitable ϵ values (38). An example calculation is provided for solvated C16 in Fig. S1.

A diagonal matrix D is constructed from the row sums of A and used to construct the N-by-N matrix M,

graphic file with name pnas.1003293107eq12.jpg [2]
graphic file with name pnas.1003293107eq13.jpg [3]

The eigenvectors of M arranged in decreasing eigenvalue order, Inline graphic, may be efficiently computed as described in Materials and Methods, with Inline graphic the trivial all-ones vector. The k-dimensional diffusion map is the mapping of the ith simulation snapshot into the ith component of each of the top k nontrivial eigenvectors (19, 21, 39)

graphic file with name pnas.1003293107eq14.jpg [4]

For brevity, this mapping will henceforth be referred to simply as the “embedding in the top k eigenvectors.” Dimensionality reduction is achieved by mapping the 3R-dimensional simulation snapshots into a k-dimensional embedding. Determination of an appropriate k is system dependent, and is addressed in Results and Discussion.

Free energy surfaces (FES) may be computed over the diffusion map embeddings using the relation Inline graphic, where β = 1/kBT, G is the Gibbs free energy, Inline graphic is a histogram approximation to the density of snapshots at Inline graphic, and Inline graphic is a k-dimensional vector of the eigenvector components.

As discussed in more depth in the SI Text, if the system dynamics can be well modeled as a diffusion process, and the structural similarity metric is a good descriptor of microscopic diffusive motions, then the diffusion map embedding possesses two important features. (i) The Euclidean distance between two snapshots in the diffusion map embedding corresponds to their diffusion distance, which may be regarded as the ease with which the system can evolve from one snapshot to the other (19, 39). Snapshots connected by a large number of short pathways have a small diffusion distance and will be embedded close together. (ii) The diffusion map embedding captures the slow dynamical motions of the system, capturing the intrinsic or “slow” manifold. Together, these properties make the diffusion map embeddings of the simulation trajectory dynamically meaningful, because paths traced out over this embedding describe the evolution of the system in its fundamental dynamical motions. Although these assumptions are expected to hold for biophysical systems such as this, in the event that they do not, the identified order parameters remain good variables with which to parametrize the evolution of the system from one state to another.

Results and Discussion

n-Octane.

The (fractal) effective dimensionalities of the C8 system in the ideal gas and solvated phases were estimated as 3.0 and 2.7, respectively, suggesting that embeddings may be constructed in the top three eigenvectors. The ordered principal moments of the C8 gyration tensor (ξ1,ξ2,ξ3) describe the characteristic length of the chain along each of its three principal axes, providing a convenient measure of elongation or globularity (40) and serving as convenient “intermediaries” in the nontrivial task of assigning physical meaning to the order parameters furnished by the diffusion map (Fig. S2 A and B). They do not, however, map bijectively with these order parameters and so are not in themselves the “right” variables.

Two-dimensional projections of the three-dimensional embedding of the solvated C8 system into eigenvectors 2, 3, and 4 are presented in Fig. 1. In Fig. 1A, the data points are colored according to ξ1, which correlates well with eigenvector 2 (evec2). For chain molecules such as n-alkanes, ξ1 is typically much greater than the other principal moments, and is the dominant contribution to the radius of gyration (Rg), which is related to the principal moments by Inline graphic. This relationship results in an approximately bijective mapping between ξ1 and Rg and, accordingly, evec2 also shows good correlation with Rg (Fig. S2 C and D). In Fig. 1B, the points are colored according to ξ3, which permits the identification of nine distinct low-ξ3 locales, of which seven are visible as dark blue regions, with the remaining two buried in the point cloud. These regions correspond to local FE minima, all of which are visible as low-lying isosurfaces of the FES in Fig. 2. Visualization of structures (41) residing in each FE well reveals that the basins correspond to various combinations of gauche defects in the chain. The fact that these structures are approximately planar permits their identification with low values of ξ3.

Fig. 1.

Fig. 1.

Two-dimensional elevations of the three-dimensional embedding of the solvated phase C8 system into evec2, evec3, and evec4. Data points are colored according to the (A) first and (B) third principal moments of the C8 gyration tensor.

Fig. 2.

Fig. 2.

FES of the solvated phase C8 system embedded in evec2, evec3, and evec4 with representative chain conformations. As in Figs. 3, 4, and 5A, molecules are oriented such that the head is farther from the reader, and solvent has been removed for clarity. The range of βG (where G is the Gibbs free energy, and β = 1/kBT) is 3.6–10.3, with isosurfaces plotted at βG = 5, 6, 7, and 8. The ninth low-ξ3 region midway between structures 2 and 3 has not been associated with a distinct structure, because it describes transitory conformations between 2 and 3 containing gauche defects in both the head and tail.

Figs. S3 and S4 present analogous plots to Figs. 1 and 2 for C8 in the ideal-gas phase. The fact that the structure of the ideal-gas phase intrinsic manifold and FES are remarkably similar to those in the solvated phase reinforces the assertion that the solvent plays little role in the conformations of short chain n-alkanes (34, 42). Rg has previously been suggested by ourselves and others as a good order parameter for n-alkane systems (29, 35, 42), and its correlation with the components of the top eigenvector justifies its use as a good one-dimensional descriptor for C8. However, the diffusion map approach furnishes a more informative three-dimensional embedding in which local minima are separated on the basis of gauche defects, and transitions between them correspond to torsional motions of the chain.

n-Hexadecane and n-Tetracosane.

The effective dimensionality of ideal gas C16 was estimated to lie in the range 3.1–6.2, in agreement with the solvated phase estimate of 3.9–5.6. In the case of C24, the ideal-gas and solvated phase estimates were 3.0–5.4 and 3.2–5.4, respectively. The spread arises from the precise location at which the slope is computed in the Inline graphic vs. ϵ log–log plot (Fig. S1). This ambiguity motivated us to develop an independent measure of the fractal dimensionality of the intrinsic manifold by computing the correlation dimension (37) of the embeddings in successively more eigenvectors, with the value at which this function flattens out, known as the plateau dimension (43). Up to 12-dimensional embeddings were constructed to determine a plateau dimension between 2.8–2.9 for the C16 systems, and 2.9–3.0 for the C24 systems, suggesting that embeddings be constructed in the top three eigenvectors.

The eigenvectors returned by the diffusion map are mutually orthogonal, but embeddings into their components (Eq. 4) may exhibit functional dependencies. Conceptually, the diffusion map may return multiple order parameters characterizing the same dynamic pathway. To draw an analogy with multivariate Fourier series, sin(x) and sin(2x) are independent Fourier components both oriented in the same spatial direction. In the case of solvated C16, such a dependency collapses an embedding into the components of evec2 and evec3 onto an effectively one-dimensional curve (Fig. S5A). By fitting two piecewise continuous quartic functions, this curve was parametrized by its scalar valued arclength (Fig. S5B), permitting an embedding in [evec2/3 arclength, evec4, evec6], where evec5 exhibited a functional dependency on arclength, and was omitted in favor of evec6. Similarly, ideal gas C16 was embedded in [evec2/3 arclength, evec5, evec6], and C24 in [evec2/4 arclength, evec3, evec5] in both the ideal-gas and solvated phase.

Fig. 3 presents elevations of the three-dimensional embedding of the solvated C24 system, with data points colored according to the principal moments of the C24 gyration tensor to assist in the physical interpretation of the eigenvectors and arclength. Fig. 3A demonstrates that ξ1, which describes the extent of the molecule along its longest axis, is anticorrelated with arclength, permitting arclength to be interpreted as the degree of molecular collapse. Structural details are resolved by visualizing representative chain conformations at increasing values of arclength while holding evec3 = evec5 = 0. The progression of structures 1 → 2 → 3 → 6 in Fig. 3 tracks the collapse from an extended conformation via a bend in the middle of the chain to a tight, symmetric hairpin.

Fig. 3.

Fig. 3.

Two-dimensional elevations of the three-dimensional embedding of the solvated phase C24 system in evec2/4 arclength, evec3, and evec5. Data points are colored according to the (A) first, (B) second, and (C) third principal moments of the C24 gyration tensor.

Fig. 3B presents the same projection of the manifold as Fig. 3A, but with the data points colored according to ξ2, describing the extent of the chain along its second longest axis. For values of arclength between 0.04 and 0.07, evec5 values around zero correspond to high values of ξ2, whereas values of evec5 away from zero are associated with lower values of ξ2. Visualization of representative chain conformations reveals the details of transitions described by evec5. Structure 3 has an evec5 value of around zero, and is a loose hairpin in which the bend is located approximately at the center of the chain, with the looseness reflected in a large value of ξ2. As evec5 is increased to approach structure 4, the kink migrates toward the tail and is accompanied by a tightening of the hairpin and a reduction in ξ2. Similarly, as evec5 is reduced from zero to approach structure 5, the hairpin tightens, but the kink now migrates toward the head of the chain. No large ξ2 values are observed at values of arclength below 0.04 and above 0.07, because structures in the first case correspond to linearly extended conformations (structure 1), and in the second to tight hairpins (structure 6) and coils (structures 7 and 8).

The data points in the elevation of the manifold presented in Fig. 3C are colored according to ξ3, describing the extent of the n-alkane chain along its shortest principal axis. Structure 6 has a value of evec3 near zero, and a small value of ξ3 due to the planarity of this tight, symmetric hairpin. Moving toward positive (negative) values of evec3 corresponds to a transition to a left-handed (right-handed) helical coil depicted by structure 8 (structure 7), which is accompanied by an increase in ξ3 due to a transition from a planar to a more globular form. Therefore, evec3 may be interpreted as describing the deviations from planarity of the molecule, distinguishing the handedness of such deviations toward right- or left-handed helical coiled structures akin to those observed by Chakrabarty and Bagchi (29).

Pairwise rmsd distances were computed using a consistent definition of the head-tail directionality of the n-alkane chains in order to yield a dynamically meaningful similarity metric. Because the chemical structure of n-alkanes is, however, identical irrespective of which end is defined as the head, the head-tail symmetry emerged naturally in order parameters extracted by the diffusion map, and is apparent in the approximate planes of symmetry in the structure and coloration of the manifold elevations in Fig. 3 B and C. For instance, structures containing a kink near the head (structure 5) are related by a head-tail inversion to those with a kink near the tail (structure 4).

The intrinsic manifold of solvated C16 (Fig. S6) is remarkably similar to that of solvated C24 (Fig. 3), and the intrinsic manifolds of ideal gas C16 (Fig. S7) and C24 (Fig. S8) exhibit striking similarity to those of the corresponding solvated systems. The similarity of the fundamental structure of the intrinsic manifold in all four systems suggests that the chain conformations explored by C16 and C24 in the ideal-gas and solvated phases are largely the same (34, 42), with the slow dynamics restrained to similar low-dimensional attractors.

We now turn from a consideration of the structure to an analysis of the conformational population distribution by constructing FES over the manifolds. Fig. 4 A and B present FES for C24 in the ideal-gas and solvated phases, respectively. Fig. 4A illustrates the presence of a low-FE “doughnut” encircling a higher FE region, and linking extended and collapsed conformations by two distinct routes. The pathway indicated by the upper arrow illustrates the progression from low to high arclength via positive values of evec5, whereas the lower arrow indicates a route via negative evec5 values. The representative structures projected onto the manifold demonstrate that the transitions proceed by a kink and slide mechanism, where a kink developing near the head (evec5 < 0) or tail (evec5 > 0) of an extended chain migrates toward the middle to form a tight, symmetric hairpin. These pathways follow an FE isosurface and are therefore essentially barrierless. The direct route from low to high arclength with evec5 = 0 traverses the “doughnut hole,” as indicated by the dashed arrow, and corresponds to a symmetric collapse pathway whereby a loose hairpin closes symmetrically into a tight hairpin. This route is less favored than the asymmetric collapse pathways, containing an extended 1-2kT FE barrier. Populations of helical coils at large positive and negative values of evec3 are too low to be resolved on the FES.

Fig. 4.

Fig. 4.

FES for the C24 chain in the (A) ideal-gas and (B) solvated phase. In both cases the embedding is constructed in evec2/4 arclength, evec3, and evec5. The range of βG is 2.4–10.3 in the ideal-gas phase and 1.6–10.3 in the solvated phase, with isosurfaces plotted at βG = 5, 6, 7, 8, and 9 in each case. The essentially one-dimensional low-arclength tip apparent in Fig. 3C and Fig. S8C constitutes the global FE minimum, but is not resolved in the three-dimensional plot. Solid arrows indicate collapse pathways by the kink and slide mechanism, and the dashed arrow in A indicates the symmetric hairpin collapse route. The * in B marks the low-FE incursion mentioned in the text, and the positive evec3 wing is not visible due to the perspective of the plot.

Fig. 4B demonstrates that the kink and slide mechanism remains the low-FE pathway for chain collapse in the solvated phase, as indicated by the arrows. The symmetric collapse pathway was so rarely sampled throughout the 30 ns simulation that we compute an infinitely high barrier at the resolution of our FES. Accordingly, the low-FE incursion on the low-arclength side of the doughnut hole (indicated by *) corresponds to loose, symmetric hairpins and represents a dead end to chain collapse. The helical coils are significantly more stable in the solvated phase relative to the ideal gas, with similar free energies as the tight, symmetric hairpins. That such structures exist in both the ideal-gas and solvated phases suggest that these morphologies are inherent to the n-alkane chain rather than a product of the aqueous environment (29), but are significantly stabilized by the solvent interaction, presumably by the hydrophobic effect (44, 45).

The major features of the C24 ideal-gas and solvated FES are conserved in the case of C16 and are presented in Fig. S9. The low-FE pathway to collapse for solvated C16 also proceeds by a kink and slide mechanism, followed by further collapse into a helical coil. The symmetric collapse route is as favorable as the asymmetric pathways in the ideal-gas phase, but whereas the asymmetric routes remain barrierless in the solvated phase, the symmetric pathway contains a ≫kT barrier, with precise determination of the height frustrated by inadequate sampling of this region. Depopulation of the symmetric collapse pathway in the solvated phase relative to the ideal gas appears to be the root of our previously reported solvent-induced FE barrier between extended and collapsed n-alkane conformations (42). As for C8, the diffusion map approach has identified ξ1, or equivalently Rg, as a good one-dimensional order parameter, although this representation is inferior to a three-dimensional description in terms of the degree or chain collapse, position of the bend in the chain, and handedness of the helical coil.

Although an objective validation of the dynamical relevance of the low-dimensional description would require evaluation of the committor probabilities along the collapse pathway (46), the identified variables are preserved when considering partial simulation trajectories (compare Fig. 5A and Fig. S6A), and appropriately characterize collapse events observed in the trajectories (Movie S1).

Fig. 5.

Fig. 5.

Solvent analysis. (A) Two-dimensional elevation of the three-dimensional embedding of the solvated phase C16 system constructed from a 3 ns portion of the full 30 ns trajectory. The embedding is constructed in evec2/3 arclength, evec4, and evec5, after a small rotation of the manifold to present a view consistent with that of Fig. 3. Data points are colored according to the solvent-excluded cavity volume occupied by the C16 chain. (B) Three-dimensional view of the solvated phase C24 system embedded in evec2/4 arclength, evec3, and evec5. Data points are colored according to the first principal moment of the C24 gyration tensor.

Solvent Analysis.

The role of the solvent was probed by calculating the solvent-excluded cavity volume occupied by the C16 chain using a test probe insertion procedure detailed in Materials and Methods. Due to the computational expense of the procedure, a contiguous 3 ns portion of the full 30 ns solvated trajectory was considered. Application of the diffusion map to the partial trajectory proved robust, resulting in a three-dimensional embedding of the intrinsic manifold (Fig. 5A) very similar in structure to those for the complete C16 and C24 trajectories (Fig. S6A and Fig. 3). The data points in Fig. 5A are colored according to the cavity volume, illustrating that chain collapse from low-arclength, extended conformations to high-arclength, tight, symmetric hairpins and helical coils, is accompanied by an increase in the solvent-excluded cavity volume. Considering arclength = 0.12 as a cutoff, the mean cavity volume of the extended conformations is 16 ± 11 3, compared to 39 ± 20 3 for the collapsed structures. Large variances are expected in the measurement of a dynamic void volume depending on the collective motion of many solvent molecules.

Fig. 5B provides a three-dimensional view of the solvated C24 intrinsic manifold presented in Fig. 3, together with representative snapshots of the n-alkane chain and surrounding water molecules, to illustrate the details of the solvent as the chain collapses via the kink and slide mechanism. The interior of the low-arclength, loose hairpin is hydrated (snapshot A in Fig. 5B), as is apparent by the presence of water molecules between the arms of the chain. Solvent is excluded from within a bend in the tail of the chain (B), which subsequently slides down to form a tight, symmetric hairpin with a dry interior (C), and collapses further into a helical coil with a dry core (D) (Movie S1). The expulsion of solvent from the chain interior is captured by the increase in the solvent-excluded cavity with increasing arclength and, in this sense, solvent effects are contained within the extracted order parameters without explicit consideration of solvent degrees of freedom.

The depopulation of the symmetric hairpin collapse pathway in the solvated phase relative to the ideal gas (Fig. 4) is apparently due to a wetting/dewetting FE barrier similar to that observed by Lum et al. for bundles of hydrophobic cylinders (47), with the kink and slide mechanism providing a less expensive route to collapse, avoiding the collective expulsion of interior solvent molecules. A study of a hydrophobic 12-mer by Miller et al. (46) determined the low-FE collapse pathway to proceed via a drying transition at a bend in the middle of the chain. Athough the sensitivity of hydrophobic hydration mechanics to subtle differences in the force field (29) may underlie these differences, the large monomers of Miller et al. may permit this model to be interpreted as a coarse-grained representation of ∼C60 (35), suggesting an alternative collapse mechanism for long chains.

Conclusions

We have demonstrated an application of diffusion maps to systematically recover a small number of “good” order parameters from simulations of C8, C16, and C24 n-alkane chains in the ideal-gas and solvated phases. The intrinsic manifolds upon which the dynamics of each system effectively lie were reconstructed by embedding the simulation trajectories into these order parameters, and a physical interpretation of the parameters was facilitated by correlating them with the principal moments of the n-alkane gyration tensors. FES constructed on the manifold are dynamically meaningful, with the low-FE pathways providing mechanistic insight. In the case of C8, the local FE minima were separated on the basis of the dihedral angles of the chain, with transitions between minima corresponding to torsional chain dynamics. The ideal-gas and solvated phase FES were strikingly similar in both structure and depth of the local minima, indicating relatively little effect of the solvent interaction on the conformations of the chain.

For the C16 and C24 systems, the diffusion map approach identified three global order parameters describing the degree of collapse, location of the bend in the chain, and the handedness of the chain helicity. Although the overall structure of the FES was conserved between the ideal-gas and solvated phases, helical coil conformations were stabilized in the solvated phase relative to the ideal gas, whereas the collapse pathway corresponding to the tightening of a loose symmetric hairpin was destabilized. The low-FE pathway for the collapse of both chains in solvent was observed to proceed first by a kink and slide mechanism, whereby a kink near the end of the chain migrates to the middle to form a symmetric hairpin with a dry interior, followed by further collapse into a helical coil. These results suggest that the FES and underlying dynamical motions of n-alkanes of lengths between C16 and C24 are well conserved, although the extent of this range and the manner in which the short chain behavior merges into this regime remains to be determined.

Materials and Methods

Molecular Simulations.

Solvated phase molecular dynamics simulations were conducted with the GROMACS 4.0.2 simulation suite (48) employing the Transferable Potentials for Phase Equilibria potential for the n-alkane chains (49) and the simple point charge model of water (50). PRODRG2 (51) assisted in the building of n-alkane topologies. Lennard–Jones interactions were smoothly switched to zero at 14 Å, whereas real-space electrostatic interactions were truncated at 15 Å and the reciprocal space treated with Particle Mesh Ewald (52). Systems were subjected to energy minimization, 5 ps of position restrained dynamics, and 1 ns of equilibration, before conducting 30 ns production runs at 298 K and 1 bar maintained by a Nosé–Hoover thermostat (53, 54) and Parrinello–Rahman (55) barostat. Snapshots were saved every 1 ps. Conformationally biased (56), ideal-gas Monte Carlo simulations were conducted for 150,000 steps, saving snapshots every fifth step.

Solvent-Excluded Cavity Volumes.

A 0.2-Å cubic mesh was placed over the simulation box and the solvation cavity defined as those cells for which the insertion of a 3.75-Å spherical probe into the center of the cell did not result in overlap with any water O atom centers (42). The probe radius was selected so as to result in a probability of zero overlap insertions in bulk water of less than 10-7 (57).

Eigenvector Computation.

The top 20 eigenvectors and eigenvalues of 30,001 × 30,001 matrices were computed by the Implicitly Restarted Arnoldi Method implemented in the Parallel ARPACK libraries (58). Matrix storage scales as the square of the number of snapshots, but is independent of ostensible system dimensionality.

Supplementary Material

Supporting Information

Acknowledgments.

Computations were performed at the Terascale Infrastructure for Groundbreaking Research in Engineering and Science facility at Princeton University. P.G.D. acknowledges the National Science Foundation (Collaborative Research in Chemistry Grants CHE-0404699 and CHE-0908265). A.Z.P. acknowledges the Department of Energy (Grant DE-SC-0002128) and the Princeton Center for Complex Materials (Materials Research Science and Engineering Center Grant DMR-0819860). I.G.K. acknowledges the US Department of Energy (Grant DE-PS02-08ER08-13).

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1003293107/-/DCSupplemental.

References

  • 1.García AE. Large-amplitude nonlinear motions in proteins. Phys Rev Lett. 1992;68:2696–2699. doi: 10.1103/PhysRevLett.68.2696. [DOI] [PubMed] [Google Scholar]
  • 2.Amadei A, Linssen ABM, Berendsen HJC. Essential dynamics of proteins. Proteins. 1993;17:412–425. doi: 10.1002/prot.340170408. [DOI] [PubMed] [Google Scholar]
  • 3.Hegger R, Altis A, Nguyen PH, Stock G. How complex is the dynamics of peptide folding? Phys Rev Lett. 2007;98:028102–4. doi: 10.1103/PhysRevLett.98.028102. [DOI] [PubMed] [Google Scholar]
  • 4.Zhuravlev PI, Materese CK, Papoian GA. Deconstructing the native state: Energy landscapes, function, and dynamics of globular proteins. J Phys Chem B. 2009;113:8800–8812. doi: 10.1021/jp810659u. [DOI] [PubMed] [Google Scholar]
  • 5.Das P, Moll M, Stamati H, Kavraki LE, Clementi C. Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction. Proc Natl Acad Sci USA. 2006;103:9885–9890. doi: 10.1073/pnas.0603553103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Zwanzig R. Nonequilibrium Statistical Mechanics. New York: Oxford Univ Press; 2001. pp. 143–168. [Google Scholar]
  • 7.Hummer G, Kevrekidis IG. Coarse molecular dynamics of a peptide fragment: Free energy, kinetics, and long-time dynamics computations. J Chem Phys. 2003;118:10762–10773. [Google Scholar]
  • 8.Cho SS, Levy Y, Wolynes PG. P versus Q: Structural reaction coordinates capture protein folding on smooth landscapes. Proc Natl Acad Sci USA. 2006;103:586–591. doi: 10.1073/pnas.0509768103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bolhuis PG, Dellago C, Chandler D. Reaction coordinates of biomolecular isomerization. Proc Natl Acad Sci USA. 2000;97:5877–5882. doi: 10.1073/pnas.100127697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Buja A, et al. Data visualization with multidimensional scaling. J Comput Graph Stat. 2008;17:444–472. [Google Scholar]
  • 11.Elman JL, Zipser D. Learning the hidden structure of speech. J Acoust Soc Am. 1988;83:1615–1626. doi: 10.1121/1.395916. [DOI] [PubMed] [Google Scholar]
  • 12.Zhu X, Ghahramani Z, Lafferty J. In: Fawcett T, Mishra N, editors. Proceedings of the 20th International Conference on Machine Learning; Washington: AAAI Press; 2003. 912 pp.919 pp. [Google Scholar]
  • 13.Zelnik-Manor L, Perona P. In: Advances in Neural Information Processing Systems 17. Saul LK, Weiss Y, Bottou L, editors. Cambridge, MA: MIT Press; 2004. pp. 1601–1608. [Google Scholar]
  • 14.Best RB, Hummer G. Reaction coordinates and rates from transition paths. Proc Natl Acad Sci USA. 2005;102:6732–6737. doi: 10.1073/pnas.0408098102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Du R, Pande VS, Grosberg AY, Tanaka T, Shakhnovich ES. On the transition coordinate for protein folding. J Chem Phys. 1998;108:334–350. [Google Scholar]
  • 16.Sanbonmatsu KY, García AE. Structure of met-enkephalin in explicit aqueous solution using replica exchange molecular dynamics. Proteins. 2002;46:225–234. doi: 10.1002/prot.1167. [DOI] [PubMed] [Google Scholar]
  • 17.Tenenbaum JB, de Silva V, Langford JC. A global geometric framework for nonlinear dimensionality reduction. Science. 2000;290:2319–2323. doi: 10.1126/science.290.5500.2319. [DOI] [PubMed] [Google Scholar]
  • 18.Roweis ST, Saul LK. Nonlinear dimensionality reduction by locally linear embedding. Science. 2000;290:2323–2326. doi: 10.1126/science.290.5500.2323. [DOI] [PubMed] [Google Scholar]
  • 19.Coifman RR, et al. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proc Natl Acad Sci USA. 2005;102:7426–7431. doi: 10.1073/pnas.0500334102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Coifman RR, Lafon S. Diffusion maps. Appl Comput Harmon Anal. 2006;21:5–30. [Google Scholar]
  • 21.Belkin M, Niyogi P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 2003;15:1373–1396. [Google Scholar]
  • 22.Plaku E, Stamati H, Clementi C, Kavraki LE. Fast and reliable analysis of molecular motion using proximity relations and dimensionality reduction. Proteins. 2007;67:897–907. doi: 10.1002/prot.21337. [DOI] [PubMed] [Google Scholar]
  • 23.Kentsis A, Gindin T, Mezei M, Osman R. Calculation of the free energy and cooperativity of protein folding. PLoS One. 2007;2:e446. doi: 10.1371/journal.pone.0000446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Singer A, Erban R, Kevrekidis IG, Coifman RR. Detecting intrinsic slow variables in stochastic dynamical systems by anisotropic diffusion maps. Proc Natl Acad Sci USA. 2009;106:16090–16095. doi: 10.1073/pnas.0905547106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Sonday BE, Haataja M, Kevrekidis IG. Coarse-graining the dynamics of a driven interface in the presence of mobile impurities: Effective description via diffusion maps. Phys Rev E. 2009;80:031102–031111. doi: 10.1103/PhysRevE.80.031102. [DOI] [PubMed] [Google Scholar]
  • 26.Maragliano L, Fischer A, Vanden-Eijnden E, Ciccotti G. String method in collective variables: Minimum free energy paths and isocommittor surfaces. J Chem Phys. 2006;125:024106–024115. doi: 10.1063/1.2212942. [DOI] [PubMed] [Google Scholar]
  • 27.Jónsson H, Mills G, Jacobsen KW. In: Classical and Quantum Dynamics in Condensed Phase Simulations. Berne BJ, Ciccoti G, Coker DF, editors. Singapore: World Scientific; 1998. pp. 385–404. [Google Scholar]
  • 28.Bolhuis PG, Chandler D, Dellago C, Geissler PL. Transition path sampling: Throwing ropes over rough mountain passes, in the dark. Annu Rev Phys Chem. 2002;53:291–318. doi: 10.1146/annurev.physchem.53.082301.113146. [DOI] [PubMed] [Google Scholar]
  • 29.Chakrabarty S, Bagchi B. Self-organization of n-alkane chains in water: Length dependent crossover from helix and toroid to molten globule. J Phys Chem B. 2009;113:8446–8448. doi: 10.1021/jp9034387. [DOI] [PubMed] [Google Scholar]
  • 30.Tanford C. Hydrophobic free energy, micelle formation and the association of proteins with amphiphiles. J Mol Biol. 1972;67:59–74. doi: 10.1016/0022-2836(72)90386-5. [DOI] [PubMed] [Google Scholar]
  • 31.Kauzmann W. Some factors in the interpretation of protein denaturation. Adv Protein Chem. 1959;14:1–63. doi: 10.1016/s0065-3233(08)60608-7. [DOI] [PubMed] [Google Scholar]
  • 32.Mountain RD, Thirumalai D. Molecular dynamics simulations of end-to-end contact formation in hydrocarbon chains in water and aqueous urea solution. J Am Chem Soc. 2003;125:1950–1957. doi: 10.1021/ja020496f. [DOI] [PubMed] [Google Scholar]
  • 33.Huang DM, Chandler D. The hydrophobic effect and the influence of solute-solvent attractions. J Phys Chem B. 2002;106:2047–2053. [Google Scholar]
  • 34.Sun L, Siepmann JI, Schure MR. Conformation and solvation structure for an isolated n-octadecane chain in water, methanol, and their mixtures. J Phys Chem B. 2006;110:10519–10525. doi: 10.1021/jp0602631. [DOI] [PubMed] [Google Scholar]
  • 35.Athawale MV, Goel G, Ghosh T, Truskett TM, Garde S. Effects of lengthscales and attractions on the collapse of hydrophobic polymers in water. Proc Natl Acad Sci USA. 2007;104:733–738. doi: 10.1073/pnas.0605139104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Maiorov VN, Crippen GM. Size-independent comparison of protein three-dimensional structures. Proteins. 1995;22:273–283. doi: 10.1002/prot.340220308. [DOI] [PubMed] [Google Scholar]
  • 37.Grassberger P, Procaccia I. Measuring the strangeness of strange attractors. Physica D. 1983;9:189–208. [Google Scholar]
  • 38.Coifman RR, Shkolnisky Y, Sigworth FJ, Singer A. Graph laplacian tomography from unknown random projections. IEEE T Image Process. 2008;17:1891–1899. doi: 10.1109/TIP.2008.2002305. [DOI] [PubMed] [Google Scholar]
  • 39.Nadler B, Lafon S, Coifman RR, Kevrekidis I. In: Advances in Neural Information Processing Systems 18. Weiss Y, Schölkopf B, Platt J, editors. Cambridge, MA: MIT Press; 2006. pp. 955–962. [Google Scholar]
  • 40.Theodorou DN, Suter UW. Shape of unperturbed linear polymers: Polypropylene. Macromolecules. 1985;18:1206–1214. [Google Scholar]
  • 41.Humphrey W, Dalke A, Schulten K. VMD—visual molecular dynamics. J Mol Graphics. 1996;14:33–38. doi: 10.1016/0263-7855(96)00018-5. [DOI] [PubMed] [Google Scholar]
  • 42.Ferguson AL, Debenedetti PG, Panagiotopoulos AZ. Solubility and molecular conformations of n-alkane chains in water. J Phys Chem B. 2009;113:6405–6414. doi: 10.1021/jp811229q. [DOI] [PubMed] [Google Scholar]
  • 43.Sauer T, Yorke JA, Casdagli M. Embedology. J Stat Phys. 1991;65:579–616. [Google Scholar]
  • 44.Widom B, Bhimalapuram P, Koga K. The hydrophobic effect. Phys Chem Chem Phys. 2003;5:3085–3093. [Google Scholar]
  • 45.Chandler D. Interfaces and the driving force of hydrophobic assembly. Nature. 2005;437:640–647. doi: 10.1038/nature04162. [DOI] [PubMed] [Google Scholar]
  • 46.Miller TF, Vanden-Eijnden E, Chandler D. Solvent coarse-graining and the string method applied to the hydrophobic collapse of a hydrated chain. Proc Natl Acad Sci USA. 2007;104:14559–14564. doi: 10.1073/pnas.0705830104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Lum K, Chandler D, Weeks JD. Hydrophobicity at small and large length scales. J Phys Chem B. 1999;103:4570–4577. [Google Scholar]
  • 48.van der Spoel D, et al. Gromacs: Fast, flexible, and free. J Comput Chem. 2005;26:1701–1718. doi: 10.1002/jcc.20291. [DOI] [PubMed] [Google Scholar]
  • 49.Martin MG, Siepmann JI. Transferable potentials for phase equilibria. 1. United-atom description of n-alkanes. J Phys Chem B. 1998;102:2569–2577. [Google Scholar]
  • 50.Berendsen HJC, Postma JPM, van Gunsteren WF, Hermans J. Pullman B. Intermolecular Forces. Dordrecht, The Netherlands: Reidel; 1981. pp. 331–342. [Google Scholar]
  • 51.Schüttelkopf A, van Aalten D. PRODRG: A tool for high throughput crystallography of protein-ligand complexes. Acta Crystallogr D. 2004;60:1355–1363. doi: 10.1107/S0907444904011679. [DOI] [PubMed] [Google Scholar]
  • 52.Essmann U, et al. A smooth particle mesh Ewald method. J Chem Phys. 1995;103:8577–8593. [Google Scholar]
  • 53.Nosé S. A unified formulation of the constant temperature molecular dynamics methods. J Chem Phys. 1984;81:511–519. [Google Scholar]
  • 54.Hoover WG. Canonical dynamics: Equilibrium phase-space distributions. Phys Rev A. 1985;31:1695–1697. doi: 10.1103/physreva.31.1695. [DOI] [PubMed] [Google Scholar]
  • 55.Parinello M, Rahman A. Polymorphic transitions in single crystals: A new molecular dynamics method. J Appl Phys. 1981;52:7182–7190. [Google Scholar]
  • 56.Frenkel D, Smit B. Understanding Molecular Simulation: From Algorithms to Applications. 2nd Ed. San Diego: Academic; 2002. pp. 331–353. [Google Scholar]
  • 57.Hummer G, Garde S, Paulitis M, Pratt L. Hydrophobic effects on a molecular scale. J Phys Chem B. 1998;102:10469–10482. [Google Scholar]
  • 58.Maschhoff KJ, Sorensen DC. In: Proceedings of the Third International Workshop on Applied Parallel Computing, Industrial Computation and Optimization. Wasniewski J, Dongarra J, Madsen K, Olesen D, editors. London: Springer; 1996. pp. 478–486. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information
Download video file (9.1MB, mpg)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES