Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2014 May 28;35(20):1481–1490. doi: 10.1002/jcc.23643

Visualizing energy landscapes with metric disconnectivity graphs

Lewis C Smeeton 1, Mark T Oakley 1, Roy L Johnston 1
PMCID: PMC4285870  PMID: 24866379

Abstract

The visualization of multidimensional energy landscapes is important, providing insight into the kinetics and thermodynamics of a system, as well the range of structures a system can adopt. It is, however, highly nontrivial, with the number of dimensions required for a faithful reproduction of the landscape far higher than can be represented in two or three dimensions. Metric disconnectivity graphs provide a possible solution, incorporating the landscape connectivity information present in disconnectivity graphs with structural information in the form of a metric. In this study, we present a new software package, PyConnect, which is capable of producing both disconnectivity graphs and metric disconnectivity graphs in two or three dimensions. We present as a test case the analysis of the 69-bead BLN coarse-grained model protein and show that, by choosing appropriate order parameters, metric disconnectivity graphs can resolve correlations between structural features on the energy landscape with the landscapes energetic and kinetic properties.

Keywords: collective variables, protein, coarse-grained models, software, Python

Introduction

The potential energy surface,U(r), of an N atom chemical system represents the potential energy as a function of 3N atomic coordinates. The topography of U(r), or energy landscape, determines its structure, kinetics, and thermodynamics1,2 and its analysis has proved useful in studying a range of physical systems and phenomena, including glasses,3 biomolecules,46 and clusters.79 For all but the simplest cases, U(r) has many more degrees of freedom than it is possible to visualize conventionally, making it impossible to assess the surface topography directly. One solution to the visualization problem is to partition the landscape into discrete regions, and then hierarchically cluster these regions according to some similarity measure. This clustering can then be represented as a tree-graph in either two or three dimensions (2D or 3D). There are a number of examples of hierarchical clustering methods in the literature, broadly based on either geometry, energetic barriers, or local ergodicity.

In geometrical clustering, regions are clustered according to their structural similarity, which is usually defined by the root-mean-square deviation (RMSd) between them. In this context, regions can either correspond to minima on U(r) 10 or points along a molecular dynamics trajectory.11,12 The structures are clustered either by an iterative process, by which each structure is joined to its nearest neighbor until only a single cluster remains,11 or clustering structures that are within a critical distance of one another.10

When clustering according to energetic barriers, the landscape is partitioned into basins of attraction whereby each point on the landscape U(r), is mapped onto a local minimum α, with coordinate, rα by a steepest-descent path.3,13 Alternatively, the landscape can be partitioned using a lumping approach,14 in which energy thresholds are used to group connected regions below the threshold. The similarity measure used for hierarchical clustering is the barrier energy that separates any two regions. Starting from the energy of the global minimum, U0, regions are clustered together if they are separated by a barrier with an energy lying in the interval Ui + 1Ui, where Ui + 1Ui + ΔU and ΔU is the width of the interval. This clustering is repeated until a particular energy threshold, Ut, is reach, or all the minima are clustered together. Such graphs have come to be referred to as disconnectivity graphs,13,15 and have been used in a number of studies to visualize energy landscapes.13,1517 Disconnectivity graphs retain both the energies of minima on U(r), and the barriers that separate them, making them a useful diagnostic in visually assessing the thermodynamic and kinetic behavior of a system.18,19 They can also be used to represent free-energy surfaces by estimating the vibrational entropy of minima and transition states on the landscape from the harmonic superposition approximation.2022 Clustering landscapes by local ergodicity involves partitioning the landscape into basins about local minima. Equilibration between basins is determined by comparing forward and backward transition rates between states23 or the time-dependent probability distributions of connected basins.24

A weakness of the disconnectivity graph method is that it does not retain any structural information on the minima and thus neglects a large portion of the information contained in the energy landscape. Metric disconnectivity graphs capture some of this structure by defining a metric, and then calculating an order parameter from the metric for each minimum of interest on the landscape. The minima can then be plotted along a metric axis perpendicular to the energy axis. Metric information can be included in a number of other ways, such as by changing the color, or thickness of the nodes and edges.25,26 In this article, we will refer to metric disconnectivity graphs as those for which the nodes are organized along a metric axis. A judicious choice of metric captures overall structural trends in the system, while ignoring noisy or irrelevant information.

Here, we demonstrate the use of metric disconnectivity graphs, using several metrics, to visualize the energy landscapes of coarse-grained proteins. These disconnectivity graphs are plotted with our new energy landscape visualization package, PyConnect.27

Methodology

BLN model

Metric disconnectivity graph analysis was performed on a database of stationary points for a BLN model protein This database was generated with discrete path sampling8,28 as implemented in PATHSAMPLE.29 The BLN model30,31 is a coarse-grained, off-lattice protein model in which each protein residue is represented by one of three types of bead: hydrophoBic, hydrophiLic, or Neutral. Here, we use a version of the BLN potential in which the interresidue distances and angles are restrained with stiff springs.32 The beads interacts with each other according to

graphic file with name jcc0035-1481-m11.jpg (1)

where Rij is the distance between two beads i and j. The first term is a harmonic bond restraint with Kr = 231.2 ϵσ−2 and Re = σ. The second term represents a harmonic angle constraint the Kθ = 20 rad−2 and θϵ = 1.8326 rad. The third term takes into account torsional angles along the chain and is defined by four consecutive beads. If two or more beads are N, then A = 0 and B = 0.2, else A = B = 1.2. The fourth term represents long range, water-mediated hydrophobic interactions between nonbonded pairs. If both beads are B, then C = D = 1. If one residue is L and the other is L or B, then Inline graphic and D = −1. If either residue is N, then C = 1 and D = 0.32

Though other sequences exist and have been studied, we consider here BLN-69, which consists of 69 beads with the sequence33 B9N3(LB)4N3B9N3(LB)4N3B9N3(LB)5L. BLN-69 has been designed to exhibit a frustrated energy landscape, with a 6-strand β-barrel structure as its global minimum. The model has been shown to have a number of low-energy β-barrel-like structures, which differ from the global minimum by a chain slip along the length of the barrel,4 but are separated by large barriers. Such frustration is absent when considering the “Gō” version of the model (Gō-69), where attractive interactions between pairs of residues that are not in contact in the native state (i.e., the global minimum) are neglected.34,35

Metric disconnectivity graphs

Disconnectivity graphs and metric disconnectivity graphs are plotted using PyConnect.27 The PyConnect package comprises two components: PCA, which calculates the principal components of molecular systems from PATHSAMPLE29 databases, and PyConnect, which constructs and displays metric disconnectivity graphs. Both of these programs were written in Python. The disconnectivity graphs are rendered with MatPlotLib,36 and users can choose to create disconnectivity graphs and metric disconnectivity graphs in 2D or 3D. PyConnect also provides some cosmetic features, including the ability to label minima, color minima according to an order parameter or according to their basin of residence. PyConnect can also be used to modify graphs interactively using the iPython37 virtual environment. In the disconnectivity graphs produced by PyConnect, the position of nodes and minima along the x axis are determined by algorithms similar to those used in DISCONNECT,38 another program for producing disconnectivity graphs from databases of minima and transition states. Full details of the algorithms used can be found on the PyConnect website.27 Two-dimensional metric disconnectivity graphs are plotted with the position of the minima on the x axis defined according to a metric. In 3D disconnectivity graphs, two metrics are used. The positions of nodes on the disconnectivity graphs are defined as the mean of the metrics for all minima connected to that node.

Native Contact Metric

The native contact metric evaluates for each minimum, α, the ratio Nα/NNC, where NNC is the number of native contact pairs, and Nα is the number of contact pairs in minimum α that are also present in the native conformation. Here, contacts are defined as those beads which are within 1.167σ of each other, excluding pairs that are within three beads of each other in the peptide sequence.4

Hydrogen bonding is important in protein folding, and native contact analysis can provide a useful analogy for coarse-grained protein models.Nα/NNC is commonly used as a progress variable in computational studies of protein folding to distinguish between the different degrees of partially folded protein.39

RMSd Metric

The RMSd, dα,β, measures the distance between the conformation of minimum α and β, rα and rβ respectively, according to

graphic file with name jcc0035-1481-m22.jpg (2)

Invariance under global translations and rotations is implicit if structures are represented in internal coordinates. When working in Cartesian coordinates, the Kabsch algorithm40 was used to align structures to minimize dαβ. In the RMSd metric, dαβ is calculated between the conformation of each minimum and the conformation of the global minimum, rGM.

Principal Component Metric

The principal component metric is based on principal component analysis (PCA), a statistical procedure used to analyze large, high-dimensional data sets, which is commonly used in dimensional reduction and, or when the relevant degrees of freedom in a data set are not clear.41 PCA attempts to reexpress a data set in terms of a new basis set, the principal components, which are a linear transform of the data sets original basis set. The principal components lie along the axes of greatest sample variance, with the first principal component, PC1, capturing the axis of greatest variance, the second principal component, PC2, capturing the axis of second greatest variance (orthogonal to the first) and so on.42

We performed PCA on the set of Nsp stable configurationsInline graphic, whereInline graphic are all local minima connected to the global minimum below a certain threshold energy Ut. The initial basis sets employed were the 3N dimensional external Cartesian basis set, {ei}, and an internal basis set of dihedral angles,{ψi}. To remove the periodicity of {ψi}, we used the sines and cosines of the internal dihedrals, {cosψi, sinψi}.43 Rotational and translational invariance of Inline graphic was enforced by implementing McLachlan's best fit procedure44;

  1. A reference configuration defined as the ensemble average,〈r〉 of {rα} was calculated, where {rα} is the set of Nsp minima of interest, and where each configuration in {rα} has its centroid centered on the origin.

  2. Define a new setInline graphic, rotate each configuration about its origin to be as close to 〈r〉 as possible using the Kabsch algorithm, and thus minimize
    graphic file with name jcc0035-1481-m39.jpg (3)
  3. Replace {rα} with Inline graphic.

  4. Repeat steps 1–3 until the ensemble average converges to some threshold criterion.

In our study, we used the threshold criterion defined by Komatsuzaki et al.,25 s ≤ 10−8.

Hereafter, whether discussing Cartesian or internal coordinates, we define {rα} as the translation-free, rotation-free set of configurations, and {qi} as the basis set, where i is the coordinate index.

To perform PCA, we begin with defining the 3N × Nsp mean-centered configuration matrix, R

graphic file with name jcc0035-1481-m45.jpg (4)

where each column of R is a 3N dimensional vector corresponding to a stable configuration in the set Inline graphic. The PCs are the eigenvectors of the 3N × 3N covariance matrix, C

graphic file with name jcc0035-1481-m47.jpg (5)

and are thus the basis set {Qi}, where i is the coordinate index, in which C is diagonalized. The PCs are calculated using the singular value decomposition method, which states that a 3N × Nsp configuration matrix, R can be written as the product

graphic file with name jcc0035-1481-m49.jpg (6)

where W is the 3N × 3N matrix;

graphic file with name jcc0035-1481-m50.jpg (7)

whose columns are the PCs of C and S is a 3N × Nsp matrix with diagonal elements, Sii, where (dropping the double index for clarity)Inline graphic is the variance associated with Qi. The PCs are ordered so that Q1 has the greatest variance, Q2 has the second greatest variance, and so on. The ith principal component metric is calculated by transforming each member of Inline graphic into the basis set of {Qi}, and using the value of the ith PC for each minimum as the order parameter.

One can visualize the PCs of a given Inline graphic, by choosing a reference structure, rref, and adding the Qi of interest to it

graphic file with name jcc0035-1481-m61.jpg (8)

where λ is a progress variable.

Isomap Metric

The Isomap metric is based on the Isomap algorithm,45,46 a nonparametric, nonlinear dimensionality reduction technique. The aim of the Isomap algorithm is to define a low-dimensional embedding that as accurately as possible preserves geodesic distances between all pairs of points in the data cloud. The geodesic distance between a pair of points that lie on a manifold is the length of the shortest path between them that lies along that manifold. Isomap assumes that such a low-dimensional manifold exists, and that its shape can be estimated from the distribution of points in the data cloud. The Isomap algorithm approximates the geodesic distance between a given pair of points on the manifold by calculating the shortest possible path between them that can be found by stepping from one point to its neighbor.

We applied Isomap to the set of Nsp stable configurations Inline graphic using the Isomap implementation in the scikit-learn machine learning package.47 As with the principal component metric, rotational and translational invariance of Inline graphic was enforced by implementing McLachlan's best fit procedure.

The Isomap algorithm works in three steps;

  1. A weighted graph, G, ofInline graphic is built, where each conformation is a node and where the k nearest-neighbors of each conformation α are joined by an edge with weight dαβ. Isomap has been shown to be fairly robust to the choice of k,46 and in this study we took k = 15.

  2. The shortest path between each conformation through the graph G is determined and a distance matrix, D, is computed, where Dαβ = min{dαβ, dαγ + dγβ} for γ = 1, &, Nsp. The elements Dαβ, are the approximate geodesics between conformations α and β.

  3. Classical multidimensional scaling is applied to the matrix D, producing a low-dimensional embedding of the conformational coordinates that best preserves geodesic distances on the manifold.

The ith Isomap metric corresponds to the ith embedded dimension of the low-dimensional manifold ofInline graphic.

Results and Discussion

BLN-69

For BLN-69, a database containing 141,835 minima and 173,692 transition states was used in the study. The first three Cartesian and dihedral principal components for the sets,Inline graphic, connected to the global minimum below energy Ut, where −95.0ε ≥ Ut ≥ − 98.0ε are shown in Table 1.

Table 1.

The variance captured by the first three principal components in Cartesian, Inline graphic, and dihedral,Inline graphic, bases of the Nsp structures in the sublevel sets of minima below threshold energy Ut for BLN–69.

Ut Nsp Cartesian PCA Dihedral PCA
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
−95.0 6891 25.0 9.2 8.0 12.3 8.5 8.2
−95.5 5973 25.7 8.9 8.1 12.5 8.6 8.4
−96.0 5135 26.0 8.7 8.2 12.3 8.9 8.6
−96.5 4353 27.4 8.5 8.2 12.7 9.2 9.8
−97.0 1611 37.0 10.0 7.9 12.8 10.6 9.7
−97.5 561 21.2 19.0 10.8 15.1 13.7 11.9
−98.0 409 25.8 16.6 10.8 15.6 14.1 12.0

For the Cartesian PCs, PC1 captures significantly more of the variance than PC2 for all data sets considered. PC1 for the sublevel set of minima connected to the global minimum below Ut = −52.0ε has the largest fractional variance and therefore this threshold was selected for all disconnectivity graphs. The dihedral PCs have a more uniform variance distribution than the Cartesian PCs, with Inline graphic, for allInline graphic considered in both BLN-69 and Gō-69. The dihedral PCs are thus not appropriate metrics for studying these systems, and have not been used to create metric disconnectivity graphs. The set of minima where Ut = −97.0ε is represented as a disconnectivity graph in Figure 1. Figure 2 shows the low-energy minima labeled ae in Figure 1. Minima be are all structurally similar to one another with each adopting compact β-barrel geometries and differing from global minimum a by either a chain-slip, chain-reptation, or twist in the turn regions, with further details given in Table 2.

Figure 1.

Figure 1

Disconnectivity graph of BLN-69,Ut = 97.0ε, Nsp = 1611. The color scheme is chosen to distinguish between energetic funnels. Labeled minima correspond to the global minimum and low-energy minima separated from the global minimum and one another by large kinetic barriers and are shown in Figure 2.

Figure 2.

Figure 2

Structures of the minima labeled in Figure 1, corresponding to the global minimum, Figure 2a and low-energy minima separated from the global minimum and one another by large kinetic barriers, Figures 2b2e. Energetic and structural details are provided in Table 2. The beads are colored from red to blue (N-terminus to C-terminus).

Table 2.

Energy above the global minimum, ΔU, fraction of native contacts, Nα/NNC, RMSd from the global minimum and difference in PC1 and PC2 from the global minimum ΔQ1 and ΔQ2, respectively, for minima be.

Minimum ΔU Nα/NNC RMSd/σ ΔQ1 ΔQ2 Defect
b 0.26 0.85 0.41 −5.32 −1.25 Chain-slip
c 0.38 0.87 0.21 −0.02 −0.76 Reptation
d 0.67 0.86 0.40 −2.73 −3.51 Double chain-slip
e 0.92 0.77 0.54 −6.54 −2.61 Twist

The native contact metric (Fig. 3) splits the two largest funnels, with each having distinct fractions of native contacts (the mean fraction of native contacts for the funnels containing minima a and b is 0.91 and 0.81, respectively). Though the native contact metric differentiates between kinetically separated minima, it does not differentiate according to their energies. There are a number of unstable, high-energy minima with energetically unfavorable turns in the flexible N bead regions, but otherwise with almost all native contacts satisfied. Minimum a by definition satisfies all native contacts. Minima bd are all very similar according to this metric, with each satisfying ≈ 85% of possible native contacts, in spite of their geometries being relatively dissimilar.

Figure 3.

Figure 3

Metric disconnectivity graph of BLN-69, Ut = − 97.0ε, Nsp = 1611, with fraction of native contacts used as an order parameter. The color scheme and labels are as used in Figure 1. [Color figure can be viewed in the online issue, which is available at http://wileyonlinelibrary.com.]

The RMSd metric (Fig. 4) is capable of distinguishing between the different major funnels on the surface, with each having its own mean value of the metric (mean RMSd for the funnels containing minima a, c, and b 0.91, and 0.81, respectively). There is also some relation to the minima energy in the green and red funnels, where lower energy corresponds to RMSd metric values closer to 0. The RMSd metric differentiates the basin minima into four groups, with minimum c being most similar to the global minimum, which is as expected from a visual inspection of the structures.

Figure 4.

Figure 4

Metric disconnectivity graph of BLN-69, Ut = − 97.0ε, Nsp = 1611, with RMSd of each structure from the global minimum used as an order parameter in units of σ. The color scheme and labels are as used in Figure 1. [Color figure can be viewed in the online issue, which is available at http://wileyonlinelibrary.com.]

The PC1 metric (Fig. 5) splits the blue funnel (mean 3.29σ) from the red and green funnels, which sit on top of one another (mean 1.59σ and 1.60σ, respectively). The purple funnel lies at the boundary of the two, with a mean of −0.86σ. Minima a and c have almost identical values of Q1, while minima d, b, and e have increasingly dissimilar values. Given that PC1 corresponds to a chain-slip between the C and N termini, and that minima d, b, and e have chains that have shifted relative to the global minimum in the same direction, it gives confidence that PCA is capable of identifying structural features of the energy landscape.

Figure 5.

Figure 5

Metric disconnectivity graph of BLN-69,Ut = − 97.0ε, Nsp = 1611, with PC1 for Inline graphic, used as an order parameter in units of σ. The color scheme and labels are as used in Figure 1. [Color figure can be viewed in the online issue, which is available at http://wileyonlinelibrary.com.]

The PC2 metric (Fig. 6) does not reveal any obvious correlation between structure and energetics or kinetics, with no distinction made between the funnels and with the points reasonably evenly distributed along the order parameter. Thus, this PC corresponds to variations within all of the funnels rather than structural differences between the funnels.

Figure 6.

Figure 6

Metric disconnectivity graph of BLN-69, Ut = − 97.0ε, Nsp = 1611, with PC2 for Inline graphic, used as an order parameter in units of σ. The color scheme and labels are as used in Figure 1. [Color figure can be viewed in the online issue, which is available at http://wileyonlinelibrary.com.]

The progression of PC1 of Inline graphic from λ = −5.2σ to λ = 4.7σ (Fig. 7) corresponds to a chain-slip between the C and N termini. The progression of PC2 of Inline graphic from λ = −3.7σ to λ = −3.4σ (Fig. 8) corresponds to a twisting of the internal chain sequences.

Figure 7.

Figure 7

Different values of Q1 for Ut = −97.0ε projected onto the structure of the global minimum of BLN-69. For the global minimum, Q1 = 1.96σ. The beads are colored from red to blue (N-terminus to C-terminus). An animated version of this projection is available as Supporting Information. [Color figure can be viewed in the online issue, which is available at http://wileyonlinelibrary.com.]

Figure 8.

Figure 8

Different values of Q2 for Ut = −97.0ε projected onto the structure of the global minimum of BLN-69. For the global minimum, Q2 = −5.5σ. The beads are colored from red to blue (N-terminus to C-terminus). An animated version of this projection is available as Supporting Information. [Color figure can be viewed in the online issue, which is available at http://wileyonlinelibrary.com.]

The first embedded dimension of the Isomap metric (Fig. 9) clearly differentiates between all the colored funnels on the landscape. The mean value of the blue, purple, red, and green funnels are −10.23σ, −2.96σ, 4.12σ, and 8.80σ, respectively. The structure of the graph is similar to the PC1 graph (Fig. 5), with the order of the colored funnels and labeled low-energy minima along the metric axis matching. The agreement between these two metrics suggests that the first embedded dimension of the Isomap metric is fairly linear, and that, as with the PC1 metric, it corresponds to a chain-slip between the C and N termini.

Figure 9.

Figure 9

Metric disconnectivity graph of BLN-69, Ut = −97.0ε, Nsp = 1611, with the first embedded dimension for Inline graphic from Isomap analysis used as an order parameter in units of σ. The color scheme and labels are as used in Figure 1. [Color figure can be viewed in the online issue, which is available at http://wileyonlinelibrary.com.]

As with the PC2 metric, the disconnectivity graph for the seconded embedded dimesnion of the Isomap metric (Fig. 10) is difficult to interpret. The overlapping of the colored funnels suggests that the second embedded dimension corresponds to some structural variation common to each funnel.

Figure 10.

Figure 10

Metric disconnectivity graph of BLN-69,Ut = −97.0ε, Nsp = 1611, with the second embedded dimension for Inline graphic from Isomap analysis used as an order parameter in units of σ. The color scheme and labels are as used in Figure 1. [Color figure can be viewed in the online issue, which is available at http://wileyonlinelibrary.com.]

The information in Figures 5 and 6 is visualized on a single 3D metric disconnectivity graph of Inline graphic projected onto the plane of maximal variance in Figure 1. The plot shows Inline graphic for BLN-69 plotted against its first two principal components. Clear separation of minima ae is discernible in this 3D metric disconnectivity graph.

Figure 11.

Figure 11

3D metric disconnectivity graph of BLN-69,Ut = −97.0ε, Nsp = 1611, plotted with the first two principal components of Inline graphic, Q1 and Q2, used as order parameters in units of σ. The color scheme and labels are as used in Figure 1. [Color figure can be viewed in the online issue, which is available at http://wileyonlinelibrary.com.]

Gō-69

For Gō-69, a database containing 75,666 minima and 113,101 transition states was used. The first three Cartesian principal components for the sets, Inline graphic, connected to the global minimum below energy Ut, where −52.0ε ≥ Ut ≥ − 58.0ε are shown in Table 3.

Table 3.

The variance captured by the first three principal components in Cartesian, Inline graphic, and dihedral,Inline graphic, bases of the Nsp structures in the sublevel sets of minima below threshold energy Ut for Gō-69.

Ut Nsp Cartesian PCA Dihedral PCA
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
−52.0 5529 38.6 14.3 12.6 14.4 10.6 7.3
−53.0 4364 38.0 13.7 12.0 13.7 11.1 7.7
−54.0 3188 24.5 16.0 8.0 13.3 11.6 8.2
−55.0 2386 24.4 16.0 8.1 13.3 11.3 8.7
−56.0 1691 23.7 15.7 7.6 14.0 12.1 9.4
−57.0 1185 21.6 16.3 7.7 13.9 12.7 10.4
−58.0 739 21.5 15.3 7.7 14.9 12.9 11.4

As with the Cartesian PCs of BLN-69, PC1 captures significantly more of the variance than PC2 for all data sets considered. PC1 for the sublevel set of minima connected to the global minimum below Ut and Ut = − 53.0ε, have the largest fractional variance, though these large variances are due to a comparatively small number of unstable, high-energy minima in which one end of the chain has peeled away from the barrel and become unbound. For these systems, PC1 is no longer representative of the distribution of minima on Ur. For this reason, we consider the sublevel set of minima connected to the global minimum below Ut = − 52.0ε, for which all the minima have densely packed geometries. This set is represented as a disconnectivity graph in Figure 2. The results for the Isomap metric were fairly ambiguous for Gō, with no obvious pattern correlation between the embedded dimensions and the kinetic or energetic structure of the graph, so they have not been included in this work.

Figure 12.

Figure 12

Disconnectivity graph of Gō-69, Ut = − 54.0ε, Nsp = 3189.

As there is only a single funnel on the Gō-69 landscape, there are no large kinetic barriers for any of the metrics to differentiate between. The native contact metric (Fig. 13) is able to partially distinguish between the structures of high- and low-energy minima. As with BLN-69, minima across the whole energy range examined were able to satisfy nearly full native contacts, including unstable, high-energy minima with energetically unfavorable turns in the flexible N bead regions. The converse is not true; however, as all low-energy minima have a high number of native contacts and low numbers of native contacts are only found for high-energy minima.

Figure 13.

Figure 13

Metric disconnectivity graph of Gō-69, Ut = − 54.0ε, Nsp = 3189, with fraction of native contacts used as an order parameter.

For the RMSd metric (Fig. 14), similar behavior to BLN–69 is exhibited, albeit with a single funnel, with RMSd from the global minimum increasing with increasing energy.

Figure 14.

Figure 14

Metric disconnectivity graph of Gō-69, Ut = − 54.0ε, Nsp = 3189, with RMSd of each structure from the global minimum used as an order parameter in units of σ.

The metric disconnectivity graphs in Figure 15 use PC1 and PC2 as order parameters. In the PC1 graph, the majority of minima are centered about the global minimum, with a smaller number of high-energy minima extending to higher values of PC1. PC1 and the fraction of native contacts are well-correlated, as can be seen in the 3D metric disconnectivity graph (Fig. 16). The PC2 metric orders all but a few minima tightly in a rough column about Q2 ≈ 0.1. Those unstable, higher energy minima that are not in that column are structures with a partly unbound C-terminus chain-portion.

Figure 15.

Figure 15

Metric disconnectivity graphs of Gō-69, Ut = − 54.0ε, Nsp = 3189, with PC1 (left) and PC2 (right) for Inline graphic and Q2, used as order parameters in units of σ.

Figure 16.

Figure 16

Metric disconnectivity graph of Gō-69, Ut = − 54.0ε, Nsp = 3189, with PC1 for Inline graphic, and fraction of native contacts used as order parameters, and colored according to RMSd of each structure from the global minimum. Q1 and RMSd are in units of σ. [Color figure can be viewed in the online issue, which is available at http://wileyonlinelibrary.com.]

The use of color allows an additional metric to be included on a metric disconnectivity graph. For example, Figure 6 shows the PC1, native contact, and RMSd metrics for Gō-69.

Figure 7 shows the progression of PC1 ofInline graphic from λ = − 2.6σ to λ = 6.1σ, with a view along the axis of the barrel and corresponds to a sweeping action of the red chain-portion across the face of the white chain-portion.

Figure 17.

Figure 17

Different values of Q1 for Ut = − 54.0ε projected onto the structure of the global minimum of Gō-69. For the global minimum, Q1 = −0.6σ. The beads are colored from red to blue (N-terminus to C-terminus). An animated version of this projection is available as Supporting Information. [Color figure can be viewed in the online issue, which is available at http://wileyonlinelibrary.com.]

Figure 18 shows the progression of PC2 of Inline graphic from λ = −7.6σ to λ = 8.1σ, and corresponds to a “can-can” like sweeping motion of the free C-terminus end of the chain.

Figure 18.

Figure 18

Different values of Q2 for Ut = − 54.0ε projected onto the structure of the global minimum of Gō-69. For the global minimum, Q2 = −0.9σ. The beads are colored from red to blue (N-terminus to C-terminus). An animated version of this projection is available as Supporting Information. [Color figure can be viewed in the online issue, which is available at http://wileyonlinelibrary.com.]

Conclusions

In this study, we have demonstrated how an appropriate order parameter can elucidate the connection between structures in the energy landscape of BLN-69 and Gō-69, such as funnels, with certain structural motifs of the protein, including chain slips and twists in the turn regions. However, there are still shortcomings to the metrics proposed. Fraction of native contacts and RMSd metrics relied on having prior knowledge of the system. PCA provides a means to study systems without resorting to chemical intuition, but still assumes that the point cloud is approximately linear, and cannot be directly implemented for angular coordinates. Also, it considers all structures to be of equal importance, regardless of energy, leading to situations such as with Gō-69, where all the variance in structure was provided by a small number of high energy, unstable minima. Isomap allows one to discern low-dimensional, nonlinear manifolds in the data, and does not make the same assumptions of linearity as PCA. This is clearly a successful strategy, with Isomap distinguishing between the different kinetic structures on the landscape. A useful feature of PCA is the ease with which one can project the principal components back into the original space, making it possible to visualize what these directions correspond to. In principal, it should be possible to do the same with Isomap, projecting the approximate geodesics of the manifold back into the original space, though we have not implemented this in the work presented here. Other nonlinear dimensionality reduction methods exist in the literature, such as sketch-map,48 locally scaled diffusion map,46,49 and spectral methods,50 which are good candidate metrics for further study.

Equally, though the data produced by PyConnect is of a high-quality, the data analysis is still fairly qualitative, and further efforts are being taken to quantify the observations, such as using graph-theoretic techniques to analyze and compare tree graphs.51

Future work should also focus on investigating more realistic, small protein systems, such G-protein52 or cyclic peptides.5,22

Acknowledgments

The computations described in this paper were performed using the University of Birminghams BlueBEAR HPC service, which provides a High-Performance Computing service to the Universitys research community. See http://www.birmingham.ac.uk/bear for more details. The authors thank Prof. David Wales for helpful discussions, and Dr. Victor Ruhle and Dr. Jacob Stevenson for advice about implementation of our Python code.

Supporting Information

Additional Supporting Information may be found in the online version of this article.

Supporting Information

Download video file (1.3MB, mpg)

Supporting Information

Download video file (1.3MB, mpg)

Supporting Information

Download video file (1.6MB, mpg)

Supporting Information

Download video file (617.6KB, mpg)

References

  1. Wales DJ. Energy Landscapes: Applications to Clusters, Biomolecules and Glasses. Cambridge, UK: Cambridge Molecular Science: Cambridge University Press; 2003. [Google Scholar]
  2. Schön JC, Jansen M. Zeitschrift für Krist. 2001;216:307. [Google Scholar]
  3. Stillinger FH, Weber TA. Phys. Rev. A. 1982;25:978. [Google Scholar]
  4. Oakley MT, Wales DJ, Johnston RL. J. Phys. Chem. B. 2011;115:11525. doi: 10.1021/jp207246m. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Oakley MT, Johnston RL. J. Chem. Theory Comput. 2013;9:650. doi: 10.1021/ct3005084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Strodel B, Wales DJ. Chem. Phys. Lett. 2008;466:105. [Google Scholar]
  7. Johnston RL. Dalton Trans. 2003:4193. [Google Scholar]
  8. Wales DJ. Mol. Phys. 2004;102:891. [Google Scholar]
  9. Cox G, Berry RS, Johnston RL. J. Phys. Chem. A. 2006;110:11543. doi: 10.1021/jp0630572. [DOI] [PubMed] [Google Scholar]
  10. Becker OM. Proteins. 1997;27:213. doi: 10.1002/(sici)1097-0134(199702)27:2<213::aid-prot8>3.0.co;2-g. [DOI] [PubMed] [Google Scholar]
  11. García A, Blumenfeld R. Phys. D. 1997;107:225. [Google Scholar]
  12. Troyer JM, Cohen FE. Proteins. 1995;23:97. doi: 10.1002/prot.340230111. [DOI] [PubMed] [Google Scholar]
  13. Becker OM, Karplus M. J. Chem. Phys. 1997;106:1495. [Google Scholar]
  14. Hoffmann KH, Sibani P. Phys. Rev. A. 1988;38:4261. doi: 10.1103/physreva.38.4261. [DOI] [PubMed] [Google Scholar]
  15. Wales DJ, Miller MA, Walsh TR. Nature. 1998;394:758. [Google Scholar]
  16. Czerminski R, Elber R. J. Chem. Phys. 1990;92:5580. [Google Scholar]
  17. Middleton T, Hernández-Rojas J, Mortenson P, Wales D. Phys. Rev. B. 2001;64:184201. [Google Scholar]
  18. Wales DJ. Curr. Opin. Struct. Biol. 2010;20:3. doi: 10.1016/j.sbi.2009.12.011. [DOI] [PubMed] [Google Scholar]
  19. Wales DJ, Bogdan TV. J. Phys. Chem. B. 2006;110:20765. doi: 10.1021/jp0680544. [DOI] [PubMed] [Google Scholar]
  20. Evans DA, Wales DJ. J. Chem. Phys. 2003a;118:3891. [Google Scholar]
  21. Evans DA, Wales DJ. J. Chem. Phys. 2003b;119:9947. [Google Scholar]
  22. Oakley MT, Oheix E, Peacock AFA, Johnston RL. J. Phys. Chem. B. 2013;117:8122. doi: 10.1021/jp4043039. [DOI] [PubMed] [Google Scholar]
  23. Lempesis N, Boulougouris GC, Theodorou DN. J. Chem. Phys. 2013;138:12A545. doi: 10.1063/1.4792363. [DOI] [PubMed] [Google Scholar]
  24. Sibani P, Schön JC. Euro. Phys. Lett. 1993;22:479. [Google Scholar]
  25. Komatsuzaki T, Hoshino K, Matsunaga Y, Rylance GJ, Johnston RL, Wales DJ. J. Chem. Phys. 2005;122:84714. doi: 10.1063/1.1854123. [DOI] [PubMed] [Google Scholar]
  26. Rylance GJ, Johnston RL, Matsunaga Y, Li C-B, Baba A, Komatsuzaki T. Proc. Natl. Acad. Sci. USA. 2006;103:18551. doi: 10.1073/pnas.0608517103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Smeeton LC, Oakley MT, Johnston RL. 2014. PyConnect, available at: https://github.com/lsmeeton/pyconnect. Accessed on 17 March 2014. [DOI] [PMC free article] [PubMed]
  28. Wales DJ. Mol. Phys. 2002;100:3285. [Google Scholar]
  29. Wales DJ. PATHSAMPLE: A Program for Refining and Analyzing Kinetic Transition Networks. 2013. available at: http://www-wales.ch.cam.ac.uk/PATHSAMPLE/. Accessed on 29 July 2011.
  30. Honeycutt JD, Thirumalai D. Proc. Natl. Acad. Sci. USA. 1990;87:3526. doi: 10.1073/pnas.87.9.3526. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Thirumalai D, Guo Z. Biopolymers. 1995;35:137. [Google Scholar]
  32. Berry RS, Elmaci N, Rose JP, Vekhter B. Proc. Natl. Acad. Sci. USA. 1997;94:9520. doi: 10.1073/pnas.94.18.9520. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Kim S-Y. J. Chem. Phys. 2010;133:135102. doi: 10.1063/1.3494038. [DOI] [PubMed] [Google Scholar]
  34. Ueda Y, Taketomi H, Gō N. Biopolymers. 1978;17:1531. [Google Scholar]
  35. Kim J, Keyes T. J. Phys. Chem. B. 2008;112:954. doi: 10.1021/jp072872u. [DOI] [PubMed] [Google Scholar]
  36. Hunter JD. Comput. Sci. Eng. 2007;9:90. [Google Scholar]
  37. Pérez F, Granger BE. Comput. Sci. Eng. 2007;9:21. [Google Scholar]
  38. Miller MA. DISCONNECT: A Program for Producing Disconnectivity Graphs. 2013. available at: http://www-wales.ch.cam.ac.uk/DISCONNECT/. Accessed on 29 July 2011.
  39. Wang J, Oliveira RJ, Chu X, Whitford PC, Chahine J, Han W, Wang E, Onuchic JN, Leite VBP. Proc. Natl. Acad. Sci. USA. 2012;109:15763. doi: 10.1073/pnas.1212842109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Kabsch W. Acta Crystallogr. A. 1978;34:827. [Google Scholar]
  41. Shlens J. Systems Neurobiology Laboratory, Salk Institute for Biological Studies. 2005. available at http://rieke-server.physiol.washington.edu/People/Fred/Classes/545/shlens-pca.pdf. Accessed 23rd March 2012.
  42. Riccardi L, Nguyen PH, Stock G. J. Chem. Theory Comput. 2012;8:1471. doi: 10.1021/ct200911w. [DOI] [PubMed] [Google Scholar]
  43. Altis A, Nguyen PH, Hegger R, Stock G. J. Chem. Phys. 2007;126:244111. doi: 10.1063/1.2746330. [DOI] [PubMed] [Google Scholar]
  44. McLachlan A. Biopolymers. 1984;23:1325. doi: 10.1002/bip.360230716. [DOI] [PubMed] [Google Scholar]
  45. Tenenbaum JB, de Silva V, Langford JC. Science. 2000;290:2319. doi: 10.1126/science.290.5500.2319. [DOI] [PubMed] [Google Scholar]
  46. Das P, Moll M, Stamati H, Kavraki LE, Clementi C. Proc. Natl. Acad. Sci. USA. 2006;103:9885. doi: 10.1073/pnas.0603553103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. J. Mach. Learn. Res. 2011;12:2825. [Google Scholar]
  48. Ceriotti M, Tribello GA, Parrinello M. J. Chem. Theory Comput. 2013;9:1521. doi: 10.1021/ct3010563. [DOI] [PubMed] [Google Scholar]
  49. Rohrdanz MA, Zheng W, Maggioni M, Clementi C. J. Chem. Phys. 2011;134:124116. doi: 10.1063/1.3569857. [DOI] [PubMed] [Google Scholar]
  50. Cazals F, Chazal F, Giesen J. In: Nonlinear Computational Geometry. Emiris IZ, Sottile F, Theobald T, editors. New York: Springer; 2010. [Google Scholar]
  51. Dorogovstev SN. Lectures on Complex Networks. Oxford University Press; New York; 2010. [Google Scholar]
  52. Wales DJ, Head-Gordon T. J. Phys. Chem. B. 2012;116:8394. doi: 10.1021/jp211806z. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Download video file (1.3MB, mpg)

Supporting Information

Download video file (1.3MB, mpg)

Supporting Information

Download video file (1.6MB, mpg)

Supporting Information

Download video file (617.6KB, mpg)

Articles from Journal of Computational Chemistry are provided here courtesy of Wiley

RESOURCES