Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2001 May 8;98(11):6098–6103. doi: 10.1073/pnas.101030498

Top-down free-energy minimization on protein potential energy landscapes

Bruce W Church 1, David Shalloway 1,*
PMCID: PMC33428  PMID: 11344256

Abstract

The hierarchical properties of potential energy landscapes have been used to gain insight into thermodynamic and kinetic properties of protein ensembles. It also may be possible to use them to direct computational searches for thermodynamically stable macroscopic states, i.e., computational protein folding. To this end, we have developed a top-down search procedure in which conformation space is recursively dissected according to the intrinsic hierarchical structure of a landscape's effective-energy barriers. This procedure generates an inverted tree similar to the disconnectivity graphs generated by local minima-clustering methods, but it fundamentally differs in the manner in which the portion of the tree that is to be computationally explored is selected. A key ingredient is a branch-selection algorithm that takes advantage of statistically predictive properties of the landscape to guide searches down the tree branches that are most likely to lead to the physically relevant macroscopic states. Using the computational folding of a β-hairpin-forming peptide as an example, we show that such predictive properties indeed exist and can be used for structure prediction by free-energy global minimization.


New methods have been developed in recent years for using the global properties of protein potential energy landscapes to analyze overall thermodynamic and kinetic properties of protein ensembles. The method pioneered by Bryngelson and Wolynes (1) uses order parameters to characterize global dynamic properties such as funneling (28). Other methods generate hierarchical inverted trees or disconnectivity graphs, whose topologies reflect selected aspects of landscape structure, by finding and hierarchically grouping local minima according to metrics such as Euclidean distance in conformation space (913), potential energy barrier height (1423), or effective-energy barrier (peak of a potential-of-mean-force) height (2428). Different methods for finding local minima have been used for this purpose, such as steepest descent quenching from molecular dynamics simulations (10, 11, 14, 15), and the threshold method, which progressively extends a search of the basin surrounding a known local minimum until a new minimum of lower energy is found (2123). By grouping the catchment regions [steepest-descent minimization starting from any point in a catchment region leads to its local minimum (29)] of the local minima, a disconnectivity graph defines a variable-scale decomposition of conformation space: The “root” (the top) is the complete region containing all conformation space, the “leaves” (the bottom) are the local minima catchment regions, and each branch at a given level parameter corresponds to an extended region that is connected by virtue of satisfying a level inequality by using the graph's metric.

The branches of trees constructed with the effective-energy metric (2428), which are parameterized by thermal energy or temperature T (in units where kB is unity), correspond to the macroscopic thermodynamic states (macrostates) of the system—i.e., to conformation space regions that kinetically confine the system at T (28, 30, 31). (If entropic effects are not too important, the macrostates also can be approximated as branches of trees constructed by using the potential barrier metric.) The top is reached when T exceeds all effective-energy barriers, the leaves (local minima) are at T = 0 and the level of experimental relevance is (for biological problems) T = Tphys ∼ 310 °K ∼ 0.6 kcal/mol. The top macrostate contains the entire space, and macrostates decrease in size and increase in number with decreasing T until they correspond to local minima catchment regions at low T.

It also may be possible to use such hierarchical trees for another purpose—to efficiently guide searches for the macrostates of lowest free-energy (the physically relevant subset of macrostates, PRSM) at Tphys; i.e. for global free-energy minimization. Our focus here is the possibility of developing a top-down tree-search method for global minimization that progressively subdivides conformation space and explores only selected branches as T is lowered toward Tphys. Although this type of search implicitly relies on the existence of an underlying tree structure, most of the tree is never computed: the utility of a top-down method lies in its ability to find a global minimum while computing very little of the tree. This requires a branch selection algorithm that can choose, during the annealing process itself, the branches that are most likely to lead to low free-energy macrostates at Tphys. Branches that are not selected are not explored, resulting in computational efficiency, but at the risk of excluding important regions from subsequent search.

Local minima-based methods are not top-down in this context because they all start from local minima, progress by finding more local minima, and build the higher (and larger) branches of a disconnectivity graph by aggregating local minima (see ref. 21 for an example of a global minimization method that moves from local minima to local minima using the “threshold” algorithm and ref. 20 for a method of building a disconnectivity graph starting with knowledge of the global minimum). These methods do not attempt to select the most promising branches by using a branch-selection algorithm. In contrast, a top-down search to nonzero Tphys does not seek any local minima. Even if a top-down search were used to find local minima by continuing the search down to T = 0, only a few local minima would be computed at the very end of the search process.) Instead, it subdivides branches by using distance-geometry inequalities. Most importantly, this procedure provides a context in which a branch selection algorithm can select the most promising descendent branches for further search.

A top-down search requires: (i) an algorithm for recursive subdivision of macrostates/branches, and (ii) an effective branch selection algorithm. We previously have described a method for recursive subdivision based on the identification of effective-energy barriers during a computational cooling process that starts from the top macrostate (25, 26). Stochastic sampling of the landscape is algorithmically monitored during cooling for the appearance of effective-energy barriers that would trap sampling in subregions (i.e., break computational ergodicity), and thus reduce sampling efficiency, if temperature were further lowered (see ref. 32 for an interesting discussion of this problem). Before T is lowered to a point where trapping occurs, the parent macrostate is subdivided into child macrostates that are separated by the effective-energy barrier. Independent search processes can then be spawned (e.g., a coarse-grained parallel computer) for each child that do not need to cross the barrier. This maintains computational ergodicity and efficient sampling at all T at the expense of an increased number of separate search processes.

The second task, effective branch selection, will only be possible if there are inheritance properties of the macrostates at T > Tphys that can be used to partially predict that their descendents will be PRSM members. This is not guaranteed, and it is easy to construct landscapes whose hierarchies have no predictive power (33). Yet it seems plausible that predictive properties exist and can be used to hierarchically solve the global minimization problem with reduced computational cost. Here we empirically explore this key question: Using a β-hairpin-forming octapeptide as an example, we show that this hypothesis is true and compare the utility of different branch-selection algorithms. We conclude that hierarchical top-down searching can be a valuable tool in computational structure prediction.

Methods

We fix bond lengths and angles and sample conformation space by using the protein backbone and side-chain torsion angles, denoted Ω, as variables.

Recursive Computation of Window Functions in Distance Variables.

The fundamentals already have been described (2426, 30); we summarize here: Each macrostate α is specified by a macrostate window function wα(T; Ω), which is ∼1 within the macrostate and ∼0 outside. Here it is adequate to use “hard” window functions that are either 1 or 0, and to assume that the window functions only change discontinuously (i.e., by subdivision) and are otherwise constant between bifurcation temperatures. The top macrostate, 0, includes all conformation space, so w0 = 1. At lower T there are multiple macrostates {α}T, whose window functions satisfy ∑αwα(T; Ω) = 1, ∀Ω. That is, they completely dissect conformation space.

As T is lowered through a descending sequence of temperatures {Ti}, each of the macrostates of interest are separately tested for bifurcation (see below). When a macrostate α divides into children β and γ, wα is divided into wβ and wγ:

graphic file with name M1.gif 1
graphic file with name M2.gif 2

where Θβα(Ω) and Θγα(Ω) equal 0 or 1, and Θβα(Ω) + Θγα(Ω) = 1. Recursive application of Eqs. 1 and 2 yields window functions of the form

graphic file with name M3.gif 3

where δ1 is the parent of δ, δ2 is the parent of δ1, and so on up to δN, which is a child of the top macrostate.

Detecting Bifurcations.

The equilibrium probability distribution within macrostate α is

graphic file with name M4.gif

where V(R) is the potential in Cartesian coordinates R. [The Jacobian of the change of variables from R to Ω is ignored because it is independent of Ω and factors out of all conformation space integrations (34).] To detect bifurcations of wα we first approximate pα(T; Ω) as a sum of localized distributions whose number and character (i.e., zero, first and second moments) are determined by characteristic packet equations as described in refs. 24 and 30. It is then simple to define window functions that separate these regions (24, 30). However, applying the packet equations in multidimensional form can be costly. Instead we apply them to one-dimensional effective-energy reaction coordinates that are derived by separately projecting pα(T; Ω) onto each of the interatomic distance variables rij(Ω) ≡ |r⃗i(Ω) − r⃗j(Ω)|, where r⃗i is the 3-vector Cartesian coordinate of atom i. Fig. 1 displays one such projected probability distribution, pInline graphic(T; r), at two different temperatures. At T = 1.1 kcal/mol the packet equations have only a single solution, so there is only one macrostate, which contains the entire region. When T is lowered to ∼0.6 kcal/mol, pInline graphic(T; r) becomes sufficiently bimodal so that two independent child solutions appear corresponding to the two separate concentrations of probability. This signals the bifurcation of the macrostate into two children. The window functions are then defined by Eqs. 1 and 2 with the approximation

graphic file with name M7.gif 4
graphic file with name M8.gif 5

where rInline graphic is the value of rij corresponding to the node of pInline graphic (see Fig. 1) and θ is the Heaviside step function.

Figure 1.

Figure 1

Probability histograms for BH8 at high and low T. The mean structures of the two children after bifurcation are shown; the double-headed arrow identifies the bifurcating distance.

Because the distance variables are highly redundant, we expect that this procedure will identify each macrostate as an isolated concentration of probability in at least one projected representation. In effect, the distance variables provide a large set of possible reaction coordinates that can be examined for confining effective-energy barriers. No artifactual barriers will be introduced by this approximation, although it is possible (though unlikely) that an effective-energy barrier could be missed. The danger of missing an effective-energy barrier is that computational ergodicity may be broken within the (spuriously undivided) macrostate. We have not yet encountered any case in which this has occurred.

The use of the rij as reaction coordinates for analyzing metastability implicitly assumes that probability equilibrates rapidly in the transverse directions. This will not be true for all rij, but should be true when there is an effective-energy barrier in the coordinate. Because this is the only case in which the ij projection will be used, the assumption is self-consistent. To maintain dynamical significance, the projection must account for the fact that the rij are nonlinear functions of Ω. This is described in ref. 26.

We sampled pα(T; Ω) within each macrostate α by using the Metropolis Monte Carlo method with an anisotropic multivariate wrapped Gaussian transition function (35, 36) and the J-walking algorithm (37). Distance-variable probability histograms describing pInline graphic(T; r) were computed (typically with 20 bins) for all the interatomic pairs using every tenth sample point. The fractional energy-fluctuation autocorrelation between sample points decayed to ∼0.5 after 50 steps. Computing these histograms added little cost compared with the cost of evaluating V[R(Ω)], because the interatomic distances were already required to compute the potential. The characteristic packet equations were solved by using trapezoidal integration over the histogram bins.

Computing Macrostate Thermodynamic Properties.

Intensive properties, such as mean energy,

graphic file with name M12.gif 6

were computed from the Metropolis Monte Carlo sampling at each T = Ti.

Macrostate entropy Sα(T) is extensive and can not be computed in this manner. Instead, we computed it and macrostate free energy Fα(T) as follows: We fixed the (classical) arbitrary entropy scale by setting F0(Thi) = 0 (i.e., for the top macrostate). When it bifurcated at temperature Tβγ the free energies of its children, β and γ, were calculated from their probability ratio pβ/pγ. In accord with Eqs. 4 and 5, this is

graphic file with name M13.gif 7

where rij is the distance variable in which the bifurcation occurs. Then, Fβ and Fγ at Tβγ were calculated by using

graphic file with name M14.gif 8

and the conservation of probability relationship

graphic file with name M15.gif 9

Sβ and Sγ were then calculated by using the thermodynamic relationship

graphic file with name M16.gif 10

As T decreased, the mean energies were updated by using Eq. 6. Entropies were updated by (discretely) integrating

graphic file with name M17.gif 11

The derivative of Eα in Eq. 11 was calculated by finite difference. While the derivative could be calculated from the potential energy variance, it is more accurate to use the finite difference because of its faster convergence. Moreover, when the finite difference is used, computational errors in Eα at different T largely cancel when Eq. 11 is integrated.

Annealing and Branch Selection.

The algorithm is summarized in Fig. 2. The cooling schedule was empirically chosen to be slow enough so that each macrostate bifurcated before sampling ergodicity was broken:

graphic file with name M18.gif 12

Each member of the set of macrostates being tracked, ℳ(T), was sampled and tested for bifurcations at each Ti as described above by using 128,000 sample points, a value that gave ΔEα(T)/T ∼ 0.05 for all macrostates (where ΔEα is the standard error of the mean.) To handle multifurcations, after each bifurcation, the children were resampled and tested for additional bifurcations at the same temperature. The thermodynamic parameters of the children were computed by partitioning the parental sample points. The parent was replaced in ℳ(T) by its children. The branch-selection algorithm then was applied to reduce the number of macrostates in ℳ(T) according to the specified criterion.

Figure 2.

Figure 2

Algorithm. ℳ(T) is the set of macrostates that are being tracked at temperature T; α refers to a macrostate and β and γ to its children. ℳ(T) is first expanded by identifying the subset of macrostates that undergo bifurcation, ℳ′, and recursively bifurcating them and their children until no bifurcations remain. Bifurcations expand ℳ(T); the branch-selection algorithm is used to prune macrostates ζ before T is lowered according to the cooling schedule.

Number of Contacts.

We defined the number of nonlocal contacts NC,α(T) for macrostate α as the number of rij for pairs separated by more than one torsion angle that had

graphic file with name M19.gif 13

where rInline graphic is the van der Waals contact distance for pair ij. The probability-weighted number of contacts

graphic file with name M21.gif 14

is a T-dependent estimator of overall compactness.

Estimated Number of Macrostates.

Not all macrostates were computed, but their T-dependent total number NM(T) was estimated (assuming that the rate, in T, of bifurcation is similar for the unobserved and observed branches) by calculating the observed average rate of bifurcation, g(T) ≡ d log(NInline graphic)/dT, and integrating

graphic file with name M23.gif 15

using the boundary condition NM(Thi) = 1.

Computation.

Tree analysis was performed with coarse-grained parallelization on a cluster of Pentium processors using a master-slave configuration. Slaves computed individual macrostate branches independently as scheduled by the master according to the branch-selection algorithm. Therefore, interprocessor communication was minimal, and parallelization efficiency was essentially 100%.

Results and Discussion

Top-Down Discovery of the BH8 Macrostate Tree.

To provide a model for examining different branch-selection algorithms, an extensive macrostate tree was generated by using the top-down method for the BH8 octapeptide (ITVNGKTY), a peptide designed to fold into a β-hairpin (38). We used the ECEPP/3 potential (39), an all-atom potential with fixed bond lengths and bond angles, in torsion-angle coordinates, Ω, augmented by empirical solvation based on solvent-accessible surface area (40). As a benchmark for subsequent analysis, a large number of macrostates (all those having equilibrium probability ≥10−3) were computed, even though most would not be computed in an actual free-energy global minimization run guided by a branch-selection algorithm.

The Gibbs–Boltzmann distribution was sampled by using a modified Metropolis algorithm (see Methods), starting at a temperature (Thi = 35 kcal/mol) well above the maximum ECEPP/3 barriers to individual torsion-angle rotation. Macrostate thermodynamic properties were computed as T was gradually lowered (see Eq. 12). In addition, the effective-energy functions that resulted from projecting the Gibbs–Boltzmann probability distribution onto each of the interatomic-distance variables were algorithmically monitored for signals of macrostate bifurcation: When a bifurcation temperature was reached at which an effective-energy barrier appeared that satisfied the bifurcation conditions (which imply macrostate metastability), the macrostate was subdivided by a distance-variable inequality (see Methods for details). By this means, large effective-energy barriers were detected during sampling at high T, and small barriers were detected at lower T.

For example, Fig. 1 shows the projected probability histograms for rThr2Cγ−Thr7Cγ, the distance between the Cγ atoms of Thr-2 and Thr-7 for one macrostate at two different temperatures. The histogram is effectively unimodal at T = 1.1 kcal/mol (i.e., the small dip in probability between the modes does not restrict the transition rate), but is sufficiently bimodal at its bifurcation temperature, 0.6 kcal/mol, to satisfy the bifurcation conditions. Thus, at this temperature the parent macrostate α was subdivided, using rThr2Cγ−Thr7Cγ as a reaction coordinate, into children β and γ centered around rThr2Cγ−Thr7Cγ ∼4.5 Å and rThr2Cγ−Thr7Cγ ∼8 Å.

Because of the partial redundancy between different distance variables [resulting from the fact that there are O(N2) distance variables but only O(N) degrees of freedom, where N is the number of atoms], a single effective-energy barrier (i.e., having a unique location in the internal coordinate or Cartesian space) often will manifest as probability gaps in multiple distance variables. When this happens, the bifurcating reaction coordinate will be the one that first satisfies the bifurcation conditions. But this choice is somewhat arbitrary (and could be influenced by numerical details). A different choice would result in a slightly different boundary definition. However, because the boundaries are only used for coarse-graining, this is not a problem: small differences will only affect the negligible probabilities located in the transition regions (30, 41) between macrostates and will not affect thermodynamic properties. For example, the bifurcation illustrated in Fig. 1 happened to use the Thr-2 Cγ–Thr-7 Cγ distance, but there would not have been a significant difference in the macrostate boundary (when projected onto the internal coordinate space) if the Thr-2 Cβ–Thr-7 Cβ distance had been used instead. Because the fluctuations in rThr2Cγ−Thr7Cγ and Inline graphic are correlated, the effective-energy barriers in both distance variables are simultaneously removed when the parental macrostate is subdivided by using either distance variable.

Multiple recursive dissections as T was lowered yielded macrostate specifications that were products of inequality constraints involving multiple distance variables (see Eq. 3). Computational sampling of the macrostates converged rapidly at all T because the bifurcation procedure ensured that there were never any significant barriers to sampling within a single macrostate.

Properties of the Macrostate Probability Tree.

We first analyzed the hierarchical organization of the macrostates by plotting the macrostate probabilities pα(T) as a function of T (Fig. 3). Each continuous line segment is a macrostate branch, and each macrostate bifurcation corresponds to a fork. This tree provides significant insight into the underlying structure of the potential energy landscape and illustrates some features that will be common in all cases: (i) At high T (here, >25 kcal/mol) a single macrostate contains all the ensemble probability. (ii) The number of macrostates, NM(T), computed by integrating Eq. 15, increases geometrically with decreasing T (see Inset, Fig. 3), and probability is distributed between many macrostates in the temperature midrange. (iii) NM(T) continues to increase with decreasing T until, as T → 0, each steepest-descent catchment region corresponds to a macrostate. (iv) As T → 0, the macrostate that contains the energy global minimum (whose trajectory is the black line in Fig. 3) captures all the probability. This will not necessarily correspond to the macrostate that contains the most probability at Tphys (i.e., the folded state, if there is one). (v) The PRSM is not huge for peptides and proteins that assume a folded state. For example, 93% of the BH8 probability at Tphys is contained in only four macrostates.

Figure 3.

Figure 3

Probability macrostate tree for BH8. The trajectories leading to PRSM members are highlighted. The peptide figures show the average macrostate conformations at three different temperatures on the trajectory that leads to the most probable macrostate (i.e., the native state). The PRSM trajectories are darkened. The inset plots the T-dependent number of macrostates NM(T) (Eq. 15) and the probability-weighted mean number of nonlocal atom pairs in contact 〈NC〉(T) (Eq. 14).

The nonuniform variation of NM with T (Fig. 3 Inset) suggests that the structure of the BH8 potential energy landscape can be qualitatively classified into four different temperature regimes. Analyzing the bifurcations in these regimes indicates that: (i) The burst of bifurcations at T ∼ 25 kcal/mol is associated with energy barriers imposed by the rigid covalent geometry used by ECEPP/3. These only affect the local structure of the molecule. (ii) NM increases only slowly until T ∼ 4 kcal/mol where attractive forces become strong enough to initiate collapse with a consequent increase in the mean number of atom pairs in contact, 〈NC〉 (see Eq. 14). Below this “transition temperature” the van der Waals attraction becomes significant and probability can get trapped in an increasing number of dynamic catchment regions. (iii) The rate of increase decreases for T ∼ 1.5 kcal/mol, probably because most van der Waals contacts have already been made.

Branch-Selection Algorithms.

All branches having pα ≥ 10−3 are plotted in Fig. 3 for reference. But as discussed above, our goal is to determine whether the PRSM can be found while computing only a small fraction of the branches selected during the annealing process. The BH8 macrostate tree provides an illustrative example: Because the PRSM members have the lowest Fα(Tphys), it was plausible a priori that PRSM ancestors might be identified at high T as the states of lowest Fα(T). But Fig. 3 shows that, for 1.4 < T < 2 kcal/mol, some of the PRSM trajectories pass through temperature regions where they have very small probability (i.e., high Fα). Therefore, they would not be followed by a low-Fα branch-selection algorithm (i.e., which kept only a fixed number of the lowest Fα macrostates at each T) unless a large number of trajectories were followed.

We examined two additional branch selection strategies, branch selection based on: (i) low Eα (low energy), which ignores entropy in making predictions, and (ii) low Ξα (low exergy, ΞαEαTphysSα; ref. 42), which gives some weight to entropy, but not as much as does the low-Fα strategy. To compare the predictive power of these three strategies, we determined the trajectory subsets that emerged when only 30 trajectories were followed by using either Fα, Eα, or Ξα as the branch-selection criterion (dark trajectories in Fig. 4 Left, Middle, and Right, respectively). In this test, success is measured by the number of high-probability (at Tphys) macrostates that are tracked by each branch-selection method. Clearly, the exergetic selection has a much stronger propensity to track macrostates that have low free energy at Tphys; thus it provides a superior branch-selection algorithm to both energy and free energy. Similar analysis of the pentapeptide Met-enkephalin showed that Ξα is a good predictor for this case as well (data not shown).

Figure 4.

Figure 4

Branch-selection strategies. The BH8 macrostate tree shown in Fig. 3 is replotted in gray. The subsets of 30 trajectories that were followed by a trajectory-selection algorithm using free energy Fα(T) (a), mean energy Eα(T) (b), and exergy Ξα(T) (c) are darkened. The PRSM corresponds to the four trajectories with lowest Ξα (equivalently, lowest Fα) states at Tphys.

Next Steps.

This study provides proof of principle for top-down free-energy minimization, but efficiency will have to be increased for much larger problems. Many improvements are possible. For example, instead of monitoring effective-energies for each of the distance variables, because distance variable redundancy grows with the number of atoms, for large systems it should be sufficient to consider only a representative subset of distance variables that includes an appropriately selected mix of small and large distances. For example, including all atom pairs that were separated by 1, 2, 4, … covalent bonds would result in a representative subset having only O(N log N) distance variables, where N is the number of atoms. It also may be possible to use smoothing methods (33) to more rapidly approximate branches at higher temperatures, and to combine these with principal-component and -coordinate methods (13, 17) to eliminate inessential degrees of freedom. And the accuracy (and cost) of numerical integrations at T > Tphys can be adaptively relaxed consistent only with the need to detect bifurcations and apply the branch-selection criteria.

The most significant efficiency increases probably will come from improved branch-selection strategies: The ones tested here simply compare macrostate properties at the same T, pick those of highest rank, and discard the others. But this all-or-nothing approach tends to concentrate excessively on closely related macrostates and does not allow for reexamination of previously discarded macrostates. It should be possible for a more sophisticated algorithm to probabilistically allocate computational effort and to dynamically adjust the balance between depth and breadth of search while simultaneously searching macrostates to different depths in T. In addition, thermodynamic branch-selection parameters can be augmented with database-derived empirical parameters such as Ramachandran and secondary structure propensities that might be more powerful at high T. It also will be interesting to explore the possibility of making the potentials themselves T-dependent to improve branch selection without affecting the Tphys behavior.

Potentials with empirical solvation, such as used here, have been successful in predicting the structures of peptides containing up to about 60 amino acids (43). And they are currently useful for perturbative folding problems such as refining experimentally-determined structures or homology modeling predictions. Although we have used an ab initio problem as an example, the top-down approach also can be used for perturbative folding. As with all potential energy-based methods, the accuracy of top-down searches ultimately will depend on the development of improved potentials. The hierarchical approach can assist this development by providing a such developments in two ways. In addition to helping perform the global minimization required to determine the predictions of a potential function, macrostate trees may help by providing a meaningful measure of the distance between the experimental and potential-predicted macrostates: The temperature at which ancestors first diverge (i.e., analogous to the time of evolutionary divergence) may be more valuable than conventional rms deviation measures for systematically improving potential energy performance.

Although not the main focus here, we note that like hierarchical methods based on grouping local minima macrostate analysis provides the information needed for a thermodynamic analysis of ensembles and for an approximate master equation description of the dynamics of folding and conformational change. It has the advantage that branch selection can be used to restrict computational effort to the branches of physical relevance (i.e., significant probability at Tphys). Moreover, in the top-down approach there is no need to separately search for saddle points and reaction coordinates between all pairs of catchment regions because the bifurcation temperatures estimate the barrier activation free energies and if more accurate results are needed, the bifurcating distance variables and (the negative logarithms of) their probability histograms provide reaction coordinates and effective-potential functions, respectively, for computing isothermal folding at Tphys. By projecting the kinetic description in the macrostate basis onto a reduced representation by using order parameters such as density and number of native contacts, it should be possible to examine the dynamics predicted by the macrostate tree for funneling and related properties (3, 8).

It is possible that the hierarchical inheritance property of exergy that makes it a useful branch-selection parameter is a general statistical consequence of the fact that the peptide potential energy function is a sum of a large number of semi-independent terms; or this may depend on other specific properties. Nymeyer et al. (44), comparing three lattice model systems, recently showed that the number of native contacts was a useful order parameter for dynamical folding-rate calculations only when the potential energy landscape possessed funneling properties defined by the relationship between the glass and folding temperatures. It seems plausible that funneling properties would favor accurate branch-selection, though it is not evident that they are required for it. Hierarchical analysis of many systems will be needed to study this and to determine whether there are prerequisites, beyond semiseparability of the potential, that are required for effective branch selection.

In summary, top-down macrostate tree analysis provides a potentially advantageous alternative to local minimum-based approaches for analyzing the hierarchical structure of protein energy landscapes. It naturally includes entropic as well as energetic effects and can identify and exploit hidden hierarchical properties in new types of global search procedures. Developing further improved branch-selection algorithms and understanding how they perform as protein size increases are important future tasks.

Acknowledgments

We thank J. Gans for countless valuable discussions, S. Berry for a critical review of the manuscript, R. Elber, J. Straub, D. Thirumalai, and P. Wolynes for comments on the manuscript, and National Science Foundation Grant CCR-9988519 and the Intel Corp. for financial support.

Abbreviation

PRSM

physically relevant subset of macrostates

Footnotes

This paper was submitted directly (Track II) to the PNAS office.

Within the subclass of methods that use an energetic or temperature metric, three different measures have been used: (i) T = ΔE/kB, where ΔE is the transition state energy barrier (1423), (ii) the T at which the transition rate equals a specified value (28), and (iii) the T at which the transition rate equals a specified fraction of the macrostate relaxation rate (2426). They all generate roughly similar trees, but differ in detail; each has advantages depending on the intended application. Any one could be used for global minimization, but we find the third to be most convenient. In distinction, the best hierarchy for analyzing dynamics by a master equation would be one that used inter-macrostate transition rates at Tphys as a metric. Because of the complicated nonlinear relationship between transition rates and temperature [when the temperature dependence of ΔE(T) and ΔS(T) are considered], this method may differ in detail from the hierarchies studied to date.

References

  • 1.Bryngelson J D, Wolynes P G. Proc Natl Acad Sci USA. 1987;84:7524–7528. doi: 10.1073/pnas.84.21.7524. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Boczko E M, Brooks C L. Science. 1995;269:393–396. doi: 10.1126/science.7618103. [DOI] [PubMed] [Google Scholar]
  • 3.Wolynes P G, Onuchic J N, Thirumalai D. Science. 1995;267:1619–1620. doi: 10.1126/science.7886447. [DOI] [PubMed] [Google Scholar]
  • 4.Shakhnovich E I. Curr Opin Struct Biol. 1997;7:29–40. doi: 10.1016/s0959-440x(97)80005-x. [DOI] [PubMed] [Google Scholar]
  • 5.Plotkin S S, Wang J, Wolynes P G. Phys Rev E. 1996;53:6271–6296. doi: 10.1103/physreve.53.6271. [DOI] [PubMed] [Google Scholar]
  • 6.Plotkin S S, Wang J, Wolynes P G. J Chem Phys. 1997;106:2932–2948. [Google Scholar]
  • 7.Plotkin S S, Wang J, Wolynes P G. Physica D. 1997;107:322–325. [Google Scholar]
  • 8.Onuchic J N, Luthey-Schulten Z, Wolynes P G. Annu Rev Phys Chem. 1997;48:545–600. doi: 10.1146/annurev.physchem.48.1.545. [DOI] [PubMed] [Google Scholar]
  • 9.Stillinger F H, Weber T A. Science. 1984;225:983–989. doi: 10.1126/science.225.4666.983. [DOI] [PubMed] [Google Scholar]
  • 10.Noguti T, Gō N. Proteins. 1989;5:97–103. doi: 10.1002/prot.340050203. [DOI] [PubMed] [Google Scholar]
  • 11.Troyer J M, Cohen F E. Proteins. 1995;23:97–110. doi: 10.1002/prot.340230111. [DOI] [PubMed] [Google Scholar]
  • 12.Daura X, van Gunsteren V F, Mark A. Proteins Struct Funct Genet. 1999;34:269–280. doi: 10.1002/(sici)1097-0134(19990215)34:3<269::aid-prot1>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
  • 13.Elmaci N, Berry R S. J Chem Phys. 1999;110:10606–10622. [Google Scholar]
  • 14.Czerminski R, Elber R. J Chem Phys. 1990;92:5580–5601. [Google Scholar]
  • 15.Becker O M, Karplus M. J Chem Phys. 1997;106:1495–1517. [Google Scholar]
  • 16.Becker O M. Proteins Struct Funct Genet. 1997;27:213–226. doi: 10.1002/(sici)1097-0134(199702)27:2<213::aid-prot8>3.0.co;2-g. [DOI] [PubMed] [Google Scholar]
  • 17.Becker O M. J Comp Chem. 1998;19:1255–1267. [Google Scholar]
  • 18.Wales D J, Miller M A, Walsh T R. Nature (London) 1998;394:758–760. [Google Scholar]
  • 19.Doye J P, Miller M A, Wales D J. J Chem Phys. 1999;110:6896–6906. [Google Scholar]
  • 20.Miller M A, Wales D J. J Chem Phys. 1999;111:6610–6616. [Google Scholar]
  • 21.Schön J, Putz H, Jansen M. J Phys Condens Matter. 1996;8:143–156. [Google Scholar]
  • 22.Schön J. Ber Bunsenges. 1996;100:1388–1391. [Google Scholar]
  • 23.Schön J, Sibani P. Europhys Lett. 2000;49:196–202. [Google Scholar]
  • 24.Orešič M, Shalloway D. J Chem Phys. 1994;101:9844–9857. [Google Scholar]
  • 25.Church B W, Orešič M, Shalloway D. In: Tracking Metastable States to Free-Energy Global Minima, DIMACS Series in Discrete Mathematics and Theoretical Computer Science. Pardalos P, Shalloway D, Xue G, editors. Vol. 23. Providence, RI: Am. Math. Soc.; 1996. pp. 41–64. [Google Scholar]
  • 26.Church B W, Ulitsky A, Shalloway D. Adv Chem Phys. 1999;105:273–210. [Google Scholar]
  • 27.Ball K D, Berry R S. J Chem Phys. 1998;109:8541–8556. [Google Scholar]
  • 28.Ball K D, Berry R S. J Chem Phys. 1998;109:8557–8572. [Google Scholar]
  • 29.Stillinger F H, Weber T A. Phys Rev A. 1982;25:978–989. [Google Scholar]
  • 30.Shalloway D. J Chem Phys. 1996;105:9986–10007. [Google Scholar]
  • 31.Sherrington D. Physica D. 1997;107:117–121. [Google Scholar]
  • 32.Straub J E, Thirumalai D. Proc Natl Acad Sci USA. 1993;90:809–813. doi: 10.1073/pnas.90.3.809. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Shalloway D. In: Variable-Scale Coarse-Graining in Macromolecular Global Optimization. Biegler L, Coleman T, Conn A R, Santosa F, editors. New York: Springer; 1997. pp. 135–161. [Google Scholar]
  • 34.Gō N, Scheraga H A. Macromolecules. 1976;9:535–542. [Google Scholar]
  • 35.Vanderbilt D, Louie S G. J Comp Phys. 1984;56:259–271. [Google Scholar]
  • 36.Church B W, Shalloway D. Polymer. 1996;37:1805–1813. [Google Scholar]
  • 37.Frantz D D, Freeman D L, Doll J D. J Chem Phys. 1990;93:2769–2784. [Google Scholar]
  • 38.Ramirez-Alvarado M, Blanco F J, Serrano L. Nat Struct Biol. 1996;3:604–611. doi: 10.1038/nsb0796-604. [DOI] [PubMed] [Google Scholar]
  • 39.Némethy G, Gibson K D, Palmer K A, Yoon C N, Paterlini G, Zagari A, Rumsey S, Scheraga H A. J Phys Chem. 1992;96:6472–6484. [Google Scholar]
  • 40.Abagyan R. In: Simulation of Biomolecular Systems: Theoretical and Experimental Applications. van Gunsteren W F, Weiner P K, Wilkinson A J, editors. Leiden, The Netherlands: ESCOM; 1997. pp. 363–394. [Google Scholar]
  • 41.Ulitsky A, Shalloway D. J Chem Phys. 1998;109:1670–1686. [Google Scholar]
  • 42.Bejan A. Advanced Engineering Thermodynamics. New York: Wiley; 1988. pp. 111–145. [Google Scholar]
  • 43.Pillardy J, Czaplewski C, Liwo A, Lee J, Ripoll D R, Kazmierkiewicz R, Oldziej S, Wedemeyer W J, Gibson K D, Arnautova Y A, et al. Proc Natl Acad Sci USA. 2001;98:2329–2333. doi: 10.1073/pnas.041609598. . (First Published February 20, 2001, 10.1073/pnas.041609598) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Nymeyer H, Socci N D, Onuchic J N. Proc Natl Acad Sci USA. 2000;97:634–639. doi: 10.1073/pnas.97.2.634. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES