Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Feb 9.
Published in final edited form as: Methods Enzymol. 2011;487:99–132. doi: 10.1016/B978-0-12-381270-4.00004-4

Modeling Loop Entropy

Gregory S Chirikjian 1
PMCID: PMC3035855  NIHMSID: NIHMS263214  PMID: 21187223

Abstract

Proteins fold from a highly disordered state into a highly ordered one. Traditionally, the folding problem has been stated as one of predicting ‘the’ tertiary structure from sequential information. However, new evidence suggests that the ensemble of unfolded forms may not be as disordered as once believed, and that the native form of many proteins may not be described by a single conformation, but rather an ensemble of its own. Quantifying the relative disorder in the folded and unfolded ensembles as an entropy difference may therefore shed light on the folding process. One issue that clouds discussions of ‘entropy’ is that many different kinds of entropy can be defined: entropy associated with overall translational and rotational Brownian motion, configurational entropy, vibrational entropy, conformational entropy computed in internal or Cartesian coordinates (which can even be different from each other), conformational entropy computed on a lattice; each of the above with different solvation and solvent models; thermodynamic entropy measured experimentally, etc. The focus of this work is the conformational entropy of coil/loop regions in proteins. New mathematical modeling tools for the approximation of changes in conformational entropy during transition from unfolded to folded ensembles are introduced. In particular, models for computing lower and upper bounds on entropy for polymer models of polypeptide coils both with and without end constraints are presented. The methods reviewed here include kinematics (the mathematics of rigid-body motions), classical statistical mechanics and information theory.

Keywords: Protein folding, Entropy, Conformation, Ensemble, Convolution, Rigid-body Motion, Probability Density Function, Polymer, Information Theory

1 Introduction

In a classic observation, Anfinsen observed the spontaneous and repeatable folding of a protein from a highly disordered state into a highly ordered one [3]. From this result and others that followed, it has been inferred over the years that similar processes work for wide classes of proteins. But exactly how unstructured is the unfolded/denatured state ? And how structured is the native state ?

New evidence suggests that the ensemble of unfolded forms may not be as disordered as once believed and that the native form may not be as rigid as one might expect. In this light, protein folding is a transformation of a high-conformational-entropy ensemble into a lower one. But how high is high, and how low is low ? In order to answer such questions, some new mathematical and computer models will be helpful. Therefore, new mathematical tools for the approximation of conformational entropy in the unfolded and folded ensembles are introduced here. A number of related tools already exist in other fields. These are reviewed, adapted and developed further. In particular, lower and upper bounds on entropy are derived for polymer models of polypeptide chains, both with and without constraints on the positions and orientations of the ends. The methods reviewed here include kinematics (the mathematics of rigid-body motions as studied in the field of Robotics), information theory, and functional analysis on Lie groups (which, in part, considers how probability density functions of group-valued argument combine and propagate). In particular, we attach reference frames to polypeptide chains as shown in Figure 1, where the origin of the ith frame is located at the ith Cα atom with a unique orientation defined by the CαC’ bond and the plane defined by the CαC’ = O atoms as in [49]. We use the distributions of relative motion between consecutive residues to characterize backbone conformational entropy. Side-chain motions are computed relative to these reference frames and we show how to compute the associated side-chain entropy. These new and powerful methods make it possible to approximate changes in entropy between relatively ordered and disordered states without using traditional sampling techniques.

Figure 1.

Figure 1

Reference Frames Attached to a Polypeptide Chain: (left) Dihedral angle definitions; (right) Attaching frames to Cα atoms in a canonical way.

To summarize, the main contributions of this work are:

  • A method for generating the distribution of relative positions and orientations of polymer-like polypeptide coils is presented, building on prior work in the Robotics literature.

  • The distribution of end-constrained loop conformations is obtained from this information by applying Bayes’ rule.

  • Quantitative bounds on the associated loop entropy are derived, and the change in loop entropy resulting from constraining one end relative to the other is computed.

The computational complexity of this approach is low enough that it can be implemented on a single-processor personal computer running standardized software such as Matlab.

The remainder of this work is structured as follows. A comprehensive review of the literature is provided in Subsection 1.1. This is followed by a review of the necessary concepts from Statistical Mechanics in Subsection 1.2, and of background mathematics in Subsection 1.3. Section 2 applies these techniques to compute lower and upper bounds on the entropy of the unfolded ensemble of polypeptide conformations. Section 3 develops bounds on the entropy of the folded ensemble. Section 4 demonstrates the methodology with some closed-form examples. Finally, Section 5 summarizes the results and maps out future directions.

1.1 Literature Review

Protein folding is often viewed graphically as a funnel from the polymer-like ensemble of unfolded states to the native state [10]. Changes in backbone entropy between unfolded and native states have been measured experimentally [22]. And NMR has been shown to be a useful tool for experimentally observing conformational fluctuations in proteins in general [52, 60, 83]. A growing body of literature suggests that ‘the native state’ of certain proteins may not be as ordered as once believed [28, 9, 78, 63, 29]. On the other hand, recent studies suggest that the unfolded ensemble is not as disordered as once believed [70], and that sequential interactions and sterics provide strong constraints on possible folding pathways [32, 36, 31, 61, 4]. Furthermore, conformational entropy of the native ensemble is believed to play an important role in binding [7, 34]. For these reasons, the development of analytical and computational models of entropy in protein loops with and without end constraints provides a way to compare the relative amount of disorder in the folded and unfolded cases.

Many statistical mechanical treatments of protein folding have been performed, e.g., [21, 80, 20, 26]. In some studies, full chemical detail is used in molecular dynamics [51, 40], yet it appears that this level of detail may not be required for successful prediction of folding [65]. Furthermore, when computing statistical quantities such as entropy, sufficient data must be obtained in high dimensional configuration or phase spaces in order to obtain robust results. This is almost impossible to do at a fully detailed level. Therefore, simplified statistical models such as those presented in this work may be useful. Furthermore, while the emphasis here is loop entropy in proteins, the methodology presented here can in principle be applied to RNA structures. Models of loop entropy in nucleic acids have been presented in [13, 53, 84].

The author’s original field is the kinematic geometry of snakelike (or ‘hyper-redundant’) robot arms with many degrees of freedom [15, 16]. A tool which is useful for the analysis of all positions and orientations reachable by the ‘gripper’ at the distal end of this kind of arm is noncommutative harmonic analysis [18]. This mathematical tool combines ideas from group theory and Fourier analysis [71, 77, 58, 35, 72], and can be used to compute convolutions and diffusions of functions on Lie groups, such as the rotation group or rigid-body motion group [82, 18]. This is a particularly useful tool in the quantitative analysis of the distribution of all possible reachable gripper positions and orientations. Such quantities are quite similar to those encountered in polymer statistical mechanics. In polymer theory, distributions of relative end-to-end distance and orientation of backbone points and their tangents play central roles, as described in [6, 8, 24, 25, 27, 33, 37, 56, 68]. With the tool of noncommutative harmonic analysis, distributions in all six dimensions of rigid-body motion (three translational and three rotational) can be obtained, and marginals of these distributions can be taken to yield those which are commonly of interest in polymer physics (such as the distribution of end-to-end distances or relative orientations) [14]. This approach has been taken by the author in a series of papers, particularly concerned with semi-flexible polymers in which there is internal bending and torsional stiffness [17, 19]. The case of statistical distributions when semi-flexible polymers have internal joints and rigid bends has also been addressed using these methods [87, 88]. And it has been shown that this method can be applied to more general polymers including unfolded polypeptide chains [44]. Similar tools can be used to analyze large amounts of geometric data in the protein data bank [5] such as statistics of helix-helix crossing angle [50] and the relative pose (position and orientation) between alpha carbons in proteins [14, 49].

Of course, the author is not the only (and not even the first) member of the robotics community to attempt to transfer theoretical and computational tools from that field to study structural biology and biophysical phenomena. Lozano-Perez and coworkers have applied methods from robot motion planning and artificial intelligence to a number of problems in structural biology and rational drug design [66, 79]. Latombe and his students have applied methods from robot motion planning to explore configuration spaces and do energy minimization in the context of protein structures [54, 47, 38]. These build on the method of probabilistic roadmaps [41]. Amato and coworkers [75, 73, 1, 2] and Kavraki [74, 85, 23, 69] have been leaders in the application of robotics techniques in computational biology. The cyclic descent algorithm for robot kinematics has been applied to protein loops [11], as has other methods from kinematics [55, 42, 43]. It has been fashionable recently in engineering to consider proteins as examples of molecular machines [57, 45].

1.2 Statistical Mechanics

In classical equilibrium statistical mechanics, the Boltzmann distribution is defined as

f(p,q)=1Zexp(βH(p,q)) (1)

where the partition function is defined as

Z=qpexp(βH(p,q))dpdq. (2)

Here β = 1/kBT (kB is the Boltzmann constant and T is temperature measured in degrees Kelvin), pi = p · ei is the momentum conjugate to the ith generalized coordinate qi = q · ei, H is the Hamiltonian for the system, and dp dq = dp1dpNdq1 … dqN for a system with N degrees of freedom. The range of integration is over all possible states of the system. The Boltzmann distribution describes the probability density of all states of a system at equilibrium.

The full set of generalized coordinates, {q}, describes the configuration of the system, which includes overall rigid-body motion, and the intrinsic structural degrees of freedom. These intrinsic degrees of freedom can be further broken down into ‘hard’ degrees of freedom such as bond angles and bond lengths which do not vary substantially from referential values, and ‘soft’ degrees of freedom, such as torsion angles, that can vary widely. The hard degrees of freedom describe vibrational states and the soft degrees of freedom describe conformational changes, i.e., motions due to rotations around covalent chemical bonds. While the words ‘configuration’ and ‘conformation’ are often used interchangeably in the literature, the distinction between them as defined above is important in this work.

For any classical mechanical system the Hamiltonian is of the form

H(p,q)=12pT{M1(q)}p+V(q) (3)

where V(q) is the potential energy and M(q) is the mass matrix (also called the mass metric tensor) [62].

The Gibbs formula for entropy of an ensemble described by f(p, q) is

S=kBpqf(p,q)logf(p,q)dpdq. (4)

Mathematically, ‘continuous entropy’ as defined above can take on negative values (and the entropy in the limiting case of a Dirac delta function goes to negative infinity). As explained in [12], this is very different than discrete entropy. Physically, continuum theory and classical mechanics break down at very small scales in phase space. By definition, a discretization of phase space is chosen such that S = 0 corresponds to the most ordered system that is physically possible, which is when all the states in an ensemble are contained in the same smallest possible element of discretized phase space. This is not the same as discretizing conformational space on a coarse lattice, as is often done in polymer simulations. The effects of discretization of continuous entropy are discussed in [12].

As a practical matter, there are several limitations in using (4) as a computational tool. First, there is some debate about what molecular potentials to use. On the one hand, the accuracy of ab initio potentials derived from first principles for small molecules and then applied to macromolecular simulations can be questioned. On the other hand, the accuracy of statistical potentials derived from structural data is limited by the richness of the databases from which they are extracted. For different perspectives on this debate see [30, 39, 46, 48, 59, 76]. Second, the number of degrees of freedom in macromolecules is so high (many thousands for a protein in continuum solvent, and perhaps millions when including explicit solvent degrees of freedom), that it is not possible to approximate f(p, q) with any degree of fidelity. (If the number of sample values required to accurately estimate a probability density function in one degree of freedom is K, then one would expect to need K2N> samples to approximate a probability density function (pdf) in a 2N-dimensional phase space). If K is on the order of 10 to 100 and N ranges from thousands to millions, this is clearly intractable. One way to circumvent this problem is to compute only marginals of the full Boltzmann distribution, which as explained below, allows one to establish bounds on the true value of entropy.

Due to the structure of the Hamiltonian (3), it is easy to see that in general the Boltzmann distribution can not be separated into a product of configurational and momentum distributions, so f(p, q) ≠ fp(p)fq(q) (due to the dependence of the mass matrix on configuration), and so the thermodynamic entropy is bounded by the entropies of each marginal as1 SSp + Sq where the configurational entropy is

Sq=kBqfq(q)logfq(q)dq. (5)

and it is often assumed that Sp is constant. In fact, when the generalized coordinates are the Cartesian coordinates of the positions of all atoms in a macromolecule so that q becomes the 3n-dimensional vector of all such positions, denoted here as x=[x1T,,xnT]T, then f(p, x) = fp(p)fx(x) and S = Sp + Sx. Furthermore, in this special case, the mass matrix is diagonal and constant, M(x) = M0, and

fp(p)=1Zpexp(12pTM01pkBT)

and so

Sp=log{(2πekBT)3n2M012} (6)

is in fact constant at constant temperature, without having to assume anything. It follows that ΔS = ΔSx, which is not necessarily true for general choices of coordinates, including dihedral angles.

Under a change of coordinates x = x(q) it is generally the case that Sx ≠ = Sq because the computation of

Sx=kBxfx(x)logfx(x)dx (7)

in an alternative coordinate systems (such as dihedral angles) becomes

Sx=kBqfx(x(q))logfx(x(q))detJ(q)dq

which is not generally equal to Sq unless |det J(q)| = 1. Therefore, when refering to configurational entropy, it is important to distinguish between Cartesian configurational entropy and dihedral configurational entropy unless |det J(q)| = 1.

In some scenarios it is convenient to subdivide the configurational degrees of freedom into the categories: rigid-body, hard and soft, so that q = (qrb, qhard, qsoft). It can be shown that the determinants of the mass and Jacobian matrices for chain structures can be written as functions proportional to the form w1(qrbw2(qhard). Similarly, it is a common modeling assumption that for a system not subjected to an external force field, and with sufficiently hard degrees of freedom, that

V(qrb,qhard,qsoft)=V1(qhard)+V2(qsoft).

Assumptions such as these lead to the separability of the partition function into a product, and the separability of entropy into a sum of terms:

S=Srb+Shard+Ssoft.

Since the rigid-body term is the same for all ensembles of a given system in the same volume and temperature,

ΔS=ΔShard+Ssoft.

We will focus on methods for computing Cartesian conformational entropy, ΔSx, using the concept of convolution on the rigid-body-motion group. When the hard degrees of freedom are treated as rigid, ΔShard → 0, and ΔSx → ΔSsoft. In the remainder of this work we will examine Sx for: (a) polymer-like ensembles with rotatable bonds and free ends; and (b) polymer-like loop regions with end constraints.

1.3 Mathematics Review

When considering models of polypeptide chains, it often will be convenient to treat parts of the chain as rigid. For example, the plane of the peptide bond can be considered rigid, as can a cluster of side-chain atoms such a methyl group. At a coarser level, one might consider an alpha helix to be a rigid object. At a coarser level still, a whole domain might be approximated as a rigid body.

Therefore, it is clear that at various levels of detail, when characterizing the conformational entropy of a protein, it is conceivable that attaching reference frames to the rigid elements and recording the set of all possible rigid-body motions between these elements is a way to describe the conformational part of the Boltzmann distribution, and therefore get at the conformational entropy via Gibbs’ formula.

In this section, a coordinate-free review of rigid-body motions is presented. More detailed reviews and comparisons of various parameterizations such as Euler angles, Cayley parameters, etc. can be found in [18].

1.3.1 Mathematics of Rigid-Body Motion

The group of rigid-body motions, which is also called the Special Euclidean group and is denoted SE(3), is the semi direct product of (R3,+) (three-dimensional Euclidean space endowed with the operation of vector addition) with the special orthogonal group, SO(3), which consists of 3× 3 rotation matrices together with the operation of matrix multiplication. In both instances, the word ‘special’ means that reflections are excluded and only physically allowable isometries of three-dimensional space are allowed.

We denote elements of SE(3) as g = (a, A) ∈ SE(3) where ASO(3) and aR3. For any g = (a, A) and h = (r, R) ∈ SE(3), the group law is written as g ○ h = (a + Ar, AR), and g−1 = (−AT a, AT). Alternately, one may represent any element of SE(3) as a 4 × 4 homogeneous transformation matrix of the form

g=(Aa0T1),

in which case the group law is matrix multiplication. The bottom row in these matrices, which consists of three zeros (i.e., 0T is the transposed, or row, vector corresponding to the column vector of zeros, 0) and the number one, is a placeholder which ensures that the matrix multiplication reproduces the correct group operation. In the above matrix, ASO(3) denotes rotations and aR3 denotes translations of a reference frame which when attached to a rigid body represent the motion of that body from the reference position and orientation defined by the identity element e = (0, I).

In Lie theory2, the exponential mapping from the Lie algebra to a corresponding Lie group plays an important role [18]. In the current context, the Lie group of interest is SE(3), and the corresponding Lie algebra is se(3), which consists of all matrices formed by linear combinations of the following basis elements:

E1=(0000001001000000);E2=(0010000010000000);E3=(0100100000000000);E4=(0001000000000000);E5=(0000000100000000);E6=(0000000000010000);

For small (infinitesimal) motions around the identity (null motion), gI + XSE(3) where Xse(3). However, for larger motions this is not true. For those unfamiliar with his terminology, definitions and properties important to our formulation has been provided in the book [18]. The essential thing to know is that elements of se(3) and SE(3) can both be viewed as 4 × 4 matrices, however while it makes sense to add elements of se(3) (i.e., velocities add), it only makes sense to multiply elements of SE(3). Furthermore, by the matrix exponential mapping, it is possible to produce elements of SE(3) from those in se(3), and vice versa using the matrix logarithm:

exp:se(3)SE(3)andlog:SE(3)se(3).

Figure 2(left) illustrates that the composition of rigid-body motions is not a commutative operation. Figure 2(right) shows the relationship between the Lie algebra se(3) consisting of infinitesimal motions (which form a linear vector space), and SE(3) consisting of large motions (which form a curved manifold, which is a Lie group).

Figure 2.

Figure 2

(left) Rigid-body Transformations between Reference Frames form a Noncommutative Lie Group (g1g2g2g1); (right) The Exponential Map

For small translational (rotational) displacements from the identity along (about) the ith coordinate axis, the homogeneous transforms representing infinitesimal motions look like

exp(Ei)I+Ei (8)

where I is the 4 × 4 identity matrix, |ε| << 1, and exp(X) = I + X + X2/2 + … is the matrix exponential defined by the Taylor series of the usual exponential function evaluated with a matrix rather than a scalar. For example,

exp(θE3)=(cosθsinθ00sinθcosθ0000100001)andexp(yE5)=(1000010y00100001),

and for small values expanding sin θθ and cos θ ≈ 1 it is easy to see that (8) holds for the example on the left. For the example on the right, (8) holds even for large values of y.

The ‘exponential parametrization’

g=g(χ1,χ2,,χ6)=exp(i=16χiEi) (9)

is a useful way to describe relatively small rigid-body motions because, unlike the Euler angles, it does not have singularities near the identity.

One defines the ‘vee’ operator, , such that for any

X=i=16χiEi,
X=(χ1χ2χ6).

The 6 × 6 adjoint matrix, Adg, is defined by the expression

Adg(X)=(gXg1),

and explicitly if g = (a, A) then

Adg=(A0a×AA),

where a × A denotes the matrix resulting from the cross product of a with each column of A.

The vector of exponential parameters, χR6, can be obtained from gG with the formula

χ=(logg). (10)

The action of an element of the motion group, g = (a, A), on a vector x in three-dimensional space is defined as g · x = Ax + a. In contrast, given a function f(x), we can translate and rotate the function by g as f(g−1 · x) = f(AT (xa)). The fact that the inverse of the transformation applies under the function (rather than the transformation itself) in order to implement the desired motion is directly analogous to the case of translation on the real line. For example, given a function on the real line, f(x), with its mode at x = 0, if we want to translate the whole function in the positive x direction by amount ξ so that the mode is at x = ξ, we compute f(xξ) (not f(x + ξ)). This is a very important point to understand in order for the rest of this work to make sense. Figure 3(left) illustrates the shifting of a function under rigid-body motion geometrically.

Figure 3.

Figure 3

(left) Action of a Motion on a Function; (right) Convolution of Functions of Rigid-Body Motion

1.3.2 Manipulations of Functions of Rigid-Body Motion

Suppose that three rigid bodies labeled 0, 1 and 2 are given, with reference frames attached to each, and assume that only sequentially adjacent bodies interact. Suppose also that body 0 is fixed in space and the ensemble of all possible motions of body 1 with respect to 0 are recorded, and motions of 2 with respect to 1 are also recorded. Then we have two functions of motion, f0,1(g) and f1,2(g) which together describe the conformational variability of this simple system. If we are interested in knowing the probability distribution describing the ensemble of all possible ways that body 2 can move relative to body 0, how is this obtained ? In fact, it is computed via the convolution on SE(3) [18]:

f0,2(g)=(f0,1f1,2)(g)=Gf0,1(h)f1,2(h1g)dh (11)

What this says is that the distribution f1,2(g) is shifted through all possible rigid-body motions, h, weighted by the frequency of occurrence of these motions, f0,1(h), and integrated over all values of hG (G is just short for ‘Group’, which throughout this work is the group of rigid-body motions, SE(3)). Figure 3(right) illustrates this geometrically.

Explicitly, what is meant by this integral ? Let us assume for the moment that rotations are parameterized using Euler angles. The range of the Euler angles is 0 ≤ α γ, ≤ 2π and 0 ≤ βπ. In this parametrization the volume element for G is given by

dg=18π2sinβdαdβdγdr1dr2dr3,

which is the product of the volume elements for R3 (dr = dr1dr2dr3), and for SO(3) (dR=18π2sinβdαdβdγ). The normalization factor in the definition of dR is so that ∫SO(3)dR = 1. The volume element for SE(3) can also be expressed in the exponential coordinates described in the previous subsection, in which case

dg=J(χ)dχ1dχ6

where |J(χ)| is a Jacobian determinant for this parametrization. The Jacobian can be computed using the formula

J(χ)=[(g1gχ1),,(g1gχ6)]

and it can be shown that |J(0)| = 1 and so close to the identity the Jacobian factor in this parametrization can be ignored (which is not true for many other parameterizations, including the Euler angles).

The fact that the volume element is invariant to right and left translations, i.e,

dg=d(hg)=d(gh),

is well known in certain communities (See e.g. [77], [71]).

A convolution integral of the form in (11) can be written in the following equivalent ways:

(f0,1f1,2)(g)=Gf0,1(z1)f1,2(zg)dz=Gf0,1(gk1)f1,2(k)dk (12)

where the substitutions z = h−1 and k = h−1g have been made, and the invariance of integration under shifts and inversions is used.

The concept of convolution on SE(3) will be central in the formulation that follows.

One can define a Gaussian distribution on the six-dimensional Lie group SE(3) much in the same way as is done on R6 provided that: (1) the covariances are small; and (2) the mean is located at the identity. The reason for these conditions is because near the identity, SE(3) resembles R6 which means that dgdχ1dχ6 and we can define the Gaussian in the exponential parameters as

f(g(χ))=1(2π)3Σ12exp(12χTΣ1χ) (13)

Given two such distributions that are shifted as fi,i+1(gi,i+11g), each with 6 × 6 covariance Σi,i+1, then it can be shown that the mean and covariance of the convolution f0,1(g0,11g)f1,2(g1,21g) respectively will be of the form g0,2 = g0,1g1,2 and [81]:

Σ0,2=Adg1,21Σ0,1Adg1,2T+Σ1,2. (14)

This provides a method for computing covariances of two concatenated segments, and this formula can be iterated to compute covariances of chains without having to compute convolutions directly. This is demonstrated numerically in the context of robotic arms in [81].

2 Computing Bounds on the Entropy of the Unfolded Ensemble

2.1 End-to-End Position and Orientation Distributions and the Cartesian Conformational Entropy of Serial Polymer Chains

Consider a polymer consisting of a serial chain of n + 1 essentially rigid monomer units numbered from 0 to n. Attach a frame of reference to the ith such unit. Let gi denote the rigid-body motion from the reference frame of the zeroth unit to that attached to the ith. Let gk,k+1 denote the relative motion from body k to body k + 1. Then gi = g0,i = g0,1g1,2 ○ … ○ gi1,i will be the cumulative motion from body 0 to body i. The relationship between these reference frames is described in Figure 4.

Figure 4.

Figure 4

(left) Relative and Absolute Reference Frames attached to the Chain; (right) The relative positions of mass points within body i.

In a purely pairwise energy model, only the interactions between adjacent units are important. In this simplest model, the probability of the relative pose gi,i+1=gi1gi+1 taking a particular value is given by

fi,i+1(gi,i+1)=(1+Zi,i+1)exp(βV(gi,i+1)).

Then, the conformational distribution described in terms of rigid-body poses is

f(g1,g2,,gn)=i=0n1fi,i+1(gi1gi+1) (15)

where g0 = e, the identity. This is related to the end-to-end position and orientation distribution

f0,n(gn)=(f0,1f1,2fn1,n)(gn), (16)

which is an n-fold convolution of the form in Section 1.3.2, by marginalization of (15) as

f0,n(gn)=GGf(g1,g2,,gn)dg1dgn1.

This is illustrated in Figure 5.

Figure 5.

Figure 5

Kinematic Covariance Propagation: (left) In the absence of other constraints, distributions describing the allowable rigid-body motions between consecutive residues ‘add’ by convolution, resulting in a spreading out of probability density in position and orientation, f0,i(gi), as i increases; (right) a zoomed in view of the probabilistic relationship between reference frames i and i + 1 embodied by the functions fi,i+1(gi,i+1).

Equation 15 represents a generalization of the classical polymer models in which only pairwise interactions are considered. If the frames of reference gi and gi+1 are attached at the Cα atoms of residues i and i + 1 in a polypeptide, then the function fi,i+1(gi,i+1) would be the six-dimensional generalization of a Ramachandran map that could include small bond angle bending, warping of the peptide plane and even bond stretching. If one chooses not to model these effects, then the classical Ramachandran map [64] can be reflected by appropriately defining fi,i+1(gi,i+1), as has been done in [44]. This is consistent with the Flory isolated pair model [33], which has been challenged in recent years [61]. However, as an upper bound on conformational entropy, it may still be useful in some contexts.

Note that since gn = (r, R) describes both the end-to-end position and orientation of the distal end of the chain relative to the proximal end, we can marginalize further to obtain quantities such as the end-to-end distance distribution, or end-to-end orientational distribution. These quantities (or several of their moments) can be measured directly from a variety of experimental measurements.

In order to convert these probabilities into a form that is directly useful for computing Cartesian conformational entropy, we must know the positions of all atoms in each of the i rigid monomer units.

Given f(g1, g2, …, gn) in (15) and given the family of probability density functions {Δi(xi1, …, xik)} each of which describes the distribution of motions of the iki1 + 1 atoms within body i, it is possible to compute the full Cartesian conformational distribution as:

ρ(x1,,xN)=GGf(g1,g2,,gn)i=1nΔi(gi1xi1,,gi1xik)dg1dgn (17)

where N = in is the total number of atoms in the chain and xi=[xi1T,,xikT]T is the composite vector of Cartesian coordinates of all positions in the ith body.

Δi(xi1, …, xik) is a probability density on 3(iki1 + 1)-dimensional Euclidean space. In other words,

R3R3Δi(xi1,,xik)dxi1dxik=1.

As an example, when the ith body is modeled as being perfectly rigid,

Δi(xi1,,xik)=j=i1ikδ(xjxj0)

where xj0 is the fixed position of atom j as seen in the frame of reference gi affixed to rigid body i. In contrast, if body i is an articulated side chain, averaging over all of its conformational states would result in a Δi which is not a sum of Dirac delta functions.

In some cases it may be useful to compute the full pose entropy of the chain:

Sg=GGf(g1,g2,,gn)logf(g1,g2,,gn)dg1dgn. (18)

2.2 Modeling Excluded Volume Effects

The phantom polymer chain model in which the effects of excluded volume are ignored is clearly not a realistic model, but it can be used as a baseline onto which self-avoidance can be built. In a polypeptide, residue i interacts with residues i + 1,…, i + 4 substantially as well as more sequentially distant residues. These interactions are not only responsible for the formation of secondary structures, but also substantially winnow down the available conformational space [32]. Clearly this has implications for the entropy. More specifically, polymer models can be used to compute upper bounds on the conformational entropy in polypeptides. And these bounds can be made tighter by incorporating the effects of steric clash into modified versions of the conformational probability distributions.

To begin, let’s compute the density of body i. This can either be done directly by, for example, averaging body i over all possible side-chain conformations. Or, it can be done by first computing each marginal of the density function Δi as:

dij(xij)=xi1R3xij1R3xij+1R3xikR3Δi(xi1,,xik)dxi1xij1xij+1dxik.

Then the average density of body i (normalized to be a probability density) is

di(x)=1iki1+1ij=i1ikdij(x).

The overlap of bodies in the chain is illustrated in Figure 6.

Figure 6.

Figure 6

Conformations to be removed from the Phantom Chain Ensemble: (left) Local Overlaps; (right) Nonlocal Overlaps.

Therefore, if body i is moved by rigid body motion gi, and likewise for body j, we can compute an estimate of their overlap (averaged over all deformations of the bodies) as

wij(gi,gj)=R3di(gi1x)dj(gj1x)dx.

A general property of integration over all of three-dimensional space is that it is invariant under rigid-body motions. Therefore if, we make the change of variables y=gi1x, then we find that

wij(gi,gj)=wij(e,gi1gj)=wij(gj1gi,e).

Clearly when the two bodies do not overlap, wij = 0. Otherwise they will have some positive value. One can imagine evaluating wij as the argument of a ‘sigmoid function’, which sharply ramps up from zero to one, where it then plateaus at higher values. The resulting Wij(gi1gj)=1exp((wij(gi,gj))22σ2) (for some small value of σ) would effectively window out all values of the rigid-body motions gi and gj that contribute to nonphysical overlaps. Then the original f(g1, g2, …, gn) in (15) could be replaced with one of the form

fex(g1,g2,,gn)=Cf(g1,g2,,gn)ijn(1Wij(gi1gj)) (19)

where C is the normalization required to make fex a probability density function. Note that the product in this expression is not only over sequentially local pairs of bodies, but rather all bodies, where the ‘i < j’ simply avoids double counting. In this way, a phantom polymer model that generates f(g1, g2, …, gn) can be viewed as the starting point for a more realistic model that includes steric constraints.

2.3 Bounding Cartesian Conformational Entropy

Practically speaking computing such high-dimensional integrals as in (17) or (19) can impose a computational problem, except when simple closed-form expressions such as Gaussians are used. If we seek an upper bound on Cartesian conformational entropy, marginals can be computed and information-theoretic bounds can be employed. Performing such marginalization, one finds

ρi(xi)=Gf0,i(gi)Δi(gi1xi1,,gi1xik)dgi (20)

In the case when one representative point is chosen per residue (for example, the Cα atom, which is where the reference frame for the residue is usually attached), we have k = 1. Then i = i1, and since xi0=0 due to the way the reference frame is attached, we can write

ρi(xi)=Gf0,i(g)δ(g1xi)dg.

If g = (r, R), then δ(g−1 · xi) = δ(RT (xir)) = δ(xir) and so we can get the positional distribution of the ith Cα atom by marginalizing the full pose distribution over orientations as

ρi(xi)=SO(3)R3f0,i(r,R)δ(xir)drdR=SO(3)f0,i(xi,R)dR. (21)

The conformational entropy of the backbone represented by Cα atoms is then bounded from below by the entropy of individual marginals (with the tightest lower bound resulting from the maximum of these). The loop entropy will be bounded from above by the sum of entropies from all of the marginals. Therefore,

maxiSiSxi=1nSiwhereSi=R3ρi(xi)logρi(xi)dxi (22)

and x=[x1T,x2T,,xnT]T.

3 Approximating Entropy of the Loops in the Folded Ensemble

The native ensemble of a protein is characterized by a relatively high degree of order. However, the native form is not completely rigid. In particular, loop/coil regions connecting secondary structures can exhibit large motions. Here we model the ends of these loops as being fixed at specific positions and orientations, as illustrated in Figure 7. Bounds on the contribution of loop motions to overall entropy are discussed here.

Figure 7.

Figure 7

Using Density Information to Determine Probabilities of Conformations that Obey End Constraints

If f(g1, …, gn) is the conformational distribution function describing the positions and orientations of all bodies in the system with respect to the proximal end of the chain, then if we fix the distal end at a specific pose, gend, the resulting distribution will the the conditional density

ffix(g1,g2,,gn1;gend)=f(g1,,gngn=gend)=f(g1,,gn1,gend)f0,n(gend). (23)

The entropy of this distribution in some cases can either be computed directly, or each of the marginals can be computed as

f0,ifix(gi;gend)=f0,i(gi)fi,n(gi1gend)f0,n(gend). (24)

The reason for this is that in the definition of f(g1, …, gn), the variable gi appears in only the two multiplied terms: fi1,i(gi11gi)fi,i+1(gi1gi+1). Marginalizaing over g1 through gi–1 results in f0,i(gi). If gi were the identity element, then marginalizing over gi+1 through gn–1 would yield fi,n(gend). However, since in general gie, this result is shifted by gi to yield fi,n(gi1gend). Division by f0,n(gend) is the normalization required to make the result a pdf (since integration of the numerator in (24) over gi is a convolution). This denominator is carried along from (23), which is a statement of Bayes’ rule.

Intuitively, the entropy of a chain with fixed ends must be smaller than that of a chain with freely moving ends. This can be quantified when using (23) and (24), as will be shown in the examples in the next section.

4 Examples

In this section examples are used to illustrate the formulation presented earlier in this work. In both of these examples, a piece of flexible loop/coil connects relatively rigid structures. In the first example, the loop is considered to be a long phantom chain, whereas in the second it is considered to be a semi-flexible polymer. The reduction in conformational entropy associated with constraining the ends in both cases is examined.

4.1 Model 1: Long Loops Modeled as Gaussian Chains

Perhaps the most common model for the distribution of end-vector distribution in polymer theory is the Gaussian distribution:

f(g)=W(r)=(32πr2)32exp[3r22r2]=1(2π)32Σ12exp(12rTΣ1r) (25)

where g = (r, R) ∈ SE(3) and the chain is so flexible that the orientational part of the distribution is constant, and Σ = (⟨r2⟩/3)I where I is the 3 × 3 identity matrix. This distribution is spherically symmetric (and hence depends only on r = |r|). It is normalized so that it is a probability density function,

R3W(r)dV=4π0W(r)r2dr=1,

satisfying

R3W(r)r2dV=4π0W(r)r4dr=r2.

The probability density function for a freely-jointed chain with n links, each of length l can be approximated as a Gaussian random walk with

r2=nl2.

If we denote Wn(r) to be the function for n links, then it is clear that in this simple model Wn1 * Wn2 = Wn1+n2. Therefore, in this simple model the conformational entropy is bounded as

32log(2πenl23)Sr32k=1nlog(2πekl23)

using (22) where r takes the place of x.

The conformational entropy of the phantom chain that gives rise to this distribution is bounded from below by the entropy of the probability density function of the location of the terminal end, and from above by the sum of probabilities for each link from base to end, since these are marginals of the total conformational distribution f(g1, …, gn), which for this case is a function W(r1,,rn)=i=0n1W1(ri+1ri) with r0 = 0 where W1 is the effective one-bond Gaussian distribution with covariance matrix Σ = (l2/3)I. Therefore, it is possible to compute the Cartesian conformational entropy in this model exactly in closed form as

Sr=r1rnW(r1,,rn)logW(r1,,rn)dr1drn.

Since the chain is assumed to be uniform and the product of Gaussians is a Gaussian, we can use the fact that there is a closed-form formula for the entropy of a Gaussian in terms of its covariance matrix, together with the fact that in this example

Σ1=3l2(2II000I2II00I0000I2II000II). (26)

In principle, the entropy of a Gaussian chain with end position constraints can be computed using (23) and (24). In practice, there are some details that need to be addressed, which are addressed in Section 4.3.

Whereas here a Gaussian chain in which orientations diffuse rapidly was considered, the opposite extreme of a stiff chain is considered in the following subsection.

4.2 Model 2: Short Loops Modeled as Semi-flexible Polymers

Suppose that we have a semi-flexible loop, i.e., one that has local resistance to bending and twisting that reflects sequentially local steric constraints. Then each gi will deviate only a relatively small amount from a constant reference pose, hi, and so we write gi = hi ○ exp χi where ∥χi∥ < 1. This sort of assumption is consistent with findings in the literature. For example, [86] validated the use of semi-flexible polymer models to describe protein loop motions. The relative motion between adjacent reference frames will be

gi1gi+1=expχihi1hi+1expχi+1.

If the probability density fi,i+1 is a Gaussian with mean at hi1hi+1, then it will be of the form

fi,i+1(g)=FΣi,i+1((hi1hi+1)1g)

where

FΣi,i+1(expχ)=1(2π)3Σi,i+112exp(12χTΣi,i+11χ).

Therefore,

fi,i+1(gi1gi+1)=FΣi,i+1((hi1hi+1)1expχihi1hi+1expχi+1).

For small motions between adjacent bodies, the approximation

[log((hi1hi+1)1expχihi1hi+1expχi+1)]χi+1Adhi1hi+11χi

has been proven to be accurate [81]. If as shorthand we define Ai,i+1=Adhi1hi+1, then

fi,i+1(gi1gi+1)=1(2π)3Σi,i+112exp12[χiT,χi+1T](Ai,i+1TΣi,i+11Ai,i+11Ai,i+1TΣi,i+11Σi,i+11Ai,i+11Σi,i+11)[χiχi+1]. (27)

This, together with the product in (15), leads f(g1, g2, …, gn) to be a Gaussian distribution in the variable χ=[χ1T,,χnT]T with an inverse covariance of the form

Σ1=(Σ0,11+Σ1,21A1,2TΣ1,210000Σ1,21A1,21Σ1,21+Σ2,31A2,3TΣ2,31000Σ2,31A2,3100000Σn3,n21An3,n21Σn2,n11+Σn3,n21An2,n1TΣn2,n110000Σn2,n11An2,n11Σn2,n11+Σn1,n1An1,nTΣn1,n100000Σn1,n1) (28)

where Σi,i+1=Ai,i+1Σi,i+1Ai,i+1T.

The entropy Sg for the case with free ends is then given by the formula (18), which is relatively efficient to compute due to the block tri-diagonal form of Σ−1.

Obtaining the entropy for the case when both ends are fixed in position and orientation is also possible within this model. Using the fact that the adjoint is a homomorphism, i.e., Ad(g1g2) = Ad(g1)Ad(g2) and Ad(g−1) = Ad−1(g), this generalizes to the concatenation of n reference frames that vary around values hi as

Σ01n=k=0nAdhk,n1ΣkAdhk,nT (29)

In order to compute Sg for the case when the distal end of the chain is fixed at gn = gend, we would use (23) with the covariance of f0,n(g) being given by (29). Conditioning of Gaussians by Gaussians yields Gaussians, the entropy of which can be computed in closed form in principle. However, there are some subtle issues that need to be addressed, as discussed below.

4.3 From Covariance Matrices to Entropy

It is one thing to have bounds such as (22). It is another to have a closed-form expression for the actual quantity of interest. Here (26) and (28) are used to compute entropy. This follows from the fact that for a d(n)-dimensional Gaussian distribution with covariance Σ, the entropy is given as [67, 12]

S=log{(2πe)d(n)2Σ12}. (30)

Here we use the notation d(n) to denote the dimension of the covariance matrix, which is d(n) = 3n for positional Gaussian, and d(n) = 6n for the semi-flexible case. In other words, we can write d(n) = d0 · n.

We will consider the case when entropy change is due to fixing the ends. Consider a chain (either Gaussian or semi-flexible), and let x1,…,xn denote the variables describing the kinematic state of the n segments (i.e., xi=riR3 for the Gaussian chain and xi=χiR6 for the semi-flexible chain). Let us denote x=[x1T,,xn1T]T, y=xnRd0 and z = [xT, yT]T. Then f(x1, …, xn) can be written as

f(z)=1(2π)d0n2Σ12exp(12zTΣ1z)

The entropy of this distribution is computed simply as (30). However, the end-constrained case is somewhat more involved.

The conditional probability density describing the ensemble of end-constrained conformations is of the form

f(xy)=f(x,y)f(y)

where f(x, y) = f(z) (with y held fixed rather than being a variable) and the marginal distribution f(y) is given by

f(y)=1(2π)d02Σyy12exp(12yTΣyy1y)

where

Σ=[ΣxxΣxyΣyxΣyy].

The conditional distribution will then be

f(xy)=1(2π)d0(n1)2Λ12exp(12[xx0]TΛ1[xx0]) (31)

where

Λ=ΣxxΣxyΣyy1Σyx=[(Σ1)xx]1

and

x0=ΣxyΣyy1y.

In principle we now have everything we need to compute entropy differences. However, in practice, there is an implicit assumption about polymer distribution functions that must be addressed. Namely, even though the chain length is L, and hence the distal end cannot reach outside a ball or radius L centered at the proximal end, for the sake of convenience we will accept distributions with infinitely long tails. Another way to say this is that as long as the pdf decays rapidly enough and all integrals over a ball of radius L centered at the origin can be replaced by integrals over an infinitely large ball, then things will work out fine. Such calculations include computing probabilities and entropies from probability densities. In other words, we have simplifications such as

LLex2dxex2dx.

While this is perfectly reasonable when the Gaussians are centered at the origin, it will no longer be the case when we shift them by significant amounts. In other words, even though the value of an integral over an infinite range is invariant under shifts, this is not the case for integration over finite intervals:

LLe(xL2)2dxLLex2dx.

This is important in the context of the current discussion because the conditional pdf in (31) is shifted from the origin by a vector x0. In other words, if we fix the distal end of the chain at an arbitrary y, then this distribution of interest in d0 · (n – 1)-dimensional space will not be centered at the origin, and the infinite integral used to approximate integration over a ball of radius L = nl centered at the origin that resulted in the normalization constant [(2π)d0(n–1)/2|Λ|½]−1 will no longer be a valid approximation. The computation of this constant, and the computation of entropy then become a problem when ∥y∥ (and hence ∥x0∥) is not very small relative to total chain length, L = nl. However, when it is very small, the integral can still be approximated as being over infinite-dimensional space because the overwhelming majority of the mass under the pdf will still be contained in the finite ball of radius L.

4.3.1 Entropy for the Gaussian Chain

For the Gaussian chain effectively the end constraint rn = 0 means that the chain forms a closed loop because the vectors {ri} in this case are absolute positions of the ith residue with respect to the proximal end.

The entropy difference between two ensembles described by Gaussians with dimensions d(n) and d(n – 1) in the unconstrained and end-constrained states, respectively, will be

ΔS=S2S1=log{(2πe)d(n)2Σ212}log{(2πe)d(n1)2Σ112}=log[(2πe)d(0)2Σ212Σ112]=12log[(2πe)d0Σ11Σ21]. (32)

The last equality means that there is no need to invert the matrices in (26) and (28) when computing entropy differences between the ensembles with free and fixed ends. This is useful, because in practice one usually is interested only in entropy differences, and the determinants of block-tridiagonal matrices can be computed very efficiently (in O(n) computations for a chain of length n), whereas computing their inverses followed by taking the determinant can be an O(n3) operation.

4.3.2 Entropy of a Semi-flexible Chain

The entropy being considered is that defined in (18). For the semi-flexible chain the set {χi} describes the relative small rigid-body displacements of the ith residue with respect to a referential configuration. Therefore in this case the same tools developed in this work can be used to describe the entropy differences between the free and end-constrained cases for a somewhat different scenario than the Gaussian chain model. Namely, we can compute multiple reference conformations and consider small deviations around each. The reduction in entropy due to constraining both ends of the chain is then due to eliminating motions around multiple reference conformations (each with free ends) and only allowing motions around the one reference conformation that satisfies the required end conditions. This discussion is quantified below.

Imagine sampling the relative poses between adjacent amino acids in a loop at their K most populated isolated peaks. For example, K might be equal to 3 if we sample at the centers of the α, β and ω regions of the φ-ψ plane. If each of these peaks are isolated, and the distributions around them are modeled as SE(3) Gaussian distributions with small covariances and essentially non-overlapping tails, then the entropy associated with each of these conformational ensembles will be given by (30) where Σ is defined by (28). If the loop has n residues, then each reference conformation for i = 1, …, Kn will have its own entropy, Si defined by these equations. If the relative weights of each of these reference conformations are given by w1, …, wKn, and if each conformation is disjoint, and there is minimal overlap between the associated conformational distributions around each, then the total entropy in the case of free ends can be approximated as

Sfreei=1Knwilogwi+i=1KnwiSi.

The entropy for the case when the distal end is fixed can be approximated by adding contributions from the subset of baseline conformations that approximately satisfy the end constraints.

5 Conclusions

This work reviewed and built on techniques from the fields of robotics, information theory and theoretical polymer science and applied these to model conformational entropy in protein loops. At the core of this presentation was the mathematics of rigid-body motion and associated statistical computations, as well as the use of inequalities from information theory for developing a rigorous mathematical treatment of the entropy of unfolded, partially folded and fully folded proteins. Models of conformational statistics in these three kinds of ensembles were reviewed and developed. These models were then applied to compute entropy differences. The various concepts of entropy in statistical mechanics, computational polymer science and information theory were reviewed. The distinction between conformational entropy computed in internal and Cartesian coordinates was made. Inequalities to bound the value of entropy from below and above were presented in cases when exact computations were judged to be intractable.

Acknowledgements

This work was performed with support from the National Institutes for Health under grant R01GM075310 and R01GM075310-04S1. The author would like to thank Profs. G. Rose and E. Lattman for their valuable comments, and Dr. W. Park for proofreading, and Dr. S. Lee and Ms. Y. Wang for creating some of the figures..

Footnotes

1

Using results from information theory [67, 12].

2

Named after Norwegian mathematician Marius Sophus Lie (1842 - 1899).

References

  • [1].Amato NM, Song G. Using motion planning to study protein folding pathways. Journal of Computational Biology. 2002;9(2):149–168. doi: 10.1089/10665270252935395. [DOI] [PubMed] [Google Scholar]
  • [2].Amato NM, Dill KA, Song G. Using motion planning to map protein folding landscapes and analyze folding kinetics of known native structures. Journal of Computational Biology. 2003;10(3-4):239–255. doi: 10.1089/10665270360688002. [DOI] [PubMed] [Google Scholar]
  • [3].Anfinsen CB. Principles that govern folding of protein chains. Science. 1973 July;181(4096):223–230. doi: 10.1126/science.181.4096.223. [DOI] [PubMed] [Google Scholar]
  • [4].Baldwin RL, Rose GD. Is protein folding hierarchic? I. Local structure and peptide folding. Trends in Biochemical Sciences. 1999 Jan;24(1):26–33. doi: 10.1016/s0968-0004(98)01346-2. [DOI] [PubMed] [Google Scholar]
  • [5].Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Research. 2000 Jan 1;28(1):235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Birshtein TM, Ptitsyn OB. Conformations of Macromolecules. Interscience; New York: 1966. [Google Scholar]
  • [7].Boehr DD, Nussinov R, Wright PE. The role of dynamic conformational ensembles in biomolecular recognition. Nature Chemical Biology. 2009;5:789–796. doi: 10.1038/nchembio.232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Boyd RH, Phillips PJ. The Science of Polymer Molecules. Cambridge University Press; Cambridge: 1993. (Cambridge Solid State Science Series). [Google Scholar]
  • [9].Bracken C, Iakoucheva LM, Rorner PR, Dunker AK. Combining prediction, computation and experiment for the characterization of protein disorder. Current Opinion in Structural Biology. 2004 Oct;14(5):570–576. doi: 10.1016/j.sbi.2004.08.003. [DOI] [PubMed] [Google Scholar]
  • [10].Bryngelson JD, Onuchic JH, Socci ND, Wolynes PG. Funnels, Pathways, and the Energy Landscape of Protein-Folding - A Synthesis. Proteins-Structure,Function and Genetics. 1995 March;21(3):167–195. doi: 10.1002/prot.340210302. [DOI] [PubMed] [Google Scholar]
  • [11].Canutescu AA, Dunbrack RL. Cyclic coordinate descent: A robotics algorithm for protein loop closure. Protein Science. 2003 May;12(5):963–972. doi: 10.1110/ps.0242703. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Chirikjian GS. Stochastic Models, Information Theory, and Lie Groups. Birkhäuser; 2009. [Google Scholar]
  • [13].Chirikjian GS. Group Theory and Biomolecular Conformation, I.: Mathematical and computational models. J. Phys.: Condens. Matter. 2010;22:323103. doi: 10.1088/0953-8984/22/32/323103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Chirikjian GS. Conformational Statistics of Macromolecules Using Generalized Convolution. Computational and Theoretical Polymer Science. 2001 Feb;11:143–153. [Google Scholar]
  • [15].Chirikjian GS, Burdick JW. A Modal Approach to Hyper-Redundant Manipulator Kinematics. IEEE Transactions on Robotics and Automation. 1994;10:343–354. [Google Scholar]
  • [16].Chirikjian GS, Burdick JW. A Geometric Approach to Hyper-Redundant Manipulator Obstacle Avoidance. ASME Journal of Mechanical Design. 1992 December;114:580–585. [Google Scholar]
  • [17].Chirikjian GS, Kyatkin AB. An Operational Calculus for the Euclidean Motion Group with Applications in Robotics and Polymer Science. J. Fourier Analysis and Applications. 2000 Dec;6(6):583–606. [Google Scholar]
  • [18].Chirikjian GS, Kyatkin AB. Engineering Applications of Noncommutative Harmonic Analysis. CRC Press; Boca Raton, FL: 2001. [Google Scholar]
  • [19].Chirikjian GS, Wang Y. Conformational Statistics of Stiff Macromolecules as Solutions to PDEs on the Rotation and Motion Groups. Physical Review E. 2000 July;62(1):880–892. doi: 10.1103/physreve.62.880. [DOI] [PubMed] [Google Scholar]
  • [20].Crippen GM. A Gaussian statistical mechanical model for the equilibrium thermodynamics of barnase folding. Journal of Molecular Biology. 2001 Feb 23;306(3):565–573. doi: 10.1006/jmbi.2000.4401. [DOI] [PubMed] [Google Scholar]
  • [21].Crippen GM. Statistical mechanics of protein folding by cluster distance geometry. Biopolymers. 2004 Oct 15;75(3):278–289. doi: 10.1002/bip.20118. [DOI] [PubMed] [Google Scholar]
  • [22].D’Aquino JA, Gomez J, Hilser VJ, Lee KH, Amzel LM, Fieire E. The Magnitude of the Backbone Conformational Entropy Change in Protein Folding. PROTEINS: Structure, Function, and Genetics. 1996;25:143–156. doi: 10.1002/(SICI)1097-0134(199606)25:2<143::AID-PROT1>3.0.CO;2-J. [DOI] [PubMed] [Google Scholar]
  • [23].Das P, Moll M, Stamati H, Kavraki LE, Clementi C. Low-dimensional Free-energy Landscapes of Protein-Folding Reactions by Nonlinear Dimensionality Reduction. PNAS. 2006 June 17;103(26):9885–9890. doi: 10.1073/pnas.0603553103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].de Gennes PG. Scaling Concepts in Polymer Physics. Cornell University Press; 1979. [Google Scholar]
  • [25].des Cloizeaux J, Jannink G. Polymers in Solution: Their Modelling and Structure. Clarendon Press; Oxford: 1990. [Google Scholar]
  • [26].Dill KA, Fiebig KM, Chan HS. Cooperativity in Protein-Folding Kinetics. PNAS. 1993 March 1;90(5):1942–1946. doi: 10.1073/pnas.90.5.1942. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Doi M, Edwards SF. The Theory of Polymer Dynamics. Clarendon Press; Oxford: 1986. [Google Scholar]
  • [28].Dunker AK, Cortese MS, Romero P, Iakoucheva LM, Uversky VN. Flexible nets - The roles of intrinsic disorder in protein interaction networks. FEBS Journal. 2005 Oct;272(20):5129–5148. doi: 10.1111/j.1742-4658.2005.04948.x. [DOI] [PubMed] [Google Scholar]
  • [29].Dunker AK, Lawson JD, Brown CJ, Williams RM, Romero P, Oh JS, Oldfield CJ, Campen AM, Ratliff CR, Hipps KW, Ausio J, Nissen MS, Reeves R, Kang CH, Kissinger CR, Bailey RW, Griswold MD, Chiu M, Garner EC, Obradovic Z. Intrinsically disordered protein. Journal of Molecular Graphics and Modelling. 2001;19(1):26–59. doi: 10.1016/s1093-3263(00)00138-8. [DOI] [PubMed] [Google Scholar]
  • [30].Fang QJ, Shortle D. A consistent set of statistical potentials for quantifying local side-chain and backbone interactions. Proteins-Structure, Function and Bioinformatics. 2005 July 1;60(1):90–96. doi: 10.1002/prot.20482. [DOI] [PubMed] [Google Scholar]
  • [31].Fitzkee NC, Rose GD. Reassessing random-coil statistics in unfolded proteins. PNAS. 2004 August 24;101(34):12497–12502. doi: 10.1073/pnas.0404236101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Fitzkee NC, Rose GD. Sterics and solvation winnow accessible conformational space for unfolded proteins. Journal of Molecular Biology. 2005 Nov 4;353(4):873–887. doi: 10.1016/j.jmb.2005.08.062. [DOI] [PubMed] [Google Scholar]
  • [33].Flory PJ. Statistical Mechanics of Chain Molecules. John Wiley & Sons; 1969. (reprinted Hanser Publishers, Munich 1989) [Google Scholar]
  • [34].Frederick KK, Marlow MS, Valentine KG, Wand AJ. Conformational entropy in molecular recognition by proteins. Nature. 2007 July 19;448:325–330. doi: 10.1038/nature05959. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Gel’fand IM, Minlos RA, Shapiro Z, Ya. Representations of the Rotation and Lorentz Groups and Their Applications. Pergamon Press; New York: 1963. [Google Scholar]
  • [36].Gong HP, Rose GD. Does secondary structure determine tertiary structure in proteins? Proteins-Structure,Function and Bioinformatics. 2005 Nov 1;61(2):338–343. doi: 10.1002/prot.20622. [DOI] [PubMed] [Google Scholar]
  • [37].Grosberg A, Yu., Khokhlov AR. Statistical Physics of Macromolecules. American Institute of Physics; New York: 1994. [Google Scholar]
  • [38].Hsu D, Latombe JC, Motwani R. Path planning in expansive configuration spaces. International Journal of Computational Geometry and Applications. 1999 Aug-Oct;9(4-5):495–512. [Google Scholar]
  • [39].Jernigan RL, Bahar I. Structure-derived potentials and protein simulations. Current Opinion in Structural Biology. 1996 April;6(2):195–209. doi: 10.1016/s0959-440x(96)80075-3. [DOI] [PubMed] [Google Scholar]
  • [40].Karplus M, Weaver DL. Protein-Folding Dynamics. Nature. 1976;260(5550):404–406. doi: 10.1038/260404a0. [DOI] [PubMed] [Google Scholar]
  • [41].Kavraki LE, Svestka P, Latombe JC, Overmars MH. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics and Automation. 1996 Aug;12(4):566–580. [Google Scholar]
  • [42].Kazerounian K, Latif K, Rodriguez K, Alvarado C. Nano-kinematics for analysis of protein molecules. Journal of Mechanical Design. 2005 July;127(4):699–711. [Google Scholar]
  • [43].Kazerounian K. From mechanisms and robotics to protein conformation and drug design. Journal of Mechanical Design. 2004 Jan;126(1):40–45. [Google Scholar]
  • [44].Kim JS, Chirikjian GS. A unified approach to conformational statistics of classical polymer and polypeptide models. Polymer. 2005 Nov 28;46(25):11904–11917. doi: 10.1016/j.polymer.2005.09.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [45].Kim MK, Jernigan RL, Chirikjian GS. Rigid-cluster models of conformational transitions in macromolecular machines and assemblies. Biophysical Journal. 2005 July;89(1):43–55. doi: 10.1529/biophysj.104.044347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [46].Kortemme T, Morozov AV, Baker D. An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. Journal of Molecular Biology. 2003 Feb 28;326(4):1239–1259. doi: 10.1016/s0022-2836(03)00021-4. [DOI] [PubMed] [Google Scholar]
  • [47].Lavalle SM, Finn PW, Kavraki LE, Latombe JC. A randomized kinematics-based approach to pharmacophore-constrained conformational search and database screening. Journal of Computational Chemistry. 2000 July 15;21(9):731–747. [Google Scholar]
  • [48].Lazaridis T, Karplus M. Effective energy function for proteins in solution. Proteins-Structure, Function and Genetics. 1999 May 1;35(2):133–152. doi: 10.1002/(sici)1097-0134(19990501)35:2<133::aid-prot1>3.0.co;2-n. [DOI] [PubMed] [Google Scholar]
  • [49].Lee S, Chirikjian GS. Pose analysis of alpha-carbons in proteins. International Journal of Robotics Research. 2005 Feb-March;24(2-3):183–210. [Google Scholar]
  • [50].Lee S, Chirikjian GS. Inter-Helical Angle and Distance Preferences in Globular Proteins. Biophysical Journal. 2004 Feb;86:1105–1117. doi: 10.1016/S0006-3495(04)74185-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [51].Levitt M. Protein Folding by Restrained Energy Minimization and Molecular-Dynamics. Journal of Molecular Biology. 1983;170(3):723–764. doi: 10.1016/s0022-2836(83)80129-6. [DOI] [PubMed] [Google Scholar]
  • [52].Li Z, Raychaudhuri S, Wand J. Insights into the local residual entropy of proteins provided by NMR relaxation. Protein Science. 1996;5:2647–2650. doi: 10.1002/pro.5560051228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [53].Liu L, Chen S-J. Computing the conformational entropy for RNA folds. J. Chem. Phys. 2010;132:235104. doi: 10.1063/1.3447385. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [54].Lotan I, Schwarzer F, Halperin D, Latombe JC. Algorithm and data structures for efficient energy maintenance during Monte Carlo simulation of proteins. Journal of Computational Biology. 2004 Oct;11(5):902–932. doi: 10.1089/cmb.2004.11.902. [DOI] [PubMed] [Google Scholar]
  • [55].Manocha D, Zhu YS, Wright W. Conformational-Analysis of Molecular Chains Using Nano-Kinematics. Computer Applications in the Biosciences. 1995 Feb;11(1):71–86. doi: 10.1093/bioinformatics/11.1.71. [DOI] [PubMed] [Google Scholar]
  • [56].Mattice WL, Suter UW. Conformational Theory of Large Molecules, The Rotational Isomeric State Model in Macromolecular Systems. Wiley; New York: 1994. [Google Scholar]
  • [57].Mavroidis C, Dubey A, Yarmush ML. Molecular machines. Annual Review of Biomedical Engineering. 2004;6:363–395. doi: 10.1146/annurev.bioeng.6.040803.140143. [DOI] [PubMed] [Google Scholar]
  • [58].Miller W., Jr. Lie Theory and Special Functions. Academic Press; New York: 1968. also see Miller W., Jr. Some Applications of the Representation Theory of the Euclidean Group in Three-Space. Commun. Pure App. Math. 1964;17:527–540.
  • [59].Moult J. Comparison of database potentials and molecular mechanics force fields. Current Opinion in Structural Biology. 1997;7(2):194–199. doi: 10.1016/s0959-440x(97)80025-5. [DOI] [PubMed] [Google Scholar]
  • [60].Palmer AG., III Probing molecular motions by NMR. Current Opinion in Structural Biology. 1997;7:732–737. doi: 10.1016/s0959-440x(97)80085-1. [DOI] [PubMed] [Google Scholar]
  • [61].Pappu RV, Srinivasan R, Rose GD. The Flory isolated-pair hypothesis is not valid for polypeptide chains: Implications for protein folding. PNAS. 2000 Nov 7;97(23):12565–12570. doi: 10.1073/pnas.97.23.12565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [62].Patriciu A, Chirikjian GS, Pappu RV. Analysis of the conformational dependence of mass-metric tensor determinants in serial polymers with constraints. Journal of Chemical Physics. 2004 Dec. 22121(24):12708–12720. doi: 10.1063/1.1821492. [DOI] [PubMed] [Google Scholar]
  • [63].Radivojac P, Obradovic Z, Smith DK, Zhu G, Vucetic S, Brown CJ, Lawson JD, Dunker AK. Protein flexibility and intrinsic disorder. Protein Science. 2004 JAN;13(1):71–80. doi: 10.1110/ps.03128904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [64].Ramachandran GN, Ramakrishnan C, Sasisekharan V. Stereochemistry of Polypeptide Chain Configurations. Journal of Molecular Biology. 1963;7(1):95–99. doi: 10.1016/s0022-2836(63)80023-6. [DOI] [PubMed] [Google Scholar]
  • [65].Rhee YM, Pande VS. On the role of chemical detail in simulating protein folding kinetics. Chemical Physics. 2006;323:66–77. [Google Scholar]
  • [66].Rienstra CM, Tucker-Kellogg L, Jaroniec CP, Hohwy M, Reif B, McMahon MT, Tidor B, Lozano-Perez T, Griffin RG. De novo determination of peptide structure with solid-state magic-angle spinning NMR spectroscopy. PNAS. 2002 Aug 6;99(16):10260–10265. doi: 10.1073/pnas.152346599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [67].Shannon CE. A mathematical theory of communication. Bell System Technical Journal. 1948 July and October;Vol. 27:379–423. 623–656. [Google Scholar]
  • [68].Skliros A, Chirikjian GS. Positional and Orientational Distributions for Locally Self-Avoiding Random Walks with Obstacles. Polymer. 2008 March;49(6):1701–1715. doi: 10.1016/j.polymer.2008.01.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [69].Shehu A, Clementi C, Kavraki LE. Modeling Protein Conformational Ensembles: From Missing Loops to Equilibrium Fluctuations. Proteins: Structure, Function, and Bioinformatics. 2006;65(1):164–179. doi: 10.1002/prot.21060. [DOI] [PubMed] [Google Scholar]
  • [70].Shortle D, Ackerman MS. Persistence of native-like topology in a denatured protein in 8 M urea. Science. 2001 July 20;293(5529):487–489. doi: 10.1126/science.1060438. [DOI] [PubMed] [Google Scholar]
  • [71].Sugiura M. Unitary Representations and Harmonic Analysis. 2nd edition Elsevier Science Publisher; The Netherlands: 1990. [Google Scholar]
  • [72].Talman J. Special Functions. W. A. Benjamin, Inc.; Amsterdam: 1968. [Google Scholar]
  • [73].Tang XY, Kirkpatrick B, Thomas S, Song G, Amato NM. Using motion planning to study RNA folding kinetics. Journal of Computational Biology. 2005 July;12(6):862–881. doi: 10.1089/cmb.2005.12.862. [DOI] [PubMed] [Google Scholar]
  • [74].Teodoro M, Phillips GN, Jr., Kavraki LE. Molecular Docking: A Problem with Thousands of Degrees of Freedom. Proc. of the 2001 IEEE International Conference on Robotics and Automation (ICRA 2001); May 2001; Seoul, Korea: IEEE press; pp. 960–966. [Google Scholar]
  • [75].Thomas S, Song G, Amato NM. Protein folding by motion planning. Physical Biology. 2005 Dec;2(4):S148–S155. doi: 10.1088/1478-3975/2/4/S09. [DOI] [PubMed] [Google Scholar]
  • [76].Vajda S, Sippl M, Novotny J. Empirical potentials and functions for protein folding and binding. Current Opinion in Structural Biology. 1997;7(2):222–228. doi: 10.1016/s0959-440x(97)80029-2. [DOI] [PubMed] [Google Scholar]
  • [77].Vilenkin NJ, Klimyk AU. Representation of Lie Group and Special Functions. Vol. 1-3. Kluwer Academic Publishers; The Netherlands: 1991. [Google Scholar]
  • [78].Vucetic S, Obradovic Z, Vacic V, Radivojac P, Peng K, Iakoucheva LM, Cortese MS, Lawson JD, Brown CJ, Sikes JG, Newton CD, Dunker AK. DisProt: a database of protein disorder. Bioinformatics. 2005 JAN 1;21(1):137–140. doi: 10.1093/bioinformatics/bth476. [DOI] [PubMed] [Google Scholar]
  • [79].Wang CSE, Lozano-Perez T, Tidor B. AmbiPack: A systematic algorithm for packing of macromolecular structures with ambiguous distance constraints. Proteins-Structure Function and Genetics. 1998 July 1;32(1):26–42. doi: 10.1002/(sici)1097-0134(19980701)32:1<26::aid-prot5>3.0.co;2-c. [DOI] [PubMed] [Google Scholar]
  • [80].Wang JY, Crippen GM. Statistical mechanics of protein folding with separable energy functions. Biopolymers. 2004 June 15;74(3):214–220. doi: 10.1002/bip.20077. [DOI] [PubMed] [Google Scholar]
  • [81].Wang Y, Chirikjian GS. Nonparametric Second-Order Theory of Error Propagation on the Euclidean Group. International Journal of Robotics Research. 2008 November/December;27(1112):1258–1273. doi: 10.1177/0278364908097583. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [82].Wang Y, Chirikjian GS. Workspace Generation of Hyper-Redundant Manipulators as a Diffusion Process on SE(N) IEEE Transactions on Robotics and Automation. 2004 June;20(3):399–408. [Google Scholar]
  • [83].Yang D, Kay LE. Contributions to Conformational Entropy Arising from Bond Vector Fluctuations Measured from NMR-Derived Order Parameters: Application to Protein Folding. Journal of Molecular Biology. 1996;263:369–382. doi: 10.1006/jmbi.1996.0581. [DOI] [PubMed] [Google Scholar]
  • [84].Zhang J, Lin M, Chen R, Wang W, Liang J. Discrete state model and accurate estimation of loop entropy of RNA secondary structures. J. Chem. Phys. 2008;128:125107. doi: 10.1063/1.2895050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [85].Zhang M, White RA, Wang L, Goldman R, Kavraki LE, Hassett B. Improving Conformational Searches by Geometric Screening. Bioinformatics. 2005;21(5):624–630. doi: 10.1093/bioinformatics/bti055. [DOI] [PubMed] [Google Scholar]
  • [86].Zhou H-X. Loops in Proteins Can Be Modeled as Worm-Like Chains. J. Phys. Chem. B. 2001;105:6763–6766. [Google Scholar]
  • [87].Zhou Y, Chirikjian GS. Conformational Statistics of Semi-Flexible Macromolecular Chains with Internal Joints. Macromolecules. 2006;39(5):1950–1960. doi: 10.1021/ma0512556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [88].Zhou Y, Chirikjian GS. Conformational statistics of bent semiflexible polymers. Journal of Chemical Physics. 2003 Sept. 1119(9):4962–4970. [Google Scholar]

RESOURCES