Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2007 Sep 11;104(38):14899–14904. doi: 10.1073/pnas.0607833104

Generalization of distance to higher dimensional objects

Steven S Plotkin 1,1,
PMCID: PMC1986585  PMID: 17848528

Abstract

The measurement of distance between two objects is generalized to the case where the objects are no longer points but are one-dimensional. Additional concepts such as nonextensibility, curvature constraints, and noncrossing become central to the notion of distance. Analytical and numerical results are given for some specific examples, and applications to biopolymers are discussed.


The distance, as conventionally defined between two zero-dimensional objects (points) A and B at positions rA and rB, is the minimal arclength travelled in the transformation from A to B. A transformation r(t) between A and B is a vector function that may be parametrized by a scalar variable t: 0 ≤ tT, r(0) = rA, r(T) = rB, and the distance travelled is a functional of r(t). The (minimal) transformation r*(t) is an object of dimension one higher than A or B; i.e., it yields a distance that is one-dimensional. The distance Inline graphic* is found through the variation of the functional (1):

graphic file with name zpq03807-7212-m01.jpg
graphic file with name zpq03807-7212-m02.jpg

Here, ẋ = dx/dt, = dr/dt, and we have let gμv = δμv (Euclidean metric). The boundary conditions mentioned above are present at the end points of the integral. The Einstein summation convention will be used where convenient (e.g., eq. 1b); however, all the analysis here deals with spatial coordinates, ν = 1, 2, 3, on a Euclidean metric. Generalizations to dimension higher than 3, as well as non-Euclidean metrics, are straightforward to incorporate into the formalism.

On a Euclidean metric, the minimal distance becomes the diagonal of a hyper-cube. However, formulated as above, the solutions minimizing Inline graphic are infinitely degenerate, because particles moving at various speeds but tracing the same trajectory over the total time T all give the same distance. To circumvent this problem, what is typically done is to let one of the space variables (e.g., x) become the independent variable. However, for higher dimensional objects or zero dimensional objects on a manifold with nontrivial topology, there is no guarantee that the dependent variables (y, z) constitute single valued functions of x. Alternatively, one can study the “time” trajectory of the parametric curve defined above, but under a gauge that fixes the speed to a constant, vo, for example. One can either fix the gauge from the outset with Lagrange multipliers, or choose a gauge that may simplify the problem after finding the extremum equations. The latter is often simpler in practice.

To be specific, the effective Lagrangian Inline graphic appearing in the above problem is Inline graphic, and the Euler–Lagrange (EL) equations are

graphic file with name zpq03807-7212-m03.jpg

with the unit vector in the direction of the velocity. The boundary conditions are

graphic file with name zpq03807-7212-m04.jpg

Because the derivative of a unit vector is always orthogonal to that vector, Eq. 2 says that the direction of the velocity cannot change, and therefore straight line motion results. Applying the boundary conditions gives = (rBrA)/|rBrA|. However, any function v(t) = |vo(t)| satisfying the boundary conditions is a solution, so long as ∫0Tdt |vo(t)| = |rBrA|. This is the infinite degeneracy of solutions mentioned above. Then r*(t) = rA + (rBrA)/|rBrA| ∫0t dt |vo(t)|, and Inline graphic* = ∫0Tdt Inline graphic = ∫0Tdt | vo(t)| = | rBrA|. At this point we could fix the parameterization by choosing |vo(t)| = |rBrA|/T (constant speed), for example.

The extremum is a minimum, as can be shown by analyzing the eigenvalues of the matrix ∂2Inline graphic/∂xν(t)∂xμ(t′) = −δμνδ″ (tt′). Diagonalizing by Fourier transform gives positive elements +ω n2 δμνδ(ωn − ω′n) for the stability matrix and thus positive eigenvalues.

In what follows, we generalize the notion of distance to higher dimensional objects, specifically space curves. We will see many of the above themes reiterated, as well as some fundamentally new features that emerge when one treats the space curves as nonextensible, having some persistence length or curvature constraint, and noncrossing or unable to pass through themselves. We provide analytical and numerical results for some prototypical examples for nonextensible chains, and we lay the foundations for treating curvature and noncrossing constraints.

Distance Metric for One-Dimensional Objects

The distance Inline graphic* between two one-dimensional objects (which we refer to as space curves or strings) A and B having configurations rA(s) and rB(s), 0 ≤ sL, is obtained from the transformation from A to B that minimizes the integrated distance travelled. By integrated distance we mean the cumulative arclength all elements of the string had to move in the transformation from A to B. For the transformation to exist, strings A and B must have the same length (although this condition may be relaxed by allowing specific extensions or contractions). For the distance to be finite, open space curves must be finite in length. For closed non-crossing space curves, A and B must be in the same topological class for the transformation to exist. Describing the transformation r(s, t) requires two scalar parameters, one for arc length s along the string and another measuring progress as in the zero-dimensional case, say t: 0 ≤ tT, so that r(s, 0) = rA(s) and r(s, T) = rB(s). The distance travelled is a functional of the vector function r(s, t). The minimal transformation r*(t, s) is an object of dimension one higher than A or B, i.e., it yields a distance that is two-dimensional. The problem does not map to a simple soap film, since there are many configuration pairs that have zero area between them but nonzero distance travelled, e.g., a straight line displaced along its own axis, or that in Fig. 1C. The analogue to a higher-dimensional surface of minimal area when the “time” t is included is closer but inexact (see footnote below).

Fig. 1.

Fig. 1.

Three representative pairs of curves. (A) Straight line curve rotated by π/2. (B) One string has a finite radius of curvature, the other is straight. (C) A canonical example where noncrossing is important; the curves are displaced for easy visualization but should be imagined to be superimposed.

We can construct the effective Lagrangian along the same lines as the zero-dimensional case. Using the shorthand rr(s, t), ≡ ∂r/∂t, r′ ≡ ∂r/∂s, the distance travelled is

graphic file with name zpq03807-7212-m05.jpg

However, to meaningfully represent the distance a string must move to reconfigure itself from conformation A to B, the transformation must be subject to several auxiliary conditions.

The first of these is nonextensibility. Points along the space curve cannot move independently of one another but are constrained to integrate to fixed length, so the curve cannot stretch or contract. Thus there is a Lagrange multiplier λ(s, t) weighting the (nonholonomic) constraint:

graphic file with name zpq03807-7212-m06.jpg

This constraint ensures a parameterization of the string with unit tangent vector = r′, so that the total length of the string is L = ∫0Lds Inline graphic = ∫0Lds. In the language of differential geometry, the space curve is a unit-speed curve.

If the constraint (Eq. 5) were not present in Eq. 4, each point along the space curve could follow a straight line path from A to B and the problem of minimizing the distance would be trivial. Equivalently, setting λ = 0 should reduce the problem to a sum of straight lines analogously to the zero-dimensional case above.

As in the case of distance between points, one can fix the t-parameterization from the outset by introducing a Lagrange multiplier α(t) that fixes the total distance covered per time ∫0Lds Inline graphic to a known function f(t). Although this approach removes the infinite degeneracy mentioned above, as a global isoperimetric condition it reduces the symmetry of the problem. For example, there would then be no conservation law that could be written to capture the invariance of the effective Lagrangian with respect to the independent variable t. For these reasons, we choose to leave the answer as unparamaterized with respect to t, analogous to the point-distance case above.

Ideal Chains.

There are many examples of transformations between two strings A and B where chain noncrossing is unimportant (e.g., Fig. 1 A and B). Here we derive the EL equations for this case.

From Eqs. 4 and 5, the extrema of the distance D are found from

graphic file with name zpq03807-7212-m07.jpg

Performing the variation gives

graphic file with name zpq03807-7212-m08.jpg

where the generalized momenta pt and ps are given by:

graphic file with name zpq03807-7212-m09.jpg

where is again the unit velocity vector, and is the unit tangent to the curve.

The EL equation follows from the last term in Eq. 7, and yields a partial differential equation for the minimal transformation r*(s, t):

graphic file with name zpq03807-7212-m10.jpg

where we have used the facts that |r′| = 1 and r′ · r″ ≡ · κ = 0, since the tangent is always orthogonal to the curvature at any given point along a space curve.

Eq. 9 can be written in terms easier to understand intuitively by using the unit velocity vector , tangent , and curvature κ:§

graphic file with name zpq03807-7212-m11.jpg

Comparing Eqs. 10 and 2, we confirm that setting the Lagrange multiplier λ corresponding to the nonextensibility condition to zero results in straight line solutions for all points along the space curve. Conversely, the condition that the space curve form a contiguous object results generally in nonzero deviation from straight line motion. So in comparing various extremal solutions to Eq. 10, the minimal solution tends to minimize |λ| everywhere.

The boundary conditions are obtained from the first two terms in Eq. 7. Because the initial and final configurations are specified, the variation δr vanishes at t = 0, T, and the corresponding boundary conditions, or initial and final conditions, are:

graphic file with name zpq03807-7212-m12.jpg

Because the end points of the string are free during the transformation, δr ≠ 0 at s = 0, L, and so the conjugate momenta must vanish: ps(0, t) = ps(L, t) = 0. This means that λ = 0 at the end points. However, because cannot be zero, the only way this can occur is for λ(0, t) = λ(L, t) = 0. The Lagrange multiplier, which represents the conjugate force or tension to ensure an inextensible chain, must vanish at the end points of the string. If λ = 0, the EL Eq. 10 gives Inline graphic = λ′ at the end points. However, because is a unit vector, Inline graphic is orthogonal to (or v), and we have finally the boundary conditions at the end points of the string:

graphic file with name zpq03807-7212-m13.jpg

Eq. 12 has three possible solutions. One is that v · = 0 or equivalently · r′ = 0, which corresponds to pure rotation of the end points. It is worth mentioning that the end points of the classical relativistic string also move transversely to the string. Moreover because of the Minkowski metric the end points must also move at the speed of light. Here, however, because Lorentz invariance is not at issue, additional solutions are possible. The end points of our string can be at rest, v = 0, and satisfy Eq. 12. The last solution of Eq. 12 is for λ′ = 0. Because λ also vanishes at the end points, Eq. 10 gives Inline graphic = 0, or straight line motion. In summary, the three possible boundary conditions for the string end points are:

graphic file with name zpq03807-7212-m14.jpg
graphic file with name zpq03807-7212-m15.jpg
graphic file with name zpq03807-7212-m16.jpg

Whether an extremal transformation is a minimum can be determined by examining the second variation of the functional (Eq. 6):

graphic file with name zpq03807-7212-m17.jpg

where Iij = (2δij − ẋij)/||3 and Λij = −λ(s, t) δij, and δr′ and δ are the s and t derivatives of the variation δr from the extremal path.

We now apply these concepts to some specific examples.

Examples.

Translations.

If two space curves differ by a translation, rB(s) = rA(s) + d with d a constant vector. The appropriate boundary condition for the end points is Eq. 13c. The points along the string can all satisfy Eq. 10 with Inline graphic = 0 and λ = 0 everywhere (because , κ ≠ 0), and straight line motion results: r*(s, t) = rA(s) + (rB(s) − rA(s))t/T. The distance Inline graphic* = L|d|. This is the one-dimensional analogue to Eqs. 2 and 3.

Piece-wise linear space curves.

Suppose initially the curvature of some section of the string is zero. Then, taking the dot product of v with Eq. 10, we see that Eq. 12 holds for all points along the string. So the string either rotates or translates (or remains at rest if that segment has completed the transformation). If both rA and rB are straight lines as in Fig. 1A, Eq. 12 holds for both. It is then reasonable to seek solutions r* of the EL equation such that Eq. 12 holds for all (s, t).

Consider the two space curves shown in Fig. 1A with rA(s) = s and rB(s) = s ŷ, both with curvature κ = 0. We first investigate rotation from A to B. This transformation satisfies the EL equation so appears to be extremal: r = s = s(cos ωt + sin ωtŷ). The velocity = sωθ̂, so the distance D[rROT(s, t)] = πL2/4. Taking the dot product of with Eq. 10 gives λ′ = · Inline graphic = −ω or λ(s, t) = λo − ωs. For the transformation to be extremal, the conjugate momenta must also vanish at the string end points, or λ(0, t) = λ(L, t) = 0. This is impossible to achieve with this functional form, so the transformation is not extremal, unless we include the subsidiary condition here that rA(0, t) = rB(0, t). Then the end point of the string at s = 0 is determined, and the variations δr(0, t) must vanish. Now only λ(L, t) = 0, and so λ(s, t) = ω(Ls). The transformation is extremal.

Whether it is a minimum can be determined by examining the second variation (Eq. 14). For the transformation rROT(s, t), the matrix I in Eq. 14 is nonnegative definite, a necessary condition for a local minimum (1); however, Λ is negative definite, so the character of the extremum is determined by the interplay of the two terms in Eq. 14. Variations δr that preserve r2 = 1 or 2 · δr′ = 0 are satisfied in this example by δr = f(s, t)θ̂, where f(s, t) must satisfy the boundary conditions δr(0, t) = δr(s, 0) = δr(s, T) = 0. We thus let the variations have the functional form: δr = ε sin(ks) sin(nπt/T)θ̂, where θ̂ = − sin ωt + cos ωtŷ, n is a positive integer, and k is unrestricted. Inserting this functional form for the variations into Eq. 14 gives δ2D = (ε2π/8)Inline graphic(kL), where Inline graphic (x) is a nonpositive, monotonically decreasing function, with a maximum of zero at kL = 0. In fact to lowest order Inline graphic(kL) ≈ −(πε2/2160) (kL)6. The extremum corresponding to pure rotation of curve rA into rB is a maximum!

The only other solution to Eqs. 10 and 12 for all (s, t) is for each point s on rA(s) to be connected to a corresponding point on rB(s) by a straight line, corresponding to Eq. 13c. Eq. 12 holds everywhere because λ′(s, t) = 0. Because λ is zero at the boundaries, it is thus zero everywhere.

An intermediate configuration then has the shape of a piecewise linear curve with a right angle “kink” at s*(t) (see Fig 2). As t progresses, the kink propagates along curve rB, and the “free” part of the chain follows straight line diagonal motion, shrinking as its left end is overlaid onto curve rB. The solution for the velocity at all (s, t) is given by v(s, t) = vo(t)Θ (ss*(t)) êv where s*(t) is the position of the tangent discontinuity in Fig. 2, which goes from s*(0) = 0 to s*(T) = L as t goes from 0 to T. êv is a unit vector along the direction of the velocity, Inline graphic, and vo(t) is a speed which can be taken constant. By simple geometry, vo=2*. Because s*(T) = L, vo=2L/T and s*(t) = Lt/T. The total distance travelled from Eq. 4 is then Inline graphic* = L2/2.

Fig. 2.

Fig. 2.

The minimal transformation from A to B in Fig. 1A involves the propagation of a kink along curve B. The end point of the curve at intermediate states satisfies x + y = L, the equation for a straight line. A similar linear equation holds for any point on the curve; thus, no solution with shorter distance can exist. An intermediate configuration is shown in red. Alternative transformations are possible with kinks along A as well as multiple kinks (see text).

Because the transformation involves straight line motion, it is minimal. This can be seen from the second variation Eq. 14. The shape of the curve at all times is given by

graphic file with name zpq03807-7212-m18.jpg

Taking variations from the extremal path as before, let δr = ε sin k(sLt/T) sin(nπt/T)Θ(sLt/T)ŷ. These variations only act on the “free” part of the string and preserve a unit tangent to first order. The matrix Λ in Eq. 14 is zero for straight line transformations where λ = 0. The quadratic form δ · I · δ is nonnegative, and results in a second variation δ2Inline graphic = ε2(322)1[(kL)2+(nπ)2(1sinc(kL))], which is nonnegative, monotonically increasing in kL, and quadratic to lowest order, with a minimum of zero at kL = 0. The transformation is indeed minimal.

Likewise, the minimal distance to fold a string of length L upon itself starting from a straight line (to form a hairpin) is Inline graphic* = L2/4.

Solution degeneracy.

In the above example, one can piece together various rotations and translations for parts or all of the chain while still satisfying the EL equations. This infinity of extrema renders the solution of Eq. 9 by direct numerical integration very difficult. For these reasons we apply a method based on analytic geometry to obtain numerical solutions, described in more detail below.

There is also an infinite degeneracy of solutions having the minimal distance in the above example. To see a second minimal transformation, imagine running the above solution backward in time, so the kink propagates from s = L to s = 0 along rB. However, this solution should hold forward in time for the original problem if we permute rB and rA. Now, intermediate states r* first run along , then ŷ. But then we can introduce multiple right–angle kinks in various places, without causing the trajectories in the transformation to deviate from straight lines, so that intermediate states look like staircases. Because there are an infinite number of possible staircases in the continuum limit, there is an infinite degeneracy. This can lead to a tangent vector r′ whose magnitude is length-scale dependent, and less than unity until s → 0. For example, an intermediate configuration can be drawn in Fig. 2 that appears as a straight diagonal line from r*(0, t) to r*(L, t), until s → 0 when an infinite number of step discontinuities are revealed. This problem is resolved in practice through finite-size effects involving different critical angles of rotation described below. In the continuum limit, it is resolved by introducing curvature constraints.

Curvature constraints.

In applications to polymer physics, chains have a stiffness characterized by bending potential in the analysis that is proportional to the square of the local curvature. Here, we may choose to characterize stiffness by introducing a constraint on the configurations of the space curve, so that the curvature simply cannot exceed a given number:

graphic file with name zpq03807-7212-m19.jpg

This term lifts the infinite degeneracy mentioned above, as each near-kink (with curvature κ now = κc) would result in slight deviations from linear motion in the above example, and thus an additional cost in the effective action. Other functional forms for Vκ are also possible. For some applications, a more conventional stiffness potential of the form Vκ(r″) = 1/2 Aκr2 may be more appropriate. However, the action would no longer consist of a true distance functional, and its minimization would involve the detailed interplay of the parameter Aκ favoring globally minimal curvature with other factors affecting distance in the problem.

Discrete chains.

Strings with a finite number of elements (chains) provide a more accurate representation of real-world systems such as biopolymers. Discretization is also essential for numerical solutions in these more realistic cases. Monomers on a discretized chain travel along a curved metric (3), and Lagrange multipliers explicitly account for this fact here.

We start by discretizing the string into a chain of N links each with length ds = L/N, so that Eq. 4 becomes Inline graphic, with each ri(t) a function of t only. The total distance is the accumulated distance of all the points joining the links, plus that of the end points, all times ds. This approach is essentially the method of lines for solving Eq. 10: the PDE becomes a set of N + 1 coupled ODEs.

Eq. 5 becomes N constraint equations added to the Lagrangian: i=1Nλi,i+1(ri+1ri)2. We rewrite this strictly for convenience as Σ(1/2) λi,i+1r i+1/i2, where ri+1/iri+1ri, and |ri+1/i| = L/N.

The PDE in Eq. 10 then becomes N + 1 coupled (vector) ODEs, each of the form

graphic file with name zpq03807-7212-m20.jpg

with λ0,1 = λN+1,N+2 = 0. Eq. 17 is consistent with Eq. 10 after suitable definitions, for example the curvature at point i after discretization is given by (ri+1/iri/i−1)/ds2.

One link.

We turn to the simplest problem of one link with end points A and B (see Fig. 3), for which the action reads Inline graphic. Points A and B have boundary conditions rA (0) = A, rB (0) = B, rA (T) = A′, rB(T) = B′. The link in our problem is taken to have a direction, so point A cannot transform to point B′. The EL equations become:

graphic file with name zpq03807-7212-m21.jpg

where the orthogonality of v and Inline graphic has been used.

Fig. 3.

Fig. 3.

Transformations between two rigid rods. A undergoes simultaneous translation and rotation and so is not extremal. B is extremal and minimal. The rod cannot rotate any less given that it translates first. However, this transformation is a weak or local minimum. C–E are extremal in bulk but not minimal because they violate corner conditions (A. Mohazab and S.S.P., unpublished data). F is the global minimum. It rotates the minimal amount, and both A and B move monotonically toward A′, B′. A purely straight-line transformation exists but involves moving point A away from A′ before moving toward it (similar to D), thus covering a larger distance than the minimal transformation.

Reminiscent of Eq. 12, Eqs. 18 each have three solutions. For point A, these are: (i) vA · rB/A = 0, or pure rotation of A about B, (ii) vA = 0 or point A is stationary, or (iii) λ = 0 and thus Inline graphicA = 0 from the EL equations, indicating straight-line motion. Moreover, i implies vB = 0, or both points rotate about a common center, ii implies vB · rB/A = 0 or B rotates, and iii implies Inline graphicB = 0 as well, so that both points move in straight lines. An extremal transformation thus involves either straight line motion, or rotations of one point about the other at rest (or common center), see Fig. 3 B–F.

The Lagrange multiplier may be found from the dot product of the EL equation for B with rB/A, which gives −ds2λ = rB/A · Inline graphicB. Thus, when B moves in a straight line λ = 0. When B rotates about A, its acceleration aB follows from rigid body kinematics as aA + α × rB/A − ω2rB/A, where ω and α are the angular velocity and acceleration, respectively, and aA = 0. Thus λ = 1/L.

The minimal solution is the one that involves the minimal amount of rotation (and monotonic approach to A′B′). This may be obtained from analytic geometry: for the example configurations in Fig. 3F, point B rotates about point A until B″, where the straight line B″B′ is tangent to the circle of radius ds = L about A. The distance (over ds) is AA′ + Lθc + B″′, where sin θc = L/(L+AA′) and BB=(AA)2+2L(AA), so, for example, if AA′ = 2L, Inline graphic ≈ 5.168 L2.

Chains with curvature.

We can now investigate the transformation shown in Fig. 1B with the above methods. This is the canonical example when at least one of the space curves has nonzero curvature κ. Let rA = R sin(πs/2L) + R cos(πs/2L)ŷ and rB = sx̂ + Rŷ, with 0 ≤ sL and R = 2L/π. We then discretize the chain into N segments. According to Eq. 17, the end point velocities Inline graphic1, Inline graphicN+1 obey EL equations of the same form as Eqs. 18, and thus either rotate or translate. The situation for these links is analogous to Fig. 3 B and F, in that the angle the link must rotate depends on the order of translation and rotation. The geometry in Fig. 1B is analogous to transformations A′B′AB in Fig. 3 B and F, in that the critical angle θc the link must rotate is smaller if translation occurs first.

Fig. 4shows the two minimal solutions thus obtained. The transformation in Fig. 4A undergoes translation away from curve rA, and rotation at rB. It is the global minimum. The transformation in Fig. 4B rotates from rA through a larger critical angle (see Fig. 4B Inset), and then translates to rB. Both solutions have a soliton-like kink that propagates across either space-curve rB or rA.

Fig. 4.

Fig. 4.

Two minimal transformations between the curves shown in Fig. 1B, for N = 10 links. (A) The global minimal transformation r*(s, t), with Inline graphic* ≈ 0.330 L2. (B) A local minimum with Inline graphic ≈ 0.335 L2. In A, links with one end touching curve rB rotate; the others translate first from rA, rotating only when one end of a link has touched rB. In B, they rotate first from rA and then translate into rB. Dashed lines in A show the paths travelled for each bead. (A Inset) The total distance travelled as a function of the number of links N, with various N plotted as filled circles to indicate the rapid decrease and asymptotic limit to Inline graphic ≈ 0.251 L2. (B Inset) The minimal angle each link must rotate during the transformation; it is less for the transformation in A. Animations of these transformations are provided as supporting information (SI) Movies 1–4.

The minimal transformation follows these steps: (i) Link r2/1 rotates about r1, v1 = 0, v2 · r2/1 = 0, and the Lagrange multiplier representing the conjugate “force” λ12 ≠ 0. During this rotation, nodes 3, 4, … move in straight lines formed by their initial values rA3, rA4, … and the tangent points to circles of radius ds centered at rB2, rB3, …. The corresponding Lagrange constraint forces λ23, λ34, … are all zero. Links r3/2, r4/3, … all adjust their orientation to ensure straight-line motion of their end points (dashed lines in Fig. 4A), except for r2, which follows a curved path. (ii) When link r2/1 completes its rotation, it coincides with curve rB, and the process starts again with link r3/2, which begins its rotation about r2, whereas nodes 4, 5, … move in straight lines. This process continues until the final link rN+1/N rotates into place on rB. The transformation in Fig. 4B is essentially the time-reverse of the above, but starting at curve rB and ending on rA.

For ideal chains without curvature constraints, the distances obtained from the two transformations in Fig. 4 A and B differ nonextensively as the number of links N → ∞. Moreover, the distance for each transformation itself differs nonextensively from the mean root square distance MRSD=N1i=1N(rAirBi)2 as N → ∞. Specifically, the distance travelled by straight line motion scales as ds NLL2, whereas the distance travelled by rotational motion scales as ds (Nθ̄cds) ∼ L2/N.

On the other hand, curvature constraints as in Eq. 16 become more severe on consecutive links as N → ∞, and can yield extensive corrections to the distance. Specifically, the increase in distance ΔInline graphic due to curvature constraints scales like the radius of curvature R times N, because every node is affected by the rounded kink as it propagates. So ΔInline graphicds N RLR. The importance of this effect then depends on how R compares to L (the ratio of the persistence length to the total length). It does not vanish as N → ∞. Non-crossing constraints described below also yield extensive corrections to the distance travelled.

Noncrossing Space Curves.

The minimal transformation may be qualitatively different when chain crossing is explicitly disallowed. Fig. 1C illustrates a pair of curves that differ only by the order of chain crossing. They are displaced in Fig. 1C for easier visualization but should be imagined to overlap so the quantity ∫0L|rArB| ≈ 0, i.e., if they were ghost chains their distance would be nearly zero, and most existing metrics give zero distance between these curve pairs (see Table 1).

Table 1.

Values of the distance for various examples considered here, compared to other metrics

Curve pair Inline graphic* (L2) rmsd (L) (1-Q) χ§
Trivial translation |d|/L |d|/L 0 0
“L-curves”, Fig. 1A 1/2 2/3 0
Straight line to hairpin 1/4 1/6 1 1/2
“C-curve” - straight line, Fig. 4A 0.330 0.371 0.417
“C-curve” - straight line, Fig. 1A 0.251 0.334 1
“Over/under” curves, Fig. 1C (ℓ/L)2 ≈0 0†† 0
Single link, Fig. 3F‡‡ 5.168 7§§ ¶¶ ¶¶

rmsdN1i(rAirBi)2.

Fraction of shared contacts A has with B; see refs. 7 and 8 for definitions.

§Structural overlap function equal to 1 minus the fraction of residue pairs with similar distances in structures A and B. The formula in ref. 9 is used.

0/0 or undefined.

In the continuum limit.

††Assuming a contact is made at the junction.

‡‡For AA′ = 2 × link length.

§§Inline graphic > rmsd here because rmsd contains a factor of 2, whereas Inline graphic did not. An “effective distance” for the rod could divide by 2.

¶¶Undefined for a single link.

Analogous to the construction of Alexander polynomials for knots, if we form the orthogonal projection of these space curves onto a plane, there will be double points indicating one part of the curve crossing over or under another. If we trace the curve in an arbitrary but fixed direction, each double point occurs twice, once as underpass and once as an overpass. We may call the part of the curve between two consecutive passes a bridge. If the bridge ends in an overpass we assign it +1, if the bridge ends in an underpass we assign it −1, so traversing from the left in Fig. 1C, curve rB has (+1) sense, and curve rA (−1). For transformations obeying noncrossing, a bridge can undergo change in sense ±2 to zero by moving from under or over the chain, whereas bridges in ghost chains undergo changes of sense by crossing from ±1 to ∓1 directly.

The non-crossing condition means that the Lagrangian for the minimal transformation now depends on the position r(s, t) of the space curve, which may be accounted for using an Edwards potential: VNC([r(s, t)]) = ∫0Lds10Lds2 δ(r(s1, t) − r(s2, t)) In practice, a Gaussian may be used to approximate the delta function, with a variance that may be adjusted to account for the thickness or volume of the chain.

The EL equation now becomes

graphic file with name zpq03807-7212-m22.jpg

where the curvature potential in Eq. 16 has been included, and the notation (Inline graphicr)s ≡ (d/ds)(∂Inline graphic/∂r′) has been used. Eq. 10 is now modified to

graphic file with name zpq03807-7212-m23.jpg

To access various conformations, the minimal transformation must now abide by the nontrivial geometrical constraints that are induced by non-crossing. In general, this renders the problem difficult; however, the example in Fig. 1C is simple enough to propose a mechanism for the minimal transformation consistent with the developments above, without explicitly solving the EL equations in this case. In analogy with the hairpin transformation described below Eq. 15, the transformation here involves essentially forming and then unforming a hairpin. rA(N) (the blue end of curve rA in Fig. 1C) propagates back along its own length until it reaches the junction, where it then rotates over it to become the overpass (this “rotation” takes essentially zero distance in the continuum limit). The curve then doubles back, following its path in reverse to its starting point. This transformation is fully consistent with the allowed extremal rotations and translations of the discretized chain. The distance in the continuum limit is D = ∫0lds (2s) = ℓ2, where ℓ is the length of the shorter arm extending from the junction in Fig. 1C.

Discussion

The distance between finite objects of any dimension d is a variational problem, and may be calculated by minimizing a vector functional of d + 1 independent variables. Here we formulated the problem for space curves, where the function r*(s, t) defining the transformation from curve rA to curve rB gives the minimal distance Inline graphic.

We provided a general recipe for the solution to the problem through the calculus of variations. For simple cases, the solution is analytically tractable. Direct numerical methods are difficult due to multiple extrema. We employed a method that interpreted the discretized EL equations geometrically to obtain minimal solutions. The various solutions obtained here are summarized in Table 1 and compared with other similarity measures currently used.

The distance metric may be generalized to higher dimensional manifolds; for example, a two-dimensional surface needs three independent parameters to describe the transformation. The distance becomes Inline graphic = ∫dudvdt || and the constant unit area condition becomes |∂r/∂u × ∂r/∂v| = 1.

The question of a distance metric between configurations of a biopolymer has occupied the minds of many in the protein folding community for some time (c.f., for example, refs. 48). Such a metric is of interest for comparison between folded structures, as well as to quantify how close an unfolded or partly folded structure is to the native. Chan and Dill (5) investigated the minimum number of moves necessary to transform one lattice structure to another, in particular while breaking the smallest number of hydrogen bonds. Leopold et al. (4) investigated the minimum number of monomers that had to be moved to transform one compact conformation to another. Falicov and Cohen (6) investigated structural comparison by rotation and translation until the minimal area surface by triangulation was obtained between two protein structures.

The present theoretical framework allows computation of a minimal distance between proteins of the same length by rotating and translating until Inline graphic is minimized, as done in the calculation of rmsd. Comparison between different length proteins would involve the further optimization with respect to insertions or deletions.

It is interesting to ask which folded structures have the largest, or smallest, average distance 〈Inline graphic〉 from an ensemble of random coil structures, and also whether the accessibility of these structures in terms of Inline graphic translates to their folding rates. It can also be determined whether the distance to a structure correlates with kinetic proximity in terms of its probability pF to fold before unfolding (7), by calculating 〈Inline graphicpF〉. The question of the most accessible or least accessible structure may be formulated variationally as a free-boundary or variable end-point problem.

It is an important future question to address whether the entropy of paths to a particular structure is as important as the minimal distance. In this sense, it may be the finite “temperature” (β < ∞) partition function Z(β) = ∫d[r(s, t)] exp (−βInline graphic[r(s, t)]), i.e., the sum over paths weighted by their “actions,” which is the most important quantity in determining the accessibility between structures. This has an analogue to the quantum string: we investigated only Z(∞) here. We hope that this work proves useful in laying the foundations for unambiguously defining distance between biomolecular structures in particular and high-dimensional objects in general.

Supplementary Material

Supporting Movies

Acknowledgments

We are grateful to Ali Mohazab, Moshe Schecter, Matt Choptuik, and Bill Unruh for insightful discussions. This work was supported by the Natural Sciences and Engineering Research Council and the A. P. Sloan Foundation.

Abbreviation

EL

Euler–Lagrange

MRSD

mean root square distance.

Footnotes

The author declares no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/cgi/content/full/0607833104/DC1.

The distance-metric action in Eq. 4 bears a strong resemblance to the Nambu–Goto action for a classical relativistic string (2): SNG[r(s,t)]=dσdτ(r·r)2(r)2(r)2, where r in SNG is now a four-vector and the dot product is the relativistic dot product. This action is physically interpreted as the (Lorentz invariant) world-sheet area of the string. If Eq. 4 could be mapped by suitable choice of gauge to the minimization of the Nambu–Goto action, one could exploit here the same reparameterization invariance that results in wave equation solutions to the equations of motion for the classical relativistic string, by choosing a parameterization such that ·r′ = 0 (for the purely geometrical problem, the discriminant under the square root in the action has opposite sign). Unfortunately, however, because the velocity in the distance-metric action is a 3-velocity rather than a 4-velocity, our action only accumulates area when parts of the string move in 3-space, in contrast to the Nambu–Goto action that accumulates area even for a static string. The distance-metric action Eq. 4 has a lower symmetry than that for the classical relativistic string. Inline graphic* cannot depend on the time the transformation took, whereas the world sheet area does. Conversely, if we take, for example, configuration A at t = 0 to be a straight line of length L, and configurations B at t = T to be the same straight line but displaced along its own axis by varying amounts d, the geometrical area for all transformations would be LT, whereas the distances Inline graphic*AB for each transformation would be Ld.

§

The invariance of the Lagrangian to (s, t) leads to conservation laws by Noether's theorem (1), which here take the form of divergence conditions. However, these generally contain no new information beyond the EL equations and can be obtained by dotting Eq. 10 with either r′ to give λ′ = Inline graphic · or to give v · (λ)′ = 0.

The MRSD is always less than or equal to the rmsd between structures, as can be shown by applying Hölder's inequality.

References

  • 1.Gelfand IM, Fomin SV. Calculus of Variations. New York: Dover; 2000. [Google Scholar]
  • 2.Zwiebach B. A First Course in String Theory. New York: Cambridge Univ Press; 2004. [Google Scholar]
  • 3.Grosberg AY. In: Computational Soft Matter: From Synthetic Polymers to Proteins. Attig N, Binder K, Grubmüller H, Kremer K, editors. Vol 23. Bonn: John von Neumann Institut für Computing; 2004. pp. 375–399. NIC series. [Google Scholar]
  • 4.Leopold PE, Montal M, Onuchic JN. Proc Natl Acad Sci USA. 1992;89:8721–8725. doi: 10.1073/pnas.89.18.8721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Chan HS, Dill KA. J Chem Phys. 1994;100:9238–9257. [Google Scholar]
  • 6.Falicov A, Cohen FE. J Mol Biol. 1996;258:871–892. doi: 10.1006/jmbi.1996.0294. [DOI] [PubMed] [Google Scholar]
  • 7.Du R, Pande VS, Grosberg AY, Tanaka T, Shakhnovich ES. J Chem Phys. 1998;108:334–350. [Google Scholar]
  • 8.Cho SS, Levy Y, Wolynes PG. Proc Natl Acad Sci USA. 2006;103:586–591. doi: 10.1073/pnas.0509768103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Veitshans T, Klimov D, Thirumalai D. Folding Des. 1996;2:1–22. doi: 10.1016/S1359-0278(97)00002-3. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Movies
Download video file (287.8KB, mpg)
Download video file (3MB, mpg)
Download video file (1.6MB, mpg)
Download video file (5MB, mpg)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES