Abstract
The measurement of distance between two objects is generalized to the case where the objects are no longer points but are one-dimensional. Additional concepts such as nonextensibility, curvature constraints, and noncrossing become central to the notion of distance. Analytical and numerical results are given for some specific examples, and applications to biopolymers are discussed.
The distance, as conventionally defined between two zero-dimensional objects (points) A and B at positions rA and rB, is the minimal arclength travelled in the transformation from A to B. A transformation r(t) between A and B is a vector function that may be parametrized by a scalar variable t: 0 ≤ t ≤ T, r(0) = rA, r(T) = rB, and the distance travelled is a functional of r(t). The (minimal) transformation r*(t) is an object of dimension one higher than A or B; i.e., it yields a distance that is one-dimensional. The distance
* is found through the variation of the functional (1):
![]() |
Here, ẋ = dx/dt, ṙ = dr/dt, and we have let gμv = δμv (Euclidean metric). The boundary conditions mentioned above are present at the end points of the integral. The Einstein summation convention will be used where convenient (e.g., eq. 1b); however, all the analysis here deals with spatial coordinates, ν = 1, 2, 3, on a Euclidean metric. Generalizations to dimension higher than 3, as well as non-Euclidean metrics, are straightforward to incorporate into the formalism.
On a Euclidean metric, the minimal distance becomes the diagonal of a hyper-cube. However, formulated as above, the solutions minimizing
are infinitely degenerate, because particles moving at various speeds but tracing the same trajectory over the total time T all give the same distance. To circumvent this problem, what is typically done is to let one of the space variables (e.g., x) become the independent variable. However, for higher dimensional objects or zero dimensional objects on a manifold with nontrivial topology, there is no guarantee that the dependent variables (y, z) constitute single valued functions of x. Alternatively, one can study the “time” trajectory of the parametric curve defined above, but under a gauge that fixes the speed to a constant, vo, for example. One can either fix the gauge from the outset with Lagrange multipliers, or choose a gauge that may simplify the problem after finding the extremum equations. The latter is often simpler in practice.
To be specific, the effective Lagrangian
appearing in the above problem is
, and the Euler–Lagrange (EL) equations are
with v̇ the unit vector in the direction of the velocity. The boundary conditions are
Because the derivative of a unit vector is always orthogonal to that vector, Eq. 2 says that the direction of the velocity cannot change, and therefore straight line motion results. Applying the boundary conditions gives v̂ = (rB − rA)/|rB − rA|. However, any function v(t) = |vo(t)|v̂ satisfying the boundary conditions is a solution, so long as ∫0Tdt |vo(t)| = |rB − rA|. This is the infinite degeneracy of solutions mentioned above. Then r*(t) = rA + (rB−rA)/|rB−rA| ∫0t dt |vo(t)|, and
* = ∫0Tdt
= ∫0Tdt | vo(t)| = | rB − rA|. At this point we could fix the parameterization by choosing |vo(t)| = |rB − rA|/T (constant speed), for example.
The extremum is a minimum, as can be shown by analyzing the eigenvalues of the matrix ∂2
/∂xν(t)∂xμ(t′) = −δμνδ″ (t − t′). Diagonalizing by Fourier transform gives positive elements +ω n2 δμνδ(ωn − ω′n) for the stability matrix and thus positive eigenvalues.
In what follows, we generalize the notion of distance to higher dimensional objects, specifically space curves. We will see many of the above themes reiterated, as well as some fundamentally new features that emerge when one treats the space curves as nonextensible, having some persistence length or curvature constraint, and noncrossing or unable to pass through themselves. We provide analytical and numerical results for some prototypical examples for nonextensible chains, and we lay the foundations for treating curvature and noncrossing constraints.
Distance Metric for One-Dimensional Objects
The distance
* between two one-dimensional objects (which we refer to as space curves or strings) A and B having configurations rA(s) and rB(s), 0 ≤ s ≤ L, is obtained from the transformation from A to B that minimizes the integrated distance travelled. By integrated distance we mean the cumulative arclength all elements of the string had to move in the transformation from A to B. For the transformation to exist, strings A and B must have the same length (although this condition may be relaxed by allowing specific extensions or contractions). For the distance to be finite, open space curves must be finite in length. For closed non-crossing space curves, A and B must be in the same topological class for the transformation to exist. Describing the transformation r(s, t) requires two scalar parameters, one for arc length s along the string and another measuring progress as in the zero-dimensional case, say t: 0 ≤ t ≤ T, so that r(s, 0) = rA(s) and r(s, T) = rB(s). The distance travelled is a functional of the vector function r(s, t). The minimal transformation r*(t, s) is an object of dimension one higher than A or B, i.e., it yields a distance that is two-dimensional. The problem does not map to a simple soap film, since there are many configuration pairs that have zero area between them but nonzero distance travelled, e.g., a straight line displaced along its own axis, or that in Fig. 1C. The analogue to a higher-dimensional surface of minimal area when the “time” t is included is closer but inexact (see footnote below).
Fig. 1.
Three representative pairs of curves. (A) Straight line curve rotated by π/2. (B) One string has a finite radius of curvature, the other is straight. (C) A canonical example where noncrossing is important; the curves are displaced for easy visualization but should be imagined to be superimposed.
We can construct the effective Lagrangian along the same lines as the zero-dimensional case. Using the shorthand r ≡ r(s, t), ṙ ≡ ∂r/∂t, r′ ≡ ∂r/∂s, the distance travelled is‡
![]() |
However, to meaningfully represent the distance a string must move to reconfigure itself from conformation A to B, the transformation must be subject to several auxiliary conditions.
The first of these is nonextensibility. Points along the space curve cannot move independently of one another but are constrained to integrate to fixed length, so the curve cannot stretch or contract. Thus there is a Lagrange multiplier λ(s, t) weighting the (nonholonomic) constraint:
This constraint ensures a parameterization of the string with unit tangent vector t̂ = r′, so that the total length of the string is L = ∫0Lds
= ∫0Lds. In the language of differential geometry, the space curve is a unit-speed curve.
If the constraint (Eq. 5) were not present in Eq. 4, each point along the space curve could follow a straight line path from A to B and the problem of minimizing the distance would be trivial. Equivalently, setting λ = 0 should reduce the problem to a sum of straight lines analogously to the zero-dimensional case above.
As in the case of distance between points, one can fix the t-parameterization from the outset by introducing a Lagrange multiplier α(t) that fixes the total distance covered per time ∫0Lds
to a known function f(t). Although this approach removes the infinite degeneracy mentioned above, as a global isoperimetric condition it reduces the symmetry of the problem. For example, there would then be no conservation law that could be written to capture the invariance of the effective Lagrangian with respect to the independent variable t. For these reasons, we choose to leave the answer as unparamaterized with respect to t, analogous to the point-distance case above.
Ideal Chains.
There are many examples of transformations between two strings A and B where chain noncrossing is unimportant (e.g., Fig. 1 A and B). Here we derive the EL equations for this case.
From Eqs. 4 and 5, the extrema of the distance D are found from
![]() |
Performing the variation gives
![]() |
where the generalized momenta pt and ps are given by:
where v̂ is again the unit velocity vector, and t̂ is the unit tangent to the curve.
The EL equation follows from the last term in Eq. 7, and yields a partial differential equation for the minimal transformation r*(s, t):
where we have used the facts that |r′| = 1 and r′ · r″ ≡ t̂ · κ = 0, since the tangent is always orthogonal to the curvature at any given point along a space curve.
Eq. 9 can be written in terms easier to understand intuitively by using the unit velocity vector v̂, tangent t̂, and curvature κ:§
Comparing Eqs. 10 and 2, we confirm that setting the Lagrange multiplier λ corresponding to the nonextensibility condition to zero results in straight line solutions for all points along the space curve. Conversely, the condition that the space curve form a contiguous object results generally in nonzero deviation from straight line motion. So in comparing various extremal solutions to Eq. 10, the minimal solution tends to minimize |λ| everywhere.
The boundary conditions are obtained from the first two terms in Eq. 7. Because the initial and final configurations are specified, the variation δr vanishes at t = 0, T, and the corresponding boundary conditions, or initial and final conditions, are:
Because the end points of the string are free during the transformation, δr ≠ 0 at s = 0, L, and so the conjugate momenta must vanish: ps(0, t) = ps(L, t) = 0. This means that λt̂ = 0 at the end points. However, because t̂ cannot be zero, the only way this can occur is for λ(0, t) = λ(L, t) = 0. The Lagrange multiplier, which represents the conjugate force or tension to ensure an inextensible chain, must vanish at the end points of the string. If λ = 0, the EL Eq. 10 gives
= λ′t̂ at the end points. However, because v̂ is a unit vector,
is orthogonal to v̂ (or v), and we have finally the boundary conditions at the end points of the string:
Eq. 12 has three possible solutions. One is that v · t̂ = 0 or equivalently ṙ · r′ = 0, which corresponds to pure rotation of the end points. It is worth mentioning that the end points of the classical relativistic string also move transversely to the string. Moreover because of the Minkowski metric the end points must also move at the speed of light. Here, however, because Lorentz invariance is not at issue, additional solutions are possible. The end points of our string can be at rest, v = 0, and satisfy Eq. 12. The last solution of Eq. 12 is for λ′ = 0. Because λ also vanishes at the end points, Eq. 10 gives
= 0, or straight line motion. In summary, the three possible boundary conditions for the string end points are:
Whether an extremal transformation is a minimum can be determined by examining the second variation of the functional (Eq. 6):
![]() |
where Iij = (ṙ2δij − ẋiẋj)/|ṙ|3 and Λij = −λ(s, t) δij, and δr′ and δṙ are the s and t derivatives of the variation δr from the extremal path.
We now apply these concepts to some specific examples.
Examples.
Translations.
If two space curves differ by a translation, rB(s) = rA(s) + d with d a constant vector. The appropriate boundary condition for the end points is Eq. 13c. The points along the string can all satisfy Eq. 10 with
= 0 and λ = 0 everywhere (because t̂, κ ≠ 0), and straight line motion results: r*(s, t) = rA(s) + (rB(s) − rA(s))t/T. The distance
* = L|d|. This is the one-dimensional analogue to Eqs. 2 and 3.
Piece-wise linear space curves.
Suppose initially the curvature of some section of the string is zero. Then, taking the dot product of v with Eq. 10, we see that Eq. 12 holds for all points along the string. So the string either rotates or translates (or remains at rest if that segment has completed the transformation). If both rA and rB are straight lines as in Fig. 1A, Eq. 12 holds for both. It is then reasonable to seek solutions r* of the EL equation such that Eq. 12 holds for all (s, t).
Consider the two space curves shown in Fig. 1A with rA(s) = sx̂ and rB(s) = s ŷ, both with curvature κ = 0. We first investigate rotation from A to B. This transformation satisfies the EL equation so appears to be extremal: r = sr̂ = s(cos ωtx̂ + sin ωtŷ). The velocity ṙ = sωθ̂, so the distance D[rROT(s, t)] = πL2/4. Taking the dot product of t̂ with Eq. 10 gives λ′ = t̂ ·
= −ω or λ(s, t) = λo − ωs. For the transformation to be extremal, the conjugate momenta must also vanish at the string end points, or λ(0, t) = λ(L, t) = 0. This is impossible to achieve with this functional form, so the transformation is not extremal, unless we include the subsidiary condition here that rA(0, t) = rB(0, t). Then the end point of the string at s = 0 is determined, and the variations δr(0, t) must vanish. Now only λ(L, t) = 0, and so λ(s, t) = ω(L − s). The transformation is extremal.
Whether it is a minimum can be determined by examining the second variation (Eq. 14). For the transformation rROT(s, t), the matrix I in Eq. 14 is nonnegative definite, a necessary condition for a local minimum (1); however, Λ is negative definite, so the character of the extremum is determined by the interplay of the two terms in Eq. 14. Variations δr that preserve r′2 = 1 or 2t̂ · δr′ = 0 are satisfied in this example by δr = f(s, t)θ̂, where f(s, t) must satisfy the boundary conditions δr(0, t) = δr(s, 0) = δr(s, T) = 0. We thus let the variations have the functional form: δr = ε sin(ks) sin(nπt/T)θ̂, where θ̂ = − sin ωtx̂ + cos ωtŷ, n is a positive integer, and k is unrestricted. Inserting this functional form for the variations into Eq. 14 gives δ2D = (ε2π/8)
(kL), where
(x) is a nonpositive, monotonically decreasing function, with a maximum of zero at kL = 0. In fact to lowest order
(kL) ≈ −(πε2/2160) (kL)6. The extremum corresponding to pure rotation of curve rA into rB is a maximum!
The only other solution to Eqs. 10 and 12 for all (s, t) is for each point s on rA(s) to be connected to a corresponding point on rB(s) by a straight line, corresponding to Eq. 13c. Eq. 12 holds everywhere because λ′(s, t) = 0. Because λ is zero at the boundaries, it is thus zero everywhere.
An intermediate configuration then has the shape of a piecewise linear curve with a right angle “kink” at s*(t) (see Fig 2). As t progresses, the kink propagates along curve rB, and the “free” part of the chain follows straight line diagonal motion, shrinking as its left end is overlaid onto curve rB. The solution for the velocity at all (s, t) is given by v(s, t) = vo(t)Θ (s − s*(t)) êv where s*(t) is the position of the tangent discontinuity in Fig. 2, which goes from s*(0) = 0 to s*(T) = L as t goes from 0 to T. êv is a unit vector along the direction of the velocity,
, and vo(t) is a speed which can be taken constant. By simple geometry, . Because s*(T) = L, and s*(t) = Lt/T. The total distance travelled from Eq. 4 is then
* = .
Fig. 2.
The minimal transformation from A to B in Fig. 1A involves the propagation of a kink along curve B. The end point of the curve at intermediate states satisfies x + y = L, the equation for a straight line. A similar linear equation holds for any point on the curve; thus, no solution with shorter distance can exist. An intermediate configuration is shown in red. Alternative transformations are possible with kinks along A as well as multiple kinks (see text).
Because the transformation involves straight line motion, it is minimal. This can be seen from the second variation Eq. 14. The shape of the curve at all times is given by
![]() |
Taking variations from the extremal path as before, let δr = ε sin k(s−Lt/T) sin(nπt/T)Θ(s − Lt/T)ŷ. These variations only act on the “free” part of the string and preserve a unit tangent to first order. The matrix Λ in Eq. 14 is zero for straight line transformations where λ = 0. The quadratic form δṙ · I · δṙ is nonnegative, and results in a second variation δ2
= , which is nonnegative, monotonically increasing in kL, and quadratic to lowest order, with a minimum of zero at kL = 0. The transformation is indeed minimal.
Likewise, the minimal distance to fold a string of length L upon itself starting from a straight line (to form a hairpin) is
* = L2/4.
Solution degeneracy.
In the above example, one can piece together various rotations and translations for parts or all of the chain while still satisfying the EL equations. This infinity of extrema renders the solution of Eq. 9 by direct numerical integration very difficult. For these reasons we apply a method based on analytic geometry to obtain numerical solutions, described in more detail below.
There is also an infinite degeneracy of solutions having the minimal distance in the above example. To see a second minimal transformation, imagine running the above solution backward in time, so the kink propagates from s = L to s = 0 along rB. However, this solution should hold forward in time for the original problem if we permute rB and rA. Now, intermediate states r* first run along x̂, then ŷ. But then we can introduce multiple right–angle kinks in various places, without causing the trajectories in the transformation to deviate from straight lines, so that intermediate states look like staircases. Because there are an infinite number of possible staircases in the continuum limit, there is an infinite degeneracy. This can lead to a tangent vector r′ whose magnitude is length-scale dependent, and less than unity until s → 0. For example, an intermediate configuration can be drawn in Fig. 2 that appears as a straight diagonal line from r*(0, t) to r*(L, t), until s → 0 when an infinite number of step discontinuities are revealed. This problem is resolved in practice through finite-size effects involving different critical angles of rotation described below. In the continuum limit, it is resolved by introducing curvature constraints.
Curvature constraints.
In applications to polymer physics, chains have a stiffness characterized by bending potential in the analysis that is proportional to the square of the local curvature. Here, we may choose to characterize stiffness by introducing a constraint on the configurations of the space curve, so that the curvature simply cannot exceed a given number:
This term lifts the infinite degeneracy mentioned above, as each near-kink (with curvature κ now = κc) would result in slight deviations from linear motion in the above example, and thus an additional cost in the effective action. Other functional forms for Vκ are also possible. For some applications, a more conventional stiffness potential of the form Vκ(r″) = 1/2 Aκr″2 may be more appropriate. However, the action would no longer consist of a true distance functional, and its minimization would involve the detailed interplay of the parameter Aκ favoring globally minimal curvature with other factors affecting distance in the problem.
Discrete chains.
Strings with a finite number of elements (chains) provide a more accurate representation of real-world systems such as biopolymers. Discretization is also essential for numerical solutions in these more realistic cases. Monomers on a discretized chain travel along a curved metric (3), and Lagrange multipliers explicitly account for this fact here.
We start by discretizing the string into a chain of N links each with length ds = L/N, so that Eq. 4 becomes
, with each ri(t) a function of t only. The total distance is the accumulated distance of all the points joining the links, plus that of the end points, all times ds. This approach is essentially the method of lines for solving Eq. 10: the PDE becomes a set of N + 1 coupled ODEs.
Eq. 5 becomes N constraint equations added to the Lagrangian: . We rewrite this strictly for convenience as Σ(1/2) λi,i+1r i+1/i2, where ri+1/i ≡ ri+1 − ri, and |ri+1/i| = L/N.
The PDE in Eq. 10 then becomes N + 1 coupled (vector) ODEs, each of the form
with λ0,1 = λN+1,N+2 = 0. Eq. 17 is consistent with Eq. 10 after suitable definitions, for example the curvature at point i after discretization is given by (ri+1/i − ri/i−1)/ds2.
One link.
We turn to the simplest problem of one link with end points A and B (see Fig. 3), for which the action reads
. Points A and B have boundary conditions rA (0) = A, rB (0) = B, rA (T) = A′, rB(T) = B′. The link in our problem is taken to have a direction, so point A cannot transform to point B′. The EL equations become:
where the orthogonality of v and
has been used.
Fig. 3.
Transformations between two rigid rods. A undergoes simultaneous translation and rotation and so is not extremal. B is extremal and minimal. The rod cannot rotate any less given that it translates first. However, this transformation is a weak or local minimum. C–E are extremal in bulk but not minimal because they violate corner conditions (A. Mohazab and S.S.P., unpublished data). F is the global minimum. It rotates the minimal amount, and both A and B move monotonically toward A′, B′. A purely straight-line transformation exists but involves moving point A away from A′ before moving toward it (similar to D), thus covering a larger distance than the minimal transformation.
Reminiscent of Eq. 12, Eqs. 18 each have three solutions. For point A, these are: (i) vA · rB/A = 0, or pure rotation of A about B, (ii) vA = 0 or point A is stationary, or (iii) λ = 0 and thus
A = 0 from the EL equations, indicating straight-line motion. Moreover, i implies vB = 0, or both points rotate about a common center, ii implies vB · rB/A = 0 or B rotates, and iii implies
B = 0 as well, so that both points move in straight lines. An extremal transformation thus involves either straight line motion, or rotations of one point about the other at rest (or common center), see Fig. 3 B–F.
The Lagrange multiplier may be found from the dot product of the EL equation for B with rB/A, which gives −ds2λ = rB/A ·
B. Thus, when B moves in a straight line λ = 0. When B rotates about A, its acceleration aB follows from rigid body kinematics as aA + α × rB/A − ω2rB/A, where ω and α are the angular velocity and acceleration, respectively, and aA = 0. Thus λ = 1/L.
The minimal solution is the one that involves the minimal amount of rotation (and monotonic approach to A′B′). This may be obtained from analytic geometry: for the example configurations in Fig. 3F, point B rotates about point A until B″, where the straight line B″B′ is tangent to the circle of radius ds = L about A. The distance (over ds) is AA′ + Lθc + B″′, where sin θc = L/(L+AA′) and , so, for example, if AA′ = 2L,
≈ 5.168 L2.
Chains with curvature.
We can now investigate the transformation shown in Fig. 1B with the above methods. This is the canonical example when at least one of the space curves has nonzero curvature κ. Let rA = R sin(πs/2L)x̂ + R cos(πs/2L)ŷ and rB = sx̂ + Rŷ, with 0 ≤ s ≤ L and R = 2L/π. We then discretize the chain into N segments. According to Eq. 17, the end point velocities
1,
N+1 obey EL equations of the same form as Eqs. 18, and thus either rotate or translate. The situation for these links is analogous to Fig. 3 B and F, in that the angle the link must rotate depends on the order of translation and rotation. The geometry in Fig. 1B is analogous to transformations A′B′ → AB in Fig. 3 B and F, in that the critical angle θc the link must rotate is smaller if translation occurs first.
Fig. 4shows the two minimal solutions thus obtained. The transformation in Fig. 4A undergoes translation away from curve rA, and rotation at rB. It is the global minimum. The transformation in Fig. 4B rotates from rA through a larger critical angle (see Fig. 4B Inset), and then translates to rB. Both solutions have a soliton-like kink that propagates across either space-curve rB or rA.
Fig. 4.
Two minimal transformations between the curves shown in Fig. 1B, for N = 10 links. (A) The global minimal transformation r*(s, t), with
* ≈ 0.330 L2. (B) A local minimum with
≈ 0.335 L2. In A, links with one end touching curve rB rotate; the others translate first from rA, rotating only when one end of a link has touched rB. In B, they rotate first from rA and then translate into rB. Dashed lines in A show the paths travelled for each bead. (A Inset) The total distance travelled as a function of the number of links N, with various N plotted as filled circles to indicate the rapid decrease and asymptotic limit to
∞ ≈ 0.251 L2. (B Inset) The minimal angle each link must rotate during the transformation; it is less for the transformation in A. Animations of these transformations are provided as supporting information (SI) Movies 1–4.
The minimal transformation follows these steps: (i) Link r2/1 rotates about r1, v1 = 0, v2 · r2/1 = 0, and the Lagrange multiplier representing the conjugate “force” λ12 ≠ 0. During this rotation, nodes 3, 4, … move in straight lines formed by their initial values rA3, rA4, … and the tangent points to circles of radius ds centered at rB2, rB3, …. The corresponding Lagrange constraint forces λ23, λ34, … are all zero. Links r3/2, r4/3, … all adjust their orientation to ensure straight-line motion of their end points (dashed lines in Fig. 4A), except for r2, which follows a curved path. (ii) When link r2/1 completes its rotation, it coincides with curve rB, and the process starts again with link r3/2, which begins its rotation about r2, whereas nodes 4, 5, … move in straight lines. This process continues until the final link rN+1/N rotates into place on rB. The transformation in Fig. 4B is essentially the time-reverse of the above, but starting at curve rB and ending on rA.
For ideal chains without curvature constraints, the distances obtained from the two transformations in Fig. 4 A and B differ nonextensively as the number of links N → ∞. Moreover, the distance for each transformation itself differs nonextensively from the mean root square distance as N → ∞.¶ Specifically, the distance travelled by straight line motion scales as ds NL ∼ L2, whereas the distance travelled by rotational motion scales as ds (Nθ̄cds) ∼ L2/N.
On the other hand, curvature constraints as in Eq. 16 become more severe on consecutive links as N → ∞, and can yield extensive corrections to the distance. Specifically, the increase in distance Δ
due to curvature constraints scales like the radius of curvature R times N, because every node is affected by the rounded kink as it propagates. So Δ
∼ ds N R ∼ LR. The importance of this effect then depends on how R compares to L (the ratio of the persistence length to the total length). It does not vanish as N → ∞. Non-crossing constraints described below also yield extensive corrections to the distance travelled.
Noncrossing Space Curves.
The minimal transformation may be qualitatively different when chain crossing is explicitly disallowed. Fig. 1C illustrates a pair of curves that differ only by the order of chain crossing. They are displaced in Fig. 1C for easier visualization but should be imagined to overlap so the quantity ∫0L|rA − rB| ≈ 0, i.e., if they were ghost chains their distance would be nearly zero, and most existing metrics give zero distance between these curve pairs (see Table 1).
Table 1.
Values of the distance for various examples considered here, compared to other metrics
| Curve pair |
* (L2) |
rmsd† (L) | (1-Q)‡ | χ§ |
|---|---|---|---|---|
| Trivial translation | |d|/L | |d|/L | 0 | 0 |
| “L-curves”, Fig. 1A | —¶ | 0 | ||
| Straight line to hairpin | 1/4 | 1 | 1/2 | |
| “C-curve” - straight line, Fig. 4A | 0.330 | 0.371 | —¶ | 0.417 |
| “C-curve” - straight line, Fig. 1A‖ | 0.251 | 0.334 | —¶ | 1 |
| “Over/under” curves, Fig. 1C | (ℓ/L)2 | ≈0 | 0†† | 0 |
| Single link, Fig. 3F‡‡ | 5.168 | §§ | —¶¶ | —¶¶ |
†.
§Structural overlap function equal to 1 minus the fraction of residue pairs with similar distances in structures A and B. The formula in ref. 9 is used.
¶0/0 or undefined.
‖In the continuum limit.
††Assuming a contact is made at the junction.
‡‡For AA′ = 2 × link length.
§§
> rmsd here because rmsd contains a factor of 2, whereas
did not. An “effective distance” for the rod could divide by 2.
¶¶Undefined for a single link.
Analogous to the construction of Alexander polynomials for knots, if we form the orthogonal projection of these space curves onto a plane, there will be double points indicating one part of the curve crossing over or under another. If we trace the curve in an arbitrary but fixed direction, each double point occurs twice, once as underpass and once as an overpass. We may call the part of the curve between two consecutive passes a bridge. If the bridge ends in an overpass we assign it +1, if the bridge ends in an underpass we assign it −1, so traversing from the left in Fig. 1C, curve rB has (+1) sense, and curve rA (−1). For transformations obeying noncrossing, a bridge can undergo change in sense ±2 to zero by moving from under or over the chain, whereas bridges in ghost chains undergo changes of sense by crossing from ±1 to ∓1 directly.
The non-crossing condition means that the Lagrangian for the minimal transformation now depends on the position r(s, t) of the space curve, which may be accounted for using an Edwards potential: VNC([r(s, t)]) = ∫0Lds1∫0Lds2 δ(r(s1, t) − r(s2, t)) In practice, a Gaussian may be used to approximate the delta function, with a variance that may be adjusted to account for the thickness or volume of the chain.
The EL equation now becomes
where the curvature potential in Eq. 16 has been included, and the notation (
r′)s ≡ (d/ds)(∂
/∂r′) has been used. Eq. 10 is now modified to
To access various conformations, the minimal transformation must now abide by the nontrivial geometrical constraints that are induced by non-crossing. In general, this renders the problem difficult; however, the example in Fig. 1C is simple enough to propose a mechanism for the minimal transformation consistent with the developments above, without explicitly solving the EL equations in this case. In analogy with the hairpin transformation described below Eq. 15, the transformation here involves essentially forming and then unforming a hairpin. rA(N) (the blue end of curve rA in Fig. 1C) propagates back along its own length until it reaches the junction, where it then rotates over it to become the overpass (this “rotation” takes essentially zero distance in the continuum limit). The curve then doubles back, following its path in reverse to its starting point. This transformation is fully consistent with the allowed extremal rotations and translations of the discretized chain. The distance in the continuum limit is D = ∫0lds (2s) = ℓ2, where ℓ is the length of the shorter arm extending from the junction in Fig. 1C.
Discussion
The distance between finite objects of any dimension d is a variational problem, and may be calculated by minimizing a vector functional of d + 1 independent variables. Here we formulated the problem for space curves, where the function r*(s, t) defining the transformation from curve rA to curve rB gives the minimal distance
.
We provided a general recipe for the solution to the problem through the calculus of variations. For simple cases, the solution is analytically tractable. Direct numerical methods are difficult due to multiple extrema. We employed a method that interpreted the discretized EL equations geometrically to obtain minimal solutions. The various solutions obtained here are summarized in Table 1 and compared with other similarity measures currently used.
The distance metric may be generalized to higher dimensional manifolds; for example, a two-dimensional surface needs three independent parameters to describe the transformation. The distance becomes
= ∫du ∫dv ∫dt |ṙ| and the constant unit area condition becomes |∂r/∂u × ∂r/∂v| = 1.
The question of a distance metric between configurations of a biopolymer has occupied the minds of many in the protein folding community for some time (c.f., for example, refs. 4–8). Such a metric is of interest for comparison between folded structures, as well as to quantify how close an unfolded or partly folded structure is to the native. Chan and Dill (5) investigated the minimum number of moves necessary to transform one lattice structure to another, in particular while breaking the smallest number of hydrogen bonds. Leopold et al. (4) investigated the minimum number of monomers that had to be moved to transform one compact conformation to another. Falicov and Cohen (6) investigated structural comparison by rotation and translation until the minimal area surface by triangulation was obtained between two protein structures.
The present theoretical framework allows computation of a minimal distance between proteins of the same length by rotating and translating until
is minimized, as done in the calculation of rmsd. Comparison between different length proteins would involve the further optimization with respect to insertions or deletions.
It is interesting to ask which folded structures have the largest, or smallest, average distance 〈
〉 from an ensemble of random coil structures, and also whether the accessibility of these structures in terms of
translates to their folding rates. It can also be determined whether the distance to a structure correlates with kinetic proximity in terms of its probability pF to fold before unfolding (7), by calculating 〈
pF〉. The question of the most accessible or least accessible structure may be formulated variationally as a free-boundary or variable end-point problem.
It is an important future question to address whether the entropy of paths to a particular structure is as important as the minimal distance. In this sense, it may be the finite “temperature” (β < ∞) partition function Z(β) = ∫d[r(s, t)] exp (−β
[r(s, t)]), i.e., the sum over paths weighted by their “actions,” which is the most important quantity in determining the accessibility between structures. This has an analogue to the quantum string: we investigated only Z(∞) here. We hope that this work proves useful in laying the foundations for unambiguously defining distance between biomolecular structures in particular and high-dimensional objects in general.
Supplementary Material
Acknowledgments
We are grateful to Ali Mohazab, Moshe Schecter, Matt Choptuik, and Bill Unruh for insightful discussions. This work was supported by the Natural Sciences and Engineering Research Council and the A. P. Sloan Foundation.
Abbreviation
- EL
Euler–Lagrange
- MRSD
mean root square distance.
Footnotes
The author declares no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/cgi/content/full/0607833104/DC1.
The distance-metric action in Eq. 4 bears a strong resemblance to the Nambu–Goto action for a classical relativistic string (2): , where r in SNG is now a four-vector and the dot product is the relativistic dot product. This action is physically interpreted as the (Lorentz invariant) world-sheet area of the string. If Eq. 4 could be mapped by suitable choice of gauge to the minimization of the Nambu–Goto action, one could exploit here the same reparameterization invariance that results in wave equation solutions to the equations of motion for the classical relativistic string, by choosing a parameterization such that ṙ·r′ = 0 (for the purely geometrical problem, the discriminant under the square root in the action has opposite sign). Unfortunately, however, because the velocity in the distance-metric action is a 3-velocity rather than a 4-velocity, our action only accumulates area when parts of the string move in 3-space, in contrast to the Nambu–Goto action that accumulates area even for a static string. The distance-metric action Eq. 4 has a lower symmetry than that for the classical relativistic string.
* cannot depend on the time the transformation took, whereas the world sheet area does. Conversely, if we take, for example, configuration A at t = 0 to be a straight line of length L, and configurations B at t = T to be the same straight line but displaced along its own axis by varying amounts d, the geometrical area for all transformations would be LT, whereas the distances
*AB for each transformation would be Ld.
The invariance of the Lagrangian to (s, t) leads to conservation laws by Noether's theorem (1), which here take the form of divergence conditions. However, these generally contain no new information beyond the EL equations and can be obtained by dotting Eq. 10 with either r′ to give λ′ =
· t̂ or ṙ to give v · (λt̂)′ = 0.
The MRSD is always less than or equal to the rmsd between structures, as can be shown by applying Hölder's inequality.
References
- 1.Gelfand IM, Fomin SV. Calculus of Variations. New York: Dover; 2000. [Google Scholar]
- 2.Zwiebach B. A First Course in String Theory. New York: Cambridge Univ Press; 2004. [Google Scholar]
- 3.Grosberg AY. In: Computational Soft Matter: From Synthetic Polymers to Proteins. Attig N, Binder K, Grubmüller H, Kremer K, editors. Vol 23. Bonn: John von Neumann Institut für Computing; 2004. pp. 375–399. NIC series. [Google Scholar]
- 4.Leopold PE, Montal M, Onuchic JN. Proc Natl Acad Sci USA. 1992;89:8721–8725. doi: 10.1073/pnas.89.18.8721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Chan HS, Dill KA. J Chem Phys. 1994;100:9238–9257. [Google Scholar]
- 6.Falicov A, Cohen FE. J Mol Biol. 1996;258:871–892. doi: 10.1006/jmbi.1996.0294. [DOI] [PubMed] [Google Scholar]
- 7.Du R, Pande VS, Grosberg AY, Tanaka T, Shakhnovich ES. J Chem Phys. 1998;108:334–350. [Google Scholar]
- 8.Cho SS, Levy Y, Wolynes PG. Proc Natl Acad Sci USA. 2006;103:586–591. doi: 10.1073/pnas.0509768103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Veitshans T, Klimov D, Thirumalai D. Folding Des. 1996;2:1–22. doi: 10.1016/S1359-0278(97)00002-3. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.










