Abstract
We present an identity for an unbiased estimate of a general statistical distribution. The identity computes the distribution density from dividing a histogram sum over a local window by a correction factor from a mean-force integral, and the mean force can be evaluated as a configuration average. We show that the optimal window size is roughly the inverse of the local mean-force fluctuation. The new identity offers a more robust and precise estimate than a previous one by Adib and Jarzynski [J. Chem. Phys. 122, 014114 (2005)]10.1063/1.1829631. It also allows a straightforward generalization to an arbitrary ensemble and a joint distribution of multiple variables. Particularly we derive a mean-force enhanced version of the weighted histogram analysis method. The method can be used to improve distributions computed from molecular simulations. We illustrate the use in computing a potential energy distribution, a volume distribution in a constant-pressure ensemble, a radial distribution function, and a joint distribution of amino acid backbone dihedral angles.
INTRODUCTION
We present a method for estimating a general statistical distribution from data collected in a molecular simulation. The method is based on an identity and is superior to the common approach of using a normalized histogram, which suffers from either a large noise when the bin size is too small or a systematic bias when the bin size is too large.
Our identity is akin to a previous one derived by Adib and Jarzynski1 (hence the AJ identity), whereby the distribution density ρ(x) at a point x is estimated from a weighted number of visits to a window surrounding x, plus a correction from integrating the derivative of ρ(x). The AJ identity improves over the histogram-based approach not only by eliminating the systematic bias from binning but also by smoothing out the resulting distribution, as the window contains much more data points than a single bin. However, the identity is slightly inconvenient as it neither ensures a positive output, nor determines its optimal parameters.
Here we present a new identity in which we construct a proper correction factor from integrating the “mean force” (the logarithmic derivative of the distribution density), and use it to divide the number of visits to a local window to reach an unbiased estimate, as schematically illustrated in Fig. 1a. The new strategy not only guarantees a nonnegative distribution, but also offers a simple estimate of the optimal window size, as it separates the error contributions from both the histogram and the mean force. The new identity also allows straightforward extensions to an arbitrary ensemble and to a joint distribution of multiple variables.
Figure 1.
(a) The key of the fractional identity is to convert the ratio of a histogram sum (shaded area) to the distribution density ρ(x*) to an integral of the mean force (log ρ)′(x). We then measure the mean force and use its integral to divide the observed histogram sum to obtain an unbiased estimate of ρ(x*). As both the histogram sum and ratio involves data from a window instead of a single bin, the resulting distribution is smoother due to the reduced uncertainty. (b) The auxiliary function ϕ(x) (solid) and its derivative ϕ′(x) (dashed, δ-function at x = x* excluded) employed by the Adib-Jarzynski identity (Appendix A).
We describe the new identity in Sec. 2, present a few numerical applications in Sec. 3, and conclude the article in Sec. 4 with a few discussions.
METHODS
Integral identity
We wish to find an expression for the distribution density ρ(x) at x = x*. We first approximate ρ(x*) by a histogram sum over a local window (x−, x+) enclosing x = x*, and then apply a correction factor. Formally,
where the numerator counts the fraction of x falling into the window (x−, x+), and can thus be measured from the histogram sum over the window.
We then express the logarithm of ρ(x)/ρ(x*) on the denominator to an integral of (log ρ)′(y) as
So
| (1) |
We refer to Eq. 1 as the fractional identity in the following. Unlike in the AJ identity,1 the correction here is applied as a divisor instead of additively. Nevertheless, it can be derived as a near-optimal modification of the AJ identity as shown in Appendix A.
Mean force from configuration average
The identity Eq. 1 requires a mean force (log ρ)′(x) in addition to a histogram. In the following, we show how to construct a conjugate force fx = fx(rN, s), as a function of molecular coordinates rN and other ensemble variables s, such that its ensemble average is equal to the required mean force (log ρ)′(x). Thus, in a molecular simulation, we can compute fx for each trajectory frame and use its average as the mean force. The expressions for the mean force are summarized in Eq. 5 and Eqs. 4 or 4'.
We first express x as a function x = X(rN, s) of both molecular coordinates rN and (optionally) some variables s of the simulation ensemble, e.g., s can be the volume in an isobaric ensemble, or the temperature in a tempering simulation.2, 3 Note that X(rN, s) denotes a function of rN and s, while x its value as a number.
The distribution density ρ(x) can now be written as
| (2) |
where δ(⋯) is the Dirac δ-function, and w(rN, s) is the weight for a configuration rN and parameters s, e.g., w(rN, s)∝exp [ − βU(rN)] in a canonical ensemble [with U(rN) being the potential energy and β the temperature]. Equation 2 is properly normalized, since
Similarly, the average ⟨A(rN, s)⟩x at x for any quantity A(rN, s) can be defined as
| (3) |
where we have again used the δ-function to collect configurations with X(rN, s) being x, and the denominator is equal to ρ(x). Our objective is to find an expression fx(rN, s) such that ⟨fx(rN, s)⟩x = (log ρ)′(x) = ρ′(x)/ρ(x).
We now evaluate the derivative of ρ(x) as
where we have used ∂δ(X − x)/∂x = −∂δ(X − x)/∂X.
We proceed by introducing a vector field v(rN, s) such that v · ∇X = 1, e.g.,
| (4) |
More generally, it can be constructed from an arbitrary vector field Y (∇X · Y ≠ 0) as
| (4') |
Note that ∇ is defined on the joint vector space of both rN and s, so ∇ = (∂/∂rN, ∂/∂s).
We now insert 1 = v · ∇X into the integrand and recall that δ(X(rN, s) − x) depends on rN and s only through X. So ∇X[∂δ(X − x)/∂X] → ∇δ(X − x), and
where we have integrated by parts to shift the ∇ to the rest of the integrand, and defined a conjugate force fx ≡ ∇ · v + v · ∇log w. The last step follows from
Comparing with Eq. 3, we see that the mean force (log ρ)′(x) = ρ′(x)/ρ(x) is equal to the average of fx(rN, s) under a fixed x,
| (5) |
For a canonical ensemble, the second term v · ∇log w is reduced to −βv · ∇U, i.e., the projection of the molecular force to the gradient of X [assuming Eq. 4], which is in accordance with the name “mean force” of ⟨fx⟩.
The above derivation is analogous to that of the dynamic temperature by Rugh.4 In fact the dynamic temperature can be derived as a special case of Eq. 5. Consider the canonical ensemble at infinite temperature β → 0. The distribution of the total energy E = H(rN) (rN now represents a point in the phase space of coordinates and momenta) is proportional to the density of states g(E). The mean force ∂log g(E)/∂E should be the intrinsic temperature β(E). According to Eq. 5, it equals to ⟨∇ · v⟩E, with v = ∇H/(∇H · ∇H), we thus recover Rugh's dynamic temperature.
Optimal window size
We now determine the two window boundaries x− and x+ in Eq. 1 such that they minimize the statistical error in ρ(x*).
We first note that the histogram and mean-force data contribute independently to the numerator and denominator, respectively. How much the two contribute is however controlled by the window size. The output is dominated by the histogram contribution (numerator) with a narrow window, but by the mean-force contribution (denominator) with a wide window. For a narrow window, the denominator is reduced to the window width, and thus the identity is approximately a histogram average. At the other extreme, if the window covers the entire domain of x, the numerator becomes a constant, and the distribution is determined entirely by the mean-force integral on the denominator, i.e., (the lower bound of the integral is to be determined by the normalization).
As the window size increases, the relative error of the numerator decreases as more data points reduces uncertainty, but that of the denominator increases as the error in the mean-force integral accumulates. The sum reaches a minimum at the optimal window.
Quantitatively, the relative error of the numerator ε(N)/N is , where N = N(x−, x+) is the number of independent data points included in the window.
The relative error of the denominator D is harder to compute exactly and thus is estimated from an upper bound. First, since D is an integral of exp (Δlog ρ), the relative error of D is no larger than the maximal relative error of exp (Δlog ρ), or equivalently the maximal absolute error of Δlog ρ. Next, since Δlog ρ itself is an integral of the mean force, i.e.,, the maximum is likely to occur at either window boundary x = x±. In the discrete version, the integral becomes a sum over bins as withδx being the bin size. Its error [ɛ(Δlog ρ)]2, assuming no correlation among mean force at different bins, is , where is the variance of the conjugate force at xi, and δn(xi) is the number of independent data points in the bin at xi.
On reaching the optimal window, including one more bin from either edge would keep the combined error [ɛ(N)/N]2 + [ɛ(D)/D]2 constant. So
where the first term is the decrease of error due to the increased sample size, while the second is the increase due to the mean-force integration. Thus
For a relatively narrow window, we have N ≈ δn w/δx, with w being the window width. If we further replace σf(x±) by a local mean , then
| (6) |
where γ is a heuristic factor which should ideally be 1.0. However, as we overestimate the error of the denominator, Eq. 6 somehow underestimates the optimal window size. In practice, we found the optimal γ was 1 ∼ 2.
On using Eq. 6, we emphasize that σf is the mean-force fluctuation at a fixed x, i.e., , and thus is evaluated at the bin containing x. However, after the intra-bin calculation, the quantity can then be averaged over a local or global window for a more precise .
If the mean-force fluctuation is very small, Eq. 6 suggests abandoning the histogram data and switching to a mean-force integration. On the other hand, if the mean-force has a very large variance, one should stick to the histogram. Thus the method is effective only if the mean-force fluctuation is small. An attempt to reduce the mean-force fluctuation is described in Appendix B, where we improve the mean force itself by using data of the second-order derivatives of the distribution.
Extension to weighted histogram analysis method
We now extend Eq. 1 to a composite distribution, i.e., a superposition of several distributions under different conditions, which can result from independent simulations or an extended ensemble simulation, such as a tempering simulation (simulated2 and parallel3 tempering). For concreteness, we assume that the individual distributions are canonical ones ρ(x, βi) at different temperatures βi (with i being its label).
The aim is to estimate the distribution ρ(x, β) at some β, which needs not to be one of the βi's. Although the multiple histogram method, also known as the weighted histogram analysis method (WHAM),5 usually serves as the standard routine, we shall derive a mean-force-improved version here.
We first generalize Eq. 1 to
where Ni is the total number of independent data points from the simulation at the temperature βi, and the sum is carried over different temperatures βi. To proceed, we simultaneously multiple and divide ρ(x*, βi) in the denominator,
where the x-independent ρ(x*, βi)/ρ(x*, β) has been moved out of the integral, leaving ρ(x, βi)/ρ(x*, βi) to be converted to the mean force integral at a fixed βi as before,
| (7) |
For example, in case of the potential energy U distribution in a canonical ensemble, we have w(U, β) = exp ( − βU)/Z(β), with Z(β) being the partition function,
| (8) |
where v = ∇U/(∇U · ∇U). The regular WHAM5 is recovered with an infinitesimal window U− = U+ = U*. Generally, Eq. 7 improves the histogram method by using the mean force data, e.g., the dynamic temperature here. Note that since is the same under any temperature βi, its data from different temperatures can be combined.
Summary and practical notes
To summarize the method, we first compute a mean force profile (log ρ)′(x) at any x from Eqs. 4 [or 4'] and 5 from a simulation trajectory. It is then plugged into Eq. 1 to compute the distribution density ρ(x*), with the window (x−, x+) size determined by Eq. 6.
Practically, we shall assume both histogram and mean force data are collected by bins of size δx. We evaluate the double integral in Eq. 1 by first computing , then . The step size for numerical integration is always the bin size δx. For the inner integral, (log ρ)′ is computed as the bin-averaged value, and v(x, x*) at every bin boundary x as . The outer integral is then evaluated by the trapezoidal rule as
where f(x) = exp [ v(x, x*)]. Since we usually need ρ(x*) at many x*, we pre-compute (log ρ)′(y) and then for some xO. In this way, v(x, x*) = v(x, xO) − v(x*, xO) can be quickly retrieved in evaluating ρ(x*) at any x*.
Although the error of (log ρ)′(y) depends on the bin size, that of the integral v(x, x*), and hence ρ(x*), does not. To see this, consider a small bin of width δx and δn data points. If the standard deviation of (log ρ)′(y) is σ, then the error of the bin is σ2/δn. Its contribution to the error of v(x, x*) should be multiplied by δx2 and is thus σ2δx2/δn. If we now split the bin into two, then each sub-bin has roughly δn/2 points, each with an error of 2σ2/δn. But the contribution to the error of v(x, x*) from the two new bins, (2σ2/δ n)(δ x/2)2 × 2 = σ2δ x2/δ n, remains the same.
Practically, Eq. 5 may fail if a bin is empty. In this case, we symmetrically enlarge the bin to a window such that it contains at least one data point, then use Eq. 5.
APPLICATIONS
Potential energy distribution
We first compute a potential energy distribution ρ(U) in a canonical ensemble, in which w(rN, β) = exp [ − βU(rN)]/Z(β) with β being the reciprocal temperature, and Z(β) being the partition function. Equations 1, 5 become
Since (log ρ)′(U) is the difference between the dynamic temperature4, 6 ⟨∇ · v⟩ and the simulation temperature β, the distribution peaks at (log ρ)′(U) = 0 where the two cancel.
We performed a molecular dynamics (MD) simulation on a 256-particle Lennard-Jones system under a smoothly switched potential (see Appendix C, rs = 2.0 and rc = 3.0). The temperature β = 1.0, density ρ = 0.8, and time step Δt = 0.002. Velocity rescaling was used as the thermostat7 with a time step 0.01.
From a single trajectory of 107 steps, we constructed two samples. In the test sample, 104 frames (every 1000 steps) were collected; in the reference one, all frames were used. The setting avoided possible sampling inaccuracy to affect the comparison of different methods, as explained in Appendix D. The bin size for histogram-like data was always δU = 0.1. Unless specified otherwise, the test sample was used.
We first demonstrate the use of the fractional identity Eq. 1 with a fixed symmetric window of size ΔU = U+ − U− = 12.4, a value determined from Eq. 6 (γ = 1.0). The mean force (log ρ)′(U) was computed from a single bin at U. As shown in Fig. 2a, the resulting distribution was much smoother than the histogram (which was calculated from the number of visits to each bin). For comparison, we computed the result from the AJ identity with the same window size. Though the results were generally similar, the AJ identity sometimes yielded negative values at the two edges, while the fractional identity appeared to be more robust and closer to the reference.
Figure 2.
The potential energy distribution. (a) Comparison of the distributions from the histogram (gray dotted lines), the fractional identity (Eq. 1, blue circles and dashed line); and the Adib-Jarzynski (AJ) like identity (Eq. A1, red squares and dot-dashed line). Data from the same test sample were used for all three; symbols were plotted with a spacing ΔU = 5.0 to avoid cluttering. The reference curve (black line) was computed from the reference sample (same trajectory, higher sampling rate). Lines were plotted with a spacing equal to the bin size δU = 0.1. Vertical lines at the right edge were due to negative or small output from the AJ identity. (b) Error measured from the KS difference as a function of window size ΔU. (c) Error measured from the entropic distance. (d) The distribution from the WHAM (gray dotted line) compared with an improved version using the fractional identity (Eq. 8, blue circles, dashed line). The style was similar to that in panel (a). Vertical dotted lines at the edges were due to empty output from WHAM. (e) Comparison of the distributions from a canonical ensemble (blue) and a microcanonical one (red).
To show the gain from the integral identity approach, we define a KS difference as [which is commonly used in the Kolmogorov-Smirnov (KS) test for detecting the difference between two distributions8], where N is the sample size, and ΔCDF is the maximal difference between the cumulative distribution function (CDF) of the resulting distribution and that from the reference. The smaller the quantity is, the more accurate the test distribution is. As the measure is independent of the bin size, it mainly detects the systematic bias in the test distribution instead of the smoothness of distribution density. As the identity is not optimized for the CDF but for the distribution density, the KS difference serves as a stringent test.
The KS differences, computed for the histogram, the fractional identity Eq. 1 and the AJ identity Eq. A1 are shown in Fig. 2b. It is clear that that both identities rendered more accurate distributions than the histogram. We also show there was an optimal window size that minimized the error. However, for the fractional identity, the optimal window size ΔU ≈ 20.0 was greater than the value 12.4 given by Eq. 6. Thus, a factor γ ≈ 1.5 was used in other examples. Recall the KS difference scales with , thus from Fig. 2b we estimated about 20-fold increase of efficiency from using the optimal window. We also notice that with a smaller window, the fractional identity gave better estimates than the AJ identity. This was expected as that the fractional identity is the optimal modification of the AJ identity in this case, as shown in Appendix A. With a larger window, the errors from both identities grew rapidly due to the larger involvement of the mean force data. As the factional identity quickly switched to a pure mean-force-based integral with a large window, its growth was faster. The comparison shows that choosing the window size is crucial to the success of the integral identity, and an overly large window can be counterproductive.
We performed a similar comparison in terms of the entropic distance defined as for two distributions ρ(x) and ρref.(x). For the AJ identity, in case ρ(x) < 0, zero was assumed. Unlike the KS difference, this quantity directly compares the distribution densities. As shown in Fig. 2c, the fractional identity consistently produced a small entropic distance than the AJ identity, suggesting an improved smoothness. Interestingly, the error of the entropic distance also had a minimal, which occurred at a similar location ΔU ≈ 20.0 to that from the KS difference.
We now demonstrate the mean-force-improved WHAM introduced in Sec. 2D. In this case, we performed additional simulations at two neighboring temperatures T = 0.8 and T = 1.2. The reweighted distributions to T = 1.0 from both the original WHAM and the improved version are shown in Fig. 2d, and as expected, the latter was much smoother than the former.
We emphasize that the identity approach is ensemble-dependent because the mean force depends on the ensemble weight w. To illustrate the point, we simulated the same system using a regular molecular dynamics without a canonical thermostat, i.e., we targeted a microcanonical ensemble in which the total energy was kept as a constant. In the ensemble, the weight for a configuration, after averaging out momentum components, is , where Nf is the number of degrees of freedom, and K, U, and Etot are the kinetic, potential, and total energy, respectively. The mean force is accordingly , where Nf = 3N − 6, and the constant reference temperature β in the canonical ensemble is changed to an energy-dependent term .
The microcanonical-ensemble simulation was similar to the canonical-ensemble one. During equilibration, the kinetic energy was scaled regularly to match , and was kept as a constant afterward (Etot = −932). As shown in Fig. 2e, the distributions and mean forces (lower inset) from the two ensembles differed considerably, whereas the dynamic temperature ⟨∇ · v⟩ (upper inset) matched. This example shows the importance of applying the correct formula for the mean force.
Volume distribution
In the second example, we compute a volume distribution ρ(V) in an isothermal-isobaric (i.e., constant temperature and pressure) ensemble.9, 10 Unlike the previous case, the volume V is not a function of system coordinates, but an additional variable in the ensemble weight w(rN, V). Particularly, the volume V serves as a scaling factor that translates the reduced (0 to 1) coordinates Ri to the actual ones as . In terms of reduced coordinates, the ensemble weight can be written as
where β and p are the reciprocal temperature and the pressure, respectively.
According to Eq. 5, the conjugate force fV is reduced to ∂log w/∂V in this case (the vector field v is the unit vector along the direction of the parameter V, so ∇ · v = 0, v · ∇ = ∂/∂V). Thus,
represents the difference between the averaged virial pressure pc(rN) ≡ (N + β⟨r · F⟩V/3)/V and the simulation pressure p.
There is however a subtle distinction between the apparent volume distribution ρ(V) defined above and the actual physical one . The difference arises from the fact that the partition function Z(β, V) in the canonical ensemble counts configurations with the volume no larger than, instead of equal to, the volume V of the simulation box. In other words, unless there are particle lying precisely at all six boundary faces of the simulation box, it is always possible to shrink the box slightly without leaving out any particles. In this sense, the physical volume of a configuration is usually less than the volume of the box. It is now appropriate to introduce a differential partition function Z′10 for all configurations with the volume precisely falling in (V − dV, V) as Z′∝(∂Z/∂V)dV = ⟨pc⟩VZdV, and accordingly the actual physical volume distribution differs from ρ(V) by a factor, i.e., The reader is referred to Koper and Reiss10 for a more thorough discussion.
However, the above correct does not strictly apply to a periodic system, as the test case here. We calculated below only to show the computing process by pretending the system were non-periodic. A direct sampling according to physical volume distribution is however inconvenient, as it requires one to know ⟨pc⟩V in advance. In the following, we shall use ρ(V) to populate configurations during simulation, but report the adjusted after applying the post-simulation correction.
We performed a MD simulation on the 256 Lennard-Jones system using the switched potential (with rs = 2.5 and rc = 3.5). The temperature T = 1.24 and pressure p = 0.115 are around the critical point. Velocity rescaling was used as the thermostat7 with a time step 0.01. For the pressure control, Monte Carlo volume moves were tried every two MD steps with a maximal magnitude of ±2.0% of the side length of the box. The trajectory contained 107 steps with the time step dt = 0.002.
From the trajectory, we constructed two samples. In the test sample, one out of every 100 frames was used. In the reference one, all frames were used. The setting avoided possible sampling inaccuracy to affect the comparison of different methods, as explained in Appendix D. In both case, histogram-like data were collected using a bin width δV = 1.0. Unless specified otherwise, the test sample was used.
Since the volume changed almost by an order of magnitude, and the mean force fluctuation σf∝1/V, a fixed window size was not suitable. Thus we applied Eq. 6 with σf estimated from a local window, and γ = 1.5 (heuristic value).
To apply the correction, we further computed ⟨pc⟩V from a second integral identity, Eqs. B1, B2, in Appendix B. The window size of the second identity was similarly determined from the local σf but with γ = 3.0.
As shown in Fig. 3, the volume distribution ρ(V) or from the fractional identity was smoother than the histogram but still had some roughness. From the inset, we observe that the window size grew linearly with the volume V. This example also clearly illustrates the danger of using an overly large window. We show in Fig. 3 the distribution from a pure mean-force integration ρ(V) = exp [∫V(log ρ)′(V′)dV′] (which is the limiting case of using an infinite window, also corrected to the corresponding in Fig. 3b) manifested a much larger deviation from the reference. The deviation was however not systematic and diminished with the sample size: when calculated from the larger reference sample, the deviation was hardly noticeable, see Fig. 6b. The comparison shows choosing a proper window is crucial to the success of the method.
Figure 3.
The volume distributions (a) ρ(V) and (b) (adjusted) at p = 0.115 and T = 1.24. Gray line in (a): the histogram; blue circles and dashed lines: the fractional identity; red squares and dot-dashed lines: pure mean-force integration (the limiting case of an infinite window). Inset of (a): the window size. Data from the same test sample were used for all three; symbol spacing ΔV was 50.0 to avoid cluttering. The reference curves (black lines) were computed from the reference sample (same trajectory, higher sampling rate), adjusted in (b). Lines were plotted with a spacing equal to the bin size δV = 1.0.
Figure 6.
Distributions computed from the reference samples: (a) the energy distributions; (b) the volume distributions; (c) and (d) the radial distribution functions at T1 = 0.85 and T2 = 0.4, respectively. Gray dotted lines: histograms; blue solid lines: fractional identity; red dashed lines: alternative identities, as those in Figs. 234.
Radial distribution function
In the third application, we compute a radial distribution function g(r). Given a test particle is at the origin, g(r) gives the relative probability density of finding another particle at a distance r away, such that g(r) = 1 in a non-interacting system. It relates to the radial distribution density ρ(r) by
| (9) |
with V being the volume of the simulation box, and 4 πr2dr/V gives the probability of finding a different particle in a spherical shell of a radius r and thickness dr around the test particle, if the particles were non-interacting. ρ(r) is normalized as , where is the box size and is maximal distance between two particle under the periodic boundary condition and minimal image convention9 (of course with r > L/2, ρ(r) becomes unphysical and should not be used). Thus, we modify Eq. 1 to
To derive the mean force in a canonical ensemble, where w(r)∝exp [ − βU(rN)], with β being the reciprocal temperature, we apply Eq. 5 with a vector field , where i is the particle index. Since ∇ · v = 2/r12 with r12 = |r12|, we have , where is the unit displacement vector from particles 2 to 1, F12 = F1 − F2 is the difference between the forces exerted on particles 1 and 2, and the average is evaluated at r12 = r. By Eq. 9,
We simulated the 256 Lennard-Jones system with the density ρ = 0.7 under two different temperatures T1 = 0.85 and T2 = 0.4, using the switched potential (rs = 2.5 and rc = 3.5). Velocity rescaling with time step 0.01 was used as the thermostat.7
After 106 steps of equilibration, we simulated another 107 steps with a time step dt = 0.002. We then constructed two samples from this trajectory; for the test sample, we picked 5 frames (from every 2 × 106 steps), and for the reference one, 5000 frames (every 2000 steps). The setting avoided possible sampling inaccuracy to affect the comparison of different methods, as explained in Appendix D. Unless specified otherwise, the test sample was used. Note that each frame still contributes 256 × 255/2 = 32640 pairs of particles, although only half (π/6 ≈ 0.5) of them satisfy r < L/2. The bin size δr = 0.002 was used in collecting histogram-like data.
Since this example was also used in the original paper from Adib and Jarzynski,1 it is instructive to compare the two identities. We include the AJ identity [Eq. (20) in Ref. 1] here for convenience,
| (10a) |
| (10b) |
where r is the distance from the test particle, u(r) is the pair potential (see Appendix C), R1 is a distance close to the repulsion core (we used R1 = 1.0 here), Rmax is half of the simulation box size, θ(x) is the step function: 1 if x > 0, 0 otherwise, is the radial mean force of a particle at a distance r away from the test particle, excluding the contribution from the test particle.
As shown in Figure 4, the fractional identity produced smooth distributions with good agreement with the respective references in both temperatures, despite a relatively small sample size. On the other hand, although the AJ identity also produced smooth distributions, there was an appreciable deviation at the lower temperature T2 = 0.4 from the reference, especially around the principle peak r ≈ 1.1. The deviation was again not systematic, as it became negligible when the calculation was performed on the reference sample, see Figs. 6c, 6d. We note the large deviations from the AJ identity were similar to those observed in the previous example of the volume distribution when the mean-force integration was used to produce the distribution. Thus they were likely due to an overly large window, as the entire range of r, from 0 to Rmax, was used as the window by the AJ identity Eq. 10a. The error was larger at a lower temperature because the mean force changed more drastically there (hence larger mean force fluctuation). By contrast, in the fractional identity case, smaller windows, Δr = 0.14 for T1 = 0.85 and Δr = 0.09 for T2 = 0.4 were used according to Eq. 6 (γ = 1.5), and thus the output was more robust. We also note that the optimal window size shrunk at the lower temperature. The trend is universal, since fx has a component βv · F in the canonical ensemble, as one increases β (or lowers the temperature), the fluctuation grows, and thus the window size shrinks. The example again illustrates the critical influence from the window size.
Figure 4.
The radial distribution function g(r). (a) T1 = 0.85; (b) T2 = 0.4. Gray line: the histogram; blue circles and dashed line: the fractional identity; red squares and dot-dashed line: the original Adib-Jarzynski (AJ) identity (data from the same test sample were used for all three; symbol spacing Δr was 0.1 to avoid cluttering). The reference curve (black line) was computed from the reference sample (same trajectory, higher sampling rate). Lines were plotted with a spacing equal to the bin size δr = 0.002.
Amino acid backbone dihedral angles
Finally, we compute a joint distribution ρ(φ, ψ) of the two backbone dihedrals φ and ψ of a glycine dipeptide. Here φ and ψ are the C′–N–Cα–C and N–Cα–C–N′ dihedral angles, respectively. We first generalize Eq. 1 to the two dimensional case
On the denominator, log ρ(φ, ψ) can still be computed from the two mean force components and , but not via a direct integration, as the calculation is an overdetermined one (i.e., one distribution but two derivatives). Instead, we constructed log ρ(φ, ψ) in such a way that its two partial derivatives matched the observed values with a minimal overall deviation, see Appendix E.
The mean forces are computed as averages as and , in which β is the temperature, F is the force, and the two vector fields vφ = ∇1φ/(∇1φ · ∇1φ), vψ = ∇4ψ/(∇4ψ · ∇4ψ). Note that by ∇1φ and ∇4ψ, we mean the gradient components from the atom C′ for φ, and those from the atom N′ for ψ, respectively. The above mean force formulas were free from both the cross correlation and the two divergences ∇ · u and ∇ · v, see Appendix F for details.
We dissolved the glycine dipeptide in a 32 × 32 × 32 Å3 TIP3P11 water box, and ran the simulation for 36 ns with a time step 1 fs. All chemical bonds of the peptide were allowed to vibrate. A double precision GROMACS 4.512 was used as the simulating engine. The velocity rescaling method,7 SETTLE,13 and particle meshed Ewald (PME) sum14 were used for thermostat, constraints in water molecules and long range electrostatic interaction, respectively. Non-bonded interactions were cutoff at 7 Å and shifted to zero until 8 Å. The PME grid spacing was 12 Å. Dihedral data were collected every step using a 1° × 1° bin.
Due to a relatively large mean force fluctuation, we were only able to use small 4° × 4° windows for the fractional identity, according to Eq. 6 (however, in case the window was empty, we expanded the window symmetrically until it included at least one data point). In Fig. 5, we show that the distribution ρ(φ, ψ) from the fractional identity improved the smoothness over that from the normalized histogram. Particularly, the barrier regions, e.g., φ ≈ 0° and ψ ≈ ±100° were enhanced. Additionally the forbidden band at φ ≈ ±180°, where the histogram was simply missing, was now filled by small but finite numbers. On the other hand, peaks at the helical and extended conformations were well preserved. Nonetheless, the overall gain in this case was very modest due to the large mean force fluctuation.
Figure 5.
The joint distribution of the two backbone dihedrals in the glycine dipeptide. (a) The histogram; (b) the fractional identity.
CONCLUSIONS AND DISCUSSIONS
In conclusion, we presented an identity, Eqs. 1, 5, for estimating a general statistical distribution from data collected in molecular simulations. The new identity has broad applications (e.g., to any variable x and any ensemble, easily extended to higher dimensional distributions, etc.), and at the same time offers a robust and precise output.
The general expression for the conjugate force fx Eq. 5 is also simpler than the conventional −β ∂U/∂x = −β ∇U · ∂rN/∂x15 in that it avoids an inconvenient coordinate transformation in computing ∂ rN/∂x by replacing it with a simpler dot product β v · F plus a divergence ∇ · v. Thus, it is straightforwardly applicable, at least in principle, to an arbitrary x = X(rN). Though the computation of the divergence adds computational complexity, it can be sometimes simplified or avoided with a careful choice of the vector field v (as illustrated in the dihedral example).
We also showed that the window size should be carefully chosen to maximize the benefit from the identity. An overly wide window risks a large error from the mean-force integration, although it usually yields a smoother distribution.
Finally, we distinguish the method from an explicit smoothing method16 in that this method does not assume a smooth distribution. Although the use of the mean force (log ρ)′ implies a differentiable distribution density ρ(x), the mean force itself can be as oscillatory as an apparent noise. It is possible that with some approximations, the method can be further improved by introducing elements of the explicit smoothing techniques.
ACKNOWLEDGMENTS
We thank Dr. Michael Deem for introducing us to the work of Adib and Jarzynski and for many encouraging and helpful discussions. We also thank Dr. Thomas Woolf, Dr. John Straub, Dr. Michael Shirts, and Dr. Thomas Truskett for helpful comments and discussions as well as the referee for critical comments. J.M. acknowledges support from National Institutes of Health (R01-GM067801), National Science Foundation (NSF) (MCB-0818353), the Welch Foundation (Q-1512), the Welch Chemistry and Biology Collaborative Grant from John S. Dunn Gulf Coast Consortium for Chemical Genomics, and the Faculty Initiatives Fund from Rice University. This work was also supported by the Shared University Grid at Rice funded by NSF under Grant No. EIA-0216467.
APPENDIX A: ALTERNATIVE DERIVATION OF THE FRACTIONAL IDENTITY
We first rephrase the work of Adib and Jarzynski in our context. The presentation is given with respect to a general distribution, and thus formally differs from the original one,1 which focused on a radial distribution. Nevertheless, the basic features are similar. First, any function ρ(x) at x = x* can be evaluated as an integral over (x−, x*) as
where ϕ(x) is an arbitrary function under two conditions, ϕ(x−) = 0 and ϕ(x*) = 1; on the second line, the difference is converted to an integral within the window.
The domain of integration can be enlarged to a window (x−, x+) that encloses x* by applying the above equation to two windows (x−, x*) and (x*, x+), (x – < x* < x+) with different ϕ(x)'s, and then linearly combining them, see Fig. 1b,
| (A1) |
where the combined ϕ(x) satisfies
| (A2) |
with ϕ(x*−) and ϕ(x*+), the values of ϕ(x) immediately at the left and right of x* respectively, serving as coefficients of combining the two windows. The function ϕ(x) is equivalent to the vector field u(r) in the original paper.1
The problem of Eq. A1 is that it does not always yield a positive output since the correction from ρ′(x) can accidentally overthrow the histogram contribution.
We now derive the fractional identity Eq. 1 from Eq. A1. We start from a simple observation: if f(x) is a function that equals to unity at x = x*, i.e., f(x*) = 1, then Eq. A1 applies not only to ρ(x) itself, but also to the product ρ(x) f(x), i.e.,
An arbitrary f(x), or equivalently a reference distribution,1, 15 does not guarantee a non-negative output. However, if we choose f(x) such that ρ(x)f(x) a constant, the second term on the right-hand side of the above equation vanishes, and
| (A3) |
Thus ρ(x*) is nonnegative as long as ϕ′(x) is so. The function f(x) is obtained by integrating the distribution mean force from the boundary f(x*) = 1 as
It is easily verified that and hence .
Finally, we determine ϕ(x) based on the observation that ϕ′(x) acts as a weight in Eq. A3. Thus, to minimize statistical error, ϕ′(x) should be inversely proportional to the variance of ρ(x) f(x).9 For a small window, we assume that the error of ρ(x) f(x) comes mainly from the number of visits ρ(x) instead of the modulation factor f(x). Additionally, we assume the variance of ρ(x) is proportional to ρ(x), i.e., a Poisson distribution,9 we thus have Var[ρ(x) f(x)] = Var[ρ(x)] f2(x)∝ρ(x) f2(x)∝f(x). ϕ′(x) can now be written as C/f(x), where the constant C is determined from Eqs. A2 as (the singularity at x = x* is ignored). Solving the equation gives , and Eq. 1 is recovered.
APPENDIX B: IMPROVING THE MEAN FORCE
We sometimes need a precise mean force (log ρ)′(x) itself. If the second-order derivative (log ρ)″(x) is available, one can apply an Adib-Jarzynski-like identity (see Appendix A) as
| (B1) |
where φ(x) satisfies φ(x−) = φ(x+) = 0, φ(x0 − δ) − φ(x0 + δ) = 1. For simplicity, we use a linear function φ(x) = (x − xb)/(x+ − x−), with xb = x− if x < x0 or xb = x+ otherwise. Note that the window (x−, x+) can be different from that in Eq. 1. An averaging expression for computing (log ρ)″(x) can be found by taking the derivative of Eq. 5
| (B2) |
where vx denotes the vector field given by Eq. 4 or 4' for the quantity x.
Equations B1, B2 are particularly useful in computing the volume distribution, in which a smooth multiplicative correction term β⟨pc(V)⟩V can be obtained from the mean force (log ρ)′(V) = β(⟨pc(V)⟩V − p) using the above method.In the following, we list the formulas of vx · ∇fx for the first three examples in Sec. 3. For the potential energy distribution in the canonical ensemble, fU = ∇ · v − β,
where F = −∇U is the molecular force, M ≡ 2F · ∇∇U · F, h = 2∇∇U · F, and the “⋮” denotes the triple dot product between two tensors of order three. For the microcanonical ensemble, we add an additional term to the above formula.
For the volume distribution, v · ∇ → ∂/∂V, fV = N/V − β∂U/∂V, and
where on the second line, we have assumed the potential energy U is a sum over particle pairs (m, n) for the molecular potential u(r), i.e., U = ∑(m, n)u(rmn), and ϕ(r) = u′(r)/r, ψ(r) = ϕ′(r)/r. For the radial distribution function of a pair of particles 1 and 2,
where rjk is the displacement vector from k to j, and rjk = |rjk|.
APPENDIX C: SWITCHED LENNARD-JONES POTENTIAL
The Lennard-Jones potential u(r) = 4ɛ[(σ/r)12 − (σ/r)6] is switched at r = rs to a polynomial and extended to zero at r = rc. For simplicity, we assume the reduced unit, whereby both the energy unit ɛ and diameter σ are 1.0. In our application of the energy distribution, we need the potential and the first three derivatives to be continuous, since the derivative of the dynamic temperature requires up to the third-order derivative of the potential. The continuity at r = rc is guaranteed by the first four vanishing coefficients. To ensure the continuity up to third-order derivatives at r = rs, the following parameters are used:
where, Δr = rc − rs, ,, and .
APPENDIX D: THE REFERENCE DISTRIBUTIONS
In each of Secs. 3A, 3B, 3C, we prepared both the test and reference samples from the same trajectory but with different sampling rates of picking frames. Since frames in the test sample were just a subset of those in the reference one, any sampling inaccuracy, e.g., due to insufficient equilibration or sampling, would be shared by both samples, and thus would not affect the comparison of different methods. We also emphasize that, in either sample, the numbers of sampling points available to different methods were exactly the same.
We have used the fractional identity to produce the reference distributions, in Figs. 234, which might be unfair to the alternative identities. However, due to the large size of the reference sample, the reference distributions were insensitive to what method we choose. In Fig. 6, we show that the results from the fractional identity agreed well with the properly normalized histograms, as well as those from alternative identities [compare Figs. 2 with 6a, 3 with 6b, 4 with 6c, 6d]. The latter comparison also shows the alternative identities are unbiased, although less stable in handling smaller sample size. Note that we used the WHAM version in producing the reference distribution in Fig. 3a, which further improved the one in Fig. 6a at the edges.
APPENDIX E: POTENTIAL FROM THE TWO-DIMENSIONAL MEAN FORCE
Unlike the one-dimensional case, determining a two-dimensional “potential” u(x, y) ≡ log ρ from the two mean force components f = ∂u/∂x and g = ∂u/∂y is an overdetermined problem, as the number of variables is only half of the number of the equations. Thus we seek the “best fit” in the following.
We assume the potential is set up on a two-dimensional grid of N × M with a cell (bin) size δx × δy. For a cell at (n, m), where n = 1, 2, …N, m = 1, 2, …M, we wish to minimize the difference between the mean force value from fn, m and that from discretely differentiating the u values at four cell corners (un + 1, m + un + 1, m + 1 − un, m − un, m + 1)/(2δx) (and similarly for gn, m). To the end, we minimize the following action S:
At the minimum, we must have ∂S/∂un, m = 0 for every un, m, i.e.,
The set of linear equations can be solved using Fourier transform. With ( and similarly defined), we have
A final inverse Fourier transform yields the desired potential un, m in the real space.
APPENDIX F: ON THE DIHEDRAL DISTRIBUTION
Here we derive a general mean force formula [analogous to Eq. 5] for the joint dihedral distribution. We start from the definition
Following a similar derivation leading to Eq. 5, we have
where vφ is a vector field satisfying vφ · ∇Φ = 1. We note that the last integral, which is hard to evaluate, vanishes if vφ · ∇Ψ = 0. Thus we shall generalize Eq. 4' to satisfy both vφ · ∇Φ = 1 and vφ · ∇Ψ = 0 as
| (F1) |
where Yφ and Yψ are such vector fields that ∇Φ · Yφ ≠ 0 and ∇Ψ · Yψ ≠ 0. Similarly
| (F2) |
If the two dihedrals are decoupled ∇Φ · Yψ = ∇Ψ · Yφ = 0, Eqs. F1, F2 are then reduced to Eq. 4', i.e., vφ = Yφ/(∇Φ · Yφ) and . This condition is satisfied if Yφ = ∇1Φ and Yψ = ∇4Ψ, i.e., the components from the atom C′ in C′–N–Cα–C for ∇Φ, and those from the atom N′ in N–Cα–C–N′ for ∇Ψ.
We can also show that in this case ∇ · vφ = ∇ · vψ = 0 (i.e., if only the 1, 4 atoms are involved in Yφ and Yψ). Using Φ as example, we first label the C′, N, Cα, and C atoms by 1, 2, 3, and 4, respectively, and express the cosine of the dihedral as , where and are the unit vectors of the planes 1-2-3 and 2-3-4, respectively, with m = r12 × r32, m = |m|, n = r32 × r34 and n = |n|. The components of the gradient are listed below:
| (F3) |
For completeness, we sketch a geometric derivation of Eqs. F3. First, the gradient of particle 1 must be parallel to , since a displacement within the plane 1-2-3 leaves the plane vector and hence cos Φ unchanged. The magnitude of the gradient is the inverse of the perpendicular distance from particle 1 to the line connecting particles 2 and 3, or |∇1Φ| = r32/m. Thus . Similarly, . The equation for ∇2Φ can be derived from that the dihedral is invariant upon a rotation around any axis h passing through particle 3, i.e., . By using h = m and n, and solving the equations for ∇2Φ, we reach the second equation. The equation for ∇3Φ follows from the invariance of the dihedral upon any translation d, , or ∇3Φ = −(∇1Φ + ∇2Φ + ∇4Φ).
Now if Yφ = ∇1Φ, then . Thus the gradient flow forms concentric circles around the axis connecting particles 2 and 3. From Gauss's law, we must have ∇ · vφ = 0 for the flow has no source or sink. We have thus reached the formulas given in the main text for the canonical ensemble: (and similarly ).
References
- Adib A. B. and Jarzynski C., J. Chem. Phys. 122(1), 014114 (2005). 10.1063/1.1829631 [DOI] [PubMed] [Google Scholar]
- Lyubartsev A. P., Martsinovski A. A., Shevkunov S. V., and Vorontsovvelyaminov P. N., J. Chem. Phys. 96(3), 1776 (1992) 10.1063/1.462133; [DOI] [Google Scholar]; Marinari E. and Parisi G., Europhys. Lett. 19(6), 451 (1992) 10.1209/0295-5075/19/6/002; [DOI] [Google Scholar]; Zhang C. and Ma J., Phys. Rev. E 76, 036708 (2007). 10.1103/PhysRevE.76.036708 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Swendsen R. H. and Wang J. S., Phys. Rev. Lett. 57(21), 2607 (1986) 10.1103/PhysRevLett.57.2607; [DOI] [PubMed] [Google Scholar]; Geyer C. J., Proceedings of the 23rd Symposium on the Interface (American Statistical Association, New York, 1991);; Hukushima K. and Nemoto K., J. Phys. Soc. Jpn. 65(6), 1604 (1996) 10.1143/JPSJ.65.1604; [DOI] [Google Scholar]; Hansmann U. H. E., Chem. Phys. Lett. 281(1–3), 140 (1997). 10.1016/S0009-2614(97)01198-6 [DOI] [Google Scholar]
- Rugh H., Phys. Rev. Lett. 78, 772 (1997). 10.1103/PhysRevLett.78.772 [DOI] [Google Scholar]
- Ferrenberg A. M. and Swendsen R. H., Phys. Rev. Lett. 61(23), 2635 (1988) 10.1103/PhysRevLett.61.2635; [DOI] [PubMed] [Google Scholar]; Ferrenberg A. M. and Swendsen R. H., Phys. Rev. Lett. 63(12), 1195 (1989); 10.1103/PhysRevLett.63.1195 [DOI] [PubMed] [Google Scholar]; Chodera J. D., Swope W. C., Pitera J. W., Seok C., and Dill K. A., J. Chem. Theory Comput. 3(1), 26 (2007) 10.1021/ct0502864; [DOI] [PubMed] [Google Scholar]; Kim J., Keyes T., and Straub J. E., J. Chem. Phys. 135 (6), 061103 (2011). 10.1063/1.3626150 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Butler B. D., Ayton G., Jepps O. G., and Evans D. J., J. Chem. Phys. 109(16), 6519 (1998) 10.1063/1.477301; [DOI] [Google Scholar]; Jepps O. G., Ayton G., and Evans D. J., Phys. Rev. E 62(4 Pt A), 4757 (2000) 10.1103/PhysRevE.62.4757; [DOI] [PubMed] [Google Scholar]; Yan Q. and de Pablo J. J., Phys. Rev. Lett. 90(3), 035701 (2003); 10.1103/PhysRevLett.90.035701 [DOI] [PubMed] [Google Scholar]; Braga C. and Travisa K. P., J. Chem. Phys. 123, 134101 (2005); 10.1063/1.2013227 [DOI] [PubMed] [Google Scholar]; Adib A. B., Phys. Rev. E 71(5 Pt 2), 056128 (2005). 10.1103/PhysRevE.71.056128 [DOI] [PubMed] [Google Scholar]
- Bussi G., Donadio D., and Parrinello M., J. Chem. Phys. 126(1), 014101 (2007). 10.1063/1.2408420 [DOI] [PubMed] [Google Scholar]
- Press W. H., Teukolsky S. A., Vetterling W. T., and Flannery B. P., Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. (Cambridge University Press, Cambridge, 1992). [Google Scholar]
- Frenkel D. and Smit B., Understanding Molecular Simulation from Algorithms to Applications, 2nd ed. (Academic, 2002). [Google Scholar]
- Koper G. J. M. and Reiss H., The J. Phys. Chem. 100(1), 422 (1996). 10.1021/jp951819f [DOI] [Google Scholar]
- Jorgensen W. L., Chandrasekhar J., Madura J. D., Impey R. W. I., and Klein M. L., J. Chem. Phys. 79, 926 (1983). 10.1063/1.445869 [DOI] [Google Scholar]
- Van Der Spoel D., Lindahl E., Hess B., Groenhof G., Mark A. E., and Berendsen H. J., J. Comput. Chem. 26(16), 1701 (2005). 10.1002/jcc.20291 [DOI] [PubMed] [Google Scholar]
- Miyamoto S. and Kollman P. A., J. Comput. Chem. 13(8), 952 (1992). 10.1002/jcc.540130805 [DOI] [Google Scholar]
- Essmann U., Perela L., Berkowitz M. L., Darden T., Lee H., and Pedersen L. G., J. Chem. Phys. 103, 8577 (1995). 10.1063/1.470117 [DOI] [Google Scholar]
- Basner J. E. and Jarzynski C., J. Phys. Chem. B 112(40), 12722 (2008). 10.1021/jp803635e [DOI] [PubMed] [Google Scholar]
- Berg B. A. and Harris R. C., Comput. Phys. Commun. 179, 443 (2008) 10.1016/j.cpc.2008.03.010; [DOI] [Google Scholar]; van Zon R. and Schofield J., J. Chem. Phys. 132(15), 154110 (2010). 10.1063/1.3366523 [DOI] [PubMed] [Google Scholar]






