Skip to main content
The Journal of Chemical Physics logoLink to The Journal of Chemical Physics
. 2016 Mar 21;144(11):114112. doi: 10.1063/1.4943868

Multilevel summation with B-spline interpolation for pairwise interactions in molecular dynamics simulations

David J Hardy 1,a), Matthew A Wolff 2, Jianlin Xia 3, Klaus Schulten 1, Robert D Skeel 2
PMCID: PMC4808071  PMID: 27004867

Abstract

The multilevel summation method for calculating electrostatic interactions in molecular dynamics simulations constructs an approximation to a pairwise interaction kernel and its gradient, which can be evaluated at a cost that scales linearly with the number of atoms. The method smoothly splits the kernel into a sum of partial kernels of increasing range and decreasing variability with the longer-range parts interpolated from grids of increasing coarseness. Multilevel summation is especially appropriate in the context of dynamics and minimization, because it can produce continuous gradients. This article explores the use of B-splines to increase the accuracy of the multilevel summation method (for nonperiodic boundaries) without incurring additional computation other than a preprocessing step (whose cost also scales linearly). To obtain accurate results efficiently involves technical difficulties, which are overcome by a novel preprocessing algorithm. Numerical experiments demonstrate that the resulting method offers substantial improvements in accuracy and that its performance is competitive with an implementation of the fast multipole method in general and markedly better for Hamiltonian formulations of molecular dynamics. The improvement is great enough to establish multilevel summation as a serious contender for calculating pairwise interactions in molecular dynamics simulations. In particular, the method appears to be uniquely capable for molecular dynamics in two situations, nonperiodic boundary conditions and massively parallel computation, where the fast Fourier transform employed in the particle–mesh Ewald method falls short.

I. INTRODUCTION

The calculation of pairwise interactions among a large set of particles is vital to many simulations of physical phenomena. This calculation is done either directly or using fast methods. For doing fast N-body calculations, there are two common approaches: The first is to use hierarchical clustering methods (HCMs) such as the fast multipole method (FMM) and tree methods. The second, especially for periodic boundaries, is to use methods based on the fast Fourier transform (FFT), such as the (smooth) particle–mesh Ewald (PME)1 and particle–particle particle–mesh (P3M)2 methods. However, many of these computations can be done more quickly using a relatively obscure algorithm known as the multilevel summation method (MSM).3 The MSM is a simple, yet flexible, linear time algorithm based on multiscale piecewise polynomial interpolation of the interaction kernel and well suited for modern computer architectures, due to its use of moderately large grid stencils. Indeed, multilevel summation might be expected to outperform other methods for important classes of problems and to be a formidable competitor in other situations. In particular, the method appears to be uniquely capable for molecular dynamics in two situations, nonperiodic systems and massively parallel computation, where the FFT falls short.4

The calculation of pairwise interactions and the solution of elliptic partial equations are the time-limiting steps of applications that consume vast amounts of CPU cycles. Molecular dynamics (MD), in particular, can require months of computer time; hence, the extraordinary efforts to maximize performance, such as Desmond,5 OpenMM,6 NAMD,7 and GROMACS.8 The work presented here is motivated by problems in computational molecular biophysics. Of course, there are many other applications, including atomic level and coarser-grained simulation of all types of materials, particle methods for fluid dynamics, astrophysics, the Coulomb term in Hartree-Fock and density functional theory, integral transforms, and partial differential equations having explicit solutions involving integrals.9

Though the MSM will benefit many applications, it is uniquely qualified for molecular simulations involving nonperiodic boundary conditions, including solvent boundary potentials10,11 and the modeling of implicit (i.e., continuum) solvent.

The multilevel summation method was introduced for integral transforms in 1990.3 It was later applied to particle monopoles and dipoles in 2D,12 C1 kernels for particles in 3D,13 eigenvalues,14 generalized Born potentials,15 interseismic stress interactions,16 Madelung constants of ionic crystals,17 and dispersion interactions.18 The MSM has been shown to have good parallel scalability compared to PME,19 it is an option in the molecular simulators NAMD4 and LAMMPS,20 and it has been used for a GPU implementation of the electrostatic potential calculation21 in the molecular visualization and analysis program VMD.22 One study21 produces a speedup of 26.4, over the use of a CPU alone, for calculating a map of the electrostatic potential of a 1.5 × 106 atom system. A recent article23 examines various implementation issues, including error estimation. This article as well as Ref. 4 involving the use of MSM in NAMD and the thesis (Ref. 24, Section 6.2) demonstrates that the MSM handles in a straightforward manner periodic boundary conditions in 1, 2, or 3 coordinates, including general parallelepipeds.

A concern of previous studies4,13,23 of the multilevel summation method is its accuracy as a function of computational effort. The principal contribution of the present article is to address this shortcoming, by showing how to implement B-splines for nonperiodic boundaries and how their use makes the multilevel summation method an exceptionally efficient algorithm for MD. B-spline approximation has proved effective for the popular algorithm (S)PME1 for Coulomb interactions with periodic boundary conditions. However, for nonperiodic boundary conditions, there is currently no satisfactory algorithm. B-splines have several advantages over the C1 piecewise polynomials used previously: In addition to higher regularity, B-splines provide one order of accuracy higher for the same work and provide nested function spaces for nested grids, making prolongation operations exact. As a consequence of the nesting property, reduced grid extensions are possible. The disadvantage of a B-spline is that it is nonzero at several grid points, which necessitates a preprocessing step to determine coefficients. B-spline coefficients for an interpolant of a given function can be obtained by convolving point values of that function with a special sequence that is derived from the B-spline itself.25 Given in the present article is a complete algorithm, which takes care of three difficulties. The first two are (i) generating the special sequence automatically and (ii) reducing the interpolation in 6 variables of the kernel to interpolation in 3 variables. The third is reducing the preprocessing time, for those situations where this matters and doing so with minimal loss of accuracy. This is accomplished by quasi-interpolation, which maintains the desired order of accuracy at the expense of collocation (exactly matching values of the given function). Presented here is a novel algorithm that provides a stable way to do quasi-interpolation with arbitrarily small collocation error. Numerical experiments indicate that the proposed MSM implementation is competitive with a modern C implementation of the FMM,26 and it is several times faster if the fast multipole parameters are chosen to avoid energy drift (which is consistent with previous results13). A secondary contribution of this article is some insight into the basic structure of the MSM and other N-body methods.

A. Specification of task

Considered here are Coulomb interactions in 3 dimensions. Let ri denote the position of particle i, and let qi be its partial charge. The task is to compute sums of the form

U(r1,r2,,rN)=12i=1Nj=1jχ(i)Nqiqjk(ri,rj),k(r,r)=1|rr| (1)

as well as derivatives of such sums. Here χ(i) consists of the indices of those particles that are excluded: j = i and in the case of molecular dynamics simulations also values of j corresponding to atoms covalently bonded to atom i or to another atom that is covalently bonded to atom i. Generally, there is also a constant prefactor, which is omitted here. Kernels other than k(r, r′) = |rr′|−1 are possible, as are extensions to dipoles, etc. It is computationally advantageous to have k(r, r′) = κ(rr′). (For multilevel summation, it substantially reduces memory requirements.)

B. Basics of multilevel summation methods

The idea of the multilevel summation method is to create a multilevel separable approximation of the form

k(r,r)k0(r,r)+k˜1+(r,r),k˜1+(r,r)=l=1Lmnφml(r)Km,nlφnl(r). (2)

The superscripts l index nested grid levels and the subscripts m, n index grid points. The finest grid has a grid size h chosen so that it has O(N) grid points. The first term k0(r, r′) is a direct calculation with a cutoff a that is a small multiple of h. The basis functions φml(r) have local support. In particular, for any value of r, there are at most p3 values of m for which φml(r) is nonzero, where p − 1 is the degree of the piecewise polynomial. Also, for any level l < L and any m, there are only about 43π(2a/h)3 values of n for which Km,nl is nonzero, e.g., 3239 if a/h = 4.6.

The utility of approximation (2) is realized when applied to sum (1) with a large number of particles N. If the distribution of particles is roughly uniform, as it is for condensed matter, then the cost of the direct calculation using k0 is O(N). The calculation for grid level l can be done as follows:

mnKm,nlqmlqnl,

where

qml=iqiφml(ri). (3)

The number of nonzeros in φm1(ri) for each i is p3, yielding an operation count of O(N) for q1. The two-scale relation for B-splines enables a nested calculation of the qml, l > 1. It has the form

qml=n(Il1l)m,nqnl1, (4)

where the sum over n has (p + 1)3 nonzero terms. There are O(81lN) elements in ql, so the operation count for all levels l is O(N). The sum over m in Eq. (3) has O(81lN) terms, and the sum over n has about 43π(2a/h)3 terms—for levels l < L. If L is chosen so that the coarsest grid has about N1/2 points, the double sum for l = L will have about N terms. The total operation count is thus O(N). If forces are to be computed, the symmetry of the energy expression cannot be exploited and the computation is not quite as straightforward; see Section III A. Also discussed there is the treatment of excluded interactions.

C. Outline

Sections II A–II E consider approximation issues. The multilevel approximation, Eq. (2), is based on a splitting of the kernel k(r, r′). Section II A shows that the splitting proposed in Ref. 13 is optimal in a certain theoretical sense. Construction of the B-spline coefficients Km,nl is the topic of Sections II B and II C. Section II D presents theoretical results on the accuracy of B-spline interpolation, as well as an experimental comparison of the accuracy of B-splines to that of C1 piecewise polynomials. Section II E shows how the two-scale relation for B-splines can be used to nest interpolation operations for the different levels, which does not hold exactly for C1 piecewise polynomials.

Sections III A and III B consider algorithmic issues. Section III A gives the structure of the algorithm. Section III B compares the performance of an MSM implementation to that of an FMM.

Section IV A presents a detailed comparison between multilevel summation and alternative methods.

D. Conclusion

The multilevel summation method has two components (a softener for the interaction kernel and an interpolation scheme) and three parameters (grid size, ratio of cutoff to grid size, and order of interpolation). It combines the best features of HCMs and FFT-based 2-level methods, making it a strong candidate as the method of choice for molecular biophysics and structural biology. It shares with hierarchical clustering methods a geometry-based hierarchical structure resulting in calculations that are more parallelizable and have an essentially O(N) operation count. It shares with FFT-based kernel-splitting methods their relative simplicity and the property of computing an interaction kernel having any degree of continuity. The use of B-splines reduces the error of multilevel summation by an order of magnitude compared to previously used C1 interpolants. Coupled with innovative quasi-interpolation techniques, this significantly improves the performance of multilevel summation, as shown by numerical experiments and a partial error analysis. The availability of a fast method for nonperiodic boundaries encourages the development and use of models that do not require periodicity.

II. THEORY

A. Splitting the kernel

Multilevel summation is based on separation of length scales and interpolation from grids for all but the shortest length scale. The separation of scales is effected by splitting the kernel into a sum of partial kernels of increasing range and increasing length scale. In particular, an (L + 1)-level summation method uses

k(r,r)=k0(r,r)+k1(r,r)++kL(r,r), (5)

where the short-range part k0 is calculated directly, and the other parts are interpolated as functions of r, r′ from pairs of identical 3-dimensional grids of increasing coarseness. In this way, the problem is reduced to calculating interactions between nearby particles at level 0 and between nearby grid points at higher levels. Typically, the terms of the split kernel would have ranges a, 2a, …, 2La, +∞, respectively, where a is a cutoff parameter. Because the range and the grid size are both doubling at each level, the number of interactions per grid point is the same at each level. For a kernel depending only on distance, k(r, r′) = g(|rr′|), one can define a splitting

g(r)=g0(r)+g1(r)++gL(r), (6)

as illustrated in Figure 1, and define kl(r, r′) = gl(|rr′|). This approach to scale separation is proposed in Ref. 27; the original approach3 is to use a single smoothed kernel and perform scale separation on the (approximate) discretized smoothed kernel.

FIG. 1.

FIG. 1.

A sextic even-powers splitting of the 1/r kernel as a sum g(r) = g0(r) + g1(r) + g2(r).

The splitting of the 1/r kernel for the MSM is defined by a cutoff distance a and a dimensionless softening function γ(ρ), defined to be 1/ρ for ρ ≥ 1 and to have bounded higher derivatives for ρ ≤ 1. Specific formulas for γ are derived below. Multilevel splitting is neatly expressed as

1ρ=γ0(ρ)+12γ1(12ρ)++12LγL(12Lρ),

where

γ0(ρ)=(1/ρ)γ(ρ),γl(ρ)=2γ(2ρ)γ(ρ),l=1,2,,L1,γL(ρ)=2γ(2ρ).

From this, define

gl(r)=1alγl(ral),l=0,1,,L,

where al = 2la. Kernels γl(ρ), 0 ≤ lL − 1 cut off at ρ = 1, and accordingly, the first L kernels k0, k1, …, kL−1 are zero beyond cutoff distances a, 2a, …, 2La, respectively, but kL retains the infinitely long tail of 1/r for r ≥ 2La. Because the kernels k1, k2, …, kL lack the singularity at zero and have higher derivatives of decreasing magnitude, they can be well approximated on grids of spacing h, 2h, …, 2L−1h, respectively, for an appropriate choice of h.

The accuracy of interpolants of order p is known to be dependent on the magnitude of the pth derivative of the function being interpolated. In particular (Ref. 24, Sec. 3.1.2), what matters for accuracy of the interpolant and its gradient is

Mp=(p/up)ζ

and

Mp=(/v)(p1/up1)ζ,

where

ζ(u,v,w)=γ(u2+v2+w2).

This requires ζ to be Cp−1. Assuming p is even, it can be shown that Cp−1 continuity implies that (dk/dρk) γ(1) = (−1)kk!, k = 0, 1, …, p − 1, and, by expanding γ(|u|) in a Maclaurin series for each of u ≤ 0 and u ≥ 0, that (dk/dρk) γ(0) = 0, k = 1, 3, …, p − 1. As a heuristic, minimize 01((dp/dρp)γ(ρ))2dρ for the function γ(ρ). Applying the calculus of variations and integrating by parts yields additional conditions

(dk/dρk)γ(0)=0,k=p+1,p+3,,2p1

and

(d2p/dρ2p)γ(ρ)0.

The result is a softener defined in terms of even powers.

The even-powered softening functions are obtained for ρ ≤ 1 by the Taylor expansion of ρ−1 = s−1/2 about s = 1,

s1/2=112(s1)+38(s1)2516(s1)3+.

Truncate this expansion so that s1/2=τp(s)+O((s1)p) where τp(s) is a polynomial of degree p − 1. Hence, Δ(s) = s−1/2τp(s) and its first p − 1 derivatives vanish at s = 1, as do those of Δ(ρ2) = ρ−1τp(ρ2). Therefore, the even-powered softening functions defined by

γp(ρ)={τp(ρ2),for0ρ1,1/ρ,forρ1

satisfy all the conditions given above. This construction is equivalent to that of Ref. 27, Eq. (13).

B. Spline interpolation

Let f(x) be a bounded function defined for all real x, and consider the problem of interpolating it using splines with knots from a uniform grid xm = mh, m = 0, ± 1, ± 2, …. For basis functions, consider the use of B-splines of fixed degree p − 1 where p is even, which are advantageous due to the minimal number of grid cells on which they are nonzero.

1. B-splines

Construction of a B-spline can be done by means of a recurrence (Ref. 1, Eq. (4.1) and Ref. 28, Thm. 4.3(viii)). Let Q1 be the indicator function for the half-open interval [0, 1[. The recurrence defining the B-spline Qk of degree k − 1 is

Qk(u)=uk1Qk1(u)+kuk1Qk1(u1). (7)

For interpolation, use the centered B-spline Φ(u) = Qp(u + p/2) of degree p − 1 as an unscaled basis function. It has local support [ − p/2, p/2] consisting of just p grid cells along the u axis.

A Taylor expansion for each piece of the B-spline Φ(u) can be precomputed using the recurrence in Eq. (7) and the relation (Ref. 1, Eq. (4.2) and Ref. 28, Thm. 4.3(vii))

(d/du)Qk(u)=Qk1(u)Qk1(u1).

2. Interpolation in one dimension

By design, the interpolant of f(x) has the form

f˜(x)=nfˆnφn(x),

where φn(x) = Φ(x/hn). For any particular value of x, only p terms of the sum are nonzero. The coefficients fˆn are chosen so that f˜(xm)=f(xm) at all grid points xm.

The problem of determining the interpolant f˜(x) reduces to that of finding the interpolant Ψ(u) that satisfies

Ψ(u)=1,u=0,0,u=±1,±2,.

sometimes called a “fundamental” spline. Assuming there is a solution

Ψ(u)=mωmΦ(um), (8)

the interpolant of f(x) is given by

f˜(x)=nf(nh)Ψ(x/hn)=mfˆmΦ(x/hm),

where

fˆm=nωmnf(nh). (9)

It is shown in Schoenberg25 (p. 37) and Chui28 (p. 110) that there exists unique coefficients ωm in Eq. (8) if Ψ(u) is required to be bounded. In the limit as the degree p → ∞, Ψ(u) becomes the sinc function sinπu/(πu). Shown in Figure 2 are plots of the cubic, quintic, and septic fundamental splines and the sinc function.

FIG. 2.

FIG. 2.

The cubic, quintic, and septic fundamental splines and the sinc function.

Placing this into context, the evaluation of the interpolant requires computation at three levels:

  • 1.

    Prepreprocessing: compute universal values ωm; compute Taylor expansions for the p pieces of Φ(u) using recurrences.

  • 2.

    Preprocessing: compute coefficients fˆm.

  • 3.

    Processing: calculate values f˜(x) (and derivatives) using Horner’s rule.

To obtain formulas for prepreprocessing, it is convenient to work with discrete operators that act on sequences. With fˆ denoting the sequence with terms fˆn and fh denoting that with terms fnh=f(nh), one can write

fh=Bfˆ,

where

B=n=1p/2(p/2)1Φ(n)En

and E is the forward shift operator (Efˆ)n=fˆn+1. Therefore, it follows from fˆ=B1fh and Eq. (9) that

B1=nωnEn. (10)

Appendix A provides an algorithm for computing the coefficients ωn.

3. Interpolation for convolution kernels in one dimension

Consider the interpolation of F(x, x′) = f(xx′) from values on a 2-dimensional grid with spacing h using basis functions φm(x) φn(x′)

F˜(x,x)=mnFˆmnφm(x)φn(x).

With Fˆ denoting the sequence with terms Fˆmn and Fh denoting that with terms Fmnh=F(mh,nh), one can write

Fh=B1B2Fˆ,

where B1 operates on the first index and B2 on the second. Let fh denote the 1-dimensional sequence with fkh=f(kh). To relate Fh to fh, write

Fh=Tfh,

where T is the operator mapping a 1-dimensional sequence g to a 2-dimensional sequence Tg defined by (Tg)mn=gmn. It is straightforward to show

B1Tg=TBg,andB2Tg=TBg,

where the second equality uses the relation Φ(−m) = Φ(m). Therefore, Fˆ=B21B11Fh=B21B11Tfh=B21TB1fh=TB2fh, and the B-spline interpolant of F(x, x′) = f(xx′) is given by

F˜(x,x)=mn(B2fh)mnφm(x)φn(x). (11)

Coefficients ωm in the expansion

B2=nωnEn

are given in Table I.

TABLE I.

Tabulation of ωm for p − 1 vs. m, m = 0, 1, …, 12 with μ = ∞.

m Cubic Quintic Septic Nonic Undecic
0 3.464 12.379 51.971 241.384 1190.122
1 −1.732 −9.377 −45.671 −225.114 −1140.060
2 0.679 5.809 34.575 189.064 1014.420
3 −0.240 −3.266 −24.022 −147.928 −853.182
4 0.080 1.735 15.825 110.306 688.291
5 −0.026 −0.889 −10.061 −79.525 −538.377
6 0.008 0.444 6.237 55.945 411.423
7 −0.002 −0.217 −3.794 −38.635 −308.820
8 0.001 0.105 2.275 26.301 228.557
9 0.000 −0.050 −1.348 −17.700 −167.246
10 0.000 0.024 0.792 11.801 121.250
11 0.000 −0.011 −0.461 −7.807 −87.226
12 0.000 0.005 0.267 5.131 62.340

4. Quasi-interpolation

In practice, an expansion of B2 in negative and positive powers of E must be truncated at some point. Due to slow convergence, it is not always possible to amortize the preprocessing cost of obtaining adequate accuracy. Following the suggestion of Chui28 (Section 4.5) the truncation is done in such a way that the property of being exact for polynomials of degree p − 1 is preserved, thus maintaining pth order accuracy. At the same time, it is desirable to have control on the approximation error at the grid points. The expansion of Chui28 (Eq. (4.5.14)) cannot be used, because the norm of the operator is not bounded uniformly with respect to the number of terms.

To overcome this limitation, one proceeds as follows: The central difference operator δ satisfies δ2 = E − 2 + E−1, and it is not hard to see, using the B-spline symmetry,

B=Φ(0)+m=1(p/2)1Φ(m)(Em+Em),

that B can be expressed as a polynomial of degree (p/2) − 1 in δ2

B=Bp/2(δ2).

A formula for the coefficients of Bp/2 is derived in Appendix B from a formula in Ref. 25. As a consequence of the formulation in central differences, B2 can be expressed as

B2=1+b1δ2++b(p/2)1δp2+δpmcmEm, (12)

where cm = cm. For example,

B2=113δ2+O(δ4),p=4,112δ2+41240δ4+O(δ6),p=6.

The truncation of B2 given by

A=1+b1δ2++b(p/2)1δp2+δpm=μμcmEm

is exact for polynomials of degree ≤p − 1 and has an interpolation accuracy at grid points that is determined by the adjustable parameter μ. For quasi-interpolation, Eq. (11) becomes

F˜(x,x)=mn(Afh)mnφm(x)φn(x). (13)

For implementation, one expresses this using A=m=μp/2μ+p/2ωμ,mEm. Note that ωμ,m=ωm for |m| ≤ μp/2. Details on computing coefficients for A are provided in Appendix C.

Note that (S)PME1 does not perform exact interpolation; rather, it interpolates after applying a low-pass filter (i.e., truncating a Fourier series).

5. Quasi-interpolation for convolution kernels in three dimensions

Interpolation from values on a 3-dimensional grid with spacing h can be represented as a linear combination of basis functions φm(r) = φmx(rx) φmy(ry) φmz(rz), where m indexes the points of the grid. Let F˜(r,r) be a spline (quasi-)interpolant of the kernel F(r, r′) = f(rr′) at points (r, r′) = (hm, hn) for all integer vectors m, n. Extending Eq. (13) to 3 dimensions gives

F˜(r,r)=mn(AxAyAzfh)mnφm(r)φn(r), (14)

where (fh)k = f(hk).

C. Kernel approximations

Consider now the construction of the approximation given in Eq. (2). This employs nested grids indexed from 1 through L with the lth grid having grid size hl = 2l−1h and basis functions φml(r)=φmxl(rx)φmyl(ry)φmzl(rz) where φnl(x)=Φ(x/hln). In this section, the grids are taken to be of infinite extent; Section III A determines actual finite limits.

Let k˜l(r,r) be a spline (quasi-)interpolant of the kernel kl(r, r′) = κl(rr′) = gl(|rr′|) at points (r, r′) = (hlm, hln) for all integer vectors m, n. Applying Eq. (14) gives

k˜l(r,r)=mnkˆm,nlφml(r)φnl(r), (15)

where

kˆm,nl=(AxAyAzκl)mnandκkl=κl(hlk).

Note that

κkl=κl(hlk)=gl(|hlk|)=1alγl(hlal|k|)=2laγl(h2a|k|)=2laγkl,

where

γkl=γl(h2a|k|). (16)

Hence,

kˆm,nl=2laKmnl,

where

Kl=AxAyAzγl. (17)

This enables one to write Eq. (15) as

k˜l(r,r)=mn2laKmnlφml(r)φnl(r). (18)

If the spline degree p − 1 exceeds 1, the spline interpolants k˜l(r,r), l = 1, 2, …, L − 1, have (generally) nonzero coefficients Kmnl for the entire domain even though the range of kl is limited. Nonzero values of k˜l(r,r) beyond the range are purely interpolation error. So, for example, one might include only those basis functions φml(r)φnl(r) whose support intersects that of the (finite-range) kernel kl(r, r′). Because the kernel has spherical support and the basis functions have cubic support, it can be shown that the multi-index difference mn would be included only if hlmhln = rc + rs for some |rc| < ph and |rs| < al, where | ⋅ | denotes the maximum norm for vectors. Then the difference hlmhln would lie in a region that is cubic but with rounded edges and corners. However, including all these terms would not only be a bit complicated but also costly (compared to C1 interpolation). Moreover, the more distant pairs of basis functions make very small contributions. As a compromise, in Eq. (18), we use only those Kmnl for which |hlmhln| < al. The resulting cubic stencil is still more costly than the spherical stencil used by C1 interpolation. However, timings for the large water sphere of Section II D 2 show only a small increase in setup time for a cubic rather than a spherical stencil and no increase in the time to do an energy/force calculation. With this choice, the approximation becomes

kl(r,r)mnφml(r)Km,nlφnl(r), (19)

where

Km,nl=(2l/a)Kmnl,l=Lor|mn|<2α,0,otherwise, (20)

where

α=a/h.

In conclusion, one approximates kl(r, r′) as given by Eq. (19) and substitutes into Eq. (5) to obtain the approximation to k(r, r′) given by Eq. (2).

1. Preprocessing

The grid stencils Kkl can be precomputed using Eqs. (17) and (16). For l < L, as indicated in Eq. (20), values of Kkl are needed only for |k| < 2α, and values of γkl are nonzero only for |k| < 2α. Hence, to compute these Kkl, values of ωk are needed only for |k| < 4α, so using μ ≥ 4α + p/2 suffices for exact interpolation. Moreover, for levels l < L, γkl and therefore Kkl are independent of l.

For level L, the range of k depends on the range of the particles. Assume there is a region Ω, e.g., a rectangular box or an ellipsoid, known to contain all particles for every evaluation of the energy and forces. With such a bound, values of KmnL are needed only for m,nML where

Ml={m:φml(r)0for somerΩ}. (21)

Experimental evidence (not shown) suggests using μ ≥ 3p/2 for computing KkL.

The values of the stencils Kkl are invariant under permutations of the multi-index k and under reflections in the direction of each of the 3 axes, resulting in a 48-fold symmetry, with a commensurate reduction in computational cost.

D. Accuracy

There is, in the thesis,24 a rigorous error analysis for multilevel approximation using C1 piecewise polynomials, which provides error bounds in terms of the fundamental method parameters. An important result (Ref. 24, Eq. (3.47)) that transfers to B-splines is that, for the energy, each level contributes an error roughly half that of the previous level. For the force, the factor is one quarter.

The first part of this section compares theoretically the accuracy of different types of piecewise polynomial interpolants. A more complete error analysis, though desirable, is beyond the scope of the present study. The remainder of this section presents results of numerical experiments.

1. Theoretical evidence

A simple examination of results from error analysis for 1 dimension indicates that the accuracy of B-spline interpolation

  • 1.

    is comparable to centered C0 interpolation, (used in Ref. 27) and

  • 2.

    superior to Taylor interpolation (used by the fast multipole method) or Hermite interpolation (investigated in Ref. 24)

for the same computational effort, if the computational effort of evaluating a derivative is assumed to be the same as that for the function. However, the second observation does not necessarily transfer to a six-dimensional interpolation of the kernel.

A routine calculation using classical results shows that centered C0 piecewise polynomial interpolation of degree p − 1 has the error bound

13(p1)24ph2pf(p).

For (p/2)-fold piecewise Hermite interpolation and for piecewise Taylor interpolation, a good error bound for spacing h is

1p!h2pf(p).

For B-spline interpolation, M. Reimer29 (Eqs. (12), (15), (20)) provides the error bound

13(p1)24p+Lh2pf(p),

where L is the linear operator mapping a function f to its interpolation error f˜f. Asymptotically,30,31

L=2πlogp+2log4π+γ+O1pasp,

where γ is the Euler-Mascheroni constant.

Let heff denote the “effective” spacing: For 2-point piecewise Hermite interpolation of derivatives of order 0 through (p/2) − 1, choose spacing h = (p/2) heff to keep the work the same, and for piecewise Taylor interpolation, choose spacing h = pheff to keep the work the same. Otherwise, take h = heff. Using Stirling’s formula, the error term is (Cpheff/2)pf(p) where

Cp=112plogp+O1p,centered,1+4π112plogp+O1p,B-spline,12e112plogp+O1p,Hermite,e112plogp+O1p,Taylor.

2. Empirical evidence

Results of numerical experiments are presented that compare the accuracy of B-spline to C1 piecewise polynomial basis functions for various cutoffs a and that examine their order of accuracy. The experiments use Cp−2 and Cp−1 softening functions with the C1 piecewise polynomials and B-splines, respectively.

Specifically, these computations determine the accuracy of forces for an equilibrated sphere of 10 002 water molecules with radius 42 Å. They calculate error in mass-weighted long-range forces relative to the exact long-range forces, where mass-weighted-norm mi1|Fi1+|2 is applied to each long-range force Fi1+. Dividing by mass gives mass-weighted acceleration, which has the effect of ascribing greater importance to positions of heavy atoms compared to hydrogen. (It is common to use mass-weighted coordinates x¯=M1/2x in computing the RMSD between two structures.32) Because interpolation is applied to only the softened kernel k1+ = k1 + ⋯ + kl, the calculations compare errors relative to forces arising from the softened kernel, using k1+(r, r′) in place of k(r, r′) for energy and forces. A good value for the grid spacing h is 2.5 Å, for which there are approximately N grid points at the finest level (see Section III B 1).

Figure 3 exhibits the accuracy of the long-range parts of forces for B-spline and C1 basis functions for a range of α = a/h, for each of p = 4, 6, 8, 10. The cutoff a ranges through values 5, 6, …, 20 Å, i.e., 2 ≤ a/h ≤ 8. One sees for p ≤ 8 that the B-spline interpolants are an order of magnitude more accurate for equal computational effort. For p = 10, accuracy is significantly improved by extending the “radius” of the grid-to-grid stencils Kkl to |k| < 2α + 1. Also plotted on top of each graph of numerical data is a straight line indicating the inferred theoretical slope as α → ∞. Two features of these slopes require explanation. (i) For basis functions of degree p − 1, it is observed that the order is p − 1 for C1 piecewise polynomials and p for B-splines. These orders of accuracy in the gradients are one greater than what is expected from typical theoretical considerations.24 This can be explained by the fact that interpolation error vanishes at grid points, and as a consequence, the error in the first derivative changes sign within each grid cell. A sum of many such errors accumulates slowly due to cancellation. (ii) There is a gradually steepening of the slope (the observed order of accuracy) of the graph for the C1 interpolant, which is expected. However, the opposite is happening for the B-splines. The initial excess error (which is greater for larger p) for B-splines is a consequence of the truncation of the stencil specified by Eq. (20). This conclusion is confirmed by increasing the width of the stencil by 50%.

FIG. 3.

FIG. 3.

Comparison between B-spline and C1 piecewise polynomial basis functions for relative error in mass-weighted force vs. scaled cutoff α = a/h. Results are given for piecewise polynomials of degrees 3, 5, 7, and 9.

E. Nesting of spline function spaces

A useful property is that of nested interpolation, in which the basis functions of a coarser grid are interpolated exactly at the next finer level. This holds for B-spline interpolation but not for C1 piecewise polynomial interpolation.

A coarse-grid B-spline can be expressed in terms of fine-grid B-splines using the two-scale relation

Φ(u)=n=p/2p/2JnΦ(2un),

where

Jn=21pp(p/2)+|n|. (22)

From this relation, one gets φml(x)=nJnφn+2ml1(x)=nJn2mφnl1(x). (Note that Jn = 0 for |n| > p/2.) In three dimensions, this becomes

φml(r)=nJn2mφnl1(r),

where

Jn=JnxJnyJnx. (23)

Plotted in Figure 4(a) is a coarse-grid cubic B-spline and the 5 fine-grid cubic B-splines that sum up to it.

FIG. 4.

FIG. 4.

(a) A coarse-grid cubic B-spline and 5 fine-grid cubic B-splines that sum up to it. (b) A coarse-grid C1 piecewise cubic and 5 fine-grid C1 piecewise cubics that interpolate it, but do not sum up exactly.

The C1 piecewise polynomials used previously13,24 do not span the space of C1 piecewise polynomials with equidistant knots. For example, the C1 piecewise cubic g(u) with knots at the integers and values g(0) = 1, g′(0) = 0 and g(u) ≡ 0 for |u| ≥ 1 cannot be expressed as a linear combination of the C1 piecewise cubic nodal basis functions. Moreover, nesting does not hold exactly for C1 (nor C0) piecewise polynomials. Consider the case of piecewise cubics: A 2h-grid C1 piecewise cubic nodal basis function has support [ − 4h, 4h]. It cannot be exactly duplicated by h-grid basis functions at −2h, −h, 0, h, 2h. It can be interpolated—with some error—by h-grid basis functions at −3h, −h, 0, h, 3h, whose combined support is now [ − 5h, 5h]. To accommodate the expansion in support, C1 piecewise polynomials require an extension of an additional (p/2) − 1 grid points (Ref. 24, Section 6.1.1). Plotted in Figure 4(b) is a coarse-grid C1 piecewise cubic and 5 fine-grid C1 piecewise cubics that interpolate it, but do not sum up exactly.

III. ALGORITHM AND PERFORMANCE

A. The algorithm

Eqs. (1) and (2) approximate the electrostatic energy as U(r1, r2, …, rN) ≈ U0 + U1+ where

U0=12ijχ(i)qiqjk0(ri,rj)12ijχ(i)qiqjk1+(ri,rj)

and

U1+=12ijqiqjk˜1+(ri,rj). (24)

Substituting Eq. (2) into Eq. (24) gives

U1+=12iqilmφml(ri)nKm,nljφnl(rj)qj.

Taking the negative gradient with respect to ri, gives the ith long-range force

Fi1+=qilmφml(ri)nKm,nljφnl(rj)qj.

Letting

qnl=jφnl(rj)qj (25)

and

E(r)=lmφml(r)eml, (26)

where

eml=nKm,nlqnl

gives

U1+=12iqiE(ri)andFi1+=qiE(ri). (27)

An algorithm for computing the energy and forces from these expressions is presented and derived in a high level form and later described in greater detail.

1. Overview of algorithm

The structure of the algorithm is represented by Figure 5. Different levels correspond to different sets of points, particle positions at the lowest level and grid points at higher levels. Small circles on the left represent charges and small circles on the right represent electric potentials that accumulate as one descends. Arrows are linear operators, or matrices, that map one set of values to another. The confluence of two arrows indicates addition of a longer-range contribution to a given grid level or to the particle level. There are six different types of linear operators. Three of them depend on particle positions, for which the matrix elements are computed as needed rather than stored: short-range interactions, interpolation, anterpolation. The other three types of linear operators are convolutions whose stencils can be precomputed: grid interactions, prolongation, and restriction.

FIG. 5.

FIG. 5.

Diagram of algorithmic steps.

The various steps are the following:

  • 1.

    Short-range interactions. Compute U0 and its gradients.

  • 2.
    Anterpolation. Compute level-1 charges q1 using
    qn1=iφn1(ri)qi. (28)
    This mapping of particle charges to charges at grid points is not interpolation but rather the adjoint of interpolation. It is not the charges that are being interpolated, but their long-range effect (via interpolation of the interaction kernel).
  • 3.
    Restriction. Compute higher level charges using Eq. (4),
    ql=Il1lql1,l=2,3,,L, (29)
    where the restriction operator Il1l is defined by
    (Il1l)m,n=Jn2m. (30)
  • 4.
    Grid to grid mapping and prolongation. Compute accumulated electric potentials for each grid level using the recurrence
    eL+=KLqL, (31)
    el+=Klql+Il+1le(l+1)+,l=L1,L2,,1, (32)
    where the prolongation operator Il+1l=(Ill+1)T.
  • 5.
    Interpolation. Interpolating grid values yields the electric potential
    E(r)=mφm1(r)em1+, (33)
    which is used in formulas (27) for computing energy and forces.

The matrices representing these linear operators are all sparse, except for the operator KL representing interactions between all pairs of grid points on the top level grid. The matrices for Kl, 1 ≤ lL − 1, each have (2⌊2a/h⌋ + 1)3 nonzero elements per row. The matrix that represents interactions k0 at the particle level has about 43π(a/h*)3 nonzero elements per row, assuming a particle density of h*3. The matrices for Ill+1, 1 ≤ lL − 1, each have no more than (p + 1)3 nonzero elements along each row and column. The matrices implicit in Eqs. (33)/(27), representing interpolation, each have no more than (p + 1)3 nonzero elements along each row. Pseudocode for these matrix–vector multiplications is given in Ref. 24, Section 2.3.

a. Exact treatment of exclusions.

The foregoing algorithm interpolates the long-range part of excluded interactions, thereby introducing interpolation error k˜1+(ri,rj)k1+(ri,rj) for nonexistent terms, jχ(i). The algorithm can be augmented to remove the interpolation error of these excluded interactions by doing an additional O(p3NL)=O(p3NlogN) operations. However, two independent implementations of this augmented algorithm yield results for molecular systems that are less accurate. Presumably, there is a fortuitous cancellation of errors of excluded terms with those of included terms. The numerical experiments performed in this study do not correct the interpolation error due to excluded interactions.

b. Derivation of grid to grid operations.

Given here are derivations for Eqs. (29)–(33). To show Eq. (29), use Eq. (30) to write the two-scale relation (23) as

φml(r)=n(Il1l)m,nφnl1(r). (34)

Application of this to Eq. (25) yields the recurrence (4). The next step is to exploit the nesting property to express E(r) in terms of the level 1 basis functions. Applying the two-scale relation to the last two terms of E(r) given by Eq. (26) yields

mφmL1(r)emL1+mφmL(r)emL=mφmL1(r)eL1+ILL1eLm=mφmL1(r)em(L1)+, (35)

where e(L1)+=eL1+ILL1eL. Repeated use of this transformation yields Eq. (33) with e1+ obtained via the recurrence

eL+=eL,el+=el+Il+1le(l+1)+,l=L1,L2,,1.

2. Detailed version of algorithm

At the beginning of the algorithm, determine the index sets Ml defined by Eq. (21).

a. Short-range interactions.

The short-range calculation involves looping over all pairs of particles within a given distance a of each other. Avoiding pairs whose separation distance is much beyond the cutoff a is achieved by the spatial hashing of atoms into bins and considering for each atom just those atoms in the surrounding bins.

b. Anterpolation.

The anterpolation step loops over all particles, spreading the charge of each particle onto p3 surrounding grid points. It computes level 1 charges qn1, nM1, using Eq. (28): for each particle i, it adds nonzero φn1(ri)qi to qn1.

The evaluation of the B-spline basis functions is required for both this step and the interpolation step, with the latter also requiring a gradient evaluation. For each particle i, one expresses ri = (m + u) h1 where m is an integer triple and 0 ≤ uκ < 1 for κ = x, y, z. Values of φm+k1(ri) and φm+k1(ri) are nonzero only for 1 − p/2 ≤ |k|p/2. These values are obtained from

φmκ+k1(ri)=Φ(uκk)=Qp(uκ+(p/2)k),k=1p/2,2p/2,,p/2,

and its first derivative. To get B-spline values and first derivatives, apply Horner’s rule to each piece of the B-spline.

c. Restriction.

The restriction step loops over the coarser of two consecutive grids, collecting charge for each point on the coarser grid from nearby points on the finer grid using a fixed stencil of coefficients. Combining Eqs. (29) and (30) gives the following: for l = 2, 3, …, L, compute higher level grid charges using

qml=nJn2mqnl1,formMl,

where the sum is over all nMl1 such that −p/2 ≤ |n − 2m|p/2 and where Jk is defined in Eq. (22). The computation is linear in the order p if the collecting is done in one coordinate direction at a time. The calculation involves the use of two intermediate grids with grid spacings that are doubled in one or two directions. The algorithm given in Ref. 24, Table 2.10 shows how to reduce the required intermediate buffer space to just O(N2/3).

d. Grid to grid mapping and prolongation.

Combining Eqs. (31) and (20) gives the following: compute electric potentials for the top grid level

emL+=2LanKmnLqnL,mML,

where the sum is over all nML. Combining Eqs. (32) and (20)/(30) gives the following: compute accumulated electric potentials for each lower grid level l = L − 1, L − 2, …, 1 using the recurrence

eml+=2lanKmnlqnl+nJn2men(l+1)+,mMl.

The first sum is over all nMl such that |mn| < 2α. The second sum is the prolongation step, which loops over the coarser grid, distributing electric potential from each grid point to nearby points on the fine grid according to some fixed stencil. Specifically, the outer loop is over all nMl+1 such that −p/2 ≤ |n − 2m|p/2. Just as for restriction, it is linear in the order p.

e. Interpolation.

The interpolation step loops over all particles, and for each particle interpolates the electric potential and electric field from nearby grid points. Specifically, for each particle i, add 12qimφm1(ri)em1+ to the energy U, and add qimφm1(ri)em1+ to the ith force Fi. This requires evaluation of p B-spline basis functions along each dimension.

B. Performance analysis

1. Choosing optimal grid size

Ref. 13 analyzes the effect of grid size h and cutoff a on computational cost for a desired error tolerance and a specified order p. The error and cost depend on the ratios h/h* and h/a = 1/α where h* = (volume(Ω)/N)1/3, which is a measure of average distance between nearest neighbors. It is shown that the optimal ratio h/h* is a value near 1 that is practically independent of the desired accuracy, so it is the ratio h/a that should be varied to control accuracy. This conclusion is confirmed in Ref. 24 with the benefit of the more detailed cost analysis.

To keep the operation count linear in N, it is enough to choose the number of levels L just large enough that the number of grid points at level L does not exceed N. At the same time, it is disadvantageous to reduce the number of grid points below (2α)3, because this would incur a greater operation count than choosing L to be one less.

2. Choosing optimal degree and cutoff

The optimal degree for spline interpolation is determined empirically, using the sphere of 10 002 water molecules from Section II D. CPU time and relative mass-weighted root-mean-square error in the forces ((i=1Nmi1|F˜iFi|2)/(i=1Nmi1|Fi|2))1/2 are computed for orders p = 4, 6, 8, 10 and relative cutoffs α = 2.4, 2.8, …, 8 (cutoff distances 6 Å ≤ a ≤ 20 Å with grid spacing 2.5 Å) and Cp−1 softener. Results are shown in Figure 6. Based on these results, the following formula is constructed for choosing p for a given α:

p=the element of{4,6,8}nearest1.25α+0.25, (36)

i.e., p is chosen to be the middle value 6 for 3.8 < α < 5.4 and otherwise 4 or 8.

FIG. 6.

FIG. 6.

The MSM performance comparison for B-spline interpolation of degree 3, 5, 7, 9. Each line plot varies the cutoff distance 6, 7, …, 20 Å with grid spacing 2.5 Å.

A representative accuracy is, say, a 0.5% error in forces. The effect of a time step of 1 fs is to distort highest frequencies by 1.6%.

3. Comparison to FMM

The efficiency of a C++ implementation of the MSM is compared with a very recent C implementation26 of the Uniform FMM Laplace Solver of the FMM-Laplace library.33 (The FMM-Laplace library is built with Intel Cilk disabled to compel execution on a single CPU core.) The timing includes construction of the DAG (directed acyclic graph) as well as scaling the position input and force output, which would be part of the solver if used within an MD simulator. Both codes are compiled using the most recent version of the Intel C++ compiler (version 16.0.0) with flags to enable compiler optimization using the AVX2 (Advanced Vector Extensions 2) SIMD (single instruction, multiple data) instruction set. The MSM implementation arranges the grid points into 8-element clusters, so as to be able to make explicit use of the AVX2 instructions, benefiting especially from the supported FMA (fused multiply–add) instructions.4 Testing was performed using a single core of an Intel Haswell processor (Xeon E5-2680 v3). The grid to grid, restriction, and prolongation calculations are performed in single precision. This is never a problem because the optimal way to do calculations of higher accuracy is to increase the cutoff distance,13,24 thereby increasing the overall contribution of the short-range part while decreasing the contribution of the long-range part, which is the part being computed in single precision.

The test system is an equilibrated sphere of 10 002 water molecules (30 006 atoms) with radius 42 Å. For MSM, the finest grid spacing is fixed at h = 2.5 Å. The calculation and timings are repeated 10 times per data point to suitably “warm up” the hardware and compensate for any other background activities that the workstation might incidentally perform during testing, and the minimum time is reported.

The graph in Figure 7 shows, on the vertical axis, CPU time in seconds plotted against the relative mass-weighted root-mean-square error in the forces and the absolute error in the total potential energy. The MSM lines in the graphs are generated by varying the short-range cutoff distance, where the points correspond to relative cutoffs α = 2.4, 2.8, 3.2, …, 8 (a = 6, 7, 8, …, 20 Å), with the spline degree determined from the relative cutoff α using formula (36). Also shown are two data points for FMM based on setting the accuracy parameter to either 3 or 6 (digits of accuracy), which are the two possible values provided by the library. For each point, the CPU times and the errors have been averaged over 100 separated time steps, with error bars showing the standard deviation.

FIG. 7.

FIG. 7.

Comparing efficiency of the MSM with the FMM. The MSM varies the cutoff distance 6, 7, 8, …, 20 Å with grid spacing 2.5 Å and B-spline order determined from formula (36). The FMM varies the force accuracy from 3 to 6 digits. The left plot shows the relative error in force, and the right plot shows the absolute error in total potential energy. Each point is the average over 100 separated time steps, with error bars showing the standard deviation.

Figure 7 shows the MSM to be comparable in efficiency to the FMM. Approximating the force to a relative error of 0.5% should be sufficient for use with MD, provided that the approximation is continuous. However, FMM produces discontinuous forces, for which it is necessary to use high accuracy for stable dynamics,34 as confirmed below.

4. Stable dynamics

Stability of dynamics is investigated for MSM and FMM for a constant energy MD simulation of the sphere of 10 002 water molecules. The water model is TIP3P with rigid bonds and angles as intended by the CHARMM force field.35 A quartic restraining force is applied to model the surface tension of a 42 Å water sphere using the parameters suggested by the CHARMM documentation.36 Simulations are run using NAMD7 modified to include both the sequential MSM and FMM codes. The Verlet integrator is used with a 1 fs time step. The tested methods include the MSM using both cubic B-spline interpolation with cutoff distance 7 Å (α = 2.8) and quintic B-spline interpolation with cutoff distance 12 Å (α = 4.8) and the FMM using both the 3-digit (9 term expansion) and 6-digit (18 term expansion) accuracy options.

Figure 8 compares the total energy from two of the simulations for MSM quintic B-spline with that of FMM 6-digit accuracy for a 200 ps trajectory. Results of energy averages and standard deviations for four simulations are shown in Table II. Although the cubic B-spline MSM and 3-digit FMM calculations both produce forces within 0.5% accuracy, the MSM is stable, whereas the FMM is not. The 6-digit FMM calculation and the two MSM calculations are considered energy conserving, by the criterion that the standard deviation of total energy is within 20% of the standard deviations of kinetic and potential energies.37–39 However, the plot reveals less stability with 6-digit FMM than with MSM, and the FMM has larger standard deviation in total energy.

FIG. 8.

FIG. 8.

Total energy for a sphere of 10 002 water molecules for 200 ps constant energy MD simulations comparing the MSM with the FMM.

TABLE II.

The average μ and standard deviation σ of the total energy E, kinetic energy T, and potential energy U are compared for four simulations.

μE σE μT σT μU σU
FMM 3-digit −62 147.09 8931.82 23 702.65 3566.13 −85 849.74 5374.94
FMM 6-digit −77 270.85 0.32 17 998.10 91.27 −95 268.95 91.39
MSM cubic B-spline −76 229.99 0.18 17 899.09 91.32 −94 129.08 91.43
MSM quintic B-spline −77 270.31 0.18 17 997.73 91.58 −95 268.04 91.68

5. Scaling with problem size

Figure 9 shows the MSM code scaling linearly with system size on a single processor. For the experiments, several water spheres are created, each roughly twice the number of water molecules from the previous one, up to almost 2 × 106 atoms. The error is checked against that of the FMM code to ensure that MSM is consistently providing about six digits of accuracy in the energy.

FIG. 9.

FIG. 9.

Linear scaling of the MSM with system size on a single processor.

IV. DISCUSSION

A. Comparison to other N-body methods

A fast N-body method constructs an approximation that facilitates computation by transferring the problem to a mesh/lattice/grid (still having O(N) points). Fast N-body methods can be classified as 2-level or multilevel. Two-level methods, such as PME and P3M, exploit the translation invariance k(r, r′) = κ(rr′) of most kernels to perform calculations of mesh–mesh interactions using an FFT. Multilevel methods, such as tree methods, the FMM, and the MSM, typically assume that the kernel k(r, r′) becomes more slowly varying as |rr′| → ∞ and exploit this property to do an approximation, using O(logN) levels of cells or grids. (Oscillatory kernels can also be handled, e.g., Ref. 40, but with greater complication.) Multilevel methods effect a separation of length scales, in which an increase in the range of the interactions is balanced by a commensurate decrease in the number of interacting cells or grid points. Both classes of methods obtain O(NlogN) operation counts by means of a separable approximation of the kernel. Variants of multilevel methods use nesting to reduce the cost to O(N). An alternative classification is to distinguish “kernel-splitting methods” (KSMs), such as PME, P3M, and MSM, from HCMs, such as tree methods, the FMM, and hierarchical matrix methods.41 This is discussed below in greater detail. Not discussed here are “indirect” methods based on solving an equivalent elliptic partial differential equation, e.g., the Poisson equation. Such approaches can be attributed to a historical accident, namely, the prior development of fast Poisson solvers.

1. Kernel-splitting methods

Kernel-splitting methods are of two kinds. FFT-based 2-level methods do a single splitting. It is typical to use erf(βr) for the long-range part, where β is a parameter chosen to optimize performance. With this choice, the short-range part decays very rapidly, and its cutoff distance a is set to a small multiple of 1/β. Such methods interpolate the long-range part from a grid in either real or reciprocal space to permit the use of a 3-dimensional FFT. This is an O(NlogN) algorithm on account of its use of the FFT. In contrast, the MSM is an O(N) method that differs from 2-level methods in that it performs the calculation using a set of nested grids of increasing coarseness instead of just a single fine grid. For a 2-level method, the array of basis function coefficients is K1+=K1, and 12(q1)TK1q1 is the long-range energy, including excluded terms. Interpolation in real space produces an operator K1 that is a nonperiodic convolution, so the long-range energy is computable by a 3-dimensional FFT—if the dimension of the matrix is increased 8-fold.42 Interpolation in reciprocal space is more involved (Ref. 43, Sec. 4.3). We consider the common kernel k1(r, r′) = κ1(rr′). The long-range part is modified so that it decays rapidly beyond the diameter ℓ of the cluster of particles and then the cluster is replicated periodically with lattice spacing ¯>2. The result is a kernel κ1p(r) that has a rapidly converging Fourier series with analytically tractable coefficients that are inexpensive to evaluate. This involves making an error comparable to that of truncating the short-range part at a distance a, if ¯ exceeds 2ℓ by roughly 2a. The kernel κ1p(rr′) is approximated by an interpolated truncated Fourier series,1 giving

12(q1)TK1q1=12(Fq1¯)TDFq1, (37)

where F is implemented as a 3-dimensional FFT and D is a diagonal matrix. This is PME, a variant of an older approach, viz., P3M. In this case the domain has been enlarged more than 8-fold. Also, here the FFT is used not only to exploit the convolution property but to perform low-pass filtering as well.

The use of an FFT has the advantage that all the approximation errors are incurred on the finest grid; whereas, the use of multiple levels almost doubles the energy error and increases by a factor 4/3 the force error. To compensate for the loss of accuracy in the energy would require an increase in the cutoff a by a factor 21/(p+1). The need for a longer cutoff for the MSM is balanced by the fact that an FFT delivers its best performance only for grid dimensions that are powers of 2. Moreover, multilevel methods have several advantages over FFT-based 2-level methods: (i) For FFT-based methods, there are moderate difficulties and major inefficiencies handling nonperiodic boundaries. Specifically, each nonperiodic direction requires a cell dimension at least double the system dimension, which affects the FFT times.43 Indeed, full electrostatics is rarely used in the case of nonperiodic boundary conditions due partly to the high cost. (ii) Another drawback is the difficulty of parallelizing the FFT in 3 dimensions. As a result, PME does not scale as easily as the MSM to large numbers of processors.19 Indeed, the development of massively parallel computers is reviving interest in the use of FMM for MD.44,45 Even with a single CPU node, the inherently non-local data access patterns of FFTs are less efficient than calculations with the wide 3-dimensional stencils of the direct gridpoint-to-gridpoint interactions of the MSM. (iii) Finally, FFTs cannot exploit situations where adaptive grids might be profitable, such as implicit solvent models; this would be possible with the MSM.

Force approximations obtained from KSMs violate Newton’s third law unlike those obtained from HCMs. As a consequence, linear momentum fails to be conserved. Although not serious, this can be inconvenient. The usual remedy is to replace Fi by Fi(1/N)jFj, but this yields nonconservative forces and significant energy drift. However, such energy drift is avoided and linear momentum is conserved if the mass-weighted correction Fi(mi/mtot)jFj is used instead.46

2. Hierarchical clustering methods

Hierarchical clustering methods employ an oct-tree decomposition of space to partition the set of all particle pairs into pairs of particle clusters where the size of two clusters in a pair increases with separation distance. Interactions at the bottom level between small nearby clusters are computed directly. All other cluster pairs are, by construction, well separated and the interactions between particle pairs from a given pair of clusters are based on a (polynomial) approximation for the kernel k(r, r′) particular to that cluster pair. Truncated Taylor expansions are used in practice. For |rr′|−1, each such Taylor polynomial is harmonic and can be expressed in terms of p2 spherical harmonics, where p − 1 is the degree of the polynomial, instead of 16p3+12p2+13p monomials. (The coefficients of these expansions are multipoles.) At the most basic level there is a close relationship between the multilevel summation method and hierarchical clustering methods and how they achieve their good scaling as a function of N. In particular, the fast multipole and related algorithms47,48 have the same structure as the algorithm of Section III A 1. Many of the techniques for one class of methods transfer to the other. But there is one fundamental difference: hierarchical clustering algorithms do not split the interaction kernel—each interaction is present in only one of the terms of the multilevel sum. For more detailed information on HCMs or 2-level methods, Ref. 49 is recommended.

The main advantage of HCMs is that they can exploit special properties of the kernel. In particular, the harmonicity of the 1/r kernel can be used to reduce the number of terms in an order 4/6/8 approximation by factors of 1.25/1.56/1.88, respectively. However, these savings apply only to special kernels. Moreover, the disadvantages are significant: (i) An HCM produces an approximation to k(r, r′) that is discontinuous as a function of particle positions r, r′. This feature is intrinsic to HCMs because the shortest range interaction is not a polynomial and cannot be continuously matched to a longer range polynomial approximation. In contrast, KSMs can attain any degree of continuity. Lack of continuity is problematic for dynamics and minimization and disastrous for Hamiltonian dynamics13,50 (which requires bounded Hessians to conserve total energy). Hence, HCMs may not be usable unless high accuracy is desired. This is the observation of Section III B. By contrast, all of the points plotted for multilevel summation conserve energy well. This is consistent with past experiments13,43 showing that HCMs perform poorly compared to other methods for MD electrostatics. Indeed, HCMs are seldom used for MD, and a fairly recent review51 does not mention them. (ii) From an implementation point of view, HCMs are more complicated due to the need for a list of pairs of interacting oct-tree cells, not to mention the (possible) use of special polynomials, such as spherical harmonics, that exploit properties of the kernel. This complexity not only makes it more challenging to utilize the capabilities of new computer architectures but makes it difficult to integrate the method into an application. (iii) To reduce the cost of computing forces, it is beneficial to evaluate slowly varying interactions less often than those that vary most rapidly and a good way to do this is multiple time stepping (or subcycling). Within each (outer) step, each interaction is integrated with multiple substeps of a size that is some fraction of the overall step size. For Hamiltonian systems, a fixed multiplicity must be chosen for each interaction. (Extensive experience indicates that failure to use exactly the same symplectic map for each outer time step results in a secular drift in energy.) For a nonbonded interaction, the appropriate multiplicity will vary, depending on the distance between the two particles. However, kernel-splitting methods provide a partitioning of the interaction, each part of which can be integrated with its own fixed multiplicity, at the same time exploiting the finite ranges of all but the highest-level part. The FMM does not provide a splitting of the interactions.

Acknowledgments

The work of D.J.H. and K.S. is supported by NSF Grant No. CHE090957273 and NIH Grant No. 9P41GM104601. That of D.J.H. is also supported by NSF Grant No. CCF08-30582 and a Computational Science and Engineering Fellowship (University of Illinois). The work of M.A.W., J.X., and R.D.S. is supported by NSF Grant No. CHE09-57024. That of R.D.S. is also supported by NSF Grant No. CCF08-30582. Additionally, the authors are grateful to B. Zhang and J. Huang for sharing their FMM code.

APPENDIX A: FUNDAMENTAL SPLINE COEFFICIENTS

To compute the coefficients ωn of Section II B 2, proceed as follows: Let ψ be the bi-infinite sequence of all zeros except for ψ0 = 1, and let ω be the sequence whose nth term is ωn. Applying Eq. (10) to ψ gives B1ψ=Rω where R is the reversing operator. From this, follows

BRω=ψ,

which is an infinite banded system of linear equations. To solve this system, first obtain a Cholesky factorization B=GGT where G=g0+g1E1++g(p/2)1E1p/2. Note that GT=RGR. To calculate the coefficients, create a semi-infinite banded matrix whose elements on the nth diagonal are Φ(n) and apply the Cholesky algorithm until there is no change from one row to the row that follows it. This converged row is the generic row of the operator G. Using GT=RGR, the solution of BRω=ψ can be broken into two steps:

Gξ=ψ,Gω=Rξ.

Choosing ξn = 0 for n < 0, the value of ξ0 is simply 1/g0. Exploiting symmetry ω=Rω, solve a system of p/2 linear equations to obtain ωn, n = 0, 1, …, (p/2) − 1, and then use forward substitution to obtain ωn, n = p/2, (p/2) + 1, …. This process appears similar to one given in Ref. 52.

APPENDIX B: COEFFICIENTS OF THE BLURRING OPERATOR

It is shown below, using Ref. 25, Eq. (1.6) on p. 22, that the degree (p/2) − 1 polynomial Bp/2 of Section II B 4 is obtained from the following Maclaurin expansion:

ssinh(sz)s2+2(1cosh(sz))=p=2pevenBp/2(s2)zp1. (B1)

In particular,

ssinh(sz)s2+2(1cosh(sz))=z1+s26z2+s4120z4+O(z6)1z2s212z4+O(z6)1=z1+s26z2+s4120z4+O(z6)1+z2+(1+s212)z4+O(z6)=z+1+s26z3+1+s24+s4120z5+O(z7).

1. Derivation of Eq. (B1)

From Eqs. (1.6) and (1.7) of Ref. 25, p. 22,

t1tez=1+n=1zn(t1)nj=0n1Qn+1(j+1)tj.

Taking the odd part w.r.t. z gives

(t1)sinhzt2+12tcoshz=p=2pevenzp1(t1)p1j=0p2Qp(j+1)tj=p=2pevenzp1(t1)p1m=1p/2(p/2)1Qp((p/2)+m)tm+(p/2)1.

Multiplying by t/(t − 1) gives

sinhzt+t12coshz=p=2pevenzp1(t2+t1)p/2m=1p/2(p/2)1Qp((p/2)+m)tm.

Let t=1+s1+s2/4+s2/2. The second summation can be shown to be a polynomial of degree (p/2) − 1 in s2, denoted by Bp/2(s2). Also, t − 2 + t−1 = s2. Replacing z by sz yields the stated result.

APPENDIX C: COEFFICIENTS OF THE ANTI-BLURRING OPERATOR

To get the expansion (12) of Section II B 4, write

B(δ2)2(b0+b1δ2++b(p/2)1δp2)=1δpC(δ2), (C1)

where C is a polynomial of degree p − 3. For example, for p = 4,

B2(113δ2)=1δ4(112+1108δ2)

and for p = 6,

B2(112δ2+41240δ4)=1δ6(12022119200δ21319200δ4413456000δ6).

Comparing (C1) with (12), one sees that

B2mcmEm=C(δ2).

This is an infinite system of linear equations to solve for the coefficients cm similar to that for the ωm but with 2(p − 3) additional nonzero values on the right-hand side and with B2 in place of B. The symmetry of Φ(n) implies that RBR=B. Hence,

GGT=RGRRGTR=GTG,

whence

B2=G2G2T,

where

G2=G2.

The algorithm from Appendix A can be applied by solving G2ξ=C(δ2)ψ and G2c=Rξ where c is the symmetric sequence of unknowns cn, taking care to calculate ξn, 3 − pn ≤ 0.

REFERENCES

  • 1.Essmann U., Perera L., Berkowitz M. L., Darden T., Lee H., and Pederson L., J. Chem. Phys. 103, 8577 (1995). 10.1063/1.470117 [DOI] [Google Scholar]
  • 2.Hockney R. W. and Eastwood J. W., Computer Simulation Using Particles (McGraw-Hill, New York, 1981). [Google Scholar]
  • 3.Brandt A. and Lubrecht A. A., J. Comput. Phys. 90, 348 (1990). 10.1016/0021-9991(90)90171-V [DOI] [Google Scholar]
  • 4.Hardy D. J., Wu Z., Phillips J. C., Stone J. E., Skeel R. D., and Schulten K., J. Chem. Theory Comput. 11, 766 (2014). 10.1021/ct5009075 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Bowers K. J., Chow E., Xu H., Dror R. O., Eastwood M. P., Gregersen B. A., Klepeis J. L., Kolossváry I., Moraes M. A., Sacerdoti F. D., Salmon J. K., Shan Y., and Shaw D. E., in Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC06) (ACM Press, New York, 2006). [Google Scholar]
  • 6.Friedrichs M. S., Eastman P., Vaidyanathan V., Houston M., LeGrand S., Beberg A. L., Ensign D. L., Bruns C. M., and Pande V. S., J. Comput. Chem. 30, 864 (2009). 10.1002/jcc.21209 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Phillips J. C., Braun R., Wang W., Gumbart J., Tajkhorshid E., Villa E., Chipot C., Skeel R. D., Kalé L., and Schulten K., J. Comput. Chem. 26, 1781 (2005). 10.1002/jcc.20289 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Hess B., Kutzner C., van der Spoel D., and Lindahl E., J. Chem. Theory Comput. 4, 435 (2008). 10.1021/ct700301q [DOI] [PubMed] [Google Scholar]
  • 9.Greengard L. and Rokhlin V., Acta Numer. 6, 229 (1997). 10.1017/S0962492900002725 [DOI] [Google Scholar]
  • 10.Beglov D. and Roux B., J. Chem. Phys. 100, 9050 (1994). 10.1063/1.466711 [DOI] [Google Scholar]
  • 11.Im W., Berneche S., and Roux B., J. Chem. Phys. 114, 2924 (2001). 10.1063/1.1336570 [DOI] [Google Scholar]
  • 12.Sandak B., J. Comput. Chem 22, 717 (2001). 10.1002/jcc.1039 [DOI] [Google Scholar]
  • 13.Skeel R. D., Tezcan I., and Hardy D. J., J. Comput. Chem. 23, 673 (2002). 10.1002/jcc.10072 [DOI] [PubMed] [Google Scholar]
  • 14.Livne O. E. and Brandt A., SIAM J. Matrix Anal. Appl. 24, 439 (2002). 10.1137/S0895479801383695 [DOI] [Google Scholar]
  • 15.Lee M. S., Salsbury J. F. R., and Olson M. A., J. Comput. Chem. 25, 1967 (2004). 10.1002/jcc.20119 [DOI] [PubMed] [Google Scholar]
  • 16.Shin S., Zöller G., Holschneider M., and Reich S., Comput. Geosci. 37, 1075 (2011). 10.1016/j.cageo.2010.11.011 [DOI] [Google Scholar]
  • 17.Suwan I., Brandt A., and Ilyin V., J. Math. Stat. 8, 361 (2012). 10.3844/jmssp.2012.361.372 [DOI] [Google Scholar]
  • 18.Tameling D., Springer P., Bientinesi P., and Ismail A. E., J. Chem. Phys. 140, 024105 (2014). 10.1063/1.4857735 [DOI] [PubMed] [Google Scholar]
  • 19.Izaguirre J. A., Hampton S. S., and Matthey T., J. Parallel Distrib. Comput. 65, 949 (2005). 10.1016/j.jpdc.2005.03.006 [DOI] [Google Scholar]
  • 20.Plimpton S., J. Comput. Phys. 117, 1 (1995). 10.1006/jcph.1995.1039 [DOI] [Google Scholar]
  • 21.Hardy D. J., Stone J. E., and Schulten K., Parallel Comput. 35, 164 (2009). 10.1016/j.parco.2008.12.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Humphrey W., Dalke A., and Schulten K., J. Mol. Graphics 14, 33 (1996). 10.1016/0263-7855(96)00018-5 [DOI] [PubMed] [Google Scholar]
  • 23.Moore S. G. and Crozier P. S., J. Chem. Phys. 140, 234112 (2014). 10.1063/1.4883695 [DOI] [PubMed] [Google Scholar]
  • 24.Hardy D. J., “Multilevel summation for the fast evaluation of forces for the simulation of biomolecules,” Ph.D. thesis, University of Illinois at Urbana-Champaign; (2006), http://hdl.handle.net/2142/11173. [Google Scholar]
  • 25.Schoenberg I. J., Cardinal Spline Interpolation, CBMS-NSF Regional Conference Series in Applied Mathematics Vol. 12 (Society for Industrial and Applied Mathematics, Philadelphia, 1973). [Google Scholar]
  • 26.Zhang B., personal communication (2015).
  • 27.Brandt A. and Venner C., SIAM J. Sci. Comput. 19, 468 (1998). 10.1137/S106482759528555X [DOI] [Google Scholar]
  • 28.Chui C. K., An Introduction to Wavelets (Academic Press, 1992). [Google Scholar]
  • 29.Reimer M., Numer. Math. 44, 417 (1984). 10.1007/BF01405572 [DOI] [Google Scholar]
  • 30.Richards F., J. Approx. Theory 14, 83 (1975). 10.1016/0021-9045(75)90080-5 [DOI] [Google Scholar]
  • 31.Meinardus G., J. Approx. Theory 16, 289 (1976). 10.1016/0021-9045(76)90060-5 [DOI] [Google Scholar]
  • 32.Roe D. R., RMSD analysis in CPPTRAJ, 2014, http://ambermd.org/tutorials/analysis/tutorial1/.
  • 33.Huang J., Jia J., Zhang B., Lu B.-Z., and Cheng X., FMMLAP-uni: Uniform FMM Laplace solver, 2010, available online from http://fastmultipole.org/Main/FMMSuite/.
  • 34.Bishop T. C., Skeel R. D., and Schulten K., J. Comput. Chem. 18, 1785 (1997) . [DOI]
  • 35.Brooks B. R., Brooks C. L. III, MacKerell A. D., Nilsson L., Petrella R. J., Roux B., Won Y., Archontis G., Bartels C., Boresch S., Caflisch A., Caves L., Cui Q., Dinner A. R., Feig M., Fischer S., Gao J., Hodoscek M., Im W., Kuczera K., Lazaridis T., Ma J., Ovchinnikov V., Paci E., Pastor R. W., Post C. B., Pu J. Z., Schaefer M., Tidor B., Venable R. M., Woodcock H. L., Wu X., Yang W., York D. M., and Karplus M., J. Comput. Chem. 30, 1545 (2009). 10.1002/jcc.21287 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Venable R., CHARMM documentation for c37b1, 2012, http://www.charmm.org/charmm/documentation/by-version/c37b1/params/doc/mmfp/.
  • 37.van Gunsteren W. F. and Berendsen H. J. C., Mol. Phys. 34, 1311 (1977). 10.1080/00268977700102571 [DOI] [Google Scholar]
  • 38.Berendsen H. J. C. and van Gunsteren W. F., Molecular Liquids (Springer, 1984), pp. 475–500. [Google Scholar]
  • 39.Berendsen H. J. C. and van Gunsteren W. F., “Molecular dynamics simulation of statistical mechanical systems,” in Proceedings of the International School of Physics, “Enrico Fermi,” edited by Ciccotti G. C. and Hoover W. G. (North-Holland, Amsterdam, 1986), Vol. 97, pp. 43–65. [Google Scholar]
  • 40.Brandt A., Comput. Phys. Commun. 65, 24 (1991). 10.1016/0010-4655(91)90151-A [DOI] [Google Scholar]
  • 41.Hackbusch W., Hierarchische Matrizen: Algorithmen und Analysis (Springer, 2009). [Google Scholar]
  • 42.Neelov A., Ghasemi S. A., and Goedecker S., J. Chem. Phys. 127, 024109 (2007). 10.1063/1.2746328 [DOI] [PubMed] [Google Scholar]
  • 43.Pollock E. L. and Glosli J., Comput. Phys. Commun. 95, 93 (1996). 10.1016/0010-4655(96)00043-4 [DOI] [Google Scholar]
  • 44.Andoh Y., Yoshii N., Fujimoto K., Mizutani K., Kojima H., Yamada A., Okazaki S., Kawaguchi K., Nagao H., Iwahashi K., Mizutani F., Minami K., ichi Ichikawa S., Komatsu H., Ishizuki S., Takeda Y., and Fukushima M., J. Chem. Theory Comput. 9, 3201 (2013). 10.1021/ct400203a [DOI] [PubMed] [Google Scholar]
  • 45.Ohno Y., Yokota R., Koyama H., Morimoto G., Hasegawa A., Masumoto G., Okimoto N., Hirano Y., Ibeid H., Narumi T., and Taiji M., Comput. Phys. Commun. 185, 2575 (2014). 10.1016/j.cpc.2014.06.004 [DOI] [Google Scholar]
  • 46.Skeel R. D., Hardy D. J., and Phillips J. C., J. Comput. Phys. 225, 1 (2007). 10.1016/j.jcp.2007.03.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Greengard L. and Rokhlin V., J. Comput. Phys. 73, 325 (1987). 10.1016/0021-9991(87)90140-9 [DOI] [Google Scholar]
  • 48.Duan Z.-H. and Krasny R., J. Comput. Chem. 22, 184 (2001) . [DOI]
  • 49.Griebel M., Knapek S., and Zumbusch G., Numerical Simulation in Molecular Dynamics (Springer, Berlin, Heidelberg, 2007). [Google Scholar]
  • 50.Biesiadecki J. J. and Skeel R. D., J. Comput. Phys. 109, 318 (1993). 10.1006/jcph.1993.1220 [DOI] [Google Scholar]
  • 51.Koehl P., Curr. Opin. Struct. Biol. 16, 142 (2006). 10.1016/j.sbi.2006.03.001 [DOI] [PubMed] [Google Scholar]
  • 52.Lai M.-J., Math. Comput. 63, 689 (1994). 10.1090/S0025-5718-1994-1248971-3 [DOI] [Google Scholar]

Articles from The Journal of Chemical Physics are provided here courtesy of American Institute of Physics

RESOURCES