Skip to main content
NIST Author Manuscripts logoLink to NIST Author Manuscripts
. Author manuscript; available in PMC: 2020 Oct 20.
Published in final edited form as: Appl Comput Harmon Anal. 2020;49(3):https://doi.org/10.1016/j.acha.2020.06.002.

A FAST SIMPLE ALGORITHM FOR COMPUTING THE POTENTIAL OF CHARGES ON A LINE

ZYDRUNAS GIMBUTAS 1, NICHOLAS F MARSHALL 2, VLADIMIR ROKHLIN 3
PMCID: PMC7574583  NIHMSID: NIHMS1629758  PMID: 33088166

Abstract

We present a fast method for evaluating expressions of the form

uj=i=1,ijnαixixj,forj=1,,n,

where αi are real numbers, and xi are points in a compact interval of R. This expression can be viewed as representing the electrostatic potential generated by charges on a line in R3. While fast algorithms for computing the electrostatic potential of general distributions of charges in R3 exist, in a number of situations in computational physics it is useful to have a simple and extremely fast method for evaluating the potential of charges on a line; we present such a method in this paper, and report numerical results for several examples.

2010 Mathematics Subject Classification: 31C20 (primary) and 41A55, 41A50 (secondary)

Keywords: Fast multipole method, Chebyshev system, generalized Gaussian quadrature

1. Introduction and motivation

1.1. Introduction.

In this paper, we describe a simple fast algorithm for evaluating expressions of the form

uj=i=1,ijnαixixj,forj=1,,n, (1)

where αi are real numbers, and xi are points in a compact interval of R. This expression can be viewed as representing the electrostatic potential generated by charges on a line in R3. We remark that fast algorithms for computing the electrostatic potential generated by general distributions of charges in R3 exist, see for example the Fast Multipole Method [9] whose relation to the method presented in this paper is discussed in §1.2. However, in a number of situations in computational physics it is useful to have a simple and extremely fast method for evaluating the potential of charges on a line; we present such a method in this paper. Under mild assumptions the presented method involves O(nlogn) operations and has a small constant. The method is based on writing the potential 1/r as

1r=0ertdt.

We show that there exists a small set of quadrature nodes t1, … , tm and weights w1, … , wm such that for a large range of values of r we have

1rj=1mwjertj, (2)

see Lemma 4.5, which is a quantitative version of (2). Numerically the nodes t1, … , tm and weights w1, … , wm are computed using a procedure for constructing generalized Gaussian quadratures, see §5.2. An advantage of representing 1/r as a sum of exponentials is that the translation operator

1r1r+r (3)

can be computed by taking an inner product of the weights (w1 , … , wm) with a diagonal transformation of the vector (ert1 , … , ertm). Indeed, we have

1r+rj=1mwje(r+r)tj=j=1mwjertjertj. (4)

The algorithm described in §3 leverages the existence of this diagonal translation operator to efficiently evaluate (1).

1.2. Relation to past work.

We emphasize that fast algorithms for computing the potential generated by arbitrary distributions of charges in R3 exist. An example of such an algorithm is the Fast Multipole Method that was introduced by [9] and has been extended by several authors including [7, 10, 16]. In this paper, we present a simple scheme for the special case where the charges are on a line, which occurs in a number of numerical calcuations, see 1.3. The presented scheme has a much smaller runtime constant compared to general methods, and is based on the diagonal form (4) of the translation operator (3). The idea of using the diagonal form of this translation operator to accelerate numerical computations has been studied by several authors; in particular, the diagonal form is used in algorithms by Dutt, Gu and Rokhlin [6], and Yavin and Rokhlin [22] and was subsequently studied in detail by Beylkin and Monzón [1, 2].

The current paper improves upon these past works by taking advantage of robust generalized Gaussian quadrature codes [4] that were not previously available; these codes construct a quadrature rule that is exact for functions in the linear span of a given Chebyshev system, and can be viewed as a constructive version of Lemma 4.2 of Kreĭn [13]. The resulting fast algorithm presented in §3 simplifies past approaches, and has a small runtime constant; in particular, its computational cost is similar to the computational cost of 5-10 Fast Fourier Transforms on data of a similar length, see 5.

1.3. Motivation.

Expressions of the form (1) appear in a number of situations in computational physics. In particular, such expressions arise in connection with the Hilbert Transform

Hf(x)=limε01πxyεf(y)yxdy.

For example, the computation of the projection Pmf of a function f onto the first m + 1 functions in a family of orthogonal polynomials can be reduced to an expression of the form (1) by using the Christoffel–Darboux formula, which is related to the Hilbert transform; we detail the reduction of Pmf to an expression of the form (1) in the following.

Let {pk}k=0 be a family of monic polynomials that are orthogonal with respect to the weight w(x) ≥ 0 on (a,b)R. Consider the projection operator

Pmf(x)abk=0mpk(x)pk(y)hkf(y)w(y)dy,

where hkabpk(x)2w(x)dx. Let x1 , … , xn and w1 , … , wn be the n > m/2 point Gaussian quadrature nodes and weights associated with {pk}k=0, and set

uji=1nk=0mpk(xj)pk(xi)hkf(xi)w(xi),forj=1,,n. (5)

By construction the polynomial that interpolates the values u1 , … , un at the points x1 , … , xn will accurately approximate Pmf on (a, b) when f is sufficiently smooth, see for example §7.4.6 of Dahlquist and Björck [5]. Directly evaluating (5) would require Ω(n2) operations. In contrast, the algorithm of this paper together with the Christoffel–Darboux Formula can be used to evaluate (5) in O(nlogn) operations. The Christoffel-Darboux formula states that

k=0mpk(x)pk(y)hk=1hmpm+1(x)pm(y)pm(x)pm+1(y)xy, (6)

see §18.2(v) of [17]. Using (6) to rewrite (5) yields

uj=1hm(f(xj)+i=1,ijmpm+1(xj)pm(xi)pm(xj)pm+1(xi)xjxif(xi)w(xi)), (7)

where we have used the fact that the diagonal term of the double summation is equal to f(xj)/hm. The summation in (7) can be rearranged into two expressions of the form (1), and thus the method of this paper can be used to compute a representation of Pmf in O(nlogn) operations.

Remark 1.1. Analogs of the Christoffel–Darboux formula hold for many other families of functions; for example, if Jν(w) is a Bessel function of the first kind, then we have

k=12(ν+k)Jν+k(w)Jv+k(z)=wzwz(Jν+1(w)Jν(z)Jν(w)Jν+1(z)),

see [21]. This formula can be used to write a projection operator related to Bessel functions in an analogous form to (7), and the algorithm of this paper can be similarly applied

Remark 1.2. A simple modification of the algorithm presented in this paper can be used to evaluate more general expressions of the form

vj=i=1nαixiyj,forj=1,,m,

where x1 , … , xn are source points, and y1 , … , ym are target points. For simplicity, this paper focuses on the case where the source and target points are the same, which is the case in the projection application described above.

2. Main result

2.1. Main result.

Our principle analytical result is the following theorem, which provides precise accuracy and computational complexity guarantees for the algorithm presented in this paper, which is detailed in §3.

Theorem 2.1. Let x1 < … < xn ∈ [a, b] and α1 , … , αnR be given. Set

uji=1,ijnαixixj,forj=1,,n.

Given δ > 0 and ε > 0, the algorithm described in §3 computes values u~j such

u~juji=1nαiε,forj=1,,n (8)

in O(nlog(δ1)log(ε1)+Nδ) operations, where

Nδj=1n#{xi:xjxi<δ(ba)}. (9)

The proof of Theorem 2.1 is given in §4. Under typical conditions, the presented algorithm involves O(nlogn) operations. The following corollary describes a case of interest, where the points x1, … , xn are Chebyshev nodes for a compact interval [a, b] (we define Chebyshev nodes in §4.2).

Corollary 2.1. Fix ε = 10−15, and let the points x1 , … , xn be Chebyshev nodes on [a, b]. If δ = 1/n, then the algorithm of §3 involves O(nlogn) operations.

The proof of Corollary 2.1 is given in §4.4. The following corollary states that a similar result holds for uniformly random points.

Corollary 2.2. Fix ε = 10−15, and suppose that x1 , … , xn are sampled uniformly at random from [a, b]. If δ = 1/n, then the algorithm of §3 involves O(nlogn) operations with high probability.

The proof of Corollary 2.2 is immediate from standard probabilistic estimates. The following remark describes an adversarial configuration of points.

Remark 2.1. Fix ε > 0, and let x1 , … , x2n be a collection of points such that x1 , … , xn and xn+1, … , x2n are evenly spaced in [0, 2n] and [1 − 2n, 1], respectively, that is

xj=2n(j1n1),andxn+j=1+2n(jnn1),forj=1,,n.

We claim that Theorem 2.1 cannot guarantee a complexity better than O(n2) for this configuration of points. Indeed, if δ ≥ 2n, then Nδn2/2, and if δ < 2n, then log2(δ−1) > n. In either case

nlog(δ1)+Nδ=Ω(n2).

This complexity is indicative of the performance of the algorithm for this point configuration; the reason that the algorithm performs poorly is that structures exist at two different scales. If such a configuration were encountered in practice, it would be possible to modify the algorithm of §3 to also involve two different scales to achieve evaluation in O(nlogn) operations.

3. Algorithm

3.1. High level summary.

The algorithm involves passing over the points x1 , … , xn twice. First, we pass over the points in ascending order and compute

u~j+i=1j1αixixj,forj=1,,n, (10)

and second, we pass over the points in descending order and compute

u~ji=j+1nαixixj,forj=1,,n. (11)

Finally, we define u~ju~j++u~j for j = 1, … , n such that

u~ji=1,ijnαixixj,forj=1,,n.

We call the computation of u~1+, … , u~n+ the forward pass of the algorithm, and the computation of u~1, … , u~n+ the backward pass of the algorithm. The forward pass of the algorithm computes the potential generated by all points to the left of a given point, while the backward pass of the algorithm computes the potential generated by all points to the right of a given point. In §3.2 and §3.3 we give an informal and detailed description of the forward pass of the algorithm. The backward pass of the algorithm is identical except it considers the points in reverse order.

3.2. Informal description.

In the following, we give an informal description of the forward pass of the algorithm that computes

u~j+i=1j1αixixj,forj=1,,n.

Assume that a small set of nodes t1, … , tm and weights w1, … , wm such that

1ri=1mwiertiforr[δ(ba),ba], (12)

where δ > 0 is given and fixed. The existence and computation of such nodes and weights is described in §4.4 and §5.2. We divide the sum defining uj+ into two parts:

u~j+i=1j0αixixj+i=j0+1j1αixixj, (13)

where j0 = max {i : xixi > δ(ba)}. By definition, the points x1, … , xj0 are all distance at least δ(ba) from xj. Therefore, by (12)

u~j+i=1j0k=1mwkαie(xjxi)tk+i=j0+1j1αixixj.

If we define

gk(j0)=i=1j0αie(xj0xi)tk,fork=1,,m, (14)

then it is straightforward to verify that

u~j+k=1mwkgk(j0)e(xjxj0)tk+i=j0+1j1αixixj. (15)

Observe that we can update gk(j0) to gk(j0 + 1) using the following formula

gk(j0+1)=αj0+e(xj0+1xj0)tkgk(j0),fork=1,,m. (16)

We can now summarize the algorithm for computing u~1+, … , u~n+. For each j, we compute u~j+ by the following three steps:

  1. Update g1, … , gm as necessary

  2. Use g1, … , gm to evaluate the potential from xi such that xjxi > δ(ba)

  3. Directly evaluate the potential from xi such that 0 < xjxi < δ(ba)

By (16), each update of g1, … , gm requires O(m) operations, and we must update g1, … , gm at most n times, so we conclude that the total cost of the first step of the algorithm is O(nm) operations. For each j = 1, … , n, the second and third step of the algorithm involve O(m) and O(#{xi:0<xjxi<δ(ba)}) operations, respectively, see (15). It follows that the total cost of the second and third step of the algorithm is O(nm+Nδ) operations, where Nδ is defined in (9). We conclude that u~1+, … , u~n+ can be computed in O(nm+Nδ) operations. In §4, we complete the proof of the computational complexity guarantees of Theorem 2.1 by showing that there exist m=O(log(δ1)log(ε1)) nodes t1, … , tm and weights w1, … , wm that satisfy (12), where ε > 0 is the approximation error in (12).

3.3. Detailed description.

In the following, we give a detailed description of the forward pass of the algorithm that computes u~1+, … , u~n+. Suppose that δ > 0 and ε > 0 are given and fixed. We describe the algorithm under the assumption that we are given quadrature nodes t1, … , tm and weights w1, … , wm such that

1rj=1mwjertjεforr[δ(ba),ba]. (17)

The existence of such weights and nodes is established in §4.4, and the computation of such nodes and weights is discussed in §5.2. To simplify the description of the algorithm, we assume that x0 = −∞ is a placeholder node that does not generate a potential.

Algorithm 3.1. Input:x1<<xn[a,b],α1,,αnR.Output:u~1+,,u~n+.
1:j0=0andg1==gm=02:3:mainloop:4:forj=1,,n5:6:updateg1,,gmandj0:7:whilexjxj0+1>δ(ba)8:fori=1,,m9:gi=gie(xj0+1xj0)ti+αi10:endfor11:j0=j0+112:endwhile13:14:computepotentialfromxisuchthatxixj0:15:u~j+=016:fori=1,,m17:u~j+=u~j+wigie(xjxj0)ti18:endfor19:20:computepotentialformxisuchthatxj0+1xixj121:fori=j0+1,,j122:u~j+=u~j++αi(xixj).23:endfor24:endfor

Remark 3.1. In some applications, it may be necessary to evaluate an expression of the form (1) for many different weights α1, … , αn associated with a fixed set of points x1, … , xn. For example, in the projection application described in §1.3 the weights α1, … , αn correspond to a function that is being projected, while the points x1, … , xn are a fixed set of quadrature nodes. In such situations, pre-computing the exponentials e−(xjxj0)ti used in the Algorithm 3.1 will significantly improve the runtime, see §5.1.

4. Proof of Main Result

4.1. Organization.

In this section we complete the proof of Theorem 2.1; the section is organized as follows. In §4.2 we give mathematical preliminaries. In § 4.3 we state and prove two technical lemmas. In §4.4 we prove Lemma 4.5, which together with the analysis in §3 establishes Theorem 2.1. In §4.5 we prove Corollary 2.1, and Corollary 2.2.

4.2. Preliminaries.

Let a < bR and nZ>0 be fixed, and suppose that f:[a,b]R, and x1 < … < xn ∈ [a, b] are given. The interpolating polynomial P of the function f at x1, … , xn is the unique polynomial of degree at most n − 1 such that

P(xj)=f(xj),forj=1,,n.

This interpolating polynomial P can be explicitly defined by

P(x)=j=1nf(xj)qj(x), (18)

where qj is the nodal polynomial for xj, that is,

qj(x)=k=1,kjnxxkxjxk. (19)

We say x1, … , xn are Chebyshev nodes for the interval [a, b] if

xj=b+a2+ba2cos(πj12n),forj=1,,n. (20)

The following lemma is a classical result in approximation theory. It says that a smooth function on a compact interval is accurately approximated by the interpolating polynomial of the function at Chebyshev nodes, see for example §4.5.2 of Dahlquist and Björck [5].

Lemma 4.1. Let fCn([a, b]), and x1, … , xn be Chebyshev nodes for [a, b]. If P is the interpolating polynomial for f at x1, … , xn, then

supx[a,b]f(x)P(x)2Mn!(ba4)n,

where

M=supx[a,b]f(n)(x).

In addition to Lemma 4.1, we require a result about the existence of generalized Gaussian quadratures for Chebyshev systems. In 1866, Gauss [8] established the existence of quadrature nodes x1, … , xn and weights w1, … , wn for an interval [a, b] such that

abf(x)dx=j=1nwjf(xj),

whenever f(x) is a polynomial of degree at most 2n − 1. This result was generalized from polynomials to Chebyshev systems by Kreĭn [13]. A collection of functions f0, … , fn on [a, b] is a Chebyshev system if every nonzero generalized polynomial

g(t)=a0f0(t)++anfn(t),fora0,,anR,

has at most n distinct zeros in [a, b]. The following result of Kreĭn says that any function in the span of a Chebyshev system of 2n functions can be integrated exactly by a quadrature with n nodes and n weights.

Lemma 4.2 (Kreĭn [13]). Let f0, … , f2n−1 be a Chebyshev system of continuous functions on [a, b], and w : (a, b) → R be a continuous positive weight function. Then, there exists unique nodes x1, … , xn and weights w1, …, wn such that

abf(x)w(x)dx=j=1nwjf(xj),

whenever f is in the span of f0, … , f2n−1.

4.3. Technical Lemmas.

In this section, we state and prove two technical lemmas that are involved in the proof of Theorem 2.1. We remark that a similar version of Lemma 4.3 appears in [18].

Lemma 4.3. Fix a > 0 and t ∈ [0, ∞), and let r1, … , rn be Chebyshev nodes for [a, 2a]. If Pt(r) is the interpolating polynomial for ert at r1, … , rn, then

supr[a,2a]ertPt(r)14n.

Proof. We have

supr[a,2a]nrnert=supr[a,2a]tnert=tneta.

By writing the derivative of tneta as

ddttneta=(nat)atn1eat,

we can deduce that the maximum of tneta occurs at t = n/a, that is,

supt[0,)tneta=(na)nea(na). (21)

By (21) and the result of Lemma 4.1, we conclude that

supt[a,2a]ertPt(r)2(na)nea(na)n!(a4)n=2nnenn!14n.

It remains to show that 2nnenn!. Since ln(x) is a increasing function, we have

nlnnn+1=1nln(x)dx1nj=1n1χ[j,j+1](x)ln(j+1)dx=j=1nln(j).

Exponentiating both sides of this inequality gives ennenn!, which is a classical inequality related to Stirling’s approximation. This completes the proof. □

Lemma 4.4.Suppose that ε > 0 and M > 1 are given. Then, there exists

m=O(log(M)log(ε1))

values r1, …, rm ∈ [1, M] such that for all r ε [1, M] we have

supt[0,)ertj=1mcj(r)erjtε, (22)

for some choice of coefficients Cj(r) that depend on r.

Proof. We construct an explicit set of m := (⌊log2 M⌋ + 1)(⌊log4 ε−1⌋ + 1) points and coefficients such that (22) holds. Set n := ⌊log4 ε−1⌋ + 1. We define the points r1, … , rm by

rin+k2i1(3+cos(πk12n)), (23)

for k = 1, … , n and i = 0, …, ⌊log2 M⌋, and define the coefficients c1(r), … , cm(r) by

cin+k(r)χ[2i,2i+1)(r)l=1,lklog4ε1rrin+lrin+lrin+k, (24)

for k = 1, …, n and i = 0, … , ⌊log2 M⌋. We claim that

supr[1,M]supt[0,)ertj=1mcj(r)erjtε.

Indeed, fix r ∈ [1, M], and let i0 ∈ {0, … , ⌊log2 M⌋} be the unique integer such that r ∈ [2i0, 2i0+1). By the definition of the coefficients, see (24), we have

j=1mcj(r)erjt=k=1neri0n+ktl=1,lklog4ε1rri0n+lri0n+lri0n+k.

We claim that the right hand side of this equation is the interpolating polynomial Pt,i0 (r) for ert at ri0n+k, … , r(i0+1)n, that is,

k=1neri0n+ktl=1,lklog4ε1rri0n+lri0n+lri0n+k=Pt,i0(r).

Indeed, see (18) and (19). Since the points ri0n+k, … , r(i0+1)n are Chebyshev nodes for the interval [2i0, 2i0+1], and since i0 was chosen such that r ∈ [2i0, 2i0+1), it follows from Lemma 4.3 that

ertPt,i0(r)14nfort[0,).

Since n = ⌊log4 ε−1⌋ + 1 the proof is complete. □

Remark 4.1. The proof of Lemma 4.4 has the additional consequence that the coefficients c1(r), … , cm(r) in (22) can be chosen such that they satisfy

cj(r)2forj=1,,m.

Indeed, in (24) the coefficients Cj (r) are either equal zero or equal to the nodal polynomial, see (19), for Chebyshev nodes on an interval that contains r. The nodal polynomials for Chebyshev nodes on an interval [a, b] are bounded by 2 on [a, b], see for example [18]. The fact that ert can be approximated as a linear combination of functions er1t, … , ermt with small coefficients means that the approximation of Lemma 4.4 can be used in finite precision environments without any unexpected catastrophic cancellation.

4.4. Completing the proof of Theorem 2.1.

Previously in §3.2, we proved that the algorithm of §3 involves O(nm+Nδ) operations. To complete the proof of Theorem 2.1 it remains to show that there exists

m=O(log(ε1)log(δ1))

points t1, … , tm and weights w1, … , wm that satisfy (17); we show the existence of such nodes and weights in the following lemma, and thus complete the proof of Theorem 2.1. The computation of such nodes and weights is described in §5.2.

Lemma 4.5. Fix a < bR, and let δ > 0 and ε > 0 be given. Then, there exists m=O(log(ε1)log(δ1)) nodes t1, … , tm and weights w1, … , wm such that

1rj=1mwjertjε,forr[δ(ba),ba]. (25)

Proof. Fix a < bR, and let δ, ε > 0 be given. By the possibility of rescaling r, wj, and tj, we may assume that ba = δ−1 such that we want to establish (25) for r ∈ [1, δ−1]. By Lemma 4.4 we can choose 2m=O(log(ε1)log(δ1)) points r0, … , r2m−1 ∈ [1, δ−1], and coefficients c0(r), … , c2m−1(r) depending on r such that

supr[1,δ1]supt[0,)ertj=02m1cj(r)erjtε2log(2ε1). (26)

The collection of functions er0t, … , er2m−1t form a Chebyshev system of continuous functions on the interval [0, log(2ε−1)], see for example [12]. Thus, by Lemma 4.2 there exists m quadrature nodes t1, … , tm and weights w1, … , wm such that

0log(2ε1)f(t)dt=j=1mwjf(tj),

whenever f(t) is in the span of er0t, … , er2m−1t. By the triangle inequality

1rj=1mwjertj1r0log(2ε1)ertdt+0log(2ε1)ertdtj=1mwjertj. (27)

Recall that we have assumed r ∈ [1, δ−1], in particular, r ≥ 1 so it follows that

1r0log(2ε1)ertdtε2. (28)

By (26), the function ert can be approximated to error ε/(2log(2ε−1)) in the L-norm on [0, log(2ε−1)] by functions in the span of er0t, … , er2m−1t. Since our quadrature is exact for these functions, we conclude that

0log(2ε1)ertdtj=1mwjertjε2. (29)

Combining (27), (28), and (29) completes the proof. □

4.5. Proof of Corollary 2.1.

In this section, we prove Corollary 2.1, which states that the algorithm of §3 involves O(nlogn) operations when x1, … , xn are Chebyshev nodes, ε = 10−15, and δ = 1/n.

Proof of Corollary 2.1. By rescaling the problem we may assume that [a, b] = [−1, 1] such that the Chebyshev nodes x1, … , xn are given by

xj=cos(πj12n),forj=1,,n.

By the result of Theorem 2.1, it suffices to show that Nδ=O(nlogn), where

Nδj=1m#{xi:xjxi<1n}.

It is straightforward to verify that the number of Chebyshev nodes within an interval of radius 1/n around the point −1 < x < 1 is O(11x2), that is,

#{xi:xxi<1n}=O(11x2),for1<x<1.

This estimate, together with the fact that the first and last Chebyshev node are distance at least 1/n2 from 1 and −1, respectively, gives the estimate

j=1n#{xi:xjxi<1n}=O(1n2π1n2n1cos(t)2dt). (30)

Let π/2 > η > 0 be a fixed parameter; direct calculation yields

ηπη11cos(t)2dt=2log(cot(η2))=O(log(η1)).

Combining this estimate with (30) yields Nδ=O(nlogn) as was to be shown. □

5. Numerical results and implementation details

5.1. Numerical results.

We report numerical results for two different point distributions: uniformly random points in [1, 10], and Chebyshev nodes in [−1, 1]. In both cases, we choose the weights α1, … , αn uniformly at random from [0, 1], and test the algorithm for

n=1000×2kpoints,fork=0,,10.

We time two different versions of the algorithm: a standard implementation, and an implementation that uses precomputed exponentials. Precomputing exponentials may be advantageous in situations where the expression

uj=j=1nαixixj,forj=1,,n, (31)

must be evaluated for many different weights α1, … , αn associated with a fixed set of points x1, …, xn, see Remark 3.1. We find that using precomputed exponentials makes the algorithm approximately ten times faster, see Tables 1, 2, and 3. In addition to reporting timings, we report the absolute relative difference between the output of the algorithm of §3 and the output of direct evaluation; we define the absolute relative difference ϵr between the output u~j of the algorithm of §3 and the output ujd of direct calculation by

ϵrsupj=1,,nu~jujdu¯j,whereu¯ji=1nαixixj, (32)

Table 1.

Key for column labels of Tables 2, 3, and 4.

Label Definition
n number of points
tw time of algorithm of §3 without precomputation in seconds
tp time of precomputing exponentials for algorithm of §3 in seconds
tu time of algorithm of §3 using precomputed exponentials in seconds
td time of direct evaluation in seconds
ϵr maximum absolute relative difference defined in (32)
tf time of FFT using precomputed exponentials (for time comparison only)

Table 2.

Numerical results for uniformly random points in [1, 10].

n tw tp tu td ϵr
1000 0.74 E −03 0.18 E −02 0.93 E −04 0.66 E −03 0.19 E −14
2000 0.19 E −02 0.31 E −02 0.19 E −03 0.25 E −02 0.30 E −14
4000 0.42 E −02 0.61 E −02 0.43 E −03 0.10 E −01 0.52 E −14
8000 0.85 E −02 0.10 E −01 0.89 E −03 0.37 E −01 0.72 E −14
16000 0.18 E −01 0.25 E −01 0.18 E −02 0.14 E +00 0.92 E −14
32000 0.38 E −01 0.49 E −01 0.37 E −02 0.59 E +00 0.19 E −13
64000 0.84 E −01 0.98 E −01 0.78 E −02 0.23 E +01 0.21 E −13
128000 0.16E +00 0.19 E +00 0.18 E −01 0.95 E +01 0.35 E −13
256000 0.37 E +00 0.53 E +00 0.34 E −01 0.40 E +02 0.59 E −13
512000 0.75 E +00 0.10 E +01 0.71 E −01 0.19 E +03 0.88 E −13
1024000 0.17 E +01 0.23 E +01 0.15 E +00 0.81 E +03 0.14 E −12

Table 3.

Numerical results for Chebyshev nodes on [−1, 1].

n tw tp tu td ϵr
1000 0.54 E −03 0.12 E −02 0.74 E −04 0.60 E −03 0.11 E −14
2000 0.15 E −02 0.26 E −02 0.15 E −03 0.24 E −02 0.14 E −14
4000 0.38 E −02 0.51 E −02 0.37 E −03 0.99 E −02 0.39 E −14
8000 0.83 E −02 0.10 E −01 0.85 E −03 0.38 E −01 0.35 E −14
16000 0.19 E −01 0.23 E −01 0.17 E −02 0.14 E +00 0.58 E −14
32000 0.41 E −01 0.48 E −01 0.37 E −02 0.62 E +00 0.89 E −14
64000 0.98 E −01 0.90 E −01 0.82 E −02 0.24 E +01 0.12 E −13
128000 0.22 E +00 0.19 E +00 0.23 E −01 0.10 E +02 0.19 E −13
256000 0.44 E +00 0.47 E +00 0.32 E −01 0.40 E +02 0.26 E −13
512000 0.84 E +00 0.94 E +00 0.73 E −01 0.19 E +03 0.52 E −13
1024000 0.19 E +01 0.19 E +01 0.14 E +00 0.84 E +03 0.64 E −13

Dividing by u¯j accounts were the fact that the calculations are performed in finite precision; any remaining loss of accuracy in the numerical results is a consequence of the large number of addition and multiplication operations that are performed. All calculations are performed in double precision, and the algorithm of §3 is run with ε = 10−15. The parameter δ > 0 is set via an empirically determined heuristic. The numerical experiments were performed on a laptop with a Intel Core i5-8350U CPU and 7.7 GiB of memory; the code was written in Fortran and compiled with gfortran with standard optimization flags. The results are reported in Tables 1, 2, and 3.

To put the run time of the algorithm in context, we additionally perform a time comparison to the Fast Fourier Transform (FFT), which also has complexity O(nlogn). Specifically, we compare the run time of the algorithm of §3 on random data using precomputed exponentials with the run time of an FFT implementation from FFTPACK [20] on random data of the same length using precomputed exponentials. We report these timings in Table 4; we find that the FFT is roughly 5-10 times faster than our implementation of the algorithm of §3; we remark that no significant effort was made to optimize our implementation, and that it may be possible to improve the run time by vectorization.

Table 4.

Time comparison with FFT.

n tu tf
1000 0.91 E − 04 0.16 E − 04
2000 0.28 E − 03 0.37 E − 04
4000 0.41 E − 03 0.44 E − 04
8000 0.93 E − 03 0.85 E − 04
16000 0.18 E − 02 0.24 E − 03
32000 0.38 E − 02 0.41 E − 03
64000 0.81 E − 02 0.88 E − 03
128000 0.18 E − 01 0.19 E − 02
256000 0.38 E − 01 0.59 E − 02
512000 0.71 E − 01 0.12 E − 01
1024000 0.14 E + 00 0.25 E − 01

5.2. Computing nodes and weights.

The algorithm of §3 is described under the assumption that nodes t1, … , tm and weights w1, … , wm are given such that

1rj=1mwjertjεforr[δ(ba),ba], (33)

where ε > 0 and δ > 0 are fixed parameters. As in the proof of Lemma 4.5 we note that by rescaling r it suffices to find nodes and weights satisfying

1rj=1mwjertjεforr[1,δ1]. (34)

Indeed, if the nodes t1, … , tm and weights w1, … , wm satisfy (34), then the nodes t1/(ba), … , tm/(ba) and weights w1/(ba), … , wm/(ba) will satisfy (33). Thus, in order to implement the algorithm of §3 it suffices to tabulate nodes and weights that are valid for r ∈ [1, M] for various values of M. In the implementation used in the numerical experiments in this paper, we tabulated nodes and weights valid for r ∈ [1, M] for

M=[1,4k]fork=1,,10.

For example, in Tables 5 and 6 we have listed m = 33 nodes t1, … , t33 and weights w1, … , w33 such that

1rj=133wjertj1015,

for all r ∈ [1, 1024].

Table 5.

A list of 33 nodes t1, … , t33.

0.2273983006898589D−03, 0.1206524521003404D−02, 0.3003171636661616D−02,
0.5681878572654425D−02, 0.9344657316017281D−02, 0.1414265501822061D−01,
0.2029260691940998D−01, 0.2809891134697047D−01, 0.3798133147119762D−01,
0.5050795277167632D−01, 0.6643372693847560D−01, 0.8674681067847460D−01,
0.1127269233505314D+00, 0.1460210820252656D+00, 0.1887424688689547D+00,
0.2435986924712581D+00, 0.3140569015209982D+00, 0.4045552087678740D+00,
0.5207726670656921D+00, 0.6699737362118449D+00, 0.8614482005965975D+00,
0.1107074709906516D+01, 0.1422047253849542D+01, 0.1825822499573290D+01,
0.2343379511131976D+01, 0.3006948272874077D+01, 0.3858496861353812D+01,
0.4953559345813267D+01, 0.6367677940017810D+01, 0.8208553424367139D+01,
0.1064261195532074D+02, 0.1396688222191633D+02, 0.1889449184151398D+02

Table 6.

A list of 33 weights w1, …, w33.

0.5845245927410881D−03, 0.1379782337905140D−02, 0.2224121503815854D−02,
0.3150105276431181D−02, 0.4200370923383030D−02, 0.5431379037435571D−02,
0.6918794756934398D−02, 0.8763225538492927D−02, 0.1109565843047196D−01,
0.1408264766413004D−01, 0.1793263393523491D−01, 0.2290557147478609D−01,
0.2932752351846237D−01, 0.3761087060298772D−01,0.4828044150885936D−01,
0.6200636888239893D−01, 0.7964527252809662D−01, 0.1022921587521237D+00,
0.1313462348178323D+00, 0.1685948994092301D+00, 0.2163218289369589D+00,
0.2774479391081561D+00, 0.3557192797195578D+00, 0.4559662159666857D+00,
0.5844792718191478D+00, 0.7495918095861060D+00, 0.9626599456939077D+00,
0.1239869481076760D+01, 0.1605927580173348D+01, 0.2102583514906888D+01,
0.2811829220697454D+01, 0.3937959064316012D+01, 0.6294697335695096D+01

The nodes and weights satisfying (34) can be computed by using a procedure for generating generalized Gaussian quadratures for Chebyshev systems together with the proof of Lemma 4.4. Indeed, Lemma 4.4 is constructive with the exception of the step that invokes Lemma 4.2 of Kreĭn. The procedure described in [4] is a constructive version of Lemma 4.2: given a Chebyshev system of functions, it generates the corresponding quadrature nodes and weights. We remark that generalized Gaussian quadrature generation codes are a powerful tools for numerical computation with a wide range of applications. The quadrature generation code used in this paper was an optimized version of [4] recently developed by Serkh for [19].

Acknowledgements.

The authors would like to thank Jeremy Hoskins for many useful discussions. Certain commercial equipment is identified in this paper to foster understanding. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that equipment identified is necessarily the best available for the purpose.

N.F.M. was supported in part by NSF DMS-1903015.

V.R. was supported in part by AFOSR FA9550-16-1-0175 and ONR N00014-14-1-0797.

Contributor Information

ZYDRUNAS GIMBUTAS, National Institute of Standards and Technology, Boulder, CO 80305, USA.

NICHOLAS F. MARSHALL, Department of Mathematics, Princeton University, Princeton, NJ 08540, USA

VLADIMIR ROKHLIN, Program in Applied Mathematics, Yale University, New Haven, CT 06511, USA.

References

  • [1].Beylkin Gregory and Monzón Lucas, Approximation by exponential sums revisited, Appl. Comput. Harmon. Anal 28 (2010), no. 2, 131–149. MR2595881 [Google Scholar]
  • [2].Beylkin Gregory and Monzón Lucas, On approximation of functions by exponential sums, Appl. Comput. Harmon. Anal 19 (2005), no. 1, 17–48. MR2147060 [Google Scholar]
  • [3].Braess Dietrich, Nonlinear approximation theory, Springer Series in Computational Mathematics, vol. 7, Springer-Verlag, Berlin, 1986. MR866667 [Google Scholar]
  • [4].Bremer James, Gimbutas Zydrunas, and Rokhlin Vladimir, A nonlinear optimization procedure for generalized Gaussian quadratures, SIAM J. Sci. Comput 32 (2010), no. 4, 1761–1788. MR2671296 [Google Scholar]
  • [5].Dahlquist Germund and Björck Åke, Numerical methods, Dover Publications, Inc., Mineola, NY, 2003, Translated from the Swedish by Ned Anderson, Reprint of the 1974 English translation. MR1978058 [Google Scholar]
  • [6].Dutt A, Gu M, and Rokhlin V, Fast algorithms for polynomial interpolation, integration, and differentiation, SIAM J. Numer. Anal 33 (1996), no. 5, 1689–1711. MR1411845 [Google Scholar]
  • [7].Fong William and Darve Eric, The black-box fast multipole method, J. Comput. Phys 228 (2009), no. 23, 8712–8725. MR2558773 [Google Scholar]
  • [8].Gauss CF. Methodus nova integralium valores per approximationen inveniendi, Werke, 3 (1866), 1630–196. [Google Scholar]
  • [9].Greengard Leslie, The rapid evaluation of potential fields in particle systems, ACM Distinguished Dissertations, MIT Press, Cambridge, MA, 1988. MR936632 [Google Scholar]
  • [10].Greengard Leslie and Rokhlin Vladimir, A new version of the fast multipole method for the Laplace equation in three dimensions, Acta numerica, 1997, Acta Numer., vol. 6, Cambridge Univ. Press, Cambridge, 1997, pp. 229–269. MR1489257 [Google Scholar]
  • [11].Jakob-Chien Rüdiger and Alpert Bradley K., A fast spherical filter with uniform resolution, Journal of Computational Physics 136 (1997), no. 2, 580–584. [Google Scholar]
  • [12].Karlin Samuel and Studden William J., Tchebycheff systems: With applications in analysis and statistics, Pure and Applied Mathematics, Vol. XV, Interscience Publishers John Wiley & Sons, New York-London-Sydney, 1966. MR0204922 [Google Scholar]
  • [13].Kreĭn MG, The ideas of P. L. Čebyšev and A. A. Markov in the theory of limiting values of integrals and their further development, Amer. Math. Soc. Transl. (2) 12 (1959), 1–121. MR0113106 [Google Scholar]
  • [14].Ma J, Rokhlin V, and Wandzura S, Generalized Gaussian quadrature rules for systems of arbitrary functions, SIAM J. Numer. Anal 33 (1996), no. 3, 971–996. MR1393898 [Google Scholar]
  • [15].Martinsson Per-Gunnar, Rokhlin Vladimir, and Tygert Mark, On interpolation and integration in finite-dimensional spaces of bounded functions, Commun. Appl. Math. Comput. Sci 1 (2006), 133–142. MR2244272 [Google Scholar]
  • [16].Nabors K, Korsmeyer FT, Leighton FT, and White J, Preconditioned, adaptive, multipole-accelerated iterative methods for three-dimensional first-kind integral equations of potential theory, SIAM J. Sci. Comput 15 (1994), no. 3, 713–735, Iterative methods in numerical linear algebra (Copper Mountain Resort, CO, 1992). MR1273161 [Google Scholar]
  • [17].NIST Digital Library of Mathematical Functions. http://dlmf.nist.gov/, Release 1.0.22 of 2019-03-15. Olver FWJ, Olde Daalhuis AB, Lozier DW, Schneider BI, Boisvert RF, Clark CW, Miller BR and Saunders BV, eds
  • [18].Rokhlin V, A fast algorithm for the discrete Laplace transformation, J. Complexity 4 (1988), no. 1, 12–32. MR939693 [Google Scholar]
  • [19].Serkh Kirill, On the Solution of Elliptic Partial Differential Equations on Regions with Corners, ProQuest LLC, Ann Arbor, MI, 2016, Thesis (Ph.D.)–Yale University. MR3564124 [Google Scholar]
  • [20].Swarztrauber PN, Vectorizing the FFTs, Parallel Computations (Rodrigue G, ed.), Academic Press, 1982, pp. 51–83. [Google Scholar]
  • [21].Tygert M. Analogues for Bessel Functions of the Christoffel-Darboux Identity. Yale Tech. Rep (2016). [Google Scholar]
  • [22].Yarvin Norman and Rokhlin Vladimir, An improved fast multipole algorithm for potential fields on the line, SIAM J. Numer. Anal 36 (1999), no. 2, 629–666. MR1675269 [Google Scholar]
  • [23].Yarvin N and Rokhlin V, Generalized Gaussian quadratures and singular value decompositions of integral operators, SIAM J. Sci. Comput 20 (1998), no. 2, 699–718. MR1642612 [Google Scholar]

RESOURCES