Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Jun 5.
Published in final edited form as: Bernoulli (Andover). 2018 Dec 12;25(1):89–111. doi: 10.3150/17-BEJ960

Stein’s method and approximating the quantum harmonic oscillator

IAN W MCKEAGUE 1, EROL A PEKÖZ 2, YVIK SWAN 3
PMCID: PMC6550468  NIHMSID: NIHMS1029584  PMID: 31178654

Abstract

Hall et al. (2014) recently proposed that quantum theory can be understood as the continuum limit of a deterministic theory in which there is a large, but finite, number of classical “worlds.” A resulting Gaussian limit theorem for particle positions in the ground state, agreeing with quantum theory, was conjectured in Hall et al. (2014) and proven by McKeague and Levin (2016) using Stein’s method. In this article we show how quantum position probability densities for higher energy levels beyond the ground state may arise as distributional fixed points in a new generalization of Stein’s method These are then used to obtain a rate of distributional convergence for conjectured particle positions in the first energy level above the ground state to the (two-sided) Maxwell distribution; new techniques must be developed for this setting where the usual “density approach” Stein solution (see Chatterjee and Shao (2011)) has a singularity.

Keywords: Interacting particle system, Higher energy levels, Maxwell distribution, Stein’s method

1. Introduction

Hall et al. (2014) proposed a many interacting worlds (MIW) theory for interpreting quantum mechanics in terms of a large but finite number of classical “worlds.” In the case of the MIW harmonic oscillator, an energy minimization argument was used to derive a recursion giving the location of the oscillating particle as viewed in each of the worlds. Hall et al. conjectured that the empirical distribution of these locations converges to Gaussian as the total number of worlds N increases. McKeague and Levin (2016) recently proved such a result and provided a rate of convergence. More specifically, McKeague and Levin showed that if x1, … xN is a decreasing, zero-mean sequence of real numbers satisfying the recursion relation

xn+1=xn1x1++xn, (1)

then the empirical distribution of the xn tends to standard Gaussian when N → ∞. Here xn represents the location of the oscillating particle in the nth world, and the Gaussian limit distribution agrees with quantum theory for a particle in the lowest energy (ground) state.

The hypothesized correspondence with quantum theory suggests that stable configurations should also exist at higher energies in the MIW theory. Moreover, the empirical distributions of these configurations should converge to distributions with densities of the form

pk(x)=(Hek(x))2k!φ(x),xR, (2)

where φ(x) is the standard normal density,

Hek(x)=(1)kex22dkdxkex22

is the (probabilist’s) kth Hermite polynomial, and k is a non-negative integer. The ground state discussed above corresponds to k = 0 and has the standard Gaussian limit. However, the question of how to characterize higher energy MIW states corresponding to k ≥ 1 is still unresolved as far as we know.

The energy minimization approach of Hall et al. (2014) starts with an analysis of the Hamiltonian for the MIW harmonic oscillator:

H0(x,p)=E(p)+V(x)+U0(x),

where the locations of particles (having unit mass) in the N worlds are specified by x = (x1, … , xN) with x1 > x2 > … > xN, and their momenta by p = (p1, … , pN). Here E(p)=n=1Npn22 is the kinetic energy, V(x)=n=1Nxn2 is the potential energy (for the parabolic trap), and

U0(x)=n=1N(1xn+1xn1xnxn1)2

is called the “interworld” potential, where x0 = ∞ and xN+1 = −∞. In the ground state, there is no movement because all the momenta pn have to vanish for the total energy to be minimized. In this case, as mentioned above, Hall et al. (2014) showed that the particle locations xn satisfy (1) and McKeague and Levin (2016) showed that the empirical distribution tends to a standard Gaussian distribution.

Our contribution in the present article is to derive an interworld potential for the second energy state (k = 1) and show that the empirical distribution of the configuration that minimizes the corresponding Hamiltonian has a limit distribution that again agrees with quantum theory. The interworld potential in this case is shown to be

U1(x)=9n=1N(1xn+13xn31xn3xn13)2xn4 (3)

and the minimizer of the corresponding Hamiltonian H1(x, p) = E(p) + V(x) + U1(x) is shown to satisfy the recursion

xn+13=xn33(i=1n1xi)1. (4)

Further, we show that if x1, … , xN is a decreasing, zero-mean solution, then the empirical distribution of the xn converges to the (two-sided) Maxwell distribution having density p1(x)=x2ex222π . The entire sequence x1, … xN should be viewed as indexed by N, though we suppress notation for this dependence and write x1, … xN instead of x1, N, … , xN,N. We also give a rate of convergence using a new extension of Stein’s method. Our approach is generalizable to recursions that converge to the distributions of other higher energy states of the quantum harmonic oscillator, although we do not pursue such extensions here.

We initially thought that the MIW interpretation could be based on a “universal” interworld potential function U0 that applies to all energy levels, with the densities pk(x) then arising as limits of local minima of H0. However, this idea turned out to be analytically unworkable. Here we propose an alternative approach in terms of adapting the interworld potential to each higher energy level. Minimizing the resulting Hamiltonian is then tractable and the solution can be shown to converge to pk(x), at least in the case k = 1. Hall et al. (2014) derived their interworld potential U0 as a discretization of Bohm’s quantum potential summed over the particle ensemble, see Bohm (1952). The challenge in general is to extend this derivation to higher-energy wave functions in a way that leads to an explicit recursion minimizing the resulting Hamiltonian, and to show that it agrees with pk(x) in the limit. A major contribution here, in addition to providing a rate of convergence, is a general method for finding such interworld potential functions and their associated particle recursions.

Stein’s method (see Stein (1986), Chen et al. (2010) and Ross (2011)) is a well-established technique for obtaining explicit error bounds for distributional limit theorems. However, the usual “density approach” (see Chatterjee and Shao (2011)) for applying Stein’s method does not seem to work in cases where the density function vanishes at a point in the interior of the support of the target distribution (here we have p1 (0) = 0 and the support is the whole real line). As we elaborate later, in this case the solution to the Stein equation will have a singularity and also unbounded derivatives. This motivates the new technique we will develop to handle such distributions. While there are plenty of examples of Stein’s method applied to distributions with a density having a zero on the boundary of the support (the gamma and beta distributions, for example), there have been no examples (that we know of) with a zero in the interior of the support; the higher energy distributions pk (x), for k ≥ 1, appear to be the first such distributions considered. The price one has to pay with our approach for handling these zeros is more complicated estimates involving couplings. In our case, however, analytical properties of the recursion (4) can fortunately be exploited to establish such estimates.

In Section 2 we generalize the argument of Hall et al. (2014) to derive the interworld potential, and show how it leads to the solution (4). In Section 3 we introduce the notion of a generalized zero-bias transformation, and show that the distributional properties of eigenstates of the quantum harmonic oscillator can be characterized in terms of fixed points of this transformation. Also, we derive the generalized zero-bias distribution for the empirical distribution of general configurations. Section 4 develops our results based on the new extension of Stein’s method to show convergence of the configuration that minimizes the Hamiltonian of the second energy state.

2. Interworld potentials for higher energy states

Hall et al. (2014) introduced their MIW theory from the perspective of the de Broglie–Bohm interpretation of quantum mechanics, which is mathematically equivalent to standard quantum theory. They used this approach to construct an ansatz for the conjectured interworld potential U0 governing the ground state wave function of the quantum harmonic oscillator. In this section we introduce an extended version of this ansatz aimed at providing a MIW characterization of the higher energy eigenstates.

Our argument follows along the lines of Section IIIA of Hall et al. (2014) with the major difference being that we now need to introduce a more general way of approximating the density of particle location for a stationary wave function ψ(x), namely for a density of the form p(x) = ∣ψ(x)∣2 = b(x)φ(x), where b(x) is a non-negative, even, smooth function having finitely many zeros. Here b represents a “baseline” that varies more rapidly than φ(x). Let x1 > x2 > … > xN. Bohm’s quantum potential summed over the ensemble {xn} is defined by

Uψ(x)=n=1N[p(xn)p(xn)]2 (5)

where we are using dimensionless units. An approximation to p(xn) based on ignoring φ(x) is given (up to a normalizing constant) by

p~(xn)=b(xn)B(xn)B(xn+1),

where B(x)=0xb(t)dt is the cumulative baseline function. This suggests

p(xn)p(xn)p~(xn)p~(xn1)(xnxn1)p~(xn)[1B(xn)B(xn+1)1B(xn1)B(xn)]b(xn),

where we set B(x0) = ∞ and B(xN+1) = −∞. Our proposed ansatz for the interworld potential is then based on inserting the above expression into (5) to obtain

Ub(x)=n=1N[1B(xn+1)B(xn)1B(xn)B(xn1)]2b(xn)2. (6)

Note that our earlier assumptions about b imply that B is strictly increasing, so Ub is well-defined. In the simplest cases b(x) = 1 and b(x) = x2 the above expression for Ub agrees with the interworld potentials U0 and U1 defined in the Introduction. We conjecture that the interworld potential Ub is suitable for obtaining MIW approximations to the class of target distributions of the form pk(x). Indeed, there may be a natural affinity between our new version of Stein’s method and the densities pk(x) for all the energy levels of the quantum harmonic oscillator.

Specializing to the case b(x) = x2, the following argument characterizes the minimizer of the Hamiltonian H1 (i.e., the ground state when the interworld potential is U1) in terms of a solution to the recursion (4). In any ground state the particles do not move, so the kinetic energy E vanishes. Then, adapting the argument of Hall et al. (2014) to apply to H1, we have

9(N1)2=9[n=1N1xn+13xn3xn+13xn3]2=9[n=1N(1xn+13xn31xn3xn13)xn2(xnxN3¯xn2]29[n=1N(1xn+13xn31xn3xn13)2xn4][n=1N(xnxN3¯xn2)2]U1(x)V(x),

where the first inequality is Cauchy–Schwarz. So U1 ≥ 9(N – 1)2/V, leading to

H1=U1+V9(N1)2V+V6(N1)

with the last inequality being equality for V = 3(N – 1). It follows that H1 is minimized when V = 3(N – 1), the mean xN3¯ of {xn3,n=1,,N} vanishes, and

1xn=α[1xn+13xn31xn3xn13]

for some constant α. The sum of the right of the above display telescopes, leading to the recursion (4) by rearranging and noting that α = −V/(N – 1) = −3.

The following lemma provides the basic properties we need to ensure the existence of a solution of the Maxwell recursion (4) that minimizes the Hamiltonian H1, as well as ensuring that the solution is unique. This result is analogous to Lemma 1 of McKeague and Levin (2016) concerning solutions of (1), but the difference here is that the variance is 3, agreeing with the Maxwell distribution (rather than close to standard normal in the case of (1)).

Lemma 2.1. Suppose N is even. Every zero-median solution x1, … , xN of (4) satisfies:

  • (P1) Zero-mean: x1 + … + xN = 0.

  • (P2) Maxwell variance: x12++xN2=3(N1) .

  • (P3) Symmetry: xn = −xN+1−n for n = 1, … , N.

Further, there exists a unique solution x1, … , xN such that (P1) and

  • (P4) Strictly decreasing: x1 > … > xN

hold. This solution has the zero-median property, and thus also satisfies (P2) and (P3).

Proof. The proof follows identical steps to the proof of Lemma 1 of McKeague and Levin (2016), apart from the variance property (P2), which is proved using (P1) and (P3) as follows. Denote Sn=i=1nxi1 for n = 1, … , N, and set S0 = 0. Using (4) we can write

3(N1)=3n=1N1SnSn1=n=1N1Sn(xn3xn+13)=n=1N1[(Sn1+xn1)xn3Snxn+13]=n=1N1[Sn1xn3Snxn+13+xn2]=x12++xN12SN1xN3,

where we used the recursion in the second equality, and the last equality is from a telescoping sum. (P3) implies SN = 0, so −SN–1 = 1/xN, and (P2) follows. □

Although in the sequel we concentrate on the case k = 1 (see Figure 1), to conclude this section we briefly discuss general densities of the form pk given in (2). The above argument for b(x) = x2 can be extended to general Ub under the condition that B(x) is proportional to xb(x), which is the case when b(x) is proportional to xr for some even non-negative integer r (but not for the square of the kth Hermite polynomial unless k = 0 or 1). Under this condition, it can be shown that the minimizer of the Hamiltonian based on Ub is a symmetric solution of the recursion

B(xn+1)=B(xn)(i=1nxib(xi))1. (7)

Figure 1.

Figure 1.

Example with b(x) = x2, N = 22, showing the piecewise constant density having mass 1/(N – 1) uniformly distributed over the intervals between successive xn compared with the Maxwell density, where the breaks in the histogram are the successive xn satisfying the recursion (4).

We have not been able to show that this recursion minimizes the Hamiltonian for general b, but our numerical results suggest that it is very close if not identical to a minimizer. With k = 2 we have b(x) = (x2 – 1)2/2, B(x) = x5/10 – x3/3 + x/2, and the symmetric solution of the resulting recursion produces a remarkably good agreement with pk, see Figure 2.

Figure 2.

Figure 2.

Example with b(x) = Hek(x)2/k! for k = 2, N = 41, where the breaks in the histogram are the successive xn satisfying the recursion (7) and the red curve is pk(x).

3. Generalized zero-bias transformations

Let W be a symmetric random variable and b: RR a non-negative function such that σ2=E[W2b(W)]< . Goldstein and Reinert (1997) gives a distributional fixed point characterization of the Gaussian distribution, which we generalize in the definition below.

Definition 3.1. If there is a random variable W* such that

σ2E[f(W)b(W)]=E[Wf(W)b(W)]

for all absolutely continuous functions f: RR such that EWf(W)b(W)< , we say that W* has the b-generalized-zero-bias distribution of W.

Remark 3.2. Goldstein and Reinert (1997) study the case b(x) = 1 and show that W* has the same distribution as W if and only if W has a Gaussian distribution. Distributional fixed point characterizations for exponential, gamma and other nonnegative distributions and the connection with Stein’s method have been studied in Peköz and Röllin (2011), Peköz et al. (2013), and Peköz et al. (2016).

Remark 3.3. By a routine extension of the proof of Proposition 2.1 of Chen et al. (2010), it can be shown that there exists a unique distribution for W*, and it is absolutely continuous with density

p(x)b(x)E[Wb(W)1Wx].

We note in passing that the σ2 should be on the other side of the equality in the first, display of Chen et al.’s proposition, which corresponds to b(x) = 1, the usual zero-bias distribution of W. The composition of the b-generalized-zero-bias transformation with the (1/b)-generalized-zero-bias transformation is the usual zero-bias transformation.

Remark 3.4. With φ the standard normal density and b a φ-integrable function, if W has density

p(x)=b(x)φ(x), (8)

then its distribution is a fixed point, for the b-generalized-zero-bias transformation since

p(x)=b(x)xtb(t)p(t)dt=b(x)xtφ(t)dt=p(x).

The following result gives the b-generalized-zero-bias distribution of the uniform distribution on N points.

Proposition 3.5. Given an integer N > 1, let x1 > x2 > … > xN be such that b(xn) > 0 for all n. Let PN be the empirical distribution of the xn:

PN(A)=#{n:xnA}N

for any Borel set AR . Under the symmetry condition xn = xNn+1 for n = 1, … , N, the b-generalized-zero-bias distribution PN of PN is defined, and has density

p(x)b(x)[i=1nxib(xi)]

for xn+1 < xxn (n = 1, … , N – 1), and p* (x) = 0 if x > x1 or xxN.

Proof. Immediate from Remark 3.3. □

Recall the following distances between distribution functions F and G. The Kolmogorov distance is

dK(F,G)=supxRF(x)G(x),

and the Wasserstein distance is

dW(F,G)=suphHRhdFRhdG

where

H={h:RRLipschitz withh1}

and ∥ · ∥ is the supremum norm. Using Proposition 1.2 in Ross (2011), these two metrics are seen to be related by

dK(F,G)2CdW(F,G)

if G has density bounded by C.

Restricting attention to the special case b(x) = x2, we can now state our main result, along with an important corollary.

Theorem 3.6. Suppose W* is constructed on the same probability space as the zero-mean random variable W and is distributed according to the x2-generalized-zero-bias distribution of W. Let M have the two-sided Maxwell density x2ex222π . Then there exist positive finite constants λ1, λ2, λ3 and λ4 such that

dW(L(W),L(M))λ1EWW+λ2E[WWW]+λ3E1W1W+λ4E1WW. (9)

Proof. The inequality follows immediately from Theorem 4.4. Finiteness of the constants (along with explicit upper bounds) is detailed in Proposition 4.5. □

The following corollary gives a rate of convergence of the solution to (4) to the two-sided Maxwell distribution in terms of the Wasserstein distance; we postpone the proof until Section 4.3.

Corollary 3.7. Suppose x1, … xN is a monotonic, zero-mean, finite sequence of real numbers satisfying (4), let PN be the empirical distribution of these values, and let M be as in Theorem 3.6. Then there is a constant C > 0 such that

dW(PN,L(M))ClogNN.

4. The Stein equation and its solutions

4.1. General considerations on Stein’s method and the problem with Stein’s density approach

Let F and G be two cumulative distribution functions which one wishes to compare. Denote L1(F) (resp., L1(G)) the class of Borel measurable functions h:RR such that ∫ ∣hdF < ∞ (resp., ∫ ∣hdG < ∞). A discrepancy measure between F and G is an integral probability metric if it can be written in the form

dH(F,G)suphHhdFhdG

for some class of test functions HL1(F)L1(G) . The aforementioned Kolmogorov and Wasserstein distances are two important examples of integral probability metrics.

Suppose that F is absolutely continuous with density p on the real line, and introduce the operator hfh which, to each hL1(F), assigns the function

fh(x)=1p(x)x(h(u)F(h))p(u)du (10)

where F(h) = ∫ h dF. The integrability condition hL1(F) guarantees that fh is the unique absolutely continuous solution to the differential equation

fh(x)+p(x)p(x)fh(x)=h(x)F(h)

to also satisfy the boundary conditions limx→±∞ p(x)fh(x) = 0. Under the assumption that HL1(G) we can integrate with respect to G on both sides of the differential equation (known, in the Stein community argot, as a “Stein equation”) to get

dH(F,G)=suphHE[fh(W)+p(W)p(W)fh(W)], (11)

with W a random variable distributed according to G. This last expression provides a means of bounding integral probability metrics (and thus in particular the Kolmogorov and Wasserstein distances) in terms of the action of a differential operator over a class of functions.

The steps outlined above form the basis of what is known as the “density approach” to Stein’s method (see e.g. Chatterjee and Shao (2011)), which is the most intuitive extension of Stein’s method of normal approximation (as described in Chen et al. (2010)) to arbitrary continuous target distributions. In order for (11) to be of practical use, however, it is crucial that the functions p′/p, fh and fh be amenable to computations; it is particularly important that fh and its first derivatives be bounded. Such conditions are not met in the case of the two-sided Maxwell distribution p(x) = x2φ(x) with which we are concerned in this paper. Indeed, for such a density, we have on the one hand p′(x)/p(x) = 2/xx and, on the other hand, fh(x)=x2ex22x(h(u)F(h))u2eu22 du, both of which have a singularity at x = 0. Because of this, applying the classical Stein’s method toolkit to the right-hand side of (11) will ultimately lead to trivial upper bounds and more elaborate methods need to be devised. This will be performed in the coming sections.

Before proceeding to the description of our proposal, we stress that the classical version of the “density approach” to Stein’s method that we have just described actually breaks down for any target density p such that p(x0) = 0 at some x0 not on the edges of the support. Indeed, in general, the Stein solution (10) is the product of two terms: one term with a singularity wherever the density has a zero, and a second term that vanishes at the endpoints of the range of the target random variable. This results in the peculiar behavior of singularities inside the range of the random variable when the density has a zero there. Note that for the one-sided Maxwell distribution the solution fh(x) has the same form as for its two-sided counterpart, though it is now only defined for x ⩾ 0; since in this one-sided case when x = 0 the second term 0(h(u)F(h))u2eu22du vanishes, we would have fh(0) = 0. This term doesn’t vanish (unless h is an even function) for the two-sided Maxwell case, thus giving rise to the singularity at x = 0.

4.2. Coupling based Stein’s method for densities of the form (8)

Let X be a random variable with probability density function p which we assume to be of the form (8). Let W be a symmetric random variable whose distribution we want to compare with that of X. First, we introduce the random variable W* proposed in Definition 3.1 and write

E[h(W)]E[h(X)]=E[f(W)b(W)Wf(W)b(W)]=E[f(W)b(W)f(W)b(W)] (12)

for f = fh solutions to the differential equation

f(w)b(w)wf(w)b(w)=h(w)E[h(X)]. (13)

Taking suprema over all hH , we deduce

dH(L(W),L(X))=suphHE[f(W)b(W)f(W)b(W)] (14)

for all H such that the solutions to (13) are well-defined. Expression (14) provides an alternative to (11) which we will now prove to be useful to our purpose.

At this stage the next typical “Stein-method” step is to write W* = W + (W* – W) and Taylor expand the integrand in (14) around W to deduce a bound on dH(L(W),L(X)) expressed in terms of the difference between W* and W. Unfortunately, for similar reasons as those described in Section 4.1, the solutions to (13) also have singularities which make this intuition unexploitable directly. We propose to bypass this difficulty by introducing intermediate functions τX and g – to be defined later on in the text – for which

E[h(W)]E[h(X)]=E[f(W)b(W)f(W)b(W)]=E[W(τX(W)1)g(W)W(τX(W)1)g(W)]+E[τX(W)g(W)τX(W)g(W)]. (15)

Bounding integral probability metrics dH(,) between L(W) and L(X) then boils down to finding bounds on the four terms provided in (15). Obviously this will only lead to reasonable results if the intermediate functions τX and g are chosen wisely.

4.3. The Stein kernel equation for densities of the form (8)

We start by introducing the integral operator

hΦ(h)Tφ1(hΦ(h))(w)1φ(w)w(h(u)Φ(h))φ(u)du (16)

with Φ(h) = ∫ hdΦ and Φ the standard Gaussian cumulative distribution function. (The notation Tφ1 is taken from Ley, Reinert and Swan (2017).) We also introduce the function

τX(x)=1p(x)xup(u)du

which is called the “Stein kernel” of X (or, equivalently, of p) – again we refer to Ley, Reinert and Swan (2017) for intuition and first properties.

Remark 4.1. Stein kernels were introduced in Stein (1986); Cacoullos and Papathanasiou (1989), and have proven to be of great use in Gaussian analysis, see, e.g., Nourdin and Peccati (2009) and Chatterjee (2009). Their importance in the abstract approach to Stein’s method has been investigated in Döbler (2015), where it is shown that they have a regularizing effect on the solutions to general Stein equations.

Lemma 4.2. Let xb(x) be a nonnegative even function with support a subset of (−∞, ∞) and such that limx→±∞ b(x)φ(x) = 0. Suppose furthermore that b is absolutely continuous and integrable w.r.t. φ with integral b(x)φ(x) dx = 1. Let X be a random variable with density xb(x)φ(x). Then

τX(x)=1+Tφ1b(x)b(x) (17)

under the convention that the ratio is set to zero at all points x such that b(x) = 0 and Tφ1b(x)0 . Let h:RR be a Borel function such that Eh(X)∣ < ∞, and set h~=hE[h(X)]. Then

gh(x)=xb(u)h~(u)φ(u)dub(x)φ(x)+xb(u)φ(u)du (18)

is the unique solution g of the ODE

τX(x)g(x)xg(x)=h~(x) (19)

which satisfies the asymptotic property limx→±∞ τ(x)φ(x)b(x)g(x) = 0.

Proof. Integrating by parts in the definition of the Stein kernel for p = we get (assuming that limx→±∞ b(x)φ(x) = 0)

x+yp(y)dy=xb(y)(φ(y))dy=b(x)φ(x)+xb(y)φ(y)dy

so that (17) follows by definition (16) of the inverse Stein operator. For the second claim we follow (Nourdin and Peccati, 2012, Proposition 3.2.2) and note how

τX(x)g(x)xg(x)=(τX(x)g(x)p(x))p(x)

so that any solution to (19) has the form

g(x)=1τ(x)p(x)xh~(u)p(u)du+dτ(x)p(x), (20)

where dR . By dominated convergence, one infers that

limx±xh~(y)b(y)φ(y)dy=0,

so that the first summand in (20) has the announced form (18) and the asymptotic property is satisfied if and only if d = 0. □

Our next result provides the connection between the Stein equations (13) and (19).

Lemma 4.3. Suppose that b only has isolated zeros. Let all notations be as above and introduce the function g = gf defined at all x such that b(x) > 0 through

f(x)xf(x)b(x)=τX(x)g(x)xg(x).

Then

E[f(W)b(W)f(W)b(W)]=E[W(τX(W)1)g(W)W(τX(W)1)g(W)]+E[τX(W)g(W)τX(W)g(W)]. (21)

Proof. Since

f(x)xf(x)b(x)=(f(x)φ(x))b(x)φ(x)

and

τX(x)g(x)xg(x)=(b(x)τX(x)g(x)φ(x))b(x)φ(x)

at all x for which b(x) ≠ 0, we deduce that f and g are mutually defined by f = (X)g. This in turn gives

f(x)b(x)=(b(x)b(x)τX(x)+τX(x))g(x)+τX(x)g(x)ψ(x)g(x)+τX(x)g(x) (50)

which, combined with ψ(x) = x(τX(x) – 1) (that is easily derived using the various definitions involved), leads to the useful identity

f(x)b(x)=x(τX(x)1)g(x)+τX(x)g(x) (22)

from which (21) is directly derived. □

Combining identities (12) and (21) we get (15), as promised. As already mentioned in the introduction, the price to pay for circumventing the singularities is the necessity to bound several additional quantities concerning the couplings we obtain. The explicit nature of the recursion described in Section 2 nevertheless allows us to compute the resulting quantities satisfactorily, leading to the bounds claimed in Theorem 3.6 and Corollary 3.7. This we perform in the Maxwell case in the next sections.

4.4. Approximating the two-sided Maxwell distribution

Theorem 4.4. Let p(x) = x2φ(x), and take f a solution to the Stein equation

f(w)w2wf(w)w2=h~(w), (23)

where h~ is a function having bounded first derivative and zero-mean under p. Set c=h~ . Then for any coupling of W and W* on a joint probability space such that W* has the x2-generalized zero biased distribution for W,

E[f(W)(W)2f(W)(W)2]λ1EWW+λ2E[WWW]+λ3E1W1W+λ4E1WW (24)

with

λ16c,λ27c,λ318candλ422c. (25)

Proof. With b(x) = x2 we have τX(x) = 1 + 2/x2 and ψ(x) = 2/x, so that (21) becomes

=E[2Wg(W)2Wg(W)]+E[(1+2(W)2)g(W)(1+2(W)2)g(W)]=2E[(1W1W)g(W)]+2E[1W(g(W)g(W))]+E[g(W)g(W)]+2E[1(W)2g(W)1(W)2g(W)].

The first two terms are dealt with easily to get

2E[(1W1W)g(W)]+2E[1W(g(W)g(W))]2gE[1W1W]+2gE[1WWW].

For the last two terms we introduce the function

χ(x)=g(x)x

to get on the one hand

E[g(W)g(W)]=E[Wg(W)WWg(W)W]=E[(WW)χ(W)]+E[W(χ(W)χ(W))]

so that

E[g(W)g(W)]χE[WW]+χE[W(WW)]

and, on the other hand

E[1(W)2g(W)1(W)2g(W)]=E[1Wχ(W)1Wχ(W)]=E[(1W1W)χ(W)]+E[1W(χ(W)χ(W))]

so that

2E[1(W)2g(W)1(W)2g(W)]2χE[1W1W]+2χE[1WWW].

Combining these different estimates we obtain (24), with λ1, λ2, λ3 and λ4 expressed in terms of ∥χ∥, ∥χ′∥, ∥g∥ and ∥g′∥ as follows:

λ1=χ,λ2=χ,λ3=2(g+χ)andλ4=2(g+χ).

The inequalities in (25) are proved in the Proposition 4.5 below. □

The next step is to bound ∥χ∥, ∥χ′∥, ∥g∥ and ∥g′∥ in a non trivial way; this we achieve in the next proposition.

Proposition 4.5. Let h:RR be absolutely continuous and integrable with respect to p(x) = x2φ(x). Set c = ∥h′∥ which we suppose to be finite. Let X ~ p, define

g0(x)={ex22xy2(h(y)E[h(X)])ey22dyifx>0ex22xy2(h(y)E[h(X)])ey22dyifx0} (26)

and set

g(x)=g0(x)x2+2andχ(x)=g(x)x. (27)

Then

g3c,g4c,χ6candχ7c.

Remark 4.6. The function g0 defined in (26) satisfies

g0(x)xg0(x)=x2(h(x)E[Z2h(Z)]) (28)

with Z ~ φ a standard Gaussian random variable.

Remark 4.7. The function g defined in (27) satisfies

(g(x)τ(x)p(x))p(x)=h(x)E[h(X)]

with X ~ p and τ(x) = 1 + 2/x2.

Proof. In order to simplify future notations we introduce Φ(x)=xφ(t)dt , Φ(x)=xφ(t)dt , Υ(x)=ex22xt2et22dt and Υ(x)=ex22xt2et22dt . Using the identity

abt2et22dt=aea22beb22+abet22dt,ab, (29)

We deduce that Υ(x)=x+ex22xet22dt and Υ(x)=x+ex22xet22dt and thus

Υ(x),Υ(x)x+π2at allxRandlimxΥ(x)x=limxΥ(x)x=1. (30)

The proof is now broken down into several steps.

Step 1: rewrite the solutions. Following (Chen et al., 2010, page 39) we rewrite the test functions in term of their derivatives (still with Z a standard normal random variable)

h(y)E[h(X)]=h(y)E[Z2h(Z)]=z2(h(y)h(z))φ(z)dz=yz2(zyh(t)dt)φ(z)dzyz2(yzh(t)dt)φ(z)dz.

Changing the order of integration then using (29) leads to the rhs becoming

yh(t)[tz2φ(z)dz]duyh(t)[tz2φ(z)dz]dt=yh(t)[tφ(t)+tφ(z)dz]dtyh(t)[tφ(t)+tφ(z)dz]dt=h(t)tφ(t)dt+yh(t)Φ(t)dtyh(t)Φ(t)dt,

and thus

h(y)E[h(X)]=yh(t)Φ(t)dtyh(t)Φ(t))dtE[Zh(Z)]. (31)

We deduce the following useful bound

h(x)E[h(X)]x2+2c2(x+12π)+2πx2+22c. (32)

Plugging (31) in (26) leads to (we restrict the discussion to x > 0, the other case following by symmetry)

g0(x)=E[Zh(Z)]Υ(x)I(x)+ex22xyy2ey22h(t)Φ(t)dtdyII(x)ex22xyy2ey22h(t)Φ(t)dtdyIII(x)

To deal with the quantities II(x) and III(x) we again interchange integrations to get

II(x)=ex22x(xy2ey22dy)h(t)Φ(t)dt+ex22x(ty2ey22dy)h(t)Φ(t)dt=Υ(x)xh(t)Φ(t)dt+ex22xet22Υ(t)h(t)Φ(t)dt

and

III(x)=ex22x(xty2ey22dy)h(t)Φ(t)dt=ex22x(ex22Υ(x)et22Υ(t))h(t)Φ(t)dt=Υ(x)xh(t)Φ(t)dtex22xet22Υ(t)h(t)Φ(t)dt

and thus if x ≥ 0 we have

g0(x)=E[Zh(Z)]Υ(x)+Υ(x)xh(t)Φ(t)dtΥ(x)xh(t)Φ(t)dt+ex22xet22Υ(t)h(t)dt. (33)

By a similar argument we deduce that if x < 0 then

g0(x)=E[Zh(Z)]Υ(x)+Υ(x)xh(t)Φ(t)dtΥ(x)xh(t)Φ(t)dt+ex22xet22Υ(t)h(t)dt. (34)

Step 2: a bound on ∥g∥. Supposing ∥h′∥ ≤ c we can use (33) and the first claim in (30) to deduce that for x ≥ 0:

g0(x)cEZ(x+π2)+c(x+π2)xΦ(t)dt+c(x+π2)xΦ(t)dt+cex22xet22(t+π2)dt.

The last two terms decrease strictly to 0 as x → ∞, with maximum value c/2 and c(1 + π/2), respectively. The first term is equal to c(2πx+1) and the second one is equal to

c(x+π2)xΦ(t)dt=c(x+π2)(xΦ(x)+φ(x))c(x2+(π2)+12π)x+12).

Similar (symmetric) bounds hold for x ≤ 0 and thus, collecting all these estimates, we may conclude:

g(x)=g0(x)x2+23c. (35)

Step 3: a bound on ∥g′∥. Here we start by rewriting the derivative as

g(x)=g0(x)x2+22x(x2+2)2g0(x). (36)

Using (35), the second summand is easily seen to be uniformly bounded (by 3c). We are left with the first summand for which we start by rewriting the numerator, for x ≥ 0, using (33):

g0(x)=Υ(x)E[Zh(Z)]+Υ(x)xh(t)Φ(t)dtΥ(x)xh(t)Φ(t)dt+Υ(x)(h(x)Φ(x)+h(x)Φ(x))+xex22xΥ(t)h(t)et22dtex22Υ(x)h(x)ex22

which leads to

g0(x)=Υ(x)E[Zh(Z)]+Υ(x)xh(t)Φ(t)dtΥ(x)xh(t)Φ(t)dt+xex22xΥ(t)h(t)et22dt. (37)

Now we can use the fact that Υ(x)=xex22xet22dt1 for all x ≥ 0 as well as all the arguments outlined at the previous step to deduce the bound: g0(x)c(2π+2(x+12π)+12π)2c(x+1) whence

g0(x)x2+2c2x+2x2+2c. (38)

Similar (symmetric) arguments hold also for negative x and thus ∣g′(x)∣ ≤ 4c.

Step 4: a bound on χ(x) = g′(x)/x. Using (36) we know that

χ(x)=g0(x)x(x2+2)1(x2+2)2g0(x). (39)

The second summand in (39) is bounded using (34) to get

1(x2+2)2g0(x)3c. (40)

For the first summand we use (37) to deduce

g0(x)x=Υ(x)xE[Zh(Z)]+Υ(x)xxh(t)Φ(t)dtΥ(x)xxh(t)Φ(t)dt+ex22xΥ(t)h(t)et22dt.

At this stage it is useful to remark that, for x ≥ 0, the function Υ(x)x is strictly decreasing with maximal value π2 and hence g0(x)xc(1+2(x+12π)+12π)c(2x+3) and thus g0(x)x(x2+2)3c which, combined with (40), leads (after applying the symmetric arguments for x ≤ 0) to ∣χ(x)∣ ≤ 6c.

Step 5: a bound on ∥χ′∥. Direct computations using (28)

χ(x)=1x2+2(11x2+2)g0(x)xx2+2(h(x)E[Z2h(Z)])

and thus

χ(x)=2x3(x2+2)2g0(x)x2+2+(12x2+2)g0(x)x2+22x2x2+2h(x)E[Z2h(Z)]x2+2xx2+2h(x).

Using the bounds ∣2x3/(x2 + 2)2∣ ≤ 1, ∣1 – 2/(x2 + 2)∣ ≤ 1, ∣(2 – x2)/(x2 + 2)∣ ≤ 1 and ∣x/(x2 + 2)∣ ≤ 1 as well as (35), (38) and (32) we conclude (after applying the symmetric arguments for x ≤ 0) ∣χ′(x)∣ ≤ 7c. □

4.5. Verifying bounds on expectations

In this section we find bounds on the expectations in Theorem 3.6 in order to prove Corollary 3.7. We will make use of the following lemma.

Lemma 4.8. If x1, … , xN is the unique strictly decreasing zero-mean solution of (4), then x1=O(logN) .

Proof. To simplify the notation, note that it suffices to consider the rescaled recursion xn+13=xn3Sn1 , where Sn is defined in the proof of Lemma 2.1. By expressing x13 as a telescoping sum,

x13=n=1m1(xn3xn+13)+xm3=n=1m1Sn1+xm3n=1m1(nx1)1+xm3x1(1+logm)+xm3,

where we have used Euler’s approximation to the harmonic sum for the last inequality. By the variance property (P2) (in this rescaled case x12++xN2=N1) we have that x1 is bounded away from zero (as a sequence indexed by N) and xm is bounded, so xm2x1 is bounded. Dividing the above display by x1, we then obtain x1=O(logN) . □

Proof of Corollary 3.7. From Proposition 3.5 and the recursion (4), note that p*(x) puts mass 1/(N – 1) on each interval between successive xn, so it is easy to create a coupling of WPN with W* ~ p*(x) such that

WWxnxn+1

when W ∈ [xn+1, xn]. For a detailed proof of such a coupling, see the construction given in McKeague and Levin (2016). From (P3) (see Lemma 2.1) and Lemma 4.8 we then have

EWW1N1n=1N1(xnxn+1)=2x1N1=O(logNN). (41)

Second, using Wx1=O(logN) it follows immediately that

E[WWW]=O(logNN).

Third, the zero-median property gives

2xm8=xm3xm+13=Sm1(mxm)1=xmm,

where m = N/2 + 1, so xm1N . By symmetry

E1W1W=E1W1W1W(xm+1,xm]+2n=1m1E1W1W1W(xn+1,xn].

From Proposition 3.5 note that p*(x) ∝ x2 for x ∈ (xm+1, xm]. Also using the fact that p*(x) puts mass 1/(N – 1) on this interval, the first term above can be written

6xm3(N1)0xm(1x1xm)x2dx3xm(N1)=O(1N).

The second term is bounded above by the telescoping sum

2N1n=1m1(1xn+11xn)=2N1(1xm1x1)=O(1N),

so we have

E1W1W=O(1N).

Fourth,

E1WWNEWW=O(logNN).

sing Wxm1N and (41). The Corollary now follows from Theorem 3.6. □

Acknowledgements

The research of Ian McKeague was partially supported by NSF Grant DMS-1307838 and NIH Grant 2R01GM095722-05. The research of Yvik Swan was partially by the Fonds de la Recherche Scientifique - FNRS under Grant no F.4539.16. We also thank the Institute for Mathematical Sciences at National University of Singapore for support during the Workshop on New Directions in Stein’s Method (May 18–29, 2015) where work on the paper was initiated.

References

  1. Bohm D (1952). A suggested interpretation of the quantum theory in terms of “hidden” variables. I. Phys. Rev. 85, 166–179. [Google Scholar]
  2. Cacoullos T and Papathanasiou V (1989). Characterizations of distributions by variance bounds. Statist. Probab. Lett. 7, 351–356. [Google Scholar]
  3. Chatterjee S (2009). Fluctuations of eigenvalues and second order Poincaré inequalities. Probab. Theory Related Fields 143 1–40. [Google Scholar]
  4. Chatterjee S and Shao Q-M (2011). Non-normal approximation by Stein’s method of exchangeable pairs with application to the Curie–Weiss model. Ann. App. Probab. 21, 464–483. [Google Scholar]
  5. Chen L, Goldstein L and Shao Q-M (2010). Normal Approximation by Stein’s Method. Springer Verlag. [Google Scholar]
  6. Döbler C (2015). Stein’s method of exchangeable pairs for the beta distribution and generalizations. Electron. J. Probab. 20, 1–34. [Google Scholar]
  7. Goldstein C and Reinert G (1997). Stein’s method and the zero bias transformation with application to simple random sampling. Ann. Appl. Probab. 7, 935–952. [Google Scholar]
  8. Hall MJW, Deckert DA and Wiseman HM (2014). Quantum phenomena modeled by interactions between many classical worlds. Phys. Rev. X 4, 041013. [Google Scholar]
  9. Ley C, Reinert G and Swan Y (2017). Stein’s method for comparison of univariate distributions. Probab. Surv. 14 1–52 [Google Scholar]
  10. McKeague IW and Levin B (2016). Convergence of empirical distributions in an interpretation of quantum mechanics. Ann. Appl. Probab. 26 2540–2555. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Nourdin I and Peccati G (2009). Stein’s method on Wiener chaos. Probab. Theory Related Fields 145 75–118. [Google Scholar]
  12. Nourdin I and Peccati G (2012). Normal approximations with Malliavin calculus: from Stein’s method to universality. Vol. 192 Cambridge University Press. [Google Scholar]
  13. Peköz E and Röllin A (2011). New rates for exponential approximation and the theorems of Rényi and Yaglom. Ann. Probab. 39, 587–608. [Google Scholar]
  14. Peköz E, Röllin A and Ross N (2013). Degree asymptotics with rates for preferential attachment random graphs. Ann. Appl. Probab. 23, 1188–1218. [Google Scholar]
  15. Peköz E, Röllin A and Ross N (2016). Generalized gamma approximation with rates for urns, walks and trees. Ann. Probab, Vol. 44, No. 3, pp. 1776–1816. [Google Scholar]
  16. Ross N (2011). Fundamentals of Stein’s method. Probab. Surv. 8, 210–293. [Google Scholar]
  17. Stein C (1986). Approximate Computation of Expectations. Institute of Mathematical Statistics Lecture Notes–Monograph Series, 7. Institute of Mathematical Statistics, Hayward, CA. [Google Scholar]

RESOURCES