Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Jan 21.
Published in final edited form as: Inf inference. 2020 Aug 13;10(4):1287–1351. doi: 10.1093/imaiai/iaaa016

Wavelet invariants for statistically robust multi-reference alignment

Matthew Hirn 1, Anna Little 2,
PMCID: PMC8782248  NIHMSID: NIHMS1726636  PMID: 35070296

Abstract

We propose a nonlinear, wavelet-based signal representation that is translation invariant and robust to both additive noise and random dilations. Motivated by the multi-reference alignment problem and generalizations thereof, we analyze the statistical properties of this representation given a large number of independent corruptions of a target signal. We prove the nonlinear wavelet-based representation uniquely defines the power spectrum but allows for an unbiasing procedure that cannot be directly applied to the power spectrum. After unbiasing the representation to remove the effects of the additive noise and random dilations, we recover an approximation of the power spectrum by solving a convex optimization problem, and thus reduce to a phase retrieval problem. Extensive numerical experiments demonstrate the statistical robustness of this approximation procedure.

Keywords: multi-reference alignment, method of invariants, wavelets, signal processing, wavelet scattering transform

1. Introduction

The goal in classic multi-reference alignment (MRA) is to recover a hidden signal f: from a collection of noisy measurements. Specifically, the following data model is assumed.

Model 1 (Classic MRA).

The classic MRA data model consists of M independent observations of a compactly supported, real-valued signal fL2():

yj(x)=f(xtj)+εj(x),1jM, (1.1)

where:

  1. supp(yj)[12,12] for 1 ⩽ jM.

  2. {tj}j=1M are independent samples of a random variable t.

  3. {εj(x)}j=1M are independent white noise processes on [12,12], with variance σ2.

The signal is thus subjected to both random translation and additive noise. The MRA problem arises in numerous applications, including structural biology [32,64,65,70,71,79], single cell genomic sequencing [51], radar [43,85], crystalline simulations [76], image registration [18,40,69] and signal processing [85]. It is a simplified model relevant for cryo-electron microscopy (cryo-EM), an imaging technique for molecules that achieves near atomic resolution [11,14,75]. In this application one seeks to recover a three-dimensional reconstruction of the molecule from many noisy two-dimensional images/projections [41]. Although MRA ignores the tomographic projection of cryo-EM, investigation of the simplified model provides important insights. For example, [5,66] investigate the optimal sample complexity for MRA and demonstrate that M = Θ(σ6) is required to fully recover f in the low signal-to-noise regime when the translation distribution is periodic; this optimal sample complexity is the same for cryo-EM [7,82]. Recent work has established an improved sample complexity of M = Θ(σ4) for MRA when the translation distribution is aperiodic [1], and this rate has been shown to also hold in the more complicated setting of cryo-EM, if the viewing angles are non-uniformly distributed [72]. Problems closely related to Model 1 include the heterogenous MRA problem, where the unknown signal f is replaced with a template of k unknown signals f1, . . . , fk [16,54,66,77], as well as multi-reference factor analysis, where the underlying (random) signal follows a low-rank factor model and one seeks to recover its covariance matrix [50].

Approaches for solving MRA generally fall into two categories: synchronization methods and methods that estimate the signal directly, i.e. without estimating nuisance parameters. Synchronization methods attempt to recover the signal by aligning the translations and then averaging. They include methods based on angular synchronization [8,15,24,67,73,84], where for each pair of signals the best pairwise shift is computed and then the translations are estimated from this pairwise information [6], and semi-definite programming [4,9,10,25], which approximates the quasi-maximum likelihood estimator of the shifts by relaxing a non-convex rank constraint. However, these methods fail in the low signal-to-noise regime. Methods that estimate the signal directly include both the method of moments [44,48,72] and expectation maximization, or EM-type, algorithms [1,30]; a number of EM-type algorithms have also been developed for the more complicated cryo-EM problem [33,68]. An important special case of the method of moments is the method of invariants, which seeks to recover f by computing translation invariant features, and thus avoids aligning the translations. However, the task is a difficult one, as a complete representation is needed to recover the signal, and yet the representation may be difficult to invert and corrupted by statistical bias. Generally, the signal is recovered from translation invariant moments, which are estimated in the Fourier domain [29,44]. Recent work [5,14] utilizes such Fourier invariants (mean, power spectrum and bispectrum) and recovers f^ by solving a non-convex optimization problem on the manifold of phases.

Classic MRA however fails to capture many of the biological phenomena arising in molecular imaging, such as the random rotations of the molecules and the tomographic projection associated with the imaging of three-dimensional objects. Another shortcoming is that the model fails to capture the dynamics that arise from flexible regions in macromolecular structures. These flexible regions are very important in structural biology, for example in understanding molecular interactions [36,39,52,53] and molecular recognition of epigenetic regulators of histone tails [17,31,58]. The large-scale dynamics of these regions makes imaging challenging [81], and thus sample preparation in cryo-EM generally seeks to minimize these dynamics by focusing on well-folded macromolecules frozen in vitreous ice [63]. However, this ‘may severely impact... the nature of the intrinsic dynamics and interactions displayed by macromolecules’ [63]. Although modern cryo-EM is making great strides in understanding flexible systems [3,37,38,59], formulating models that are more capable of capturing the motions associated with the flexible regions of macromolecules could open the door to applying cryo-EM more broadly, i.e. to less well-folded macromolecules. Mathematically, the motion of the flexible region can be modeled as a diffeomorphism. See Fig. 1, which shows a molecule with a flexible side chain (1(a)) and a diffeomorphism resulting from movement of the flexible region (1(b)). Figure 1(a) is taken from [63], and Fig. 1(b) was obtained by deforming it.

Fig. 1.

Fig. 1.

Dynamics arising from flexible regions in macromolecular structures [63].

This article thus generalizes the classic MRA problem to include a random diffeomorphism. Specifically, we consider recovering a hidden signal f: from

yj(x)=Lτjf(xtj)+εj(x),1jM,

where Lτ is a dilation operator that dilates by a factor of (1−τ). The dilation operator Lτ is a simplified model for more general diffeomorphisms Lζ f(x) = f(ζ(x)), since in the simplest case when ζ(x) is affine, Lζ simply translates and dilates f (see Section 2.1). Dilations are also relevant for the analysis of time-warped audio signals, which can arise from the Doppler effect and in speech processing and bioacoustics. For example, [6062] consider a stationary random signal f(x), which is time-warped, i.e. Dζf(x)=ζ(x)f(ζ(x)), and use a maximum likelihood approach to estimate ζ. In [27,28], a similar stochastic time warping model is analyzed using wavelet based techniques. The noisy dilation MRA model considered here corresponds to the simplest case of time-warping, when ζ is an affine function. This special case is in fact very important in imaging applications [22,23,46,57,69,80], where it is critical to compute features which are scale invariant, as objects are naturally dilated by the ‘zoom’ of an image.

A new approach is needed to solve this more general MRA problem, as Fourier invariants will fail, being unstable to the action of diffeomorphisms, including dilations. The instability occurs in the high frequencies, where even a small diffeomorphism can significantly alter the Fourier modes. We instead propose L2() wavelet coefficient norms as invariants, using a continuous wavelet transform. This approach is inspired by the invariant scattering representation of [56], which is provably stable to the actions of small diffeomorphisms. However, here we replace local averages of the modulus of the wavelet coefficients with global averages (i.e. integrations) of the modulus squared, thus providing rigid invariants that can be statistically unbiased. Similar invariant coefficients have been utilized in a number of applications including predicting molecular properties [34,35] and quantum chemical energies [45], and in microcanonical ensemble models for texture synthesis [19]. Recent work [42] has also generalized such coefficients to graphs.

1.1. Notation

The Fourier transform of a signal fL1() is

f^(ω)=f(x)eixωdx.

We remind the reader that compactly supported L2() functions are in L1(). The power spectrum is the nonlinear transform P:L2()L1() that maps f to

(Pf)(ω)=|f^(ω)|2,ω.

We denote f(x) Cg(x) for some absolute constant C by f(x) ≲ g(x). We also write f(x) = O(g(x)) if |f(x)| Cg(x) for all x x0 for some constants x0, C > 0; f(x) = o(g(x)) denotes f(x)/g(x) → 0 as x → ∞; f(x) = Θ(g(x)) denotes C1g(x) |f(x)| C2g(x) for all x x0 for some constants x0, C1, C2 > 0. The minimum of a and b is denoted ab, and the maximum by ab.

2. MRA models and the method of invariants

Standard MRA models are generalized to models that include deformations of the underlying signal in Section 2.1. Section 2.2 reviews power spectrum invariants and introduces L2() wavelet coefficient invariants. Theorem 2.4 proves wavelet coefficient invariants computed with a continuous wavelet transform and a suitable mother wavelet are equivalent to the power spectrum, showing there is no information loss in the transition from one representation to the other.

2.1. MRA data models

A standard MRA scenario considers the problem of recovering a signal fL2() in which one observes random translations of the signal, each of which is corrupted by additive noise. The problem is particularly difficult when the signal-to-noise ratio (SNR) is low, as registration methods become intractable. In [5,13,14,16,54,74] the authors propose a method using Fourier-based invariants, which are invariant to translations and thus eliminate the need to register signals.

A more general MRA scenario incorporates random deformations of the signal f, which could be used to model underlying physical variability that is not captured by rigid transformations and additive noise models. For example [4,7] consider a discrete signal f corrupted by an arbitrary group action, [47,85] consider random deformations arising in RADAR and [2] considers a generalization of MRA where signals are rescaled by random constants. Another natural mathematical model is small, random diffeomorphisms, which leads to observations of the form

yj(x)=Lζjf(xtj)+εj(x),1jM, (2.1)

where ζjC1() is a random diffeomorphism, tj is a random translation and the signals εj(x) are independent white noise random processes. The transform Lζ is the action of the diffeomorphism ζ on f,

Lζf(x)=f(ζ(x)).

If (ζ1)<, then one can verify Lζ:L2()L2().

One of the keys to the Fourier invariant approach of [5,13,14,16,54,74] is the authors can unbias the Fourier invariants of the noisy signals, thus allowing them to devise an unbiased estimator of the Fourier invariants of the signal f (or a mixture of signals in the heterogeneous MRA case). For the diffeomorphism model (2.1) this would require developing a procedure for unbiasing the (Fourier) invariants of {yj}j=1M against both additive noise and random diffeomorphisms.

In order to get a handle on the difficulties associated with the proposed diffeomorphism model, in this paper we consider random dilations of the signal f, which corresponds to restricting the diffeomorphism to be of the form

ζ(x)=x1τ,|τ|1/2.

Specifically, we assume the following noisy dilation MRA model.

Model 2 (Noisy dilation MRA data model).

The noisy dilation MRA data model consists of M independent observations of a compactly supported, real-valued signal fL2():

yj(x)=Lτjf(xtj)+εj(x),1jM, (2.2)

where Lτ is an L1() normalized dilation operator,

Lτf(x)=(1τ)1f((1τ)1x).

In addition, we assume the following:

  1. supp(yj)[12,12] for 1 ⩽ jM.

  2. {tj}j=1M are independent samples of a random variable t.

  3. {τj}j=1M are independent samples of a bounded, symmetric random variable τ satisfying
    τ,E(τ)=0,Var(τ)=η2,|τ|1/2.
  4. {εj(x)}j=1M are independent white noise processes on [12,12] with variance σ2.

Remark 2.1

The interval [12,12] is arbitrary and can be replaced with any interval of length 1. In addition, the spatial box size is arbitrary, i.e. [12,12], can be replaced with [N2,N2]. All results still hold with σN replacing σ wherever it appears.

Thus, the hidden signal f is supported on an interval of length 1, and we observe M independent instances of the signal that have been randomly translated, randomly dilated and corrupted by additive white noise. We assume the hidden signal is real, but the proposed methods can also handle complex valued signals with minor modifications. Recall ε(x) is a white noise process if ε(x) = dBx, i.e. it is the derivative of a Brownian motion with variance σ2.

While the noisy dilation MRA model does not capture the full richness of the diffeomorphism model, it already presents significant mathematical difficulties. Indeed, as we show in Section 5, Fourier invariants, specifically the power spectrum, cannot be used to form accurate estimators under the action of dilations and random additive noise. The reason is that Fourier measurements are not stable to the action of small dilations (measured here by |τ|), since the displacement of Lτf^(ω) relative to f^(ω) depends on |ω|. Intuitively, high-frequency modes are unstable, and yet high frequencies are often critical; for example removing high frequencies increases the sample complexity needed to distinguish between signals in a heterogeneous MRA model [5]. We thus replace Fourier-based invariants with wavelet coefficient invariants, which are defined in Section 2.2. As we show the wavelet invariants of the signal f can be accurately estimated from wavelet invariants of the noisy signals {yj}j=1M, with no information loss relative to the power spectrum of f.

For future reference we also define the following dilation MRA model, which includes random translations and random dilations but no additive noise. Thus, Models 1 and 3 are both special cases of Model 2.

Model 3 (Dilation MRA data model).

The dilation MRA data model consists of M independent observations of a compactly supported, real-valued signal fL2():

yj(x)=Lτjf(xtj),1jM, (2.3)

where Lτ is an L1() normalized dilation operator,

Lτf(x)=(1τ)1f((1τ)1x).

In addition, we assume (i)–(iii) of Model 2.

2.2. Method of invariants

We now discuss how invariant representations can be used to solve MRA data models and introduce the wavelet invariants used in this article.

2.2.1. Motivation and related work

Let Ttf(x) = f(xt) denote the operator that translates by t acting on a signal f. Invariant measurement models seek a representation Φ(f)B in a Banach space B such that

Φ(Ttf)=Φ(f),t. (2.4)

In MRA problems, one additionally requires that

Φ(f)=Φ(g)g=Ttfforsomet. (2.5)

The first condition (2.4) removes the need to align random translations of the signal f, whereas the second condition (2.5) ensures that if one can estimate Φ(f) from the collection {Φ(yj)}j=1M, then one can recover an estimate of f (up to translation) by solving

f=arginfgL1L2()Φ(g)Φ(f)B, (2.6)

where B is the Banach space norm.

When the observed signals {yj}j=1M are corrupted by more than just a random translation, though, as in Model 2, estimating Φ(f) from {Φ(yj)}j=1M is not always straightforward. Indeed, one would like to compute

Φ¯M(f)=1Mj=1MΦ(yj), (2.7)

but the quantity Φ¯M(f) is not always an unbiased estimator of Φ(f), meaning that limMΦ¯M(f)Φ(f). In order to circumvent this issue, one must select a representation Φ such that

EΦ(yj)=Φ(f)+bΦ(f,M), (2.8)

where bΦ(f,M) is a bias term depending on the choice of Φ, f, and the signal corruption model M. If (2.8) holds and if we can compute a b˜ such that Eb˜Φ(yj,M)=bΦ(f,M)+δ for |bΦ(f,M)||δ|, then one can amend (2.7) to reduce the bias

Φ˜M(f)=1Mj=1M(Φ(yj)b˜Φ(yj,M)),

in which case

limMΦ˜M(f)=Φ(f)+δ

almost surely by the law of large numbers. The main difficulty therefore is twofold. On the one hand, one must design a representation Φ that satisfies (2.4), (2.5) and (2.8) with a bias b that can be estimated; on the other hand, the optimization (2.6) must be tractable. For random translation plus additive noise models (i.e., Model 1), the authors of [5,14] describe a representation Φ based on Fourier invariants that satisfies the outlined requirements and for which one can solve (2.6) despite the optimization being non-convex. The Fourier invariants include f^(0) (i.e. the integral of f), the power spectrum of f and the bispectrum of f. Each invariant captures successively more information in f. While f^(0) carries limited information, the power spectrum recovers the magnitude of the Fourier transform, namely it recovers the non-negative, real-valued function ρ(ω) such that f^(ω)=ρ(ω)eiθ(ω), but the phase information Θ(ω) is lost. Since Ttf^(ω)=eiωtf^(ω), the power spectrum is invariant to translations as the Fourier modulus kills the phase factor induced by a translation t of f. However, it is in general not possible to recover a signal from its power spectrum, although in certain special cases the phase information can be resolved; results along these lines are in the field of phase retrieval [26,78]. The bispectrum is also translation invariant and invertible so long as f^(ω)0 [66].

In Section 5 we show that it is impossible to significantly reduce the power spectrum bias for Model 2, which includes translations, dilations and additive noise. We thus propose replacing the power spectrum with the L2() norms of the wavelet coefficients of the signal f. These invariants satisfy (2.4) and (2.8) for Model 2 and yield a convex formulation of (2.6). They do not satisfy (2.5) for general fL2(), but Theorem 2.4 in Section 2.2.2 shows that knowing the wavelet invariants of f is equivalent to knowing the power spectrum of f, which means that any phase retrieval setting in which recovery is possible will also be possible with the specified wavelet invariants. For example if the signal lives in a spline or shift invariant space in addition to being realvalued, then it can be recovered from its phaseless measurements [26,78].

2.2.2. Wavelet invariants

We now define the wavelet invariants used in this article. A wavelet ψL2() is a waveform that is localized in both space and frequency and has zero average,

ψ(x)dx=0.

Note throughout this article ψ will always denote a wavelet in L1L2() with zero average, satisfying ‖ψ2 = 1 as well as the classic admissability condition |ψ^(ω)|2ωdω<. A dilation of the wavelet by a factor λ ∈ (0,∞) is denoted,

ψλ(x)=λ1/2ψ(λx),

where the normalization guarantees that ‖ψλ2 = ‖ψ2 = 1. The continuous wavelet transform W computes

Wf={fψλ(x):λ(0,),x}.

The parameter λ corresponds to a frequency variable. Indeed, if ξ0 is the central frequency of ψ, the wavelet coefficients fψλ recover the frequencies of f in a band of size proportional to λ centered at λξ0. Thus, high frequencies are grouped into larger packets, which we shall use to obtain a stable, invariant representation of f.

The wavelet transform Wf is equivariant to translations but not invariant. Integrating the wavelet coefficients over x yields translation invariant coefficients, but they are trivial since ψλ=0. We therefore compute L2() norms in the x variable, yielding the following nonlinear wavelet invariants:

Definition 2.1 (Wavelet invariants).

The L2 wavelet invariants of a real-valued signal fL1L2() are given by

(Sf)(λ)=f*ψλ22,λ(0,), (2.9)

where ψλ(x) = λ1/2ψ(λx) are dilations of a mother wavelet ψ.

Throughout this article ψ can be taken as a Morlet wavelet, in which case ψ is constructed to have frequency centered at ξ by ψ(x)=Cξπ1/4ex2/2(eiξxeξ2/2) for Cξ=(1eξ22e3ξ2/4)1/2, but results hold more generally for what we refer to as k-admissible wavelets, where k ⩾ 0 is an even integer. See Appendix A for a precise description of this admissibility criteria. The wavelet invariants can be expressed in the frequency domain as

(Sf)(λ)=12π|f^(ω)|2|ψ^λ(ω)|2dω,

which motivates the following definition of ‘wavelet invariant derivatives’.

Definition 2.2 (Wavelet invariant derivatives).

The n-th derivative of (Sf)(λ) is defined as

(Sf)(n)(λ):=12π|f^(ω)|2dndλn|ψ^λ(ω)|2dω.
Remark 2.2

Definition 2.1 assumes f:, which allows the wavelet ψ to be either real or complex. Our results can easily be extended to complex f, but a strictly complex wavelet would be needed, with Sf(λ) computed for all λ ∈ (−∞, ∞) \ 0.

Remark 2.3

For a discrete signal of length n, computing the wavelet invariants via a continuous wavelet transform is O(n2), while computing the power spectrum is O(n log n). Thus, one pays a computational cost to achieve greater stability with no loss of information. On the other hand, if wavelet invariants are computed for a dyadic wavelet transform (i.e. only for O(log n) λ’s), the computational cost is the same and stability is maintained, but more information is lost.

Remark 2.4

When (Pf)(ω)=|f^(ω)|2 is continuous, Definition 2.2 reduces to a normal derivative, i.e. one can check that (Sf)(n)(λ)=dndλn(Sf)(λ). However, when Pf is not continuous, in general (Sf)(n)(λ)dndλn(Sf)(λ) and (Sf)(n)(λ) is more convenient for controlling the error of the estimators proposed in this article. Throughout this article, the notation (Sf)(n)(λ) will thus denote the derivative of Definition 2.2 and dndλn(Sf)(λ) will denote the standard derivative.

Under mild conditions, one can show that S:L2()L1C(0,). The values λ = 2j for j correspond to rigid versions of first-order L2() wavelet scattering invariants [56]. The continuous wavelet transform Wf is extremely redundant; indeed, for suitably chosen mother wavelets, the dyadic wavelet transform with λ = 2j for j is a complete representation of f. However, the corresponding operator S restricted to λ = 2j is not invertible. When one utilizes every frequency λ ∈ (0,∞), though, the resulting L2() norms (Sf)(λ)=fψλ22 uniquely determine the power spectrum of f, so long as the wavelet ψ satisfies a type of independence condition.

Condition 2.3

Define

|ψ^λ+(ω)|2=(|ψ^λ(ω)|2+|ψ^λ(ω)|2)1(ω0).

If for any finite sequence {ωi}i=1n of distinct positive frequencies, the collection {|ψ^λ+(ωi)|2}i=1n is linearly independent functions of λ, we say the wavelet ψ satisfies the linear independence condition.

Remark 2.5

Condition 2.3 is stated in terms of |ψ^λ+(ω)|2 to avoid assumptions on whether ψ is real or complex. When ψ(x), |ψ^λ+(ω)|2=2|ψ^λ(ω)|2 for ω ⩾ 0. When ψ is complex analytic, |ψ^λ+(ω)|2=|ψ^λ(ω)|2. When ψ but not complex analytic, |ψ^λ+(ω)|2 simply incorporates a reflection of |ψ^λ(ω)|2 about the origin. Since we assume f(x),|ψ^λ+(ω)|2 uniquely defines (Sf)(λ), since (Sf)(λ)=12π|f^|2,|ψ^λ+|2 by the Plancherel and Fourier convolution theorems.

Theorem 2.4

Let fL1L2() and assume ψ satisfies Condition 2.3 and ψ^ has compact support. Then,

Sf=SgPf=Pg.
Proof.

First assume Pf = Pg, which means |f^(ω)|2=|g^(ω)|2 for almost every ω. Using the Plancheral and Fourier convolution theorems,

(Sf)(λ)=|f*ψλ(x)|2dx=12π|f^(ω)|2|ψ^λ(ω)|2dω=12π|g^(ω)|2|ψ^λ(ω)|2dω=(Sg)(λ),λ(0,).

Now suppose Sf = Sg. Since Sf and Sg are continuous in λ, we have

0=(Sf)(λ)(Sg)(λ)=12π(|f^(ω)|2|g^(ω)|2)|ψ^λ(ω)|2dω,λ(0,).

Since fL1L2() we have f^L2L() and thus PfL1L(). By interpolation we have PfL2(), and the same for Pg. By applying Lemma 2.1 (stated below) with p(ω) = (Pf)(ω) − (Pg)(ω) (note p is continuous since f,gL1()), we conclude Pf = Pg for almost every ω. □

Lemma 2.1

Let pL2() be continuous and assume p(ω) = p(−ω), ψ^ has compact support and Condition 2.3. Then,

p(ω)|ψ^λ(ω)|2dω=0λ>0p=0a.e.

The proof of Lemma 2.1 is in Appendix C. We remark that many wavelets satisfy Condition 2.3 and have compactly supported Fourier transform, so Theorem 2.4 is broadly applicable. For example, Proposition 2.1 below proves that any complex analytic wavelet with compactly supported Fourier transform satisfies Condition 2.3. Morlet wavelets satisfy Condition 2.3 (see Lemma C.1 in Appendix C) but do not have compactly supported Fourier transform; however, ψ^ does have fast decay for a Morlet wavelet and numerically we observe no issues. We also note, the assumption that ψ^ has compact support in Theorem 2.4 can be removed if f, g are bandlimited. The following Proposition, proved in Appendix C, gives some sufficient conditions guaranteeing Condition 2.3.

Proposition 2.1

The following are sufficient to guarantee Condition 2.3:

  1. |ψ^(ω)|2 has a compact support contained in the interval [a, b], where a and b have the same sign, e.g. complex analytic wavelets with compactly supported Fourier transform.

  2. |ψ^(ω)|2C() and there exists an N such that all derivatives of order at least N are non-zero at ω = 0, e.g. the Morlet wavelet.

Remark 2.6

In practice, Pf, Sf are implemented as discrete vectors, and Sf is obtained from Pf via matrix multiplication, i.e. Sf = F(Pf) for some real matrix F with FTF strictly positive definite. Thus, PfPg2σmin1SfSg2, where σmin > 0 is the smallest singular value of the matrix F, and the spectral decay of F, which can be explicitly computed, thus determines the stability of the representation. The smoother the wavelet, the more rapidly the spectrum decays, since when Cp, FTF is defined by a Cp kernel and thus has eigenvalues that decay like o(1/np+1) [20]. There is thus a tradeoff between smoothness and stability. In this article we choose smoothness over stability, since smoothness is required for unbiasing noisy dilation MRA, and in our experiments the Morlet wavelet yielded the best results. We therefore invert the representation by solving an optimization problem that is initialized to be close to the desired solution (see Section 6.5), and we avoid computing the pseudo-inverse of F, which is unstable for our smooth wavelet.

3. Unbiasing for classic MRA

In this section we consider the classic MRA model (Model 1). We discuss unbiasing results for both the power spectrum and wavelet invariants, as well as simulation results comparing the two methods. In the following proposition we establish unbiasing results for the power spectrum by rederiving some results from [14], extended to the continuum setting. The proposition is proved in Appendix D.

Proposition 3.1

Assume Model 1. Define the following estimator of (Pf)(ω):

(Pf˜)(ω):=1Mj=1M(Pyj)(ω)σ2.

Then with probability at least 1 − 1/t2,

|(Pf)(ω)(Pf˜)(ω)|2tσM(f1+σ). (3.1)

We obtain an identical result for wavelet invariants (Proposition 3.2) when signals are corrupted by additive noise only. See Appendix D for the proof.

Proposition 3.2

Assume Model 1. Define the following estimator of (Sf)(λ):

(Sf˜)(λ):=1Mj=1M(Syj)(λ)σ2.

Then with probability at least 1 − 1/t2,

|(Sf)(λ)(Sf˜)(λ)|2tσM(f1+σ). (3.2)

As M → ∞, the error of both the power spectrum and wavelet invariant estimators decays to zero at the same rate, and one can perfectly unbias both representations. As demonstrated in Section 5, this is not possible for noisy dilation MRA (Model 2), as there is a non-vanishing bias term. However, a nonlinear unbiasing procedure on the wavelet invariants can significantly reduce the bias.

We illustrate and compare additive noise unbiasing for power spectrum estimation using (Pf˜), the power spectrum method of Proposition 3.1 and (Sf˜), the wavelet invariant method of Proposition 3.2. To approximate (Pf) from the wavelet invariants (Sf˜), we apply the convex optimization algorithm described in Section 6.5 to obtain (PSf˜), the power spectrum approximation that best matches the wavelet invariants (Sf˜). Thus, throughout this article, (PSf˜) denotes a power spectrum estimator obtained by first unbiasing wavelet invariants and then running an optimization procedure, while (Pf˜) denotes an estimator computed by directly unbiasing the power spectrum. Our simulations compare the L2 error of both of these estimators, i.e. we compare PfPf˜2 and PfPSf˜2.

Figure 2(a) shows the uncorrupted power spectrum (red curve) of a medium frequency Gabor function (f(x)=e5x2cos(16x)), and the power spectrum after the signal is corrupted by additive noise with level σ = 2−3 (blue curve); the SNR of the experiment is 0.56 (see Section 6.1). Figure 2(b) shows the L2 error of the power spectrum estimation for the two methods as a function of log2(M) for a fixed SNR, and Fig. 2(c) shows the L2 error as a function of log2(σ) for a fixed M. The L2 errors for the two methods are similar; however, estimation via wavelet invariants is advantageous when the sample size M is small or the additive noise level σ is large. As M becomes very large or σ very small, the power spectrum method is preferable as the smoothing procedure of the wavelet invariants may numerically erase some extremely small scale features of the original power spectrum.

Fig. 2.

Fig. 2.

Simulation results for additive noise model for medium frequency Gabor f(x)=e5x2cos(16x).

4. Unbiasing for dilation MRA

In this section we analyze the dilation MRA model (Model 3). We thus assume the signals have been randomly translated and dilated but there is no additive noise.

In fact there is a simple algorithm to recover f under this model. Since fτj22=f22/(1τj), 1Mj=1M1/fτj22 is an unbiased estimator of 1/f22, and so f22 can be accurately approximated. Once is recovered, one can take any signal yj and dilate it so that yj22=f22, and the result will be an accurate approximation of the hidden signal f for M large. However, this approach collapses in the presence of even a small amount of additive noise. In the presence of additive noise, an alternative is to attempt a synchronization by centering each signal. The center cf of signal f can be defined in the classical way by

cf=1f22x|f(x)|2dx.

Since the signals yj(x − (cf + tj)) are perfectly aligned, one can thus attempt an alignment by defining y˜j(x)=yj(xcyj). However cyj − (cf + tj) = O(σσ2 + η), so significant errors arise in the synchronization that cannot be resolved by averaging. As our goal is ultimately to produce a method that can be extended to the noisy dilation MRA model, we abandon both the trivial solution (which cannot be extended to noisy dilation MRA) and the synchronization approach (which produces large errors) and explore a method based on empirical averages.

We first observe that random dilations cause 1Mj=1M(Pyj)(ω) and 1Mj=1M(Syj)(λ) to be biased estimators of (Pf)(ω) and (Sf)(λ), and the bias for both is O(η2), where η2 is the variance of the dilation distribution. However, if the moments of the dilation distribution are known and Pf, Sf are sufficiently smooth, one can apply an unbiasing procedure to the above estimators so that the resulting bias is O(ηk+2), where k ⩾ 2 is an even integer.

Throughout this section, we assume k ⩾ 2 is an even integer, and define the constants Ci from the first k/2 even moments of τ by E[τi]=Ciηi for i = 2, 4, . . . , k. Note since we assume E[τ2]=η2, C2 = 1. We define the constants B2, B4, . . . , Bk by solving

Cii!B2Ci2(i2)!Bi2C22!Bi=0 (4.1)

for i = 2, 4, . . . , k; these constants are deterministic functions of the moments of τ. A non-recursive formula related to the Euler numbers can be derived, which defines Bi explicitly in terms of C2, . . . , Ci; however, the recursive formula (4.1) is easier to implement numerically.

We introduce two additional moment-based constants that are defined by the Ci, Bi constants:

T:=maxi=0,2,Ci1i (4.2)
E:=maxi=0,2,,kmaxj=0,,k+2i(Tjj!|Bi|)1i+j, (4.3)

where C0, |B0| = 1, and when i = j = 0 in (4.3), (Tjj!|Bi|)1i+j is replaced with 1.

Remark 4.1

Since the distribution of τ is bounded, we are guaranteed that T < ∞, and in general can consider both T and E to be O(1) constants. For example for the uniform distribution, T3 and |Bi||Euler(i)|i!1, which gives E3.

We utilize the following two lemmas, which are proved in Appendix E, to derive results for both the power spectrum and wavelet invariants.

Lemma 4.1

Let Fλ(τ) = L((1 − τ)λ) for some function LCk+2(0, ∞) and a random variable τ satisfying the assumptions of Section 2.1, and let k ⩾ 2 be an even integer. Assume there exist functions Λi:, R: such that

|λiL(i)(λ)|Λi(λ)for0ik+2,Λk+2((1τ)λ)Λk+2(λ)R(λ),

and define the following estimator of L(λ):

Gλ(τ):=Fλ(τ)B2η2Fλ(τ)B4η4Fλ(4)(τ)BkηkFλ(k)(τ).

Then Gλ(τ) satisfies

|EGλ(τ)L(λ)|kR(λ)Λk+2(λ)(2Eη)k+2
VarGλ(τ)k2R(λ)2Λ(λ)2

where

Λ(λ)2:=0i,jk+2,i+j2Λi(λ)Λj(λ)(2Eη)i+j

and E is the absolute constant defined in (4.3).

Lemma 4.2

Let the assumptions and notation of Lemma 4.1 hold, and let τ1, . . . , τM be independent. Define

L˜(λ):=1Mj=1MGλ(τj).

Then with probability at least 1 − 1/t2

|L˜(λ)L(λ)|kR(λ)(Λk+2(λ)(2Eη)k+2+tΛ(λ)M).

The deviation of the estimator L˜(λ) from L(λ) thus depends on two things: (1) the bias of the estimator that is O(ηk+2) and (2) the standard deviation of the estimator that is O(ηM12), since Λ(λ) = O(η).

4.1. Power spectrum results for dilation MRA

We now show how this unbiasing procedure based on both the moments of τ and the even derivatives of Py can be used to obtain an estimator of Pf.

Proposition 4.1

Assume Model 3 and PfCk+2(). Define the following estimator of (Pf)(ω):

(Pf˜)(ω):=1Mj=1M[(Pyj)(ω)B2η2ω2(Pyj)(ω)Bkηkλk(Pyj)(k)(ω)]

where the constants Bi satisfy (4.1). Let

Ωi(ω)=|ωi(Pf)i(ω)|for0ik+2,R(ω)=maxτΩk+2((1τ)ω)Ωk+2(ω).

Then for all ω ≠ 0, with probability at least 1 − 1/t2,

|(Pf˜)(ω)(Pf)(ω)|kR(ω)(Ωk+2(ω)(2Eη)k+2+tΩ(ω)M), (4.4)

where

Ω(ω)=0i,jk+2,i+j2Ωi(ω)Ωj(ω)(2Eη)i+j.
Proof.

Since Pf is a translation invariant representation, we can ignore the translation factors {tk}k=1M and consider the model yj=Lτjf. In addition since yj(x), (Pyj)(ω) = (Pyj)(−ω) and it is sufficient to consider ω ∈ (0, ∞). Proposition 4.1 then follows directly from Lemma 4.2 with λ = ω, L = Pf since (Pyj)(ω) = (Pf)((1 − τj)ω) = Fω(τj), Λi = Ωi, and Λ = Ω. □

We postpone a discussion of the shortcomings of Proposition 4.1 to Section 4.3, where we compare the power spectrum and wavelet invariant results for dilation MRA.

4.2. Wavelet invariant results for dilation MRA

We now apply the same unbiasing procedure to the wavelet invariants. Unlike for the power spectrum, where the error may depend on the frequency ω (see (4.4) and Section 4.3), the wavelet invariant error can be uniformly bounded independently of λ with high probability. The following two Lemmas establish bounds on the derivatives of (Sf)(λ) and are needed to prove Proposition 4.2; they are proved in Appendix B.

Lemma 4.3 (Low-frequency bound).

Assume PψCm() and fL1(). Then the quantity |λm(Sf)(m)(λ)| can be bounded uniformly over all λ. Specifically:

|λm(Sf)(m)(λ)|Ψmf12

for Ψm defined in (A1).

Lemma 4.4 (High-frequency bound for differentiable functions).

Assume PψCm(), and fL1(). Then the quantity |λm(Sf)(m)(λ)| can be bounded by

|λm(Sf)(m)(λ)|Θmλ2f12

for Θm defined in (A2).

When ψ is a Morlet wavelet or more generally when ψ is (k + 2)-admissible as described in Appendix A, these lemmas allow one to bound the error of the order k wavelet invariant estimator for dilation MRA in terms of the following quantities:

Λi(λ)=Ψif12Θiλ2f12,Λ(λ)2=0i,jk+2,i+j2Λi(λ)Λj(λ)(2Eη)i+j, (4.5)

where Ψi, Θi are defined in (A1), (A2) and E is defined in (4.3).

Proposition 4.2

Assume Model 3, the notation in (4.5), and that ψ is (k + 2)-admissible. Define the following estimator of (Sf)(λ):

(Sf˜)(λ):=1Mj=1M[(Syj)(λ)B2η2λ2(Syj)(λ)Bkηkλk(Syj)(k)(λ)]

where the constants Bi satisfy (4.1). Then with probability at least 1 − 1/t2,

|(Sf˜)(λ)(Sf)(λ)|k(Λk+2(λ)(2Eη)k+2+tΛ(λ)M).
Proof.

Since Sf is a translation invariant representation, we can ignore the translation factors {tk}k=1M and consider the model yj=Lτjf. Since ψ is k + 2-admissible, ψ^Ck+2(), which guarantees(Sf)(λ) ∈ Ck+2(0, ∞). We note that since fL1(), Pf is continuous, and the Leibniz integral rule guarantees that (Sf)(n)(λ)=dndλn(Sf)(λ) for 1 ⩽ nk + 2. By applying Lemma 4.3, we have |λi(Sf)(i)(λ)|Ψif12 for all 0 ⩽ ik + 2, so that Lemma 4.2 holds for L(λ) = (Sf)(λ), Λi(λ)=Ψif12, and R(λ) = 1. Now by applying Lemma 4.4, we have |λi(Sf)(i)(λ)|Θif12/λ2 for all 0 ⩽ ik + 2, so that Lemma 4.2 also holds for L(λ) = (Sf)(λ), Λi(λ)=Θif12/λ2, and R(λ) = 4 (note since |τ|12, Λk+2((1τ)λ)/Λk+2(λ)4). Thus, Lemma 4.2 in fact holds with Λi(λ)=(Ψif12Θiλ2f12); since (Syj)(λ) = (Sf)((1 − τj)λ) = Fλ(τj), we obtain Proposition 4.2

Since Λi(λ)Ψif12, Proposition 4.2 guarantees that the error can be uniformly bounded independent of λ. In addition, if the signal is smooth, the error for high-frequency λ will have the favorable scaling λ−2. An important question in practice is how to choose k, i.e. what order wavelet invariant estimator minimizes the bias. Consider for example when fL1(), and Λk+2(λ)=Ψk+2f12. By using a second-order estimator, we can decrease the bias from O(η2) to O(η4), and we can further decrease the bias to O(η6) by choosing k = 4. However, Ψk increases very rapidly in k. Indeed, as can be seen from (A1), Ψk increases like k!. Thus, one possible heuristic (assuming η is known) is to choose k=k˜ where k˜ minimizes the bias upper bound k+2(2)k+2. Since Ψk increases factorially, Ψk ∼ (Ck)k for some constant C, and k˜+2 will be inversely proportional to η, that is (k˜+2)~η1. The following corollary of Proposition 4.2 then holds for any kk˜.

Corollary 4.1

Under the assumptions of Proposition 4.2, if Ψi(2)i is decreasing for ik + 2, then with probability at least 1 − 1/t2:

|(Sf˜)(λ)(Sf)(λ)|f12(kΨk+2(2Eη)k+2+tk2ηM). (4.6)

Similarly, if Θi(2)i is decreasing for ik + 2, then with probability at least 1 − 1/t2:

|(Sf˜)(λ)(Sf)(λ)|f12λ2(kΘk+2(2Eη)k+2+tk2ηM). (4.7)

Remark 4.2

We observe that for a discrete lattice I of λ values, we can define the discrete 1-norm by gL1(I)=λI|g(λ)|Δλ. Assume the lattice has cardinality n, and that Ψi(2)i, Θi(2)i are decreasing for ik + 2. Applying Proposition 4.2 with t=ns and a union bound over the lattice gives

Sf˜SfL1(I)k(f12Ψk+2+f12Θk+2)(2Eη)k+2+snk2ηM(f12+f12)

with probability at least 1 − 1/s2. When nM, which is the context for MRA, the 1-norm of the error is O(ηk+2) as M → ∞.

4.3. Comparison

Although Propositions 4.2 and 4.1 at first glance appear quite similar, the wavelet invariant method has several important advantages over the power spectrum method, which we enumerate in the following remarks.

Remark 4.3

Proposition 4.2 (wavelet invariants) applies to any signal satisfying fL1(), but Proposition 4.1 requires PfCk+2(). Thus, as k is increased, the power spectrum results apply to an increasingly restrictive function class. Furthermore, as discussed in Section 5, if the signal contains any additive noise, Pyj is not even C1, which means the unbiasing procedure of Proposition 4.1 cannot be applied. On the other hand, by choosing PψC(), Sf will inherit the smoothness of the wavelet, and the wavelet invariant results will hold for any fL1() and any k.

Remark 4.4

Since (Pfτ)(ξ) = (Pf)((1 − τ)ξ), dilation will transport the frequency content at ξ to (1 − τ)ξ, so that the displacement is τξ. Thus, when ξ is very large, |(Pf)(ξ) − (Pfτ)(ξ)| can be large even for τ small. Because the wavelet invariants bin the frequency content, and these bins become increasingly large in the high frequencies, this does not occur for wavelet invariants. More specifically, there is always a signal f and frequency ξ for which |(Pf)(ξ)(Pf˜)(ξ)| is large regardless of k. Consider for example when (Pf)(ω)=e(ωξ)2. Then Ωk(ξ) ∼ ξk, and |(Pf)(ξ)(Pf˜)(ξ)|1. However, for M large enough, the order k wavelet invariant estimator satisfies |(Sf)(λ)(Sf˜)(λ)|=O(kΨk+2ηk+2) for all λ. The wavelet invariants are thus stable for high-frequency signals, where the power spectrum fails.

Remark 4.5

For the wavelet invariants there will be a unique k˜ that minimizes k+2(2)k+2, and k˜ does not depend on λ. Furthermore, k˜ can be explicitly computed given the wavelet ψ and moment constant E. On the other hand, the minimum of k+2(ω)(2)k+2 with respect to k will depend on both the frequency ω and the signal f, so that k˜=k˜(ω,f), and it becomes unclear how to choose the unbiasing order.

4.4. Simulation results for dilation MRA

We first illustrate the unbiasing procedure of Propositions 4.1 and 4.2 for the high-frequency signal f(x)=e5x2cos(32x). Figure 3 shows the power spectrum estimator Pf˜ and the wavelet invariant estimator PSf˜ for k = 0, 2, 4 for both small and large dilations, where PSf˜ denotes the combined wavelet invariant unbiasing plus optimization procedure (see Section 6.5). Higher order unbiasing is beneficial for both methods for small dilations but fails for the power spectrum for large dilations. Both methods will of course fail for η large enough, but for high-frequency signals the power spectrum fails much sooner.

Fig. 3.

Fig. 3.

Order k = 0, 2, 4 power spectrum estimators Pf˜ (first two figures) and wavelet invariant estimators PSf˜ (last two figures) for the signal f3(x)=e5x2cos(32x). Figures 3(a) and 3(c) show small dilations and Figs 3(b) and 3(d) show large dilations.

Next we compare PfPf˜2 and PfPSf˜2, the L2 error of estimating the power spectrum of the target signal via the power spectrum estimators of Proposition 4.1 and via the wavelet invariant estimators of Proposition 4.2, followed by a convex optimization procedure. We consider order k = 0, 2, 4 estimators for both the power spectrum and wavelet invariants on the following Gabor atoms of increasing frequency:

f1(x)=e5x2cos(8x)
f2(x)=e5x2cos(16x)
f3(x)=e5x2cos(32x).

These functions satisfy f = Real(h) where (Ph)(ω)=(π/5)e(ωξ)2/10 for ξ = 8, 16, 32, and thus exhibit the behavior described in Remark 4.4.

Simulation results are shown in Fig. 4; the horizontal axis shows log2(M) while the vertical axis shows log2(Error). For each value of M, the error was calculated for 10 independent simulations and then averaged. The unbiasing procedure of Propositions 4.1 and 4.2 requires knowledge of the moments of the dilation distribution, but in practice these are unknown. Thus, the first two even moments of the dilation distribution (η2,C4η4) were estimated empirically with the fourth-order estimators described in Section 6.3 (see Definition 6.1). For the low-frequency signal, the fourth-order power spectrum estimator was best for both small and large dilations and is preferable due to the lower computational cost (see Remark 2.3). For the high-frequency signal, the fourth-order wavelet invariant estimator was best for large dilations and WSC k = 2 and k = 4 were best and equivalent for small dilations. For the medium-frequency signal, the higher order power spectrum estimators were best for small dilations while the higher order wavelet invariant estimators were best for large dilations. Thus, the simulation results confirm that the wavelet invariants will have an advantage over Fourier invariants when the signals are either high frequency or corrupted by large dilations. We remark that one obtains nearly identical error plots with oracle knowledge of the dilation moments, indicating that the empirical moment estimation procedure is highly accurate in the absence of additive noise, even for small M values.

Fig. 4.

Fig. 4.

L2 error with standard error bars for dilation model (empirical moment estimation). Top row shows results for small dilations (η = 0.06) and bottom row shows results for large dilations (η = 0.12). First, second, third column shows results for low, medium, high frequency Gabor signals. All plots have the same axis limits.

5. Noisy dilation MRA model

Finally, we consider the noisy dilation MRA model (Model 2) where signals are randomly translated and dilated and corrupted by additive noise. Section 5.1 gives unbiasing results for wavelet invariants and Section 5.2 reports relevant simulations.

5.1. Wavelet inariant results for noisy dilation MRA

To state Proposition 5.1 as succinctly as possible, we also define the following quantity

Ψ:=m=0,2,,kΨm(Eη)m, (5.1)

where E is defined in (4.3) and Ψm is defined in (A1).

Proposition 5.1

Assume Model 2 and that ψ is (k + 2)-admissible. Define the following estimator of (Sf)(λ):

(Sf˜)(λ):=1Mj=1M[(Syj)(λ)B2η2λ2(Syj)(λ)Bkηkλk(Syj)(k)(λ)]σ2

where the constants Bi satisfy (4.1). Then with probability at least 1 − 1/t2

|(Sf˜)(λ)(Sf)(λ)|kΛk+2(λ)(2Eη)k+2+tM[kΛ(λ)+Ψσ2+Ψ(Λ0(λ)+Λ(λ))σ], (5.2)

where E, Λ(λ), Ψ are as defined in (4.3), (4.5), (5.1).

The following corollary is an immediate consequence of Proposition 5.1.

Corollary 5.1

Let the assumptions of Proposition 5.1 hold, and in addition assume Ψi(2)i is decreasing for ik + 2. Then with probability at least 1 − 1/t2

|(Sf˜)(λ)(Sf)(λ)|kΨk+2(2Eη)k+2f12+tkM[kηf12+σf1+σ2]. (5.3)

We remark that there are two components to the estimation error bounded by the right-hand side of (5.3): the first two terms are the error due to dilation, as in Corollary 4.1 of Proposition 4.2, and the last two terms are the error due to additive noise, as given in Proposition 3.2. Thus, the wavelet invariant representation allows for a decomposition of the error of the noisy dilation MRA model into the sum of the errors of the random dilation model and the additive noise model. This is possible because the representation inherits the differentiability of the wavelet and is not possible when PψCk(), in which case the dilation unbiasing procedure has a more complicated effect on the additive noise. A result equivalent to Proposition 5.1 cannot be made for the power spectrum, because the nonlinear unbiasing procedure of Proposition 4.1 cannot be applied to the power spectra of signals from the noisy dilation MRA corruption model, since they are not differentiable in the presence of additive noise.

Proof of Proposition 5.1.

Since Sf is a translation invariant representation, we can ignore the translation factors {tj}j=1M and consider the model yj=fτj+εj. For notational convenience, we define the following order k derivative ‘unbiasing’ operator:

Aλg(λ):=g(λ)B2η2λ2ddλ2g(λ)Bkηkλkddλkg(λ), (5.4)

which is defined on any function of λ, so that we can express our estimator by

(Sf˜)(λ)=1Mj=1M[12π|y^j(ω)|2Aλ|ψ^λ(ω)|2dω]σ2=1Mj=1M[12π(|f^τj(ω)|2+f^τj(ω)ε^¯j(ω)+f^¯τj(ω)ε^j(ω)+|ε^j(ω)|2)Aλ|ψ^λ(ω)|2dω]σ2.

We can thus decompose the error as follows:

|(Sf˜)(λ)(Sf)(λ)||1Mj=1M12π(f^τj(ω)ε^¯j(ω)+f^¯τj(ω)ε^j(ω))Aλ|ψ^λ(ω)|2dωCrossTermError+|1Mj=1M12π|f^τj(ω)|2Aλ|ψ^λ(ω)|2dω(Sf)(λ)DilationError+|1Mj=1M12π|ε^j(ω)|2Aλ|ψ^λ(ω)|2dωσ2AdditiveNoiseError.

To bound the above terms we utilize the following two Lemmas, which are proved in Appendix F.

Lemma 5.1

Let the notation and assumptions of Proposition 5.1 hold, and let Aλ be the operator defined in (5.4). Then with probability at least 1 − 1/t2

|1Mj=1M12π|ε^j(ω)|2Aλ|ψ^λ(ω)|2dωσ22tkΨσ2M.

Lemma 5.2

Let the notation and assumptions of Proposition 5.1 hold, and let Aλ be the operator defined in (5.4). Then with probability at least 1 − 1/t2

|1Mj=1M12π(f^τj(ω)ε^¯j(ω)+f^¯τj(ω)ε^j(ω))Aλ|ψ^λ(ω)|2dω|tMΨ(Λ0(λ)+Λ(λ))σ.

Applying Proposition 4.2 to bound the dilation error, Lemma 5.1 to bound the additive noise error, and Lemma 5.2 to bound the cross term error gives (5.2). □

5.2. Simulation results for noisy dilation MRA

We once again consider the Gabor atoms of varying frequency introduced in Section 4.4, and compare the L2 error of estimating the power spectrum by (1) averaging the power spectra of the noisy signals, and applying additive noise unbiasing; this is the zero-order power spectrum method (PS k = 0), defined in Proposition 3.1, and (2) by approximating the wavelet invariants by the estimators given in Proposition 5.1 for k = 0, 2, 4, and then applying the optimization procedure described in Section 6.5; we refer to these methods as WSC k = i for i = 0, 2, 4. We emphasize that for the noisy dilation MRA model, it is impossible to define higher order methods for the power spectrum.

We first consider the errors obtained given oracle knowledge of the noise moments, both additive and dilation. Results are shown in Fig. 5 for all parameter combinations resulting from σ = 2−4, 2−3 (giving SNR = 2.2, 0.56) and η = 0.06, 0.12. The horizontal axis shows log2(M) and the vertical axis shows log2(Error); for each value of M, the error was calculated for 10 independent simulations and then averaged. For all simulations τ was given a uniform distribution, a challenging regime for dilations, and the sample size ranged over 16 ⩽ M ⩽ 131, 072. For the medium- and high-frequency signals, for large enough M, WSC k = 2 and WSC k = 4 have significantly smaller error than the order zero estimators, indicating that the nonlinear unbiasing procedure of Proposition 5.1 contributes a definitive advantage. For the high-frequency signal and large M, the error using WSC k = 4 is decreased by a factor of about 3 from the PS k = 0 error. For small dilations (η = 0.06), there is not much of a difference in performance between WSC k = 2 and WSC k = 4, but the gap between these estimators widens for large dilations (η = 0.12), as the fourth-order correction becomes more important. For the low-frequency signal under small dilations, PS k = 0 achieves the smallest error for large M. However, when M is small or the dilations are large, the WSC estimators have the advantage for the low-frequency signal as well, and WSC k = 4 is once again the best estimator for large M.

Fig. 5.

Fig. 5.

L2 error with standard error bars for noisy dilation MRA model (oracle moment estimation). First, second and third column shows results for low-, medium- and high-frequency Gabor signals. All plots have the same axis limits.

We note that although in general recovering the power spectrum is insufficient for recovering the signal, the signal can be recovered when f^(ω) and f^(ω)0 by taking the inverse Fourier transform of the root power spectrum. Figure 6 shows the approximate signals recovered by this procedure from PS k = 0 (Fig. 6(c)) and WSC k = 4 (Fig. 6(b)) for the high-frequency Gabor signal f3(x) (Fig. 6(a)). The WSC-recovered signal is a much better approximation of the target signal. The recovered power spectra are shown in Fig. 6(d); PS k = 0 is much flatter than the target power spectrum, while WSC k = 4 is a good approximation of both the shape and height of the target power spectrum.

Fig. 6.

Fig. 6.

Signal recovery results for f3(x)=e5x2cos(32x) with M = 20, 000, η = 0.12, SNR = 2.2.

Appendix G outlines an empirical procedure for estimating the moments of τ in the special case when t = 0 in the noisy dilation MRA model (i.e. no random translations). All simulations reported in Fig. 5 are repeated (with minor modifications) with empirical additive and dilation moment estimation, and the results are reported in Fig. G7 of Appendix G.

Appendix H contains additional simulation results for a variety of high-frequency signals.

Remark 5.1

One could also solve noisy dilation MRA with an EM algorithm. Appendix I describes how the method proposed in [1] can be extended to solve Model 2. Although EM algorithms provide a flexible tool for accurate parameter estimation in a variety of MRA models, the primary disadvantage is the high computational cost of each iteration. Each iteration costs O(Mn3), while wavelet invariant estimators can be computed in O(Mn2). In addition the statistical priors chosen may bias the signal reconstruction [12], and the algorithm will generally only converge to a local maximum. In this article we thus explore whether it is possible to solve noisy dilation MRA more efficiently and accurately by nonlinear unbiasing procedures.

6. Numerical implementation

In this section we describe the numerical implementation of the proposed method used to generate the results reported in Sections 3, 4.4 and 5.2. Section 6.1 describes how signals were generated, and Sections 6.2 and 6.3 describe empirical procedures for estimating the additive noise level and the moments of the dilation distribution τ. Finally, Section 6.4 discusses how the derivatives used for unbiasing were computed, and Section 6.5 describes the convex optimization algorithm used to recover Pf from Sf. All simulations used a Morlet wavelet constructed with ξ = 3π/4.

6.1. Signal generation and SNR

All signals were defined on [−N/4, N/4] and then padded with zeros to obtain a signal defined on [−N/2, N/2]; the additive noise was also defined on [−N/2, N/2]. Signals were sampled at a rate of 1/2, thus resolving frequencies in the interval [−2π, 2π] with a frequency sampling rate of 2π/N. We used N = 25 and = 5 in all experiments, keeping the box size and resolution fixed. For each experiment with hidden signal f, the SNR was calculated by SNR=(1NN/2N/2f(x)2dx)/σ2.

6.2. Empirical estimation of additive noise level

The additive noise level σ2 can be estimated from the mean vertical shift of the mean power spectrum 1Mj=1M|y^j(ω)|2 in the tails of the distribution. Specifically, for Σ = [−2π, 2π] \ [−2−1π, 2−1π], we define

σ˜2=1|Σ|ωΣ1Mj=1M|y^j(ω)|2.

If we choose large enough so that the target signal frequencies are essentially contained in the interval [21π,21π],|y^j(ω)|2=|ε^j(ω)|2 for ωΣ, and this is a robust and unbiased estimation procedure since E|ε^j(ω)|2=σ2 by Lemma D.1.

6.3. Empirical moment estimation for dilation MRA

Given the additive noise level, the moments of the dilation distribution τ for dilation MRA (Model 3) can be empirically estimated from the mean and variance of the random variables αm(yj) defined by

αm(yj)=02πωm|y^j(ω)|2dω (6.1)

for integer m ⩾ 0. More specifically, we define the order m squared coefficient of variation by

CVm:=Var[αm(yj)]|E[αm(yj)]|2. (6.2)

The following proposition guarantees that for M large the second and fourth moments of the dilation distribution can be recovered from CV0, CV1. In fact one could continue this procedure for higher m values, i.e. {CVm}m=0k/21 will define estimators of the first k2 even moments of τ, accurate up to O(ηk+2), but for brevity we omit the general case.

Proposition 6.1

Assume Model 3 and CV0, CV1 defined by (6.1) and (6.2). Then

CV0=η2+(3C43)η4+O(η6)
CV1=4η2+(25C433)η4+O(η6).
Proof.

Since yj=Lτjf(xtj),

αm(yj)=02πωm|f^((1τj)ω)|2dω=02π(1τj)ξm(1τj)m|f^(ξ)|2dξ(1τj)=(1τj)(m+1)αm(f),

where we assume we have chosen large enough so that the target signal frequencies are essentially supported in [−2ℓ−1π, 2ℓ−1π]. Thus,

CVm=E[αm(yj)2](E[αm(yj)])2(E[αm(yj)])2=E[(1τj)2(m+1)](E[(1τj)(m+1)])21.

When m = 0, we have

CV0=E[(1τj)2](E[(1τj)1])21=E[1+2τ+3τ2+4τ3+5τ4+O(τ5)](E[1+τ+τ2+τ3+τ4+O(τ5)])21=1+3η2+5C4η4+O(η6)(1+η2+C4η4+O(η6))21=1+3η2+5C4η4+O(η6))1+2η2+(2C4+1)η4+O(η6)1=(1+3η2+5C4η4+O(η6))(12η2+(32C4)η4+O(η6))1=η2+(3C43)η4+O(η6).

When m = 1, we have

CV1=E[(1τj)4](E[(1τj)2])21=E[1+4τ+10τ2+20τ3+35τ4+O(τ5)](E[1+2τ+3τ2+4τ3+5τ4+O(τ5)])21=1+10η2+35C4η4+O(η6)(1+3η2+5C4η4+O(η6))21=1+10η2+35C4η4+O(η6)(1+6η2+(9+10C4)η4+O(η6))1=(1+10η2+35C4η4+O(η6))(16η2+(2710C4)η4+O(η6))1=4η2+(25C433)η4+O(η6).

We cannot compute CVm exactly, but by replacing Var, E with their finite sample estimators, we obtain an approximate CV˜mCVm as M → ∞. Motivated by Proposition G.1, we thus use CV˜0, CV˜1 to define estimators of η2 and C4η4.

Definition 6.1

Assume Model 3 and let CV˜0, CV˜1 be the empirical versions of (6.2). Define the second-order estimator of η2 by η˜2=CV˜0. Define the fourth-order estimators of (η2, C4η4) by the unique positive solution (η˜2, C˜4) of

CV˜0=η2+(3C43)η4
CV˜1=4η2+(25C433)η4.

For noisy dilation MRA (Model 2), estimating the dilation moments is more difficult. We give a procedure for estimating the moments in the special case t = 0 in Appendix G. Empirical moment estimation procedures that are simultaneously robust to translations, dilations and additive noise are an important area of future research.

6.4. Derivatives

All derivatives were approximated numerically using finite difference calculations. A sixth-order finite difference approximation was used for second derivatives, and a fourth-order finite difference approximation was used for fourth derivatives. This procedure was done on the empirical mean for each representation, not the individual signals. In fact since the wavelet is known, dndλn|ψ^λ(ω)|2 could be computed analytically, and (Syj)(n)(λ) computed using Definition 2.2. Thus error due to finite difference approximations could be avoided for wavelet invariant derivatives.

6.5. Optimization

In this section we describe the convex optimization algorithm for computing (PSf˜), the power spectrum approximation that best matches the wavelet invariants (Sf˜). Since the wavelet invariants are only computed for λ > 0, we also incorporate zero frequency information into the loss function via (Pf˜)(0), an approximation of the power spectrum at frequency zero. For all of the examples reported in this article, the quasi-newton algorithm was used to solve an unconstrained optimization problem minimizing the following convex loss function:

loss(g^):=λ(g^2,|ψ^λ+|2Sf˜(λ))2+(g^(0)2(Pf˜)(0))2.

where

|ψ^λ+(ω)|2=(|ψ^λ(ω)|2+|ψ^λ(ω)|2)1(ω0).

Letting g^ denote the minimizer of the above loss function, we then define (PSf˜):=g^(ω)2. Theorem 2.4 ensures that when the loss function is defined with the exact wavelet invariants Sf, it has a unique minimizer corresponding to Pf. Whenever f(x), the symmetry of (Pf)(ω) ensures that (Sf)(λ)=|f^|2,|ψ^λ+|2, and thus it is sufficient to optimize over the non-negative frequencies and then symmetrically extend the solution. Such a procedure ensures the output of the optimization algorithm is symmetric while avoiding adding constraints to the optimization. The algorithm was initialized using the mean power spectrum with additive noise unbiasing only, i.e. PS k = 0. The optimization output does depend on various numerical tolerance parameters, which were held fixed for all examples.

Remark 6.1

Alternatively, one can invert the representation by applying a pseudo-inverse with Tikhonov regularization. Specifically, if F is the matrix defining the wavelet invariants, so that Sy = F(Py), then one can define (PSf˜)=(FTF+λI)1FT(Sf˜). This procedure however requires careful selection of the hyper-parameter λ and did not work as well as inverting via optimization in our experiments.

7. Conclusion

This article considers a generalization of classic MRA, which incorporates random dilations in addition to random translations and additive noise and proposes solving the problem with a wavelet invariant representation. These wavelet invariants have several desirable properties over Fourier invariants, which allow for the construction of unbiasing procedures that cannot be constructed for Fourier invariants. Unbiasing the representation is critical for high-frequency signals, where even small diffeomorphisms cause a large perturbation. After unbiasing, the power spectrum of the target signal can be recovered from a convex optimization procedure.

Several directions remain for further investigation, including extending results to higher dimensions and considering rigid transformations instead of translations. Such extensions could be especially relevant to image processing, where variations in the size of an object can be modeled as dilations. Incorporating the effect of tomographic projection would also lead to results more directly relevant to problems such as cryo-EM. The tools of the present article, although significantly reducing the bias, do not allow for a completely unbiased estimator for noisy dilation MRA due to the bad scaling of certain intrinsic constants. Thus, an important open question is whether it is possible to define unbiased estimators for noisy dilation MRA using a different approach. The noisy dilation MRA model of this article corresponds to linear diffeomorphisms, and constructing unbiasing procedures that apply to more general diffeomorphisms is also an important future direction. In addition, one can construct wavelet invariants that characterize higher order auto-correlation functions such as the bispectrum, and future work will investigate full signal recovery with such invariants.

Acknowledgements

We would like to thank the reviewers for their detailed comments and insights that greatly improved the manuscript. We would also like to thank Stephanie Hickey for providing useful references on flexible regions of macromolecular structures.

Funding

Alfred P. Sloan Foundation (Sloan Fellowship FG-2016–6607 to M.H.); Defense Advanced Research Projects Agency (Young Faculty Award D16AP00117 to M.H.); National Science Foundation (grant 1912906 to A.L.; grant 1620216 and CAREER award 1845856 to M.H.).

A. Wavelet admissibility conditions

This appendix describes the wavelet admissibility conditions that are needed for the main results in this article, namely Propositions 4.2 and 5.1. The wavelet ψ is k-admissible if ψ^Ck() and Ψk < ∞, Θk < ∞ where

Ψk:=12πi=0k(ki)k!i!ωi(Pψ)(i)(ω)1, (A.1)
Θk:=12πi=0k(ki)k!i!ωi2(Pψ)(i)(ω)1. (A.2)

For ψ to be k-admissible, it is sufficient for ψ^Ck(), ()(i) to decay faster than ωi+1, and |ψ^(ω)|2ω2dω< (see Lemma B.1 in Appendix B). The condition |ψ^(ω)|2ω2dω< is slightly stronger than the classic admissability condition Cψ:=|ψ^(ω)|2ωdω< [55, Theorem 4.4]. When ψ^ is continuously differentiable, ψ^(0)=0 is sufficient to guarantee Cψ < ∞; but here we need ψ^(ω)~ω12+ϵ for some ϵ > 0 as ω → 0. If this condition is removed, we are not guaranteed Θk < ∞, but all results in fact still hold, with Λk(λ)=Ψkf12 replacing Λk(λ)=Ψkf12Θkf12λ2 in Propositions 4.2 and 5.1. Any wavelet with fast decay satisfies this stronger admissibility condition, and it ensures that a smooth signal will enjoy a fast decay of wavelet invariants.

Remark A.1

The Morlet wavelet ψ(x) =→g(x)(eiξxC) is k-admissible for any k, since ψ^C(), has fast decay, and ψ^(ω)~ω as ω → 0. One can also choose ψ^ to be an order k + 1-spline of compact support.

B. Properties of wavelet invariants

This appendix establishes several important properties of wavelet invariants. Lemma B.1 gives sufficient conditions guaranteeing that a wavelet is k-admissible. Lemmas 4.3 and 4.4 bound wavelet invariant derivatives. Lemma B.2 bounds terms that arise in the dilation unbiasing procedure of Sections 4.2 and 5.

Lemma B.1 (k-admissible).

If ψ^Ck(), ()(i) decays fast than ωi+1, and |ψ^(ω)|2ω2dω<, then ψ is k-admissible.

Proof.

We first note that ψ^Ck() guarantees PψCk(). Since ()(i) decays faster than ωi+1 and PψCk(), ωi(Pψ)(i)(ω)L1() for 0 ⩽ ik, so Ψk < ∞. Also PψCk() and ωi(Pψ)(i)L1() implies ωi2(Pψ)(i)L1() for 2 ⩽ ik. In addition, ω2(Pψ)(ω)L1() by assumption. Thus, to conclude Θk < ∞, it only remains to show ω1(Pψ)(ω)L1(). Since ()′ is continuous and decays faster than ω2, only the integrability around the origin needs to be verified. We note that |ψ^(ω)|2ω2dω< and continuous implies ω1+ϵ for some ϵ > 0 as ω → 0. Thus, ()′ ∼ ωϵ as ϵ → 0, so that ω−1()′ ∼ ωϵ−1; the function is thus integrable around the origin since ϵ − 1 > −1. □

Lemma 4.3 (Low frequency bound).

Assume PψCm() and fL1(). Then the quantity |λm(Sf)(m)(λ)| can be bounded uniformly over all λ. Specifically,

|λm(Sf)(m)(λ)|Ψmf12

for Ψm defined in (A.1).

Proof.

Let g(ω)=(Pψ)(ω)=|ψ^(ω)|2, and let

gλ(ω):=1λg(ωλ)=|ψ^λ(ω)|2.

Utilizing Definition 2.2 we obtain

λm(Sf)(m)(λ)=12π|f^(ω)|2[λmdmdλmgλ(ω)]dω.

Expanding the derivative gives

λmdmdλmgλ(ω)=Cm,0gλ(ω)+Cm,1ωgλ(ω)+Cm,2ω2gλ(ω)+Cm,mωmgλ(m)(ω),Cm,i=(1)m(mi)m!i!.

Utilizing f^f1 and gλ(i)(ω)=1λi+1g(i)(ωλ), one obtains

|λm(Sf)(m)(λ)|i=0m|Cm,i|2π|f^(ω)|2|ωigλ(i)(ω)|dωf12i=0m|Cm,i|2π|ωigλ(i)(ω)|dω=f12i=0m|Cm,i|2π|ωig(i)(ω)|dω=f12i=0m|Cm,i|2πωig(i)(ω)1=Ψmf12.

Lemma 4.4 (High frequency bound for differentiable functions).

Assume PψCm(), and fL1(). Then the quantity |λm(Sf)(m)(λ)| can be bounded by

|λm(Sf)(m)(λ)|Θmλ2f12

for Θm defined in (A.2).

Proof.

Recall from the proof of Lemma 4.3 that

|λm(Sf)(m)(λ)|i=0m|Cm,i|2π|f^(ω)|2|ωigλ(i)(ω)|dω

where gλ(ω)=1λg(ωλ)=|ψ^λ(ω)|2 and Cm,i=(1)m(mi)m!i!. Since ωf^(ω)f1 and gλ(i)(ω)=1λi+1g(i)(ωλ), we obtain

|λm(Sf)(m)(λ)|i=0m|Cm,i|2π|ωf^(ω)|2|ωi2gλ(i)(ω)|dωf12i=0m|Cm,i|2π|ωi2gλ(i)(ω)|dω=f12λ2i=0m|Cm,i|2π|ωi2g(i)(ω)|dω=f12λ2i=0m|Cm,i|2πωi2g(i)(ω)1=Θmλ2f12.

Lemma B.2

Assume PfC0() and ψ is m-admissible, and let Bm, E,Ψm,Θm be as defined in (4.1), (4.3), (A.1) (A.2). Then,

12π|f^(ω)|2|Bmηmλmdmdλm|ψ^λ(ω)|2dω(Eη)mΛm(λ),

where

Λm(λ)=(f12Ψmf12Θmλ2).

Proof.

From the proof of Lemma 4.3:

12π|f^(ω)|2|λmdmdλm|ψ^λ(ω)|2dωΨmf12.

From the proof of Lemma 4.4:

12π|f^(ω)|2|λmdmdλm|ψ^λ(ω)|2dωΘmf12λ2.

Utilizing |Bm| ⩽ Em gives

12π|f^(ω)|2|Bmηmλmdmdλm|ψ^λ(ω)|2dω(Eη)m(f12Ψmf12Θmλ2).

The following Corollary is obtained from Lemma B.2 when f is a dirac-delta function.

Corollary B.1

Assume ψ is m-admissible, and let Bm, E,Ψm be as defined in (4.1), (4.3), (A1). Then,

12π|Bmηmλmdmdλm|ψ^λ(ω)|2dω(Eη)mΨm.

C. Power spectrum and wavelet invariant equivalence

This appendix contains supporting results for demonstrating the equivalence of the power spectrum and wavelet invariants. Lemma 2.1 establishes that wavelet invariants uniquely determine any bandlimited L2 function, as long as the wavelet satisfies the linear independence Condition 2.3 and a mild integrability condition. Proposition 2.1 gives two criteria that are sufficient to guarantee Condition 2.3. Finally, Lemma C.1 establishes that the Morlet wavelet satisfies Condition 2.3.

Lemma 2.1

Let pL2() be continuous and assumep(w) = p(−w), ψ^ has compact support and Condition 2.3. Then

Proof.

Since p is continuous, there exists an ϵ > 0 such that on (0, ϵ) one either has p = 0, p > 0, or p < 0. Claim: one must have p = 0. Suppose not, and without loss of generality assume p > 0 on (0, ϵ) and that the support of |ψ^+(ω)|2 is contained in the interval [1, 2]. Now choose λ0 small enough so that |ψ^λ0+(ω)|2 is supported on [ϵ/1, ϵ) i.e. λ0 = ϵ/2. Clearly, there must exist a subset M[ϵ/2,ϵ] of positive measure such that |ψ^λ0+(ω)|2>0 on M. Then,

0=0p(ω)|ψ^λ0+(ω)|2dω=ϵ/2ϵp(ω)|ψ^λ0+(ω)|2dωMp(ω)|ψ^λ0+(ω)|2dω0.

We conclude

Mp(ω)|ψ^λ0+(ω)|2dω=0,

but this is impossible since the integrand is strictly positive on M. We thus conclude that p = 0 on (0, ϵ). Thus, it is sufficient to only consider frequencies [ϵ, ∞].

Assume p(ω)|ψ^λ(ω)|2dω=0 for all λ. Since p(ω) = p(−ω),

p(ω)|ψ^λ(ω)|2dω=0p(ω)|ψ^λ+(ω)|2dω=ϵp(ω)|ψ^λ+(ω)|2dω=p,|ψ^λ+|2I=0λ,

where I = [ϵ, ∞). We now define |ϕ^λ+(ω)|2:=λβ|ψ^λ+(ω)|2 for some β > 0, and observe that

0p(ω)|ϕ^λ+(ω)|2dω=λ0|p,|ϕ^λ+|2+|2dλ=0|p,|ϕ^λ+|2I|2dλ=0.

Note

0|p,|ϕ^λ+|2I|2dλ=0p,|ϕ^λ+|2Ip¯,|ϕ^λ+|2Idλ=0(Ip(ω1)|ϕ^λ+(ω1)|2dω1)(Ip(ω2)¯|ϕ^λ+(ω2)|2dω2)dλ=Ip(ω2)¯(Ip(ω1)(0|ϕ^λ+(ω1)|2|ϕ^λ+(ω2)|2dλ)dω1)dω2.

We now apply the change of variable ωi = 1/ξi, and let g(ξi) = p(1/ξi). We obtain

0=01/ϵg(ξ2)¯(01/ϵg(ξ1)(01ξ12ξ22|ϕ^λ+(1ξ1)|2|ϕ^λ+(1ξ2)|2dλ)dξ1)dξ2. (C.1)

Now consider the kernel

k(ξ1,ξ2)=01ξ12ξ22|ϕ^λ+(1ξ1)|2|ϕ^λ+(1ξ2)|2dλ.

Note that k is a strictly positive definite kernel function if for any finite sequence {ξi}i=1n in [0, 1/ϵ], the n by n matrix A defined by

Aij=k(ξi,ξj)

is strictly positive definite [83]. Viewing ξ˜i(λ)=ξi2|ϕ^λ+(1/ξi)|2 as functions of λ, we see that

Aij=ξ˜i(λ),ξ˜j(λ)+

and A is thus a Gram matrix. Since the ξ˜i(λ) are linearly independent if and only if the |ψ^λ+(ωi)|2 are linearly independent, and the |ψ^λ+(ωi)|2 are linearly independent by assumption, we can conclude that A and thus k are strictly positive definite. Now consider the corresponding integral operator on [0, 1/ϵ]:

Kg(ξ2)=01/ϵg(ξ1)k(ξ1,ξ2)dξ1.

Since ψL1(), |ψ^λ+|2 and thus |ϕ^λ+|2 are continuous, and k will thus be continuous as long as it remains bounded. To check boundedness we observe that k(ξ1, ξ2)2k(ξ1, ξ1)k(ξ2, ξ2) [21], and

k(ξ,ξ)=01ξ4|ϕ^λ+(1ξ)|4dλ=01ξ41λ2+2β|ψ^+(1λξ)|4dλ=01ξ4(ωξ)2+2β|ψ^+(ω)|4dωξω2=ξ2β30ω2β|ψ^+(ω)|4dω3ξ2β30ω2β|ψ^(ω)|4dω3ξ2β3ωβPψ22.

Since ψ^ has a compact support, clearly ωβPψ22<, and k is thus bounded on the compact interval [0, 1/ϵ] as long as β ⩾ 3/2. Since k is continuous and [0, 1/ϵ] is compact, K : L2[0, 1/ϵ] → L2[0, 1/ϵ] is a compact, self-adjoint operator and by Mercer’s Theorem K is also strictly positive definite [83]. Since Kg,g[0,1/ϵ]=0 by (C.1), we conclude g = 0 in L2[0, 1/ϵ]. Thus, p(1/ξ) = 0 for almost every ξ ∈ (0, 1/ϵ], which implies p(ω) = 0 for almost every ω ∈ [ϵ, ). Since p(ω) = p(−ω) and p = 0 on (0, ϵ), p = 0 for almost every ω. □

Proposition 2.1

The following are sufficient to guarantee Condition 2.3:

  1. |ψ^(ω)|2 has a compact support contained in the interval [a, b], where a and b have the same sign, e.g. complex analytic wavelets with compactly supported Fourier transform.

  2. |ψ^(ω)|2C() and there exists an N such that all derivatives of order at least N are non-zero ω = 0, e.g. the Morlet wavelet.

Proof.

Let {ωi}i=1n be a finite sequence of distinct positive frequencies, and let ω˜i(λ)=1|λ||ψ^+(ωiλ)|2 denote the corresponding functions of λ.

First assume (i). Without loss of generality we assume that [a, b] is a positive interval and that |ψ^(ω)|2>0 on (a, a + ϵ) for some ϵ > 0. Clearly, |ψ^+(ω)|2=|ψ^(ω)|2. A simple calculation shows that the support of ω˜i(λ) is contained in the interval [ωib,ωia], and ω˜i(λ)>0 in a neighborhood of ωia. Assume we have ordered the ωi so that ω1 > . . . > ωn > 0. Now suppose

c1ω˜1(λ)++cnω˜n(λ)=0.

Note ω˜1(λ) is the only function in the above collection with support in a neighborhood of ω1a; thus, we must have c1 = 0, so that

c2ω˜2(λ)++cnω˜n(λ)=0.

But now ω˜2(λ) is the only function in the above collection with support in a neighborhood of ω2a, so we must have c2 = 0, and proceeding iteratively we conclude that c1 = . . . = cn = 0. Thus, {ω˜i(λ)}i=1n is a linearly independent set, and Condition 2.3 holds.

Now assume (ii). Since dndωn(|ψ^+(ω)|2)|ω=0=2dndωn(|ψ^(ω)|2)|ω=0, |ψ^+(ω)|2 is C() and all derivatives of order at least N are non-zero at ω = 0. Note {ω˜i(λ)}i=1n={|λ|1|ψ^+(ωi/λ)|2}i=1n are linearly independent if and only if {|ψ^+(ωi/λ)|2}i=1n are linearly independent. Defining λ˜=1/λ, this holds if and only if {|ψ^+(ωiλ˜)|2}i=1n={g(ωiλ˜)}i=1n are linearly independent as functions of λ˜, where we define g(ω)=|ψ^+(ω)|2. Assume

c1g(ω1λ˜)+c2g(ω2λ˜)++cng(ωnλ˜)=0.

Differentiating m times for NmN + n − 1, we obtain

c1ω1Ng(N)(ω1λ˜)++cnωnNg(N)(ωnλ˜)=0c1ω1N+n1g(N+n1)(ω1λ˜)++cnωnN+n1g(N+n1)(ωnλ˜)=0.

The above holds for all λ˜. We now take the limit as λ˜0 to obtain

g(N)(0)(ω1Nc1+ω2Nc2+ωnNcn)=0g(N+1)(0)(ω1N+1c1+ω2N+1c2+ωnN+1cn)=0g(N+n1)(0)(ω1N+n1c1+ω2N+n1c2+ωnN+n1cn)=0.

Since g(m) (0) ≠ 0, we obtain

[ω1NωnNω1N+1ωnN+1ω1N+n1ωnN+n1][c1c2cn]=[000][11ω1ωnω1(n1)ωn(n1)]:=A[ω1N000ω2N000ωnN]:=B[c1c2cn]=[000].

Since A is a Vandermonde matrix constructed from distinct ωi, det(A) ≠ 0. Since the ωi are non-zero, det(B) ≠ 0. Thus, det(AB) = det(A) det(B) ≠ 0. We conclude AB is invertible and so all ci = 0, which gives Condition 2.3. □

Lemma C.1

Suppose we construct a Morlet wavelet with parameter ξ, that is ψ(x)=Cξπ1/4ex2/2(eiξxeξ2/2) for Cξ=(1eξ22e3ξ2/4)1/2. Then, for almost all ξ+, the wavelet satisfies Condition 2.3.

Proof.

The Fourier transform ψ^ has form

ψ^(ω)=C˜ξeω2/2(eξω1)

for some constant C˜ξ depending on ξ, so that

g(ω):=C˜ξ2|ψ^(ω)|2=eω2(eξω1)2.

From direct calculation or a computer algebra system, one obtains

g(n)(0)={Hn(ξ)2Hn(ξ/2)noddHn(ξ)2Hn(ξ/2)+(1)n2n!(n2)!neven

where Hn(ξ) is the nth degree physicist’s Hermite polynomial. We have g′(0) = 0, but for n > 1, g(n)(0) = 0 only when ξ is a root of the above polynomial. Since the set of roots of the polynomials {g(n)(0)}n=1 is countable, if ξ is selected at random from , it is not a root of any of these polynomials with probability 1, and g(n)(0) ≠ 0 for all n. Thus, the wavelet satisfies criterion (ii) of Proposition 2.1, and thus the linear independence Condition 2.3. □

D. Supporting results: classic MRA

This appendix contains supporting results for Section 3. The first two lemmas (Lemmas D.1 and Lemma D.2) establish additive noise bounds for the power spectrum and are needed to prove Proposition 3.1. The next two lemmas (Lemmas D.3 and Lemma D.4) establish additive noise bounds for wavelet invariants and are needed to prove Proposition 3.2.

Lemma D.1

Let ε(x) be a white noise processes on [12,12] with variance σ2. Then, for all frequencies ω, ξ,

E[|ε^(ω)|2]=σ2 (D.1)
E[|ε^(ω)|4]3σ4 (D.2)
E[|ε^(ω)|2|ε^(ξ)|2]3σ4. (D.3)

Proof.

By Proposition J.1,

E[|ε^(ω)|2]=E[ε^(ω)ε^(ω)^]=E[(1/21/2eiωxdBx)(1/21/2eiωxdBx)]=σ21/21/2dx=σ2,

which shows (D.1). By Proposition J.2,

E[|ε^(ω)|4]=E[ε^(ω)2(ε^¯(ω))2]=E[(1/21/2eiωxdBx)2(1/21/2eiωxdBx)2]=2σ4(1/21/2dx)2+σ4(1/21/2e2iωxdx)(1/21/2e2iωxdx)2σ4+σ4(1/21/2|e2iωx|dx)(1/21/2|e2iωx|dx)=3σ4,

which shows (D.2). Finally, by Proposition J.3, we have

E[|ε^(ω)|2|ε^(ξ)|2]=E[(1/21/2eiωxdBx)(1/21/2eiωxdBx)(1/21/2eiξxdBx)(1/21/2eiξxdBx)]=σ4[(1/21/2ei(ω+ξ)xdx)(1/21/2ei(ω+ξ)xdx)]+σ4[(1/21/2ei(ξω)xdx)(1/21/2ei(ωξ)xdx)+(1/21/2dx)(1/21/2dx)]σ4[3(1/21/2dx)(1/21/2dx)]=3σ4,

which gives (D.3). □

Lemma D.2

Let ε(x) be a white noise processes on [12,12] with variance σ2. Then, for any signal fL1(),

E[(P(f+ε))(ω)]=(Pf)(ω)+σ2
Var[(P(f+ε))(ω)]4σ2(Pf)(ω)+2σ4.

Proof.

Since E[ε^(ω)]=E[ε^(ω)¯]=0 and E[|ε^(ω)|2]=σ2 by Lemma D.1,

E[(P(f+ε))(ω)]=E[(f^(ω)+ε^(ω))(f^(ω)¯+ε^(ω)¯)]=E[|f^(ω)|2+f^(ω)ε^(ω)¯+ε^(ω)f^(ω)¯+|ε^(ω)|2]=(Pf)(ω)+σ2.

We now control Var[(P(f + ε))(ω)]. Note that:

[(P(f+ε))(ω)]2=(|f^(ω)|2+f^(ω)ε^(ω)¯+ε^(ω)f^(ω)¯+|ε^(ω)|2)2

and that

E[|ε^(ω)|2ε^(ω)]=E[(1/21/2eiωxdBx)(1/21/2eiωsdBs)(1/21/2eiωpdBp)]=0,

since even when x = s = p, E[(ΔBx)3]=0. Ignoring the terms with zero expectation, we thus get

E[(P(f+ε))(ω)2]=E(|f^(ω)|4+4|f^(ω)|2|ε^(ω)|2+|ε^(ω)|4+f^(ω)2ε^(ω)2^+ε^(ω)2f^(ω)^2)E(|f^(ω)|4+6|f^(ω)|2|ε^(ω)|2+|ε^(ω)|4)=[(Pf)(ω)]2+6σ2(Pf)(ω)+3σ4

where the last line follows from Lemma D.1. Thus,

Var[(P(f+ε))(ω)]=E[(P(f+ε))(ω)2](E[(P(f+ε))(ω)])2[(Pf)(ω)]2+6σ2(Pf)(ω)+3σ4((Pf)(ω)+σ2)2=4σ2(Pf)(ω)+2σ4.

Proposition 3.1

Assume Model 1. Define the following estimator of (Pf)(ω):

(Pf˜)(ω):=1Mj=1M(Pyj)(ω)σ2.

Then, with probability at least 1 − 1/t2,

|(Pf)(ω)(Pf˜)(ω)|2tσM(f1+σ). (3.1)

Proof.

Let ftj(x)=f(xtj) so that yj=ftj+εj. We first note since ftj^(ω)=eiωtjf^(ω), the power spectrum is translation invariant, that is (Pftj)(ω)=(Pf)(ω) for all ω, tj. Thus, by Lemma D.2,

E[(Pyj)(ω)]=E[(P(ftj+εj))(ω)]=(Pftj)(ω)+σ2=(Pf)(ω)+σ2

and

Var[(Pyj)(ω)]=Var[(P(ftj+εj))(ω)]4σ2(Pftj)(ω)+2σ4=4σ2(Pf)(ω)+2σ4.

Since the yj are independent,

Var(1Mj=1M(Pyj)(ω))1M(4σ2(Pf)(ω)+2σ4).

Applying Chebyshev’s inequality to the random variable X=1Mj=1M(Pyj)(ω), we obtain

(|1Mj=1M(Pyj)(ω)((Pf)(ω)+σ2)|t(2σ(Pf)(ω)+2σ2)M)1t2.

Observing that (Pf)(ω)=|f^(ω)|f1 gives (3.1). □

Lemma D.3

Let ε(x) be a white noise processes on [12,12] with variance σ2. Then,

E[(Sε)(λ)]=σ2
E[(Sε)(λ)2]3σ4.

Proof.

Since E[|ε^(ω)|2]=σ2 by Lemma D.1, we have

E[(Sε)(λ)]=E[ε*ψλ22]=E[12πε^ψ^λ22]=E[12π|ε^(ω)|2|ψ^λ(ω)|2dω]=σ22π|ψ^λ(ω)|2dω=σ2ψλ22=σ2.

Since by Lemma D.1, E[|ε^(ω)|2|ε^(ξ)|2]3σ4, we also have

E[(Sε)(λ)2]=E[ε*ψλ24]=E[1(2π)2ε^ψ^λ22ε^ψ^λ22]=E[1(2π)2|ε^(ω)|2|ε^(ξ)|2|ψ^λ(ω)|2|ψ^λ(ξ)|2dωdξ]3σ4(2π)2|ψ^λ(ω)|2|ψ^λ(ξ)|2dωdξ=3σ4(ψλ22)2=3σ4.

Lemma D.4

Let ε(x) be a white noise processes on [12,12] with variance σ2. Then, for any signal fL1()

E[(S(f+ε))(λ)]=(Sf)(λ)+σ2
Var[(S(f+ε))(λ)]4σ2(Sf)(λ)+2σ4.

Proof.

Utilizing E[ε]=E[ε¯]=0 and Lemma D.3, we have

E[(S(f+ε))(λ)]=E[|(f+ε)*ψλ(u)|2du]=|f*ψλ(u)|2+E[|ε*ψλ(u)|2du]=(Sf)(λ)+E[(Sε)(λ)]=(Sf)(λ)+σ2.

To bound E[(S(f+ε))(λ)2], note that

[(S(f+ε))(λ)]2=(|f*ψλ(u1)|2+(ε*ψλ(u1))(f¯*ψλ(u1)¯)+(f*ψλ(u1))(ε¯*ψλ(u1)¯)+ε*ψλ(u1))|2du1)(|f*ψλ(u2)|2+(ε*ψλ(u2))(f¯*ψλ(u2)¯)+(f*ψλ(u2))(ε¯*ψλ(u2)¯)+ε*ψλ(u2))|2du2).

When we take expectation, any term involving one or three ε terms disappears, so that

E[(S(f+ε))(λ)2]=E[|f*ψλ(u1)|2|f*ψλ(u2)|2du1du2+|f*ψλ(u1)|2ε*ψλ(u2))|2du1du2+(ε*ψλ(u1))(f¯*ψλ(u1)¯)(ε*ψλ(u2))(f¯*ψλ(u2)¯)du1du2+(ε*ψλ(u1))(f¯*ψλ(u1)¯)(f*ψλ(u2))(ε¯*ψλ(u2)¯)du1du2+(f*ψλ(u1))(ε¯*ψλ(u1)¯)(ε*ψλ(u2))(f¯*ψλ(u2)¯)du1du2+(f*ψλ(u1))(ε¯*ψλ(u1)¯)(f*ψλ(u2))(ε¯*ψλ(u2)¯)du1du2+ε*ψλ(u1))|2|f*ψλ(u2)|2du1du2+ε*ψλ(u1))|2ε*ψλ(u2))|2du1du2]E[|f*ψλ(u1)|2|f*ψλ(u2)|2du1du2+6|f*ψλ(u1)|2ε*ψλ(u2))|2du1du2+ε*ψλ(u1))|2ε*ψλ(u2))|2du1du2]=E[[(Sf)(λ)]2+6(Sf)(λ)(Sε)(λ)+[(Sε)(λ)]2]=[(Sf)(λ)2]+6σ2(Sf)(λ)+3σ4,

where the last line follows from Lemma D.3. Thus,

Var[(S(f+ε))(λ)]=E[(S(f+ε))(λ)2](E[(S(f+ε))(λ)])2[(Sf)(λ)]2+6σ2(Sf)(λ)+3σ4[(Sf)(λ)+σ2]2=4σ2(Sf)(λ)+2σ4.

Proposition 3.2

Assume Model 1. Define the following estimator of (Sf)(λ):

(Sf˜)(λ):=1Mj=1M(Syj)(λ)σ2.

Then, with probability at least 1 − 1/t2,

|(Sf)(λ)(Sf˜)(λ)|2tσM(f1+σ). (D.4)

Proof.

Let ftj(x)=f(xtj) so that yj=ftj+εj. We first note that the wavelet invariants are translation invariant, that is Sftj=Sf for all tj. We now compute the mean and variance of the coefficients (Syj)(λ). By Lemma D.4,

E[(Syj)(λ)]=E[(S(ftj+εj))(λ)]=(Sftj)(λ)+σ2=(Sf)(λ)+σ2

and

Var[(Syj)(λ)]=Var[(S(ftj+εj))(λ)]4σ2(Sftj)(λ)+2σ4=4σ2(Sf)(λ)+2σ4.

Since the yj are independent,

Var[1Mj=1M(Syj)(λ)]1M[4σ2(Sf)(λ)+2σ4].

Applying Chebyshev’s inequality to the random variable X=1Mj=1M(Syj)(λ) gives

(|1Mj=1M(Syj)(λ)[(Sf)(λ)+σ2]|t(2σ(Sf)(λ)+2σ2)M)1t2.

By Young’s convolution inequality, (Sf)(λ)=f*ψλ22f12ψλ22=f12, which gives (D.4). □

E. Supporting results: dilation MRA

This appendix contains the technical details of the dilation unbiasing procedure that is central to Propositions 4.1, 4.2 and 5.1. Lemma 4.1 bounds the bias and variance of the estimator, and Lemma 4.2 bounds the error of the estimator given M independent samples.

Lemma 4.1

Let (τ) = L((1 − τ)λ) for some function LCk+2(0, ∞) and a random variable τ satisfying the assumptions of Section 2.1, and let k ⩾ 2 be an even integer. Assume there exist functions Λi:, R: such that

|λiL(i)(λ)|Λi(λ)for0ik+2,Λk+2((1τ)λ)Λk+2(λ)R(λ),

and define the following estimator of L(λ):

Gλ(τ):=Fλ(τ)B2η2Fλ(τ)B4η4Fλ(4)(τ)BkηkFλ(k)(τ).

Then Gλ(τ) satisfies

|EGλ(τ)L(λ)|kR(λ)Λk+2(λ)(2Eη)k+2
VarGλ(τ)k2R(λ)2Λ(λ)2

where

Λ(λ)2:=0i,jk+2,i+j2Λi(λ)Λj(λ)(2Eη)i+j

and E is the absolute constant defined in (4.3).

Proof.

We Taylor expand Fλ(τ) about τ = 0

Fλ(τ)=Fλ(0)+Fλ(0)τ+Fλ(0)2τ2++Fλ(k+1)(0)(k+1)!τk+1+0τFλ(k+2)(t)(k+1)!(τt)k+1dt:=R0(τ,λ).

We note

E[Fλ(τ)]=Fλ(0)+Fλ(0)2η2++Fλk(0)k!Ckηk+E[R0(τ,λ)],

which motivates an unbiasing with the first k/2 even derivatives, and thus a Taylor expansion of these derivatives

Fλ(τ)=Fλ(0)+Fλ(0)τ++Fλ(k+1)(0)(k+1)!τk+1+0τFλ(k+2)(t)(k+1)!(τt)k+1dt:=R0(τ,λ)
Fλ(τ)=Fλ(0)+Fλ(3)(0)τ++Fλ(k+1)(0)(k1)!τk1+0τFλ(k+2)(t)(k1)!(τt)k1dt:=R2(τ,λ)
Fλ(4)(τ)=Fλ(4)(0)+Fλ(5)(0)τ++Fλ(k+1)(0)(k3)!τk3+0τFλ(k+2)(t)(k3)!(τt)k3dt:=R4(τ,λ)
Fλ(k)(τ)=Fλ(k)(0)+Fλ(k+1)(0)τ+0τFλ(k+2)(t)(τt)dt:=Rk(τ,λ).

Multiplication of the ith even derivative by Biηi gives

Fλ(τ)=Fλ(0)+Fλ(0)τ++Fλ(k+1)(0)(k+1)!τk+1+R0(τ,λ)
B2η2Fλ(τ)=B2η2Fλ(0)+B2η2Fλ(3)(0)τ++B2η2Fλ(k+1)(0)(k1)!τk1+B2η2R2(τ,λ)
B4η4Fλ(4)(τ)=B4η4Fλ(4)(0)+B4η4Fλ(5)(0)τ++B4η4Fλ(k+1)(0)(k3)!τk3+B4η4R4(τ,λ)
BkηkFλ(k)(τ)=BkηkFλ(k)(0)+BkηkFλ(k+1)(0)τ+BkηkRk(τ,λ).

We want an estimator that targets Fλ(0) = L(λ). We thus consider the following variable as an estimator:

Gλ(τ):=Fλ(τ)B2η2Fλ(τ)B4η4Fλ(4)(τ)BkηkFλ(k)(τ)

and show that E[Gλ(τ)]=Fλ(0)+O(ηk+2) for constants Bi chosen according to (4.1). We have

E[Fλ(τ)]=Fλ(0)+Fλ(0)C22η2++Fλ(k)(0)Ckk!ηk+E[R0(τ,λ)]E[B2η2Fλ(τ)]=Fλ(0)B2η2+Fλ(4)(0)B2C22η4++Fλ(k)(0)B2Ck2(k2)!ηk+E[B2η2R2(τ,λ)]E[B4η4Fλ(4)(τ)]=Fλ(4)(0)B4η4+Fλ(6)(0)B4C22η6++Fλ(k)(0)B4Ck4(k4)!ηk+E[B4η4R4(τ,λ)]E[Bk2ηk2Fλ(k2)(τ)]=Fλ(k2)(0)Bk2ηk2+Fλ(k)(0)Bk2C22ηk+E[Bk2ηk2Rk2(τ,λ)]E[BkηkFλ(k)(τ)]=Fλ(k)(0)Bkηk+E[BkηkRk(τ,λ)].

That is,

E[Gλ(τ)]=Fλ(0)+Fλ(0)(C22!B2)η2+Fλ(4)(0)(C44!B2C22!B4)η4+Fλ(6)(0)(C66!B2C44!B4C22!B6)η6+Fλ(k)(0)(Ckk!B2Ck2(k2)!Bk2C22!Bk)ηk+H1(λ)

where

H1(λ)=E[R0(λ,τ)B2η2R2(τ,λ)BkηkRk(λ,τ)].

Since (4.1) guarantees that

B2=C22!B4=C44!(C22!)2B6=C66!C2C42!4!(C44!(C22!)2)C22!Bk=Ckk!B2Ck2(k2)!Bk2C22!,

the coefficients of η2, η4, . . . , ηk vanish, and we obtain

E[Gλ(τ)]=Fλ(0)+H1(λ).

First we bound the bias H1(λ). In the remainder of the proof we let B0 = −1 to simplify notation, so that

H1(λ)=i=0,2,,kBiRi(λ,τ)ηi.

We first obtain a bound for |BiRi(λ, τ)ηi|. Note

(k+1i)!ηiRi(λ,τ)=ηi0τFλ(k+2)(t)(τt)k+1idt=ηi0τλk+2L(k+2)((1t)λ)(τt)k+1idt.

We observe that

|((1t)λ)k+2L(k+2)((1t)λ)|Λk+2((1t)λ)|λk+2L(k+2)((1t)λ)|1(1t)k+2Λk+2((1t)λ)Λk+2(λ)Λk+2(λ)|λk+2L(k+2)((1t)λ)|R(λ)Λk+2(λ)(1t)k+2

so that

R(λ)Λk+2(λ)(1t)k+2λk+2L(k+2)((1t)λ)R(λ)Λk+2(λ)(1t)k+2.

Now assume first of all that τ is positive. We have

|(k+1i)!ηiRi(λ,τ)|ηiR(λ)Λk+2(λ)0τ(τt)k+1i(1t)k+2dtηiR(λ)Λk+2(λ)0ττk+1i(1t)k+2dt=ηiτk+1iR(λ)Λk+2(λ)1(k+1)(1(1τ)k+11)2k+2R(λ)k+1ηiτk+2iΛk+2(λ)

where the last line follows since 1(1τ)k+122k+1τ for τ[0,12]. A similar argument can be applied when τ is negative, and we can conclude

|BiηiRi(λ,τ)|2k+2R(λ)(k+1)(k+1i)!Λk+2(λ)|Bi|ηi|τ|k+2i, (E.1)

which gives

E|BiηiRi(λ,τ)|2k+2R(λ)(k+1)(k+1i)!Λk+2(λ)Tk+2i|Bi|ηk+2=2k+2(k+2i)R(λ)k+1Λk+2(λ)Tk+2i(k+2i)!|Bi|ηk+2.

We thus obtain

|E[Gλ(τ)]L(λ)|=|H1(λ)|R(λ)Λk+2(λ)k+1(2Eη)k+2i=0,2,,k(k+2i)R(λ)kΛk+2(λ)(2Eη)k+2,

which establishes the bound on the bias. We now bound the variance. We note

Gλ(τ)=i=0,2,,kj=0,1,,k+1iBij!Fλ(i+j)(0)ηiτj:=(I)+i=0,2,,kBiRi(λ,τ)ηi:=(II).

Thus,

Var[Gλ(τ)]=E[Gλ(τ)2]E[Gλ(τ)]2=E[(I)(I)]+2E[(I)(II)]+E[(II)(II)]Fλ(0)22Fλ(0)H1(λ)H1(λ)2(E[(I)(I)]Fλ(0)2):=(A)+(2E[(I)(II)]2Fλ(0)H1(λ)):=(B)+E[(II)(II)]:=(C)

and we proceed to bound each term.

(I)(I)Fλ(0)2=i=0,2,,k=0,2,,kj=0k+1is=0k+1BiBj!!Fλ(i+j)(0)Fλ(+s)(0)ηi+τj+s1E

where 1E is an indicator function indicating that i,j,,s are not all zero. We have

E|BiBj!!Fλ(i+j)(0)Fλ(+s)(0)ηi+τj+s||BiB|j!!Cj+sΛi+j(λ)Λ+s(λ)ηi++j+s|BiB|j!!TjTsΛi+j(λ)Λ+s(λ)ηi++j+sEi+jE+sΛi+j(λ)Λ+s(λ)ηi++j+s=(Λi+j(λ)(Eη)i+j)(Λ+s(λ)(Eη)+s).

Noting that only terms where j + s is even survive expectation, and letting i˜=i+j and ˜=+s, we obtain

E[(I)(I)]Fλ(0)2i=0,2,,k=0,2,,kj=0k+1is=0k+1Λi+j(λ)(4Tη)i+jΛ+s(λ)(4Tη)+s1E1(j+seven)=i˜=0k+1˜=0k+1Ci˜,˜Λi˜(λ)(Eη)i˜Λ˜(λ)(Eη)˜

for coefficients Ci˜˜ such that C0,0 = 0, Ci˜˜=0 if i˜+˜ is odd, and Ci˜˜˜k2. Thus,

E[(I)(I)]Fλ(0)2k22i˜+˜2k+2i˜+˜evenΛi˜(λ)Λ˜(λ)(Eη)i˜+˜k2Λ(λ)2.

Next we bound E[(II)(II)].

(II)(II)=i=0,2,,k=0,2,kBiBRi(λ,τ)R(λ,τ)ηi+.

Utilizing Equation (E.1), we have

|BiBRi(λ,τ)R(λ,τ)ηi+|22k+4R(λ)2|BiB|(k+1)2(k+1i)!(k+1)!Λk+2(λ)2ηi+|τ|2k+4i,

which gives

E|BiBRi(λ,τ)R(λ,τ)ηi+|22k+4R(λ)2T2k+4i|BiB|(k+1)2(k+1i)!(k+1)!Λk+2(λ)2η2k+4R(λ)2(k+2i)(k+2)(k+1)2(Tk+2i|Bi|(k+2i)!)(Tk+2|B|(k+2)!)Λk+2(λ)2(2η)2k+4R(λ)2(k+2i)(k+2)(k+1)2Λk+2(λ)2(2Eη)2k+4

so that

E[(II)(II)]R(λ)2(k+1)2Λk+2(λ)2(2Eη)2k+4i=0,2,,k=0,2,k(k+1i)(k+2)k2R(λ)2Λk+2(λ)2(2Eη)2k+4k2R(λ)2Λ(λ)2.

Finally we bound the cross term 2E[(I)(II)]2Fλ(0)H1(λ).

(I)(II)=i=0,2,,kj=0k+1i=0,2,,kBij!Fλ(i+j)(0)ηiτjBR(λ,τ)η (E.2)

Since |Fλ(i+j)(0)|Λi+j(λ) and |BR(λ,τ)η|2k+2R(λ)|B|(k+1)(k+1)!Λk+2(λ)ητk+2 from (E.1), we have

|Bij!Fλ(i+j)(0)ηiτjBR(λ,τ)η|2k+2R(λ)|BiB|(k+1)j!(k+1)!Λi+j(λ)Λk+2(λ)ηi+τk+2+j

so that

E|Bij!Fλ(i+j)(0)ηiτjBR(λ,τ)η|2k+2R(λ)Tk+2+j|BiB|(k+1)j!(k+1)!Λi+j(λ)Λk+2(λ)ηi+j+k+2=2k+2R(λ)(k+2)(k+1)(Tj|Bi|j!)(Tk+2|B|(k+2)!)Λi+j(λ)Λk+2(λ)ηi+j+k+2=R(λ)(k+2)(k+1)[(Eη)i+jΛi+j(λ)][(2Eη)k+2Λk+2(λ)].

The same bound holds for the terms of Fλ(0)H1(λ), which arise from i = 0, j = 0 in (E.2), so that

2E[(I)(II)]2Fλ(0)H1(λ)(i=0,2,,kj=0k+1i(Eη)i+jΛi+j(λ))(=0,2,,kR(λ)(k+2)(k+1)(2Eη)k+2Λk+2(λ))(ki˜=0k+1Λi˜(λ)(Eη)i˜)(kR(λ)(2Eη)k+2Λk+2(λ))k2R(λ)i˜=0k+1Λi˜(λ)Λk+2(λ)(2Eη)i˜+k+2k2R(λ)Λ(λ)2.

Thus, Var[Gλ(τ)]k2R(λ)2Λ(λ)2 and the lemma is proved. □

Lemma 4.2

Let the assumptions and notation of Lemma 4.1 hold, and let τ1, . . . , τM be independent. Define

L˜(λ):=1Mj=1MGλ(τj).

Then, with probability at least 1 − 1/t2,

|L˜(λ)L(λ)|kR(λ)(Λk+2(λ)(2Eη)k+2+tΛ(λ)M).

Proof.

By Lemma 4.1 and the independence of the τj, we have

|L(λ)EL˜(λ)|kR(λ)Λk+2(λ)(2Eη)k+2
VarL˜(λ)1Mk2Λ(λ)2

so by Chebyshev’s inequality we can conclude that with probability at least 1 − 1/t, we have

|L˜(λ)E[L˜(λ)]|tkR(λ)Λ(λ)M,

which gives

|L(λ)L˜(λ)||L(λ)E[L˜(λ)]|+|E[L˜(λ)]L˜(λ)|kR(λ)Λk+2(λ)(2Eη)k+2+tkR(λ)Λ(λ)M.

F. Supporting results: noisy dilation MRA

This appendix contains supporting results needed to prove Proposition 5.1, which defines a wavelet invariant estimator for noisy dilation MRA. Lemma 5.1 controls the additive noise error and Lemma 5.2 controls the cross-term error. Lemma F.1 guarantees that the dilation unbiasing procedure applied to the additive noise still has mean σ2, which is needed to prove Lemma 5.1.

Lemma 5.1

Let the notation and assumptions of Proposition 5.1 hold, and let Aλ be the operator defined in (5.4). Then, with probability at least 1 − 1/t2,

|1Mj=1M12π|ε^j(ω)|2Aλ|ψ^λ(ω)|2dωσ22tkΨσ2M.

Proof.

Let

D(εj,λ):=12π|ϵj^(ω)|2Aλ|ψ^λ(ω)|2dω.

By Lemma D.1, Eε[|ε^j(ω)|2]=σ2, and we thus obtain

Eε[D(εj,λ)]=Eε[12π|εj^(ω)|2Aλ|ψ^λ(ω)|2dω]=Eε[12π|ε^j(ω)|2|ψ^λ(ω)|2dω12π|ε^j(ω)|2B2η2λ2ddλ2|ψ^λ(ω)|2dω12π|εj^(ω)|2Bkηkλkddλk|ψ^λ(ω)|2dω]=σ2(12π|ψ^λ(ω)|2dωB2η22πλ2ddλ2|ψ^λ(ω)|2dωBkηk2πλkddλk|ψ^λ(ω)|2dω)=σ2(100)=σ2,

where we have used Lemma F.1 to conclude λm(dmdλm|ψ^λ(ω)|2)dω=0 m = 2, . . . , k. Also since (a1++an)2n(a12++an2) by the Cauchy–Schwarz inequality, we obtain

Eε[D(εj,λ)2]Eε[km=0,2,..,k(Bmηm2π|ε^j(ω)|2λmdmdλm|ψ^λ(ω)|2dω)2]

where we let ddλ0|ψ^λ(ω)|2 denote |ψ^λ(ω)|2 and B0 = 1. By Lemma D.1, we have Eε[|εj(ω)|2|εj(ξ)|2]3σ4 for all frequencies ω, ξ, so that

Eε[(Bmηm2π|εj^(ω)|2λmdmdλm|ψ^λ(ω)|2dω)2]Eε[Bm2η2m4π2|ε^j(ω)|2|εj^(ξ)|2|λmdmdλm|ψ^λ(ω)|2||λmdmdλm|ψ^λ(ξ)|2dωdξ]3σ4(12π|Bmηmλmdmdλm|ψ^λ(ω)|2dω)23σ4Ψm2(Eη)2m,

where the last line follows from Corollary B.1 in Appendix B. We thus obtain

Eε[D(εj,λ)2]km=0,2,..,kEε[(Bmηm2π|ε^j(ω)|2λmdmdλm|ψ^λ(ω)|2dω)2]3kσ4m=0,2,..,kΨm2(Eη)2m:=(I)

so that

Eε[D(εj,λ)σ2]=0
Varε[D(εj,λ)σ2]=Varε[D(εj,λ)]Eε[(D(εj,λ))2](I).

Thus,

Varε(1Mj=1MD(εj,λ)σ2)(I)M

so that by Chebyshev’s inequality with probability at least 1 − 1/t2

|1Mj=1MD(εj,λ)σ2|t(I)Mt3k(m=0,2,,kΨm(Eη)m)σ2M=2tkΨσ2M.

Lemma 5.2

Let the notation and assumptions of Proposition 5.1 hold, and let Aλ be the operator defined in (5.4). Then, with probability at least 1 − 1/t2,

|1Mj=1M12π(f^τj(ω)ε^j(ω)+f^τj(ω)ε^j(ω))Aλ|ψ^λ(ω)|2dωtMΨ(Λ0(λ)+Λ(λ))σ.

Proof.

We have

1Mj=1M12π(f^τj(ω)ε^^j(ω)+f^τj(ω)ε^j(ω))Aλ|ψ^λ(ω)|2dω=1Mj=1MYj+Y¯j

where

Yj:=12π(fτj^¯(ω)ε^j(ω))Aλ|ψ^λ(ω)|2dω.

The random variable Yj has randomness depending on both εj and τj. Note that

Eε,τ[Yj]=Eε,τ[Eε,τ[Yjτj]]

since Yj is integrable. Thus, since Eε,τ[ε^j(ω)]=0, we obtain Eε,τ[Yjτj]=0, which yields Eε,τ[Yj]=0. We also have

Varε,τ[Yj]=Eε,τ[Yj2]Eε,τ[(12π|f^τj¯(ω)||ε^j(ω)||Aλ|ψ^λ(ω)|2dω)2]Eε,τ[(12π|f^τj¯(ω)|2|Aλ|ψ^λ(ω)|2dω)(12π|ε^j(ω)|2|Aλ|ψ^λ(ω)|2dω)]=Eτ[12π|f^τj¯(ω)|2|Aλ|ψ^λ(ω)|2dω]Eε[12π|ε^j(ω)|2|Aλ|ψ^λ(ω)|2dω].

Letting B0 = 1 and applying Lemma B.2, we have

Eτ[12π|f^τj¯(ω)|2|Aλ|ψ^λ(ω)|2dω]Eτ[m=0,2,,k12π|f^τj(ω)|2|Bmηmλmdmdλm|ψ^λ(ω)|2|dω]Eτ[m=0,2,,k(Eη)m(fτj12Ψmfτj12Θmλ2)]m=0,2,,k(Eη)m(f12Ψm4f12Θmλ2)4m=0,2,,k(Eη)mΛm(λ)Λ0(λ)+Λ(λ)

since τj12 guarantees fτj1=11τjf12f1. Also,

Eε[12π|ε^j(ω)|2|Aλ|ψ^λ(ω)|2|dω]Eε[m=0,2,,k12π|ε^j(ω)|2|Bmηmλmdmdλm|ψ^λ(ω)|2|dω]=σ2(m=0,2,,k12π|Bmηmλmdmdλm|ψ^λ(ω)|2|dω)σ2m=0,2,,k(Eη)mΨm=σ2Ψ

where the second line follows from Lemma D.1 in Appendix D and the next to last line from Corollary B.1 in Appendix B. We thus have

Eε,τ[Yj]=0
Varε,τ[Yj]σ2Ψ(Λ0(λ)+Λ(λ))

and an identical argument can be applied to the Yj¯ so that by Chebyshev’s inequality with probability at least 1 − 1/t2

|1Mj=1MYj+Y¯j||1Mj=1MYj|+|1Mj=1MYj¯|tΨΛ0(λ)+Λ(λ)σM.

Lemma F.1

Assume ψ is k-admissible. Then,

λm(dmdλm|ψ^λ(ω)|2)dω=0 (F.1)

for all 1 ⩽ mk.

Proof.

We recall that since ψ is k-admissible, |ψ^λ(ω)|2Ck(), and to simplify notation we let g=|ψ^|2 and

gλ(ω)=1λg(ωλ)=|ψ^λ(ω)|2.

We first establish that

λk(ddλkgλ(ω))=ddω(ωλk1ddλk1gλ(ω))(k1)λk1ddλk1gλ(ω). (F.2)

The proof is by induction. When k = 1, we obtain

LHSofEqn.(F.2)=λddλ(1λg(ωλ))=ωλ2g(ωλ)1λg(ωλ)=ωgλ(ω)gλ(ω)

and

RHSofEqn.(F.2)=ddω(ωgλ(ω))=ωgλ(ω)gλ(ω),

so the base case is established. We now assume that Equation (F.2) holds and show it also holds for k+1 replacing k. By the inductive hypothesis

ddλkgλ(ω)=ddω(ωλ1ddλk1gλ(ω))(k1)λ1ddλk1gλ(ω)ddλk+1gλ(ω)=ddω(ωλ1ddλkgλ(ω)+ddλk1gλ(ω)ωλ2)(k1)(λ1ddλkgλ(ω)+ddλk1gλ(ω)(λ2))=ddω(ωλ1ddλkgλ(ω))(k1)λ1ddλkgλ(ω)+ddω(ωλ2ddλk1gλ(ω))+(k1)λ2ddλk1gλ(ω)=λ1ddλkgλ(ω)byinductivehypothesis=ddω(ωλ1ddλkgλ(ω))kλ1ddλkgλ(ω)

so that

λk+1ddλk+1gλ(ω)=ddω(ωλkddλkgλ(ω))kλkddλkgλ(ω).

Thus, (F.2) is established. We now use integration by parts to show (F.2) implies (F.1) in the Lemma. The proof of (F.1) is once again by induction. When k = 1, we have already shown

λ(ddλgλ(ω))=ωgλ(ω)gλ(ω). (F.3)

Integration by parts gives

ωgλ(ω)dω=(ωgλ(ω))|gλ(ω)dω=gλ(ω)dω.

Note ωgλ(ω) vanishes at ±∞ since gL1() guarantees gλL1(), and thus gλ must decay faster that ω−1. Utilizing (F.3),

ωgλ(ω)gλ(ω)dω=0λ(ddλgλ(ω))dω=0

and the base case is established. We now assume

λk1(ddλk1gλ(ω))dω=0.

By integrating Equation (F.2), we obtain

λk(ddλkgλ(ω))dω=ddω(ωλk1ddλk1gλ(ω))dω(k1)λk1ddλk1gλ(ω)dω=0byinduc.hypo.=ωddω(λk1ddλk1gλ(ω))dωλk1ddλk1gλ(ω)dω=0byinduc.hypo.=ωλk1ddλk1gλ(ω)|+λk1ddλk1gλ(ω)dω=0byinduc.hypo.=0.

We are guaranteed ωλk1ddλk1gλ(ω) vanishes at ±∞ since in the proof of Lemma 4.3 we showed λk1ddλk1gλ(ω)=j=0k1Cjωjgλ(j)(ω), and ωjgλ(j)L1() implies ωj+1gλ(j) vanishes at ±∞. □

G. Moment estimation for noisy dilation MRA

In this appendix we outline a moment estimation procedure for noisy dilation MRA (Model 2) in the special case t = 0, i.e. signals are randomly dilated and subjected to additive noise but are not translated. This procedure is a generalization of the method presented in Section 6.3.

Given the additive noise level, the moments of the dilation distribution τ can be empirically estimated from the mean and variance of the random variables βm(yj) defined by

βm(yj)=02πωmy^j(ω)dω (G.1)

for integer m ⩾ 0. To account for the effect of additive noise on the above random variables, we define

gm(,σ)=02π02π2σ2ξmωmsin(12(ξω))(ξω)dωdξ (G.2)

and an order m additive noise adjusted squared coefficient of variation by

CVm:=Var[βm(yj)]gm(,σ)|E[βm(yj)]|2. (G.3)

Remark G.1

If the noisy signals are supported in [N2,N2] instead of [12,12], (G.2) is replaced with

gm(N,,σ)=02π02π2σ2ξmωmsin(N2(ξω))(ξω)dωdξ.

The following proposition mirrors Proposition 6.1 for dilation MRA; its proof appears at the end of Appendix G.

Proposition G.1

Assume Model 2 with t = 0 and CV0, CV1 defined by (G.1), (G.2) and (G.3). Then,

CV0=η2+(3C43)η4+O(η6)
CV1=4η2+(25C433)η4+O(η6).

Once again we cannot compute CVm exactly, but by replacing Var, E with their finite sample estimators, we obtain approximations CV˜m that can be used to define estimators of the dilation moments.

Definition G.1

Assume Model 3 with t = 0 and CV˜0, CV˜1 the empirical counterparts of (G.3). Define the second-order estimator of η2 by η˜2=CV˜0. Define the fourth-order estimators of (η2, C4η4) by the unique positive solution (η˜2, C˜4) of

CV˜0=η2+(3C43)η4
CV˜1=4η2+(25C433)η4.

As M → ∞, the second-order moment estimator is accurate up to O(η4) and the fourth-order moment estimators are accurate up to O(η6). However, in the finite sample regime, the gm(, σ) appearing in (G.3) will be replaced with gm(,σ)±O(σ2/M), so that the estimators given in Definition G.1 are subject to an error of order O(σ2/M). More generally, the additive noise fluctuations imply that to estimate the first k/2 even moments of τ up to an O(ηk+1) error will require σ2/Mηk+1, or Mσ4η−2(k+1).

Having established an empirical moment estimation procedure for noisy dilation MRA when t = 0, we repeat the simulations of Section 5.2 on the restricted model, but estimate the additive and dilation moments empirically. Since accurately estimating the moments of τ is difficult for σ large, we make three modifications to the oracle set-up. First, we lower the additive noise level by a factor of 2 from the oracle simulations, and consider all parameter combinations resulting from σ = 2−5, 2−4 (giving SNR = 9.0, 2.2) and η = 0.06, 0.12. Secondly, we take M substantially larger than for the oracle simulations, with 16, 384 ⩽ M ⩽ 370, 727. Thirdly, we compute WSC k = 4 only for large dilations. For large dilations (η2, C4η4) are approximated with fourth-order estimators, while for small dilations η2 is approximated with a second-order estimator (see Definition G.1).

Results are shown in Fig. G7, and the same overall behavior observed in the oracle simulations for large M holds. The additive noise level was estimated empirically as described in Section 6.2. For the medium- and high-frequency signal, WSC k = 2 has substantially smaller error than both PS k = 0 and WSC k = 0; for the large-frequency signal, the error is decreased by at least a factor of 2 for large dilations and a factor of 4 for small dilations relative to both zero order estimators. When WSC k = 4 is defined, it has a smaller error than WSC k = 2 for the high-frequency signal, while WSC k = 2 is preferable for the low- and medium-frequency signal. We observe that for the oracle simulations WSC k = 4 is preferable for all frequencies, so this is most likely due to error in the moment estimation degrading the WSC k = 4 estimator. For the low-frequency signal, PS k = 0 once again achieves the smallest error for small dilations, while for large dilations the higher order wavelet methods appear to surpass PS k = 0 for M large enough.

Fig. G7.

Fig. G7.

L2 error with standard error bars for noisy dilation MRA model (t = 0, empirical moment estimation). First, second and third column shows results for low-, medium- and high-frequency signals. All plots have the same axis limits.

Proof of Proposition G.1.

Since yj=Lτjf+εj, we have

E[βm(yj)]=E[02πωm(f^τj(ω)+ε^j(ω))dω]=E[02πωmf^τj(ω)dω]=E[02πωmf^((1τj)ω)dω]=E[022π(1τj)ξm(1τj)mf^(ξ)dξ(1τj)]=βm(f)E[(1τj)(m+1)].

We now compute the variance. We first establish that

gm(,σ)=E[(02πωmε^j(ω)dω)(02πωmε^j(ω)¯dω)].

By Thm 4.5 of [49]

E[ε^j(ω)ε^j(ξ)¯]=E[(1/21/2eiωtdBt)(1/21/2eiξtdBt)]=σ21/21/2ei(ξω)tdt=2σ2sin(12(ξω))(ξω)

so that

E[(02πε^j(ω)dω)(02πε^j(ω)¯dω)]=02π02πωmξmE[ε^j(ω)ε^j(ξ)¯]dωdξ=02π02πωmξm2σ2sin(12(ξω))(ξω)dωdξ=gm(,σ).

We thus obtain

[|βm(yj)|2]=E[(02πωm(f^τj(ω)+ε^j(ω))dω)(02πωm(f^τj(ω)¯+ε^j(ω)¯)dω)]=E[(02πωmf^((1τj)ω)dω)(02πωmf^((1τj)ω)¯dω)+(02πωmε^j(ω)dω)(02πωmε^j(ω)¯dω)]=E[(1τj)2(m+1)βm(f)βm(f)¯]+gm(,σ)=|βm(f)|2E[(1τj)2(m+1)]+gm(,σ).

Thus,

Var[βm(yj)]gm(,σ)=E[|βm(yj)|2]gm(,σ)|E[βm(yj)]|2=|βm(f)|2E[(1τj)2(m+1)]|βm(f)|2(E[(1τj)(m+1)])2.

Dividing by |E[βm(yj)]|2 gives

CVm=E[(1τj)2(m+1)](E[(1τj)(m+1)])21,

and the remainder of the proof is identical to the proof of Proposition 6.1. □

H. Additional simulations for noisy dilation MRA

We investigate the L2 error of estimating the power spectrum using PS (k = 0) and WSC (k = 0, 2, 4) for three additional high-frequency functions:

f4(x)=1.175cos(32x)1(x[0.2,0.2])
f5(x)=0.299exp0.04x2cos(30x+1.5x2)
f6(x)=(2.304/π)cos(35x)sinc(3x).

The multiplicative constants were chosen so that the L2 norms of f4, f5, f6 are comparable with the L2 norms of the Gabor signals f1, f2, f3 defined in Section 4.4. The signal f4 is not continuous and has compact support, with a slowly decaying, oscillating Fourier transform given by f^4(ω)/0.47=sinc(0.2(ω32))+sinc(0.2(ω32)). The signal f5 is a linear chirp with a constantly varying instantaneous frequency. The signal f6 is slowly decaying in space, with a discontinuous Fourier transform of compact support given by f^6(ω)/0.384=1(ω[38,32])+1(ω[32,38]).

Implementation details were as described in Section 6, and simulations were run with oracle moment estimation on the full model (parameter values as described in Section 5.2). Figure H8 shows the L2 error. As for the high-frequency Gabor in Section 5.2, WSC (k = 2) and WSC (k = 4) significantly outperformed the zero order estimators. In addition for large dilations, the WSC (k = 4) outperformed WSC (k = 2) on f4 and f6.

Fig. H8.

Fig. H8.

L2 error with standard error bars for noisy dilation MRA model (oracle moment estimation). First, second and third columns show results for f4, f5 and f6. All plots for the same signal have the same axis limits.

I. Expectation maximization algorithm for noisy dilation MRA

In this appendix we discuss how the expectation-maximization (EM) algorithm proposed in [1] can be extended to solve noisy dilation MRA. We first summarize the EM framework, which differentiates between observed data y={yj}j=1M, latent variables s={sj}j=1M and model parameters x. The goal is to produce the x that maximizes the marginalized likelihood function

p(y|x)=p(y,s|x)ds.

Maximizing p(y|x) directly is generally not tenable because enumerating the various values for s is too costly. However, EM algorithms can be used to find local maxima of the above function, by iterating between estimating the conditional distribution of latent variables given the current estimate of parameters (E-step) and estimating parameters given the current estimate of the conditional distribution of latent variables (M-step). Specifically, the iterative procedure updates xk, the current estimate of x, by

Q(x|xk)=Es|y,xk[logp(y,s|x)]E-step (I.1)
xk+1=argmaxxQ(x|xk)M-step. (I.2)

Since (under certain conditions) log p(y|x) improves at least as much as Q at each iteration [30], the algorithm converges to a local maximum of p(y|x). This framework can be applied to noisy dilation MRA, and explicit formulas for both the E-step and M-step can be derived. Assume for simplicity that signals have been discretized to have length n and that the translation distribution ρt and dilation distribution ρτ are unknown and also discrete with n possible values {t}=1n and {τq}q=1n, respectively. Letting x = (f, ρt, ρτ) denote the parameters, sj = (tj, τj) denote the latent/nuisance variables, and px denote conditioning on x, the likelihood function has form

p(y,s|x)=px(y|s)px(s)=j=1M1(2πσ2)n2exp(12σ2LτjTtjfyj22)ρt(tj)ρτ(τj).

Thus (up to a constant), the log likelihood has form

logp(y,s|x)=j=1M12σ2LτjTtjfyj22+j=1Mlogρt(tj)+j=1Mlogρτ(τj). (I.3)

Given the current estimate xk=(fk,ρtk,ρτk) of parameters, the E-step is performed by first computing the conditional distribution of the latent variables

wk,q,j=(tj=t,τj=τq|xk)=Ckjexp(12σ2LτjTtjfkyj22)ρtk(t)ρτk(τq), (I.4)

where Ckj is a normalizing constant so that ,qwk,q,j=1. These weights are then used to compute Q, that is, by combining (I1), (I3) and (I4):

Q(f,ρt,ρτ|fk,ρtk,ρτk)=j=1M=1nq=1nwk,q,j(12σ2LτjTtjfyj22+logρt(t)+logρτ(τq)), (I.5)

up to a constant. The M-step is then computed by

(fk+1,ρtk+1,ρτk+1)=argmaxf,ρt,ρτQ(f,ρt,ρτ|fk,ρtk,ρτk). (I.6)

Since f, ρt, ρτ all appear in distinct sums in (I5), performing the maximization in (I6) is straightforward. Since LτjTtjfyi22=11τqfTl1Lτ1yj22, it is easy to check that

LτjTtjfyj22=11τqfT1Lτ1yj22
fk+1=1Cj=1M=1nq=1nwk,q,j(1τq)T1Lτ1yj,C=j=1M=1nq=1nwk,q,j(1τq). (I.7)

Using Lemma 15 in [1], one can also obtain closed form expressions for the updates to ρtk, ρτk:

ρtk+1(t)=w˜kw˜kforw˜k=jqwk,q,j,ρτk+1(τq)=v˜kqqv˜kqqforv˜kq=jwk,q,j.

Note when a discrete signal defined on some fixed grid is dilated, its dilation is defined on a different grid. Thus, computing (I.4) and (I.7) will involve off-grid interpolation, a subtlety not arising in classic MRA, and this interpolation may contribute additional error. We also note that one can always force the translation distribution to be uniform by retranslating the signals uniformly, and in this case all sums over in this section could be eliminated. This would improve the computational complexity of the algorithm but may be disadvantageous in terms of sample complexity, as in classic MRA a uniform translation distribution requires a larger sample size for accurate estimation than an aperiodic translation distribution [1].

J. Supporting results: stochastic calculus

This appendix contains several stochastic calculus results that are used to control the statistics of the additive noise. Proposition J.1 is a simple generalization of Thm 4.5 of [49]. Proposition J.2 controls the second moment of the stochastic quantity in Proposition J.1 and is in fact a special case of Proposition J.3. Both Propositions J.2 and J.3 are proved with standard techniques from stochastic calculus, and for brevity we omit the proofs.

Proposition J.1

Assume 0Tf(t)2dt<, 0Tf(t)¯2dt<, and let Bt be a Brownian motion with variance σ2. Then,

E[(0Tf(t)dBt)(0Tf(t)¯dBt)]=σ20Tf(t)f(t)¯dt.

Proposition J.2

Let f(t) be a bounded and continuous complex deterministic function on [0, T], and let Bt be a Brownian motion with variance σ2. Then, for a fixed non-random time T, we have

E[(0Tf(t)dBt)2(0Tf(t)¯dBt)2]=2σ4(0T|f(t)|2dt)2+σ4(0Tf(t)2dt)(0Tf(t)¯2dt).

Corollary J.1

When f(t) is real, the above reduces to

E[(0Tf(t)dBt)4]=3σ4(0Tf(t)2dt)2.

Proposition J.3

Let f(t), g(t) be bounded and continuous complex deterministic functions on [0, T], and let Bt be a Brownian motion with variance σ2. Then, for a fixed non-random time T, we have

E[(0Tf(t)dBt)(0Tf(t)¯dBt)(0Tg(t)dBt)(0Tg(t)¯dBt)]=σ4[(0Tf(t)g(t)dt)(0Tf(t)g(t)¯dt)+(0Tf(t)g(t)¯dt)(0Tf(t)¯g(t)dt)+(0T|f(t)|2dt)(0T|g(t)|2dt)].

Contributor Information

Matthew Hirn, Department of Computational Mathematics, Science and Engineering, Department of Mathematics and Center for Quantum Computing, Science and Engineering, Michigan State University, East Lansing, MI 48824.

Anna Little, Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824.

References

  • 1.Abbe E, Bendory T, Leeb W, Pereira JM, Sharon N & Singer A (2018) Multireference alignment is easier with an aperiodic translation distribution. IEEE Trans. Inf. Theory, 65, 3565–3584. [Google Scholar]
  • 2.Aizenbud Y, Landa B & Shkolnisky Y (2019) Rank-one multi-reference factor analysis. arXiv preprint arXiv:1905.12442. [Google Scholar]
  • 3.Bai X. c., Rajendra E, Yang G, Shi Y & Scheres SHW (2015) Sampling the conformational space of the catalytic subunit of human γ-secretase. eLife, 4, e11182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bandeira A, Chen Y, Lederman RR & Singer A (2020) Non-unique games over compact groups and orientation estimation in cryo-EM. Inverse Probl. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Bandeira A, Rigollet P & Weed J (2017) Optimal rates of estimation for multi-reference alignment. arXiv preprint at arXiv:1702.08546. [Google Scholar]
  • 6.Bandeira AS (2015) Synchronization problems and alignment. Topics in Mathematics of Data Science Lecture Notes. Cambridge, MA: Massachusetts Institute of Technology. [Google Scholar]
  • 7.Bandeira AS, Blum-Smith B, Kileel J, Perry A, Weed J & Wein AS (2017) Estimation under group actions: recovering orbits from invariants. arXiv preprint arXiv:1712.10163. [Google Scholar]
  • 8.Bandeira AS, Boumal N & Singer A (2017) Tightness of the maximum likelihood semidefinite relaxation for angular synchronization. Math. Programming, 163, 145–167. [Google Scholar]
  • 9.Bandeira AS, Boumal N & Voroninski V (2016) On the low-rank approach for semidefinite programs arising in synchronization and community detection. Conf. Learn. Theory, 49, 361–382. [Google Scholar]
  • 10.Bandeira AS, Charikar M, Singer A & Zhu A (2014) Multireference alignment using semidefinite programming. Proceedings of the 5th Conference on Innovations in Theoretical Computer Science. ACM, pp. 459–470. [Google Scholar]
  • 11.Bartesaghi A, Merk A, Banerjee S, Matthies D, Wu X, Milne JLS & Subramaniam S (2015) 2.2 åresolution cryo-EM structure of β-galactosidase in complex with a cell-permeant inhibitor. Science, 348, 1147–1151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Bendory T, Bartesaghi A & Singer A (March 2020) Single-particle cryo-electron microscopy: mathematical theory, computational challenges, and opportunities. IEEE Signal Process. Mag, 37, 58–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bendory T, Boumal N, Leeb W, Levin E & Singer A (2019) Multi-target detection with application to cryo-electron microscopy. Inverse Probl. [Google Scholar]
  • 14.Bendory T, Boumal N, Ma C, Zhao Z & Singer A (2017) Bispectrum inversion with application to multireference alignment. IEEE Trans. Signal Process, 66, 1037–1050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Boumal N (2016) Nonconvex phase synchronization. SIAM J. Optim, 26, 2355–2377. [Google Scholar]
  • 16.Boumal N, Bendory T, Lederman RR & Singer A (2018) Heterogeneous multireference alignment: a single pass approach. 2018 52nd Annual Conference on Information Sciences and Systems (CISS). IEEE, pp. 1–6. [Google Scholar]
  • 17.Bowman GD & Poirier MG (2015) Post-translational modifications of histones that influence nucleosome dynamics. Chem. Rev, 115, 2274–2295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Brown LG (1992) A survey of image registration techniques. ACM Computing Surv. (CSUR), 24, 325–376. [Google Scholar]
  • 19.Bruna J & Mallat S (2018) Multiscale sparse microcanonical models. Math. Stat. Learn, 1, 257–315. [Google Scholar]
  • 20.Buescu J & Paixão AC (2007) Eigenvalue distribution of positive definite kernels on unbounded domains. Integral Equ. Oper. Theory, 57, 19–41. [Google Scholar]
  • 21.Buescu J, Paixao AC, Garcia F & Lourtie I (2004) Positive-definiteness, integral equations and fourier transforms. J. Integral Equ. Appl, 16, 33–52. [Google Scholar]
  • 22.Capodiferro L, Cusani R, Jacovitti G & Vascotto M (1987) A correlation based technique for shift, scale, and rotation independent object identification. ICASSP’87: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 12. IEEE, pp. 221–224. [Google Scholar]
  • 23.Chandran V & Elgar SL (1992) Position, rotation, and scale invariant recognition of images using higher-order spectra. ICASSP’92: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5. IEEE, pp. 213–216. [Google Scholar]
  • 24.Chen Y & Candès EJ (2018) The projected power method: an efficient algorithm for joint alignment from pairwise differences. Comm. Pure Appl. Math, 71, 1648–1714. [Google Scholar]
  • 25.Chen Y, Guibas LJ & Huang Q-X (2014) Near-optimal joint object matching via convex relaxation. Proceedings of the 31st International Conference on Machine Learning, vol. 32 of Proceedings of Machine Learning Research, pp. 100–108. [Google Scholar]
  • 26.Cheng C, Jiang J & Sun Q (2017) Phaseless sampling and reconstruction of real-valued signals in shift-invariant spaces. J. Fourier Anal. Appl, 1–34. [Google Scholar]
  • 27.Clerc M & Mallat S (2002) The texture gradient equation for recovering shape from texture. IEEE Trans. Pattern Anal. Mach. Intell, 24, 536–549. [Google Scholar]
  • 28.Clerc M & Mallat S (2003) Estimating deformations of stationary processes. Ann. Stat, 31, 1772–1821. [Google Scholar]
  • 29.Collis WB, White PR & Hammond JK (1998) Higher-order spectra: the bispectrum and trispectrum. Mech. Syst. Signal Process, 12, 375–394. [Google Scholar]
  • 30.Dempster AP, Laird NM & Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J. Royal Stat. Soc. Ser. B (Methodological), 39, 1–22. [Google Scholar]
  • 31.DesJarlais R & Tummino PJ (2016) Role of histone-modifying enzymes and their complexes in regulation of chromatin biology. Biochemistry, 55, 1584–1599. [DOI] [PubMed] [Google Scholar]
  • 32.Diamond R (1992) On the multiple simultaneous superposition of molecular structures by rigid body transformations. Protein Sci, 1, 1279–1287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Dvornek NC, Sigworth FJ & Tagare HD (2015) Subspaceem: a fast maximum-a-posteriori algorithm for cryo-EM single particle reconstruction. J. Struct. Biol, 190, 200–214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Eickenberg M, Exarchakis G, Hirn M & Mallat S (2017) Solid harmonic wavelet scattering: predicting quantum molecular energy from invariant descriptors of 3D electronic densities. Adv. Neural Inf. Proc. Syst 30 (NIPS 2017), 6540–6549. [Google Scholar]
  • 35.Eickenberg M, Exarchakis G, Hirn M, Mallat S & Thiry L (2018) Solid harmonic wavelet scattering for predictions of molecule properties. J. Chem. Phys, 148, 241732. [DOI] [PubMed] [Google Scholar]
  • 36.Ekman D, Björklund AK, Frey-Skött J & Elofsson A (2005) Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. J. Mol. Biol, 348, 231–243. [DOI] [PubMed] [Google Scholar]
  • 37.Fernandez-Leiro R, Conrad J, Scheres SHW & Lamers MH (2015) Cryo-EM structures of the E. coli replicative dna polymerase reveal its dynamic interactions with the DNA sliding clamp, exonuclease and t. eLife, 4, e11134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Fischer N, Neumann P, Konevega AL, Bock LV, Ficner R, Rodnina MV & Stark H (2015) Structure of the E. coli ribosome–ef-tu complex at< 3 åresolution by c s-corrected cryo-EM. Nature, 520, 567–570. [DOI] [PubMed] [Google Scholar]
  • 39.Forneris F, Wu J & Gros P (2012) The modular serine proteases of the complement cascade. Curr. Opin. Struct. Biol, 22, 333–341. [DOI] [PubMed] [Google Scholar]
  • 40.Foroosh H, Zerubia JB & Berthod M (2002) Extension of phase correlation to subpixel registration. IEEE Trans. Image Process, 11, 188–200. [DOI] [PubMed] [Google Scholar]
  • 41.Frank J (2006) Three-Dimensional Electron Microscopy of Macromolecular Assemblies: Visualization of Biological Molecules in Their Native State. Oxford, United Kingdom: Oxford University Press. [Google Scholar]
  • 42.Gao F, Wolf G & Hirn M (2019) Geometric scattering for graph data analysis. Proceedings of the 36th International Conference on Machine Learning, PMLR, vol. 97, pp. 2122–2131. [Google Scholar]
  • 43.Gil-Pita R, Rosa-Zurera M, Jarabo-Amores P & López-Ferreras F (2005) Using multilayer perceptrons to align high range resolution radar signals. International Conference on Artificial Neural Networks. Springer, pp. 911–916. [Google Scholar]
  • 44.Hansen LP (1982) Large sample properties of generalized method of moments estimators. Econometrica, 50, 1029–1054. [Google Scholar]
  • 45.Hirn M, Mallat S & Poilvert N (2017) Wavelet scattering regression of quantum chemical energies. Multiscale Model. Simul, 15, 827–863. arXiv:1605.04654. [Google Scholar]
  • 46.Hotta K, Mishima T & Kurita T (2001) Scale invariant face detection and classification method using shift invariant features extracted from log-polar image. IEICE Trans. Inf. Syst, 84, 867–878. [Google Scholar]
  • 47.Hudson S & Psaltis D (1993) Correlation filters for aircraft identification from radar range profiles. IEEE Trans. Aerosp. Electron. Syst, 29, 741–748. [Google Scholar]
  • 48.Kam Z (1980) The reconstruction of structure from electron micrographs of randomly oriented particles. Journal of Theoretical Biology, 82, 15–39. [DOI] [PubMed] [Google Scholar]
  • 49.Klebaner FC (2012) Introduction to Stochastic Calculus With Applications. Singapore: World Scientific Publishing Company. [Google Scholar]
  • 50.Landa B & Shkolnisky Y (2019) Multi-reference factor analysis: low-rank covariance estimation under unknown translations. arXiv preprint arXiv:1906.00211. [Google Scholar]
  • 51.Leggett RM, Heavens D, Caccamo M, Clark MD & Davey RP (2015) Nanook: multi-reference alignment analysis of nanopore sequencing data, quality and error profiles. Bioinformatics, 32, 142–144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Levitt M (2009) Nature of the protein universe. Proc. Natl. Acad. Sci, 106, 11079–11084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Lim WA (2002) The modular logic of signaling proteins: building allosteric switches from simple binding domains. Curr. Opin. Struct. Biol, 12, 61–68. [DOI] [PubMed] [Google Scholar]
  • 54.Ma C, Bendory T, Boumal N, Sigworth F & Singer A (2019) Heterogeneous multireference alignment for images with application to 2d classification in single particle reconstruction. IEEE Trans. Image Process, 29, 1699–1710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Mallat S (2008) A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way, 3rd edn. Cambridge, MA: Academic Press. [Google Scholar]
  • 56.Mallat S (2012) Group invariant scattering. Comm. Pure Appl. Math, 65: 1331–1398. [Google Scholar]
  • 57.Martinec D & Pajdla T Robust rotation and translation estimation in multiview reconstruction. 2007 IEEE Conference on Computer Vision and Pattern Recognition, vol. 2007. IEEE, pp. 1–8. [Google Scholar]
  • 58.McGinty RK & Tan S (2016) Recognition of the nucleosome by chromatin factors and enzymes. Curr. Opin. Struct. Biol, 37, 54–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Merk A, Bartesaghi A, Banerjee S, Falconieri V, Rao P, Davis MI, Pragani R, Boxer MB, Earl LA, Milne JLS, et al. (2016) Breaking cryo-EM resolution barriers to facilitate drug discovery. Cell, 165, 1698–1707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Meynard A & Torrésani B (2018) Spectral analysis for nonstationary audio. IEEE/ACM Trans. Audio Speech Lang. Process, 26, 2371–2380. [Google Scholar]
  • 61.Omer H & Torrésani B (2013) Estimation of frequency modulations on wideband signals; applications to audio signal analysis. 10th International Conference on Sampling Theory and Applications, Bremen, Germany, 2013, 29–32. [Google Scholar]
  • 62.Omer H & Torrésani B (2017) Time-frequency and time-scale analysis of deformed stationary processes, with application to non-stationary sound modeling. Appl. Comput. Harmon. Anal, 43, 1–22. [Google Scholar]
  • 63.Palamini M, Canciani A & Forneris F (2016) Identifying and visualizing macromolecular flexibility in structural biology. Front. Mol. Biosci, 3, 47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Park W & Chirikjian GS (2014) An assembly automation approach to alignment of noncircular projections in electron microscopy. IEEE Trans. Automat. Sci. Eng, 11, 668–679. [Google Scholar]
  • 65.Park W, Midgett CR, Madden DR & Chirikjian GS (2011) A stochastic kinematic model of class averaging in single-particle electron microscopy. Int. J. Robot. Res, 30, 730–754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Perry A, Weed J, Bandeira A, Rigollet P & Singer A (2017) The sample complexity of multi-reference alignment. SIAM J. Math. Data Sci, 1, 497–517. [Google Scholar]
  • 67.Perry A, Wein AS, Bandeira AS & Moitra A (2018) Message-passing algorithms for synchronization problems over compact groups. Comm. Pure Appl. Math, 71, 2275–2322. [Google Scholar]
  • 68.Punjani A, Rubinstein JL, Fleet DJ & Brubaker MA (2017) Cryosparc: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods, 14, 290. [DOI] [PubMed] [Google Scholar]
  • 69.Robinson D, Farsiu S & Milanfar P (2007) Optimal registration of aliased images using variable projection with applications to super-resolution. Comput. J, 52, 31–42. [Google Scholar]
  • 70.Sadler BM & Giannakis GB (1992) Shift-and rotation-invariant object reconstruction using the bispectrum. JOSA A, 9, 57–69. [Google Scholar]
  • 71.Sjors HW, Scheres MV, Nuñez R, Sorzano COS, Marabini R, Herman GT & Carazo J-M (2005) Maximum-likelihood multi-reference refinement for electron microscopy images. J. Mol. Biol, 348, 139–149. [DOI] [PubMed] [Google Scholar]
  • 72.Sharon N, Kileel J, Khoo Y, Landa B & Singer A (2020) Method of moments for 3-D single particle ab initio modeling with non-uniform distribution of viewing angles. Inverse Probl, 36, 044003. [Google Scholar]
  • 73.Singer A (2011) Angular synchronization by eigenvectors and semidefinite programming. Appl. Comput. Harmon. Anal, 30, 20–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Singer A (2018) Mathematics for cryo-electron microscopy. Proceedings of the International Congress of Mathematicians, vol. 4. Rio de Janeiro, pp. 4013–4032. [Google Scholar]
  • 75.Sirohi D, Chen Z, Sun L, Klose T, Pierson TC, Rossmann MG & Kuhn RJ (2016) The 3.8 åresolution cryo-EM structure of Zika virus. Science, 352, 467–470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Sonday B, Singer A & Kevrekidis IG (2013) Noisy dynamic simulations in the presence of symmetry: data alignment and model reduction. Comput. Math. Appl, 65, 1535–1557. [Google Scholar]
  • 77.Sorzano COS, Bilbao-Castro JR, Shkolnisky Y, Alcorlo M, Melero R, Caffarena-Fernández G, Li M, Xu G, Marabini R & Carazo JM (2010) A clustering approach to multireference alignment of single-particle projections in electron microscopy. J. Struct. Biol, 171, 197–206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Sun W (2017) Phaseless sampling and linear reconstruction of functions in spline spaces. arXiv preprint arXiv:1709.04779. [Google Scholar]
  • 79.Theobald DL & Steindel PA (2012) Optimal simultaneous superpositioning of multiple structures with missing data. Bioinformatics, 28, 1972–1979. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Tsatsanis MK & Giannakis GB (1990) Translation, rotation, and scaling invariant object and texture classification using polyspectra. Advanced Signal Processing Algorithms, Architectures, and Implementations, vol. 1348. Bellingham, WA: International Society for Optics and Photonics, pp. 103–115. [Google Scholar]
  • 81.Villarreal SA & Stewart PL (2014) Cryo-em and image sorting for flexible protein/dna complexes. J. Struct. Biol, 187, 76–83. [DOI] [PubMed] [Google Scholar]
  • 82.Wein AS (2018) Statistical estimation in the presence of group actions. Ph.D. Thesis, Massachusetts Institute of Technology. [Google Scholar]
  • 83.Winkler J & Niranjan M (2002) Uncertainty in Geometric Computations, vol. 704. Berlin, Germany: Springer Science & Business Media. [Google Scholar]
  • 84.Zhong Y & Boumal N (2018) Near-optimal bounds for phase synchronization. SIAM J. Optim, 28, 989–1016. [Google Scholar]
  • 85.Zwart JP, van der Heiden R, Gelsema S & Groen F (2003) Fast translation invariant classification of HRR range profiles in a zero phase representation. IEE Proc. Radar Sonar Nav, 150, 411–418. [Google Scholar]

RESOURCES