Wavelet invariants for statistically robust multi-reference alignment

Matthew Hirn; Anna Little

doi:10.1093/imaiai/iaaa016

. Author manuscript; available in PMC: 2022 Jan 21.

Published in final edited form as: Inf inference. 2020 Aug 13;10(4):1287–1351. doi: 10.1093/imaiai/iaaa016

Wavelet invariants for statistically robust multi-reference alignment

Matthew Hirn ¹, Anna Little ^2,^†

PMCID: PMC8782248 NIHMSID: NIHMS1726636 PMID: 35070296

Abstract

We propose a nonlinear, wavelet-based signal representation that is translation invariant and robust to both additive noise and random dilations. Motivated by the multi-reference alignment problem and generalizations thereof, we analyze the statistical properties of this representation given a large number of independent corruptions of a target signal. We prove the nonlinear wavelet-based representation uniquely defines the power spectrum but allows for an unbiasing procedure that cannot be directly applied to the power spectrum. After unbiasing the representation to remove the effects of the additive noise and random dilations, we recover an approximation of the power spectrum by solving a convex optimization problem, and thus reduce to a phase retrieval problem. Extensive numerical experiments demonstrate the statistical robustness of this approximation procedure.

Keywords: multi-reference alignment, method of invariants, wavelets, signal processing, wavelet scattering transform

1. Introduction

The goal in classic multi-reference alignment (MRA) is to recover a hidden signal $f : ℝ \to ℝ$ from a collection of noisy measurements. Specifically, the following data model is assumed.

Model 1 (Classic MRA).

The classic MRA data model consists of M independent observations of a compactly supported, real-valued signal $f \in L^{2} (ℝ)$ :

y_{j} (x) = f (x - t_{j}) + ε_{j} (x), 1 ⩽ j ⩽ M,

(1.1)

where:

$supp (y_{j}) \subseteq [- \frac{1}{2}, \frac{1}{2}]$ for 1 ⩽ j ⩽ M.
${t_{j}}_{j = 1}^{M}$ are independent samples of a random variable $t \in ℝ$ .
${ε_{j} (x)}_{j = 1}^{M}$ are independent white noise processes on $[- \frac{1}{2}, \frac{1}{2}]$ , with variance σ².

The signal is thus subjected to both random translation and additive noise. The MRA problem arises in numerous applications, including structural biology [32,64,65,70,71,79], single cell genomic sequencing [51], radar [43,85], crystalline simulations [76], image registration [18,40,69] and signal processing [85]. It is a simplified model relevant for cryo-electron microscopy (cryo-EM), an imaging technique for molecules that achieves near atomic resolution [11,14,75]. In this application one seeks to recover a three-dimensional reconstruction of the molecule from many noisy two-dimensional images/projections [41]. Although MRA ignores the tomographic projection of cryo-EM, investigation of the simplified model provides important insights. For example, [5,66] investigate the optimal sample complexity for MRA and demonstrate that M = Θ(σ⁶) is required to fully recover f in the low signal-to-noise regime when the translation distribution is periodic; this optimal sample complexity is the same for cryo-EM [7,82]. Recent work has established an improved sample complexity of M = Θ(σ⁴) for MRA when the translation distribution is aperiodic [1], and this rate has been shown to also hold in the more complicated setting of cryo-EM, if the viewing angles are non-uniformly distributed [72]. Problems closely related to Model 1 include the heterogenous MRA problem, where the unknown signal f is replaced with a template of k unknown signals f₁, . . . , f_k [16,54,66,77], as well as multi-reference factor analysis, where the underlying (random) signal follows a low-rank factor model and one seeks to recover its covariance matrix [50].

Approaches for solving MRA generally fall into two categories: synchronization methods and methods that estimate the signal directly, i.e. without estimating nuisance parameters. Synchronization methods attempt to recover the signal by aligning the translations and then averaging. They include methods based on angular synchronization [8,15,24,67,73,84], where for each pair of signals the best pairwise shift is computed and then the translations are estimated from this pairwise information [6], and semi-definite programming [4,9,10,25], which approximates the quasi-maximum likelihood estimator of the shifts by relaxing a non-convex rank constraint. However, these methods fail in the low signal-to-noise regime. Methods that estimate the signal directly include both the method of moments [44,48,72] and expectation maximization, or EM-type, algorithms [1,30]; a number of EM-type algorithms have also been developed for the more complicated cryo-EM problem [33,68]. An important special case of the method of moments is the method of invariants, which seeks to recover f by computing translation invariant features, and thus avoids aligning the translations. However, the task is a difficult one, as a complete representation is needed to recover the signal, and yet the representation may be difficult to invert and corrupted by statistical bias. Generally, the signal is recovered from translation invariant moments, which are estimated in the Fourier domain [29,44]. Recent work [5,14] utilizes such Fourier invariants (mean, power spectrum and bispectrum) and recovers $\hat{f}$ by solving a non-convex optimization problem on the manifold of phases.

Classic MRA however fails to capture many of the biological phenomena arising in molecular imaging, such as the random rotations of the molecules and the tomographic projection associated with the imaging of three-dimensional objects. Another shortcoming is that the model fails to capture the dynamics that arise from flexible regions in macromolecular structures. These flexible regions are very important in structural biology, for example in understanding molecular interactions [36,39,52,53] and molecular recognition of epigenetic regulators of histone tails [17,31,58]. The large-scale dynamics of these regions makes imaging challenging [81], and thus sample preparation in cryo-EM generally seeks to minimize these dynamics by focusing on well-folded macromolecules frozen in vitreous ice [63]. However, this ‘may severely impact... the nature of the intrinsic dynamics and interactions displayed by macromolecules’ [63]. Although modern cryo-EM is making great strides in understanding flexible systems [3,37,38,59], formulating models that are more capable of capturing the motions associated with the flexible regions of macromolecules could open the door to applying cryo-EM more broadly, i.e. to less well-folded macromolecules. Mathematically, the motion of the flexible region can be modeled as a diffeomorphism. See Fig. 1, which shows a molecule with a flexible side chain (1(a)) and a diffeomorphism resulting from movement of the flexible region (1(b)). Figure 1(a) is taken from [63], and Fig. 1(b) was obtained by deforming it.

Fig. 1. — Dynamics arising from flexible regions in macromolecular structures [63].

This article thus generalizes the classic MRA problem to include a random diffeomorphism. Specifically, we consider recovering a hidden signal $f : ℝ \to ℝ$ from

y_{j} (x) = L_{τ_{j}} f (x - t_{j}) + ε_{j} (x), 1 ⩽ j ⩽ M,

where L_τ is a dilation operator that dilates by a factor of (1−τ). The dilation operator L_τ is a simplified model for more general diffeomorphisms L_ζ f(x) = f(ζ(x)), since in the simplest case when ζ(x) is affine, L_ζ simply translates and dilates f (see Section 2.1). Dilations are also relevant for the analysis of time-warped audio signals, which can arise from the Doppler effect and in speech processing and bioacoustics. For example, [60–62] consider a stationary random signal f(x), which is time-warped, i.e. $D_{ζ} f (x) = \sqrt{ζ^{'} (x)} f (ζ (x))$ , and use a maximum likelihood approach to estimate ζ. In [27,28], a similar stochastic time warping model is analyzed using wavelet based techniques. The noisy dilation MRA model considered here corresponds to the simplest case of time-warping, when ζ is an affine function. This special case is in fact very important in imaging applications [22,23,46,57,69,80], where it is critical to compute features which are scale invariant, as objects are naturally dilated by the ‘zoom’ of an image.

A new approach is needed to solve this more general MRA problem, as Fourier invariants will fail, being unstable to the action of diffeomorphisms, including dilations. The instability occurs in the high frequencies, where even a small diffeomorphism can significantly alter the Fourier modes. We instead propose $L^{2} (ℝ)$ wavelet coefficient norms as invariants, using a continuous wavelet transform. This approach is inspired by the invariant scattering representation of [56], which is provably stable to the actions of small diffeomorphisms. However, here we replace local averages of the modulus of the wavelet coefficients with global averages (i.e. integrations) of the modulus squared, thus providing rigid invariants that can be statistically unbiased. Similar invariant coefficients have been utilized in a number of applications including predicting molecular properties [34,35] and quantum chemical energies [45], and in microcanonical ensemble models for texture synthesis [19]. Recent work [42] has also generalized such coefficients to graphs.

1.1. Notation

The Fourier transform of a signal $f \in L^{1} (ℝ)$ is

\hat{f} (ω) = \int f (x) e^{- i x ω} d x .

We remind the reader that compactly supported $L^{2} (ℝ)$ functions are in $L^{1} (ℝ)$ . The power spectrum is the nonlinear transform $P : L^{2} (ℝ) \to L^{1} (ℝ)$ that maps f to

(P f) (ω) = | \hat{f} (ω) |^{2}, ω \in ℝ .

We denote f(x) ⩽ Cg(x) for some absolute constant C by f(x) ≲ g(x). We also write f(x) = O(g(x)) if |f(x)| ⩽ Cg(x) for all x ⩾ x₀ for some constants x₀, C > 0; f(x) = o(g(x)) denotes f(x)/g(x) → 0 as x → ∞; f(x) = Θ(g(x)) denotes C₁g(x) ⩽ |f(x)| ⩽ C₂g(x) for all x ⩾ x₀ for some constants x₀, C₁, C₂ > 0. The minimum of a and b is denoted a ∧ b, and the maximum by a ∨ b.

2. MRA models and the method of invariants

Standard MRA models are generalized to models that include deformations of the underlying signal in Section 2.1. Section 2.2 reviews power spectrum invariants and introduces $L^{2} (ℝ)$ wavelet coefficient invariants. Theorem 2.4 proves wavelet coefficient invariants computed with a continuous wavelet transform and a suitable mother wavelet are equivalent to the power spectrum, showing there is no information loss in the transition from one representation to the other.

2.1. MRA data models

A standard MRA scenario considers the problem of recovering a signal $f \in L^{2} (ℝ)$ in which one observes random translations of the signal, each of which is corrupted by additive noise. The problem is particularly difficult when the signal-to-noise ratio (SNR) is low, as registration methods become intractable. In [5,13,14,16,54,74] the authors propose a method using Fourier-based invariants, which are invariant to translations and thus eliminate the need to register signals.

A more general MRA scenario incorporates random deformations of the signal f, which could be used to model underlying physical variability that is not captured by rigid transformations and additive noise models. For example [4,7] consider a discrete signal f corrupted by an arbitrary group action, [47,85] consider random deformations arising in RADAR and [2] considers a generalization of MRA where signals are rescaled by random constants. Another natural mathematical model is small, random diffeomorphisms, which leads to observations of the form

y_{j} (x) = L_{ζ_{j}} f (x - t_{j}) + ε_{j} (x), 1 ⩽ j ⩽ M,

(2.1)

where $ζ_{j} \in C^{1} (ℝ)$ is a random diffeomorphism, $t_{j} \in ℝ$ is a random translation and the signals ε_j(x) are independent white noise random processes. The transform L_ζ is the action of the diffeomorphism ζ on f,

L_{ζ} f (x) = f (ζ (x)) .

If ${‖ {(ζ^{- 1})}^{'} ‖}_{\infty} < \infty$ , then one can verify $L_{ζ} : L^{2} (ℝ) \to L^{2} (ℝ)$ .

One of the keys to the Fourier invariant approach of [5,13,14,16,54,74] is the authors can unbias the Fourier invariants of the noisy signals, thus allowing them to devise an unbiased estimator of the Fourier invariants of the signal f (or a mixture of signals in the heterogeneous MRA case). For the diffeomorphism model (2.1) this would require developing a procedure for unbiasing the (Fourier) invariants of ${y_{j}}_{j = 1}^{M}$ against both additive noise and random diffeomorphisms.

In order to get a handle on the difficulties associated with the proposed diffeomorphism model, in this paper we consider random dilations of the signal f, which corresponds to restricting the diffeomorphism to be of the form

ζ (x) = \frac{x}{1 - τ}, | τ | ⩽ 1 / 2.

Specifically, we assume the following noisy dilation MRA model.

Model 2 (Noisy dilation MRA data model).

The noisy dilation MRA data model consists of M independent observations of a compactly supported, real-valued signal $f \in L^{2} (ℝ)$ :

y_{j} (x) = L_{τ_{j}} f (x - t_{j}) + ε_{j} (x), 1 ⩽ j ⩽ M,

(2.2)

where L_τ is an $L^{1} (ℝ)$ normalized dilation operator,

L_{τ} f (x) = {(1 - τ)}^{- 1} f ({(1 - τ)}^{- 1} x) .

In addition, we assume the following:

$supp (y_{j}) \subseteq [- \frac{1}{2}, \frac{1}{2}]$ for 1 ⩽ j ⩽ M.
${t_{j}}_{j = 1}^{M}$ are independent samples of a random variable $t \in ℝ$ .
${τ_{j}}_{j = 1}^{M}$ are independent samples of a bounded, symmetric random variable τ satisfying
$τ \in ℝ, E (τ) = 0, Var (τ) = η^{2}, | τ | ⩽ 1 / 2.$
${ε_{j} (x)}_{j = 1}^{M}$ are independent white noise processes on $[- \frac{1}{2}, \frac{1}{2}]$ with variance σ².

Remark 2.1

The interval $[- \frac{1}{2}, \frac{1}{2}]$ is arbitrary and can be replaced with any interval of length 1. In addition, the spatial box size is arbitrary, i.e. $[- \frac{1}{2}, \frac{1}{2}]$ , can be replaced with $[- \frac{N}{2}, \frac{N}{2}]$ . All results still hold with $σ \sqrt{N}$ replacing σ wherever it appears.

Thus, the hidden signal f is supported on an interval of length 1, and we observe M independent instances of the signal that have been randomly translated, randomly dilated and corrupted by additive white noise. We assume the hidden signal is real, but the proposed methods can also handle complex valued signals with minor modifications. Recall ε(x) is a white noise process if ε(x) = dB_x, i.e. it is the derivative of a Brownian motion with variance σ².

While the noisy dilation MRA model does not capture the full richness of the diffeomorphism model, it already presents significant mathematical difficulties. Indeed, as we show in Section 5, Fourier invariants, specifically the power spectrum, cannot be used to form accurate estimators under the action of dilations and random additive noise. The reason is that Fourier measurements are not stable to the action of small dilations (measured here by |τ|), since the displacement of $\hat{L_{τ} f} (ω)$ relative to $\hat{f} (ω)$ depends on |ω|. Intuitively, high-frequency modes are unstable, and yet high frequencies are often critical; for example removing high frequencies increases the sample complexity needed to distinguish between signals in a heterogeneous MRA model [5]. We thus replace Fourier-based invariants with wavelet coefficient invariants, which are defined in Section 2.2. As we show the wavelet invariants of the signal f can be accurately estimated from wavelet invariants of the noisy signals ${y_{j}}_{j = 1}^{M}$ , with no information loss relative to the power spectrum of f.

For future reference we also define the following dilation MRA model, which includes random translations and random dilations but no additive noise. Thus, Models 1 and 3 are both special cases of Model 2.

Model 3 (Dilation MRA data model).

The dilation MRA data model consists of M independent observations of a compactly supported, real-valued signal $f \in L^{2} (ℝ)$ :

y_{j} (x) = L_{τ_{j}} f (x - t_{j}), 1 ⩽ j ⩽ M,

(2.3)

where L_τ is an $L^{1} (ℝ)$ normalized dilation operator,

L_{τ} f (x) = {(1 - τ)}^{- 1} f ({(1 - τ)}^{- 1} x) .

In addition, we assume (i)–(iii) of Model 2.

2.2. Method of invariants

We now discuss how invariant representations can be used to solve MRA data models and introduce the wavelet invariants used in this article.

2.2.1. Motivation and related work

Let T_tf(x) = f(x − t) denote the operator that translates by t acting on a signal f. Invariant measurement models seek a representation $Φ (f) \in B$ in a Banach space $B$ such that

Φ (T_{t} f) = Φ (f), \forall t \in ℝ .

(2.4)

In MRA problems, one additionally requires that

Φ (f) = Φ (g) \Leftrightarrow g = T_{t} f for some t \in ℝ .

(2.5)

The first condition (2.4) removes the need to align random translations of the signal f, whereas the second condition (2.5) ensures that if one can estimate Φ(f) from the collection ${Φ (y_{j})}_{j = 1}^{M}$ , then one can recover an estimate of f (up to translation) by solving

f^{⋆} = \underset{g \in L^{1} \cap L^{2} (ℝ)}{\arg \inf} ‖ Φ (g) - Φ (f) ‖ B,

(2.6)

where $‖ \cdot ‖_{B}$ is the Banach space norm.

When the observed signals ${y_{j}}_{j = 1}^{M}$ are corrupted by more than just a random translation, though, as in Model 2, estimating Φ(f) from ${Φ (y_{j})}_{j = 1}^{M}$ is not always straightforward. Indeed, one would like to compute

{\bar{Φ}}_{M} (f) = \frac{1}{M} \sum_{j = 1}^{M} Φ (y_{j}),

(2.7)

but the quantity ${\bar{Φ}}_{M} (f)$ is not always an unbiased estimator of Φ(f), meaning that $\lim_{M \to \infty} {\bar{Φ}}_{M} (f) \neq Φ (f)$ . In order to circumvent this issue, one must select a representation Φ such that

E Φ (y_{j}) = Φ (f) + b_{Φ} (f, M),

(2.8)

where $b_{Φ} (f, M)$ is a bias term depending on the choice of Φ, f, and the signal corruption model $M$ . If (2.8) holds and if we can compute a $\tilde{b}$ such that $E {\tilde{b}}_{Φ} (y_{j}, M) = b_{Φ} (f, M) + δ$ for $| b_{Φ} (f, M) | ≫ | δ |$ , then one can amend (2.7) to reduce the bias

{\tilde{Φ}}_{M} (f) = \frac{1}{M} \sum_{j = 1}^{M} (Φ (y_{j}) - {\tilde{b}}_{Φ} (y_{j}, M)),

in which case

\lim_{M \to \infty} {\tilde{Φ}}_{M} (f) = Φ (f) + δ

almost surely by the law of large numbers. The main difficulty therefore is twofold. On the one hand, one must design a representation Φ that satisfies (2.4), (2.5) and (2.8) with a bias b that can be estimated; on the other hand, the optimization (2.6) must be tractable. For random translation plus additive noise models (i.e., Model 1), the authors of [5,14] describe a representation Φ based on Fourier invariants that satisfies the outlined requirements and for which one can solve (2.6) despite the optimization being non-convex. The Fourier invariants include $\hat{f} (0)$ (i.e. the integral of f), the power spectrum of f and the bispectrum of f. Each invariant captures successively more information in f. While $\hat{f} (0)$ carries limited information, the power spectrum recovers the magnitude of the Fourier transform, namely it recovers the non-negative, real-valued function ρ(ω) such that $\hat{f} (ω) = ρ (ω) e^{i θ (ω)}$ , but the phase information Θ(ω) is lost. Since $\hat{T_{t} f} (ω) = e^{- i ω t} \hat{f} (ω)$ , the power spectrum is invariant to translations as the Fourier modulus kills the phase factor induced by a translation t of f. However, it is in general not possible to recover a signal from its power spectrum, although in certain special cases the phase information can be resolved; results along these lines are in the field of phase retrieval [26,78]. The bispectrum is also translation invariant and invertible so long as $\hat{f} (ω) \neq 0$ [66].

In Section 5 we show that it is impossible to significantly reduce the power spectrum bias for Model 2, which includes translations, dilations and additive noise. We thus propose replacing the power spectrum with the $L^{2} (ℝ)$ norms of the wavelet coefficients of the signal f. These invariants satisfy (2.4) and (2.8) for Model 2 and yield a convex formulation of (2.6). They do not satisfy (2.5) for general $f \in L^{2} (ℝ)$ , but Theorem 2.4 in Section 2.2.2 shows that knowing the wavelet invariants of f is equivalent to knowing the power spectrum of f, which means that any phase retrieval setting in which recovery is possible will also be possible with the specified wavelet invariants. For example if the signal lives in a spline or shift invariant space in addition to being realvalued, then it can be recovered from its phaseless measurements [26,78].

2.2.2. Wavelet invariants

We now define the wavelet invariants used in this article. A wavelet $ψ \in L^{2} (ℝ)$ is a waveform that is localized in both space and frequency and has zero average,

\int ψ (x) d x = 0.

Note throughout this article ψ will always denote a wavelet in $L^{1} \cap L^{2} (ℝ)$ with zero average, satisfying ‖ψ‖₂ = 1 as well as the classic admissability condition $\int \frac{| \hat{ψ} (ω) |^{2}}{ω} d ω < \infty$ . A dilation of the wavelet by a factor λ ∈ (0,∞) is denoted,

ψ_{λ} (x) = λ^{1 / 2} ψ (λ x),

where the normalization guarantees that ‖ψ_λ‖₂ = ‖ψ‖₂ = 1. The continuous wavelet transform W computes

W f = {f * ψ_{λ} (x) : λ \in (0, \infty), x \in ℝ} .

The parameter λ corresponds to a frequency variable. Indeed, if ξ₀ is the central frequency of ψ, the wavelet coefficients f ∗ ψ_λ recover the frequencies of f in a band of size proportional to λ centered at λξ₀. Thus, high frequencies are grouped into larger packets, which we shall use to obtain a stable, invariant representation of f.

The wavelet transform Wf is equivariant to translations but not invariant. Integrating the wavelet coefficients over x yields translation invariant coefficients, but they are trivial since $\int ψ_{λ} = 0$ . We therefore compute $L^{2} (ℝ)$ norms in the x variable, yielding the following nonlinear wavelet invariants:

Definition 2.1 (Wavelet invariants).

The L² wavelet invariants of a real-valued signal $f \in L^{1} \cap L^{2} (ℝ)$ are given by

(S f) (λ) = {‖ f * ψ_{λ} ‖}_{2}^{2}, λ \in (0, \infty),

(2.9)

where ψ_λ(x) = λ^1/2ψ(λx) are dilations of a mother wavelet ψ.

Throughout this article ψ can be taken as a Morlet wavelet, in which case ψ is constructed to have frequency centered at ξ by $ψ (x) = C_{ξ} π^{- 1 / 4} e^{- x^{2} / 2} (e^{i ξ x} - e^{- ξ^{2} / 2})$ for $C_{ξ} = {(1 - e^{- ξ^{2}} - 2 e^{- 3 ξ^{2} / 4})}^{- 1 / 2}$ , but results hold more generally for what we refer to as k-admissible wavelets, where k ⩾ 0 is an even integer. See Appendix A for a precise description of this admissibility criteria. The wavelet invariants can be expressed in the frequency domain as

(S f) (λ) = {\frac{1}{2 π} \int | \hat{f} (ω) |}^{2} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω,

which motivates the following definition of ‘wavelet invariant derivatives’.

Definition 2.2 (Wavelet invariant derivatives).

The n-th derivative of (Sf)(λ) is defined as

{(S f)}^{(n)} (λ) : = {\frac{1}{2 π} \int | \hat{f} (ω) |}^{2} \frac{d^{n}}{d λ^{n}} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω .

Remark 2.2

Definition 2.1 assumes $f : ℝ \to ℝ$ , which allows the wavelet ψ to be either real or complex. Our results can easily be extended to complex f, but a strictly complex wavelet would be needed, with Sf(λ) computed for all λ ∈ (−∞, ∞) \ 0.

Remark 2.3

For a discrete signal of length n, computing the wavelet invariants via a continuous wavelet transform is O(n²), while computing the power spectrum is O(n log n). Thus, one pays a computational cost to achieve greater stability with no loss of information. On the other hand, if wavelet invariants are computed for a dyadic wavelet transform (i.e. only for O(log n) λ’s), the computational cost is the same and stability is maintained, but more information is lost.

Remark 2.4

When $(P f) (ω) = {| \hat{f} (ω) |}^{2}$ is continuous, Definition 2.2 reduces to a normal derivative, i.e. one can check that ${(S f)}^{(n)} (λ) = \frac{d^{n}}{d λ^{n}} (S f) (λ)$ . However, when Pf is not continuous, in general ${(S f)}^{(n)} (λ) \neq \frac{d^{n}}{d λ^{n}} (S f) (λ)$ and (Sf)⁽ⁿ⁾(λ) is more convenient for controlling the error of the estimators proposed in this article. Throughout this article, the notation (Sf)⁽ⁿ⁾(λ) will thus denote the derivative of Definition 2.2 and $\frac{d^{n}}{d λ^{n}} (S f) (λ)$ will denote the standard derivative.

Under mild conditions, one can show that $S : L^{2} (ℝ) \to L^{1} \cap C (0, \infty)$ . The values λ = 2^j for $j \in ℤ$ correspond to rigid versions of first-order $L^{2} (ℝ)$ wavelet scattering invariants [56]. The continuous wavelet transform Wf is extremely redundant; indeed, for suitably chosen mother wavelets, the dyadic wavelet transform with λ = 2^j for $j \in ℤ$ is a complete representation of f. However, the corresponding operator S restricted to λ = 2^j is not invertible. When one utilizes every frequency λ ∈ (0,∞), though, the resulting $L^{2} (ℝ)$ norms $(S f) (λ) = {‖ f * ψ_{λ} ‖}_{2}^{2}$ uniquely determine the power spectrum of f, so long as the wavelet ψ satisfies a type of independence condition.

Condition 2.3

Define

{| {\hat{ψ}}_{λ}^{+} (ω) |}^{2} = ({| {\hat{ψ}}_{λ} (ω) |}^{2} + {| {\hat{ψ}}_{λ} (- ω) |}^{2}) \cdot 1 (ω ⩾ 0) .

If for any finite sequence ${ω_{i}}_{i = 1}^{n}$ of distinct positive frequencies, the collection ${{| {\hat{ψ}}_{λ}^{+} (ω_{i}) |}^{2}}_{i = 1}^{n}$ is linearly independent functions of λ, we say the wavelet ψ satisfies the linear independence condition.

Remark 2.5

Condition 2.3 is stated in terms of ${| {\hat{ψ}}_{λ}^{+} (ω) |}^{2}$ to avoid assumptions on whether ψ is real or complex. When $ψ (x) \in ℝ$ , ${| {\hat{ψ}}_{λ}^{+} (ω) |}^{2} = 2 {| {\hat{ψ}}_{λ} (ω) |}^{2}$ for ω ⩾ 0. When ψ is complex analytic, ${| {\hat{ψ}}_{λ}^{+} (ω) |}^{2} = {| {\hat{ψ}}_{λ} (ω) |}^{2}$ . When $ψ \in ℂ$ but not complex analytic, ${| {\hat{ψ}}_{λ}^{+} (ω) |}^{2}$ simply incorporates a reflection of ${| {\hat{ψ}}_{λ} (ω) |}^{2}$ about the origin. Since we assume $f (x) \in ℝ, {| {\hat{ψ}}_{λ}^{+} (ω) |}^{2}$ uniquely defines (Sf)(λ), since $(S f) (λ) = \frac{1}{2 π} 〈 | \hat{f} |^{2}, {| {\hat{ψ}}_{λ}^{+} |}^{2} 〉$ by the Plancherel and Fourier convolution theorems.

Theorem 2.4

Let $f \in L^{1} \cap L^{2} (ℝ)$ and assume ψ satisfies Condition 2.3 and $\hat{ψ}$ has compact support. Then,

S f = S g \Leftrightarrow P f = P g .

Proof.

First assume Pf = Pg, which means ${| \hat{f} (ω) |}^{2} = | \hat{g} (ω) |^{2}$ for almost every $ω \in ℝ$ . Using the Plancheral and Fourier convolution theorems,

(S f) (λ) = \int {| f * ψ_{λ} (x) |}^{2} d x = {\frac{1}{2 π} \int | \hat{f} (ω) |}^{2} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω = \frac{1}{2 π} \int | \hat{g} (ω) |^{2} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω = (S g) (λ), \forall λ \in (0, \infty) .

Now suppose Sf = Sg. Since Sf and Sg are continuous in λ, we have

0 = (S f) (λ) - (S g) (λ) = \frac{1}{2 π} \int ({| \hat{f} (ω) |}^{2} - | \hat{g} (ω) |^{2}) {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω, \forall λ \in (0, \infty) .

Since $f \in L^{1} \cap L^{2} (ℝ)$ we have $\hat{f} \in L^{2} \cap L^{\infty} (ℝ)$ and thus $P f \in L^{1} \cap L^{\infty} (ℝ)$ . By interpolation we have $P f \in L^{2} (ℝ)$ , and the same for Pg. By applying Lemma 2.1 (stated below) with p(ω) = (Pf)(ω) − (Pg)(ω) (note p is continuous since $f, g \in L^{1} (ℝ)$ ), we conclude Pf = Pg for almost every ω. □

Lemma 2.1

Let $p \in L^{2} (ℝ)$ be continuous and assume p(ω) = p(−ω), $\hat{ψ}$ has compact support and Condition 2.3. Then,

\int p (ω) {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω = 0 \forall λ > 0 \Rightarrow p = 0 a . e .

The proof of Lemma 2.1 is in Appendix C. We remark that many wavelets satisfy Condition 2.3 and have compactly supported Fourier transform, so Theorem 2.4 is broadly applicable. For example, Proposition 2.1 below proves that any complex analytic wavelet with compactly supported Fourier transform satisfies Condition 2.3. Morlet wavelets satisfy Condition 2.3 (see Lemma C.1 in Appendix C) but do not have compactly supported Fourier transform; however, $\hat{ψ}$ does have fast decay for a Morlet wavelet and numerically we observe no issues. We also note, the assumption that $\hat{ψ}$ has compact support in Theorem 2.4 can be removed if f, g are bandlimited. The following Proposition, proved in Appendix C, gives some sufficient conditions guaranteeing Condition 2.3.

Proposition 2.1

The following are sufficient to guarantee Condition 2.3:

$| \hat{ψ} (ω) |^{2}$ has a compact support contained in the interval [a, b], where a and b have the same sign, e.g. complex analytic wavelets with compactly supported Fourier transform.
$| \hat{ψ} (ω) |^{2} \in C^{\infty} (ℝ)$ and there exists an N such that all derivatives of order at least N are non-zero at ω = 0, e.g. the Morlet wavelet.

Remark 2.6

In practice, Pf, Sf are implemented as discrete vectors, and Sf is obtained from Pf via matrix multiplication, i.e. Sf = F(Pf) for some real matrix F with F^TF strictly positive definite. Thus, $‖ P f - P g ‖_{2} ⩽ σ_{\min}^{- 1} ‖ S f - S g ‖_{2}$ , where σ_min > 0 is the smallest singular value of the matrix F, and the spectral decay of F, which can be explicitly computed, thus determines the stability of the representation. The smoother the wavelet, the more rapidly the spectrum decays, since when Pψ ∈ C^p, F^TF is defined by a C^p kernel and thus has eigenvalues that decay like o(1/n^p+1) [20]. There is thus a tradeoff between smoothness and stability. In this article we choose smoothness over stability, since smoothness is required for unbiasing noisy dilation MRA, and in our experiments the Morlet wavelet yielded the best results. We therefore invert the representation by solving an optimization problem that is initialized to be close to the desired solution (see Section 6.5), and we avoid computing the pseudo-inverse of F, which is unstable for our smooth wavelet.

3. Unbiasing for classic MRA

In this section we consider the classic MRA model (Model 1). We discuss unbiasing results for both the power spectrum and wavelet invariants, as well as simulation results comparing the two methods. In the following proposition we establish unbiasing results for the power spectrum by rederiving some results from [14], extended to the continuum setting. The proposition is proved in Appendix D.

Proposition 3.1

Assume Model 1. Define the following estimator of (Pf)(ω):

(\tilde{P f}) (ω) : = \frac{1}{M} \sum_{j = 1}^{M} (P y_{j}) (ω) - σ^{2} .

Then with probability at least 1 − 1/t²,

| (P f) (ω) - (\tilde{P f}) (ω) | ⩽ \frac{2 t σ}{\sqrt{M}} (‖ f ‖_{1} + σ) .

(3.1)

We obtain an identical result for wavelet invariants (Proposition 3.2) when signals are corrupted by additive noise only. See Appendix D for the proof.

Proposition 3.2

Assume Model 1. Define the following estimator of (Sf)(λ):

(\tilde{S f}) (λ) : = \frac{1}{M} \sum_{j = 1}^{M} (S y_{j}) (λ) - σ^{2} .

Then with probability at least 1 − 1/t²,

| (S f) (λ) - (\tilde{S f}) (λ) | ⩽ \frac{2 t σ}{\sqrt{M}} (‖ f ‖_{1} + σ) .

(3.2)

As M → ∞, the error of both the power spectrum and wavelet invariant estimators decays to zero at the same rate, and one can perfectly unbias both representations. As demonstrated in Section 5, this is not possible for noisy dilation MRA (Model 2), as there is a non-vanishing bias term. However, a nonlinear unbiasing procedure on the wavelet invariants can significantly reduce the bias.

We illustrate and compare additive noise unbiasing for power spectrum estimation using $(\tilde{P f})$ , the power spectrum method of Proposition 3.1 and $(\tilde{S f})$ , the wavelet invariant method of Proposition 3.2. To approximate (Pf) from the wavelet invariants $(\tilde{S f})$ , we apply the convex optimization algorithm described in Section 6.5 to obtain $(\tilde{P_{S} f})$ , the power spectrum approximation that best matches the wavelet invariants $(\tilde{S f})$ . Thus, throughout this article, $(\tilde{P_{S} f})$ denotes a power spectrum estimator obtained by first unbiasing wavelet invariants and then running an optimization procedure, while $(\tilde{P f})$ denotes an estimator computed by directly unbiasing the power spectrum. Our simulations compare the L² error of both of these estimators, i.e. we compare $‖ P f - \tilde{P f} ‖_{2}$ and ${‖ P f - \tilde{P_{S} f} ‖}_{2}$ .

Figure 2(a) shows the uncorrupted power spectrum (red curve) of a medium frequency Gabor function $(f (x) = e^{- 5 x^{2}} \cos (16 x))$ , and the power spectrum after the signal is corrupted by additive noise with level σ = 2⁻³ (blue curve); the SNR of the experiment is 0.56 (see Section 6.1). Figure 2(b) shows the L² error of the power spectrum estimation for the two methods as a function of log₂(M) for a fixed SNR, and Fig. 2(c) shows the L² error as a function of log₂(σ) for a fixed M. The L² errors for the two methods are similar; however, estimation via wavelet invariants is advantageous when the sample size M is small or the additive noise level σ is large. As M becomes very large or σ very small, the power spectrum method is preferable as the smoothing procedure of the wavelet invariants may numerically erase some extremely small scale features of the original power spectrum.

Fig. 2. — Simulation results for additive noise model for medium frequency Gabor $f (x) = e^{- 5 x^{2}} \cos (16 x)$ .

4. Unbiasing for dilation MRA

In this section we analyze the dilation MRA model (Model 3). We thus assume the signals have been randomly translated and dilated but there is no additive noise.

In fact there is a simple algorithm to recover f under this model. Since ${‖ f_{τ_{j}} ‖}_{2}^{2} = ‖ f ‖_{2}^{2} / (1 - τ_{j})$ , $\frac{1}{M} \sum_{j = 1}^{M} 1 / {‖ f_{τ_{j}} ‖}_{2}^{2}$ is an unbiased estimator of $1 / {‖ f ‖}_{2}^{2}$ , and so ${‖ f ‖}_{2}^{2}$ can be accurately approximated. Once is recovered, one can take any signal y_j and dilate it so that ${‖ y_{j} ‖}_{2}^{2} = ‖ f ‖_{2}^{2}$ , and the result will be an accurate approximation of the hidden signal f for M large. However, this approach collapses in the presence of even a small amount of additive noise. In the presence of additive noise, an alternative is to attempt a synchronization by centering each signal. The center c_f of signal f can be defined in the classical way by

c_{f} = \frac{1}{‖ f ‖_{2}^{2}} \int x | f (x) |^{2} d x .

Since the signals y_j(x − (c_f + t_j)) are perfectly aligned, one can thus attempt an alignment by defining ${\tilde{y}}_{j} (x) = y_{j} (x - c_{y_{j}})$ . However c_yj − (c_f + t_j) = O(σ ∨ σ² + η), so significant errors arise in the synchronization that cannot be resolved by averaging. As our goal is ultimately to produce a method that can be extended to the noisy dilation MRA model, we abandon both the trivial solution (which cannot be extended to noisy dilation MRA) and the synchronization approach (which produces large errors) and explore a method based on empirical averages.

We first observe that random dilations cause $\frac{1}{M} \sum_{j = 1}^{M} (P y_{j}) (ω)$ and $\frac{1}{M} \sum_{j = 1}^{M} (S y_{j}) (λ)$ to be biased estimators of (Pf)(ω) and (Sf)(λ), and the bias for both is O(η²), where η² is the variance of the dilation distribution. However, if the moments of the dilation distribution are known and Pf, Sf are sufficiently smooth, one can apply an unbiasing procedure to the above estimators so that the resulting bias is O(η^k+2), where k ⩾ 2 is an even integer.

Throughout this section, we assume k ⩾ 2 is an even integer, and define the constants C_i from the first k/2 even moments of τ by $E [τ^{i}] = C_{i} η^{i}$ for i = 2, 4, . . . , k. Note since we assume $E [τ^{2}] = η^{2}$ , C₂ = 1. We define the constants B₂, B₄, . . . , B_k by solving

\frac{C_{i}}{i!} - \frac{B_{2} C_{i - 2}}{(i - 2)!} - \dots - \frac{B_{i - 2} C_{2}}{2!} - B_{i} = 0

(4.1)

for i = 2, 4, . . . , k; these constants are deterministic functions of the moments of τ. A non-recursive formula related to the Euler numbers can be derived, which defines B_i explicitly in terms of C₂, . . . , C_i; however, the recursive formula (4.1) is easier to implement numerically.

We introduce two additional moment-based constants that are defined by the C_i, B_i constants:

T : = \max_{i = 0, 2, \dots} C_{i}^{\frac{1}{i}}

(4.2)

E : = \max_{i = 0, 2, \dots, k} \max_{j = 0, \dots, k + 2 - i} {(\frac{T^{j}}{j!} | B_{i} |)}^{\frac{1}{i + j}},

(4.3)

where C₀, |B₀| = 1, and when i = j = 0 in (4.3), ${(\frac{T^{j}}{j!} | B_{i} |)}^{\frac{1}{i + j}}$ is replaced with 1.

Remark 4.1

Since the distribution of τ is bounded, we are guaranteed that T < ∞, and in general can consider both T and E to be O(1) constants. For example for the uniform distribution, $T ⩽ \sqrt{3}$ and $| B_{i} | ⩽ \frac{| Euler (i) |}{i!} ⩽ 1$ , which gives $E ⩽ \sqrt{3}$ .

We utilize the following two lemmas, which are proved in Appendix E, to derive results for both the power spectrum and wavelet invariants.

Lemma 4.1

Let F_λ(τ) = L((1 − τ)λ) for some function L ∈ C^k+2(0, ∞) and a random variable τ satisfying the assumptions of Section 2.1, and let k ⩾ 2 be an even integer. Assume there exist functions $Λ_{i} : ℝ \to ℝ$ , $R : ℝ \to ℝ$ such that

| λ^{i} L^{(i)} (λ) | ⩽ Λ_{i} (λ) for 0 ⩽ i ⩽ k + 2, \frac{Λ_{k + 2} ((1 - τ) λ)}{Λ_{k + 2} (λ)} ⩽ R (λ),

and define the following estimator of L(λ):

G_{λ} (τ) : = F_{λ} (τ) - B_{2} η^{2} F_{λ}^{''} (τ) - B_{4} η^{4} F_{λ}^{(4)} (τ) - \dots - B_{k} η^{k} F_{λ}^{(k)} (τ) .

Then G_λ(τ) satisfies

| E G_{λ} (τ) - L (λ) | ≲ k R (λ) Λ_{k + 2} (λ) {(2 E η)}^{k + 2}

Var G_{λ} (τ) ≲ k^{2} R {(λ)}^{2} Λ {(λ)}^{2}

where

Λ {(λ)}^{2} : = \sum_{0 ⩽ i, j ⩽ k + 2, i + j ⩾ 2} Λ_{i} (λ) Λ_{j} (λ) {(2 E η)}^{i + j}

and E is the absolute constant defined in (4.3).

Lemma 4.2

Let the assumptions and notation of Lemma 4.1 hold, and let τ₁, . . . , τ_M be independent. Define

\tilde{L} (λ) : = \frac{1}{M} \sum_{j = 1}^{M} G_{λ} (τ_{j}) .

Then with probability at least 1 − 1/t²

| \tilde{L} (λ) - L (λ) | ≲ k R (λ) (Λ_{k + 2} (λ) {(2 E η)}^{k + 2} + \frac{t Λ (λ)}{\sqrt{M}}) .

The deviation of the estimator $\tilde{L} (λ)$ from L(λ) thus depends on two things: (1) the bias of the estimator that is O(η^k+2) and (2) the standard deviation of the estimator that is $O (η M^{- \frac{1}{2}})$ , since Λ(λ) = O(η).

4.1. Power spectrum results for dilation MRA

We now show how this unbiasing procedure based on both the moments of τ and the even derivatives of Py can be used to obtain an estimator of Pf.

Proposition 4.1

Assume Model 3 and $P f \in C^{k + 2} (ℝ)$ . Define the following estimator of (Pf)(ω):

(\tilde{P f}) (ω) : = \frac{1}{M} \sum_{j = 1}^{M} [(P y_{j}) (ω) - B_{2} η^{2} ω^{2} {(P y_{j})}^{''} (ω) - \dots - B_{k} η^{k} λ^{k} {(P y_{j})}^{(k)} (ω)]

where the constants B_i satisfy (4.1). Let

Ω_{i} (ω) = | ω^{i} {(P f)}^{i} (ω) | for 0 ⩽ i ⩽ k + 2, R (ω) = \max_{τ} \frac{Ω_{k + 2} ((1 - τ) ω)}{Ω_{k + 2} (ω)} .

Then for all ω ≠ 0, with probability at least 1 − 1/t²,

| (\tilde{P f}) (ω) - (P f) (ω) | ≲ k R (ω) (Ω_{k + 2} (ω) {(2 E η)}^{k + 2} + \frac{t Ω (ω)}{\sqrt{M}}),

(4.4)

where

Ω (ω) = \sum_{0 ⩽ i, j ⩽ k + 2, i + j ⩾ 2} Ω_{i} (ω) Ω_{j} (ω) {(2 E η)}^{i + j} .

Proof.

Since Pf is a translation invariant representation, we can ignore the translation factors ${t_{k}}_{k = 1}^{M}$ and consider the model $y_{j} = L_{τ_{j}} f$ . In addition since $y_{j} (x) \in ℝ$ , (Py_j)(ω) = (Py_j)(−ω) and it is sufficient to consider ω ∈ (0, ∞). Proposition 4.1 then follows directly from Lemma 4.2 with λ = ω, L = Pf since (Py_j)(ω) = (Pf)((1 − τ_j)ω) = F_ω(τ_j), Λ_i = Ω_i, and Λ = Ω. □

We postpone a discussion of the shortcomings of Proposition 4.1 to Section 4.3, where we compare the power spectrum and wavelet invariant results for dilation MRA.

4.2. Wavelet invariant results for dilation MRA

We now apply the same unbiasing procedure to the wavelet invariants. Unlike for the power spectrum, where the error may depend on the frequency ω (see (4.4) and Section 4.3), the wavelet invariant error can be uniformly bounded independently of λ with high probability. The following two Lemmas establish bounds on the derivatives of (Sf)(λ) and are needed to prove Proposition 4.2; they are proved in Appendix B.

Lemma 4.3 (Low-frequency bound).

Assume $P ψ \in C^{m} (ℝ)$ and $f \in L^{1} (ℝ)$ . Then the quantity |λ^m(Sf)^(m)(λ)| can be bounded uniformly over all λ. Specifically:

| λ^{m} {(S f)}^{(m)} (λ) | ⩽ Ψ_{m} ‖ f ‖_{1}^{2}

for Ψ_m defined in (A1).

Lemma 4.4 (High-frequency bound for differentiable functions).

Assume $P ψ \in C^{m} (ℝ)$ , and $f^{'} \in L^{1} (ℝ)$ . Then the quantity |λ^m(Sf)^(m)(λ)| can be bounded by

| λ^{m} {(S f)}^{(m)} (λ) | ⩽ \frac{Θ_{m}}{λ^{2}} {‖ f^{'} ‖}_{1}^{2}

for Θ_m defined in (A2).

When ψ is a Morlet wavelet or more generally when ψ is (k + 2)-admissible as described in Appendix A, these lemmas allow one to bound the error of the order k wavelet invariant estimator for dilation MRA in terms of the following quantities:

Λ_{i} (λ) = Ψ_{i} ‖ f ‖_{1}^{2} \land \frac{Θ_{i}}{λ^{2}} {‖ f^{'} ‖}_{1}^{2}, Λ {(λ)}^{2} = \sum_{0 ⩽ i, j ⩽ k + 2, i + j ⩾ 2} Λ_{i} (λ) Λ_{j} (λ) {(2 E η)}^{i + j},

(4.5)

where Ψ_i, Θ_i are defined in (A1), (A2) and E is defined in (4.3).

Proposition 4.2

Assume Model 3, the notation in (4.5), and that ψ is (k + 2)-admissible. Define the following estimator of (Sf)(λ):

(\tilde{S f}) (λ) : = \frac{1}{M} \sum_{j = 1}^{M} [(S y_{j}) (λ) - B_{2} η^{2} λ^{2} {(S y_{j})}^{''} (λ) - \dots - B_{k} η^{k} λ^{k} {(S y_{j})}^{(k)} (λ)]

where the constants B_i satisfy (4.1). Then with probability at least 1 − 1/t²,

| (\tilde{S f}) (λ) - (S f) (λ) | ≲ k (Λ_{k + 2} (λ) {(2 E η)}^{k + 2} + \frac{t Λ (λ)}{\sqrt{M}}) .

Proof.

Since Sf is a translation invariant representation, we can ignore the translation factors ${t_{k}}_{k = 1}^{M}$ and consider the model $y_{j} = L_{τ_{j}} f$ . Since ψ is k + 2-admissible, $\hat{ψ} \in C^{k + 2} (ℝ)$ , which guarantees(Sf)(λ) ∈ C^k+2(0, ∞). We note that since $f \in L^{1} (ℝ)$ , Pf is continuous, and the Leibniz integral rule guarantees that ${(S f)}^{(n)} (λ) = \frac{d^{n}}{d λ^{n}} (S f) (λ)$ for 1 ⩽ n⩽ k + 2. By applying Lemma 4.3, we have $| λ^{i} {(S f)}^{(i)} (λ) | ⩽ Ψ_{i} ‖ f ‖_{1}^{2}$ for all 0 ⩽ i ⩽ k + 2, so that Lemma 4.2 holds for L(λ) = (Sf)(λ), $Λ_{i} (λ) = Ψ_{i} ‖ f ‖_{1}^{2}$ , and R(λ) = 1. Now by applying Lemma 4.4, we have $| λ^{i} {(S f)}^{(i)} (λ) | ⩽ Θ_{i} {‖ f^{'} ‖}_{1}^{2} / λ^{2}$ for all 0 ⩽ i ⩽ k + 2, so that Lemma 4.2 also holds for L(λ) = (Sf)(λ), $Λ_{i} (λ) = Θ_{i} {‖ f^{'} ‖}_{1}^{2} / λ^{2}$ , and R(λ) = 4 (note since $| τ | ⩽ \frac{1}{2}$ , $Λ_{k + 2} ((1 - τ) λ) / Λ_{k + 2} (λ) ⩽ 4$ ). Thus, Lemma 4.2 in fact holds with $Λ_{i} (λ) = (Ψ_{i} ‖ f ‖_{1}^{2} \land \frac{Θ_{i}}{λ^{2}} {‖ f^{'} ‖}_{1}^{2})$ ; since (Sy_j)(λ) = (Sf)((1 − τ_j)λ) = F_λ(τ_j), we obtain Proposition 4.2 □

Since $Λ_{i} (λ) ⩽ Ψ_{i} ‖ f ‖_{1}^{2}$ , Proposition 4.2 guarantees that the error can be uniformly bounded independent of λ. In addition, if the signal is smooth, the error for high-frequency λ will have the favorable scaling λ⁻². An important question in practice is how to choose k, i.e. what order wavelet invariant estimator minimizes the bias. Consider for example when $f^{'} \notin L^{1} (ℝ)$ , and $Λ_{k + 2} (λ) = Ψ_{k + 2} ‖ f ‖_{1}^{2}$ . By using a second-order estimator, we can decrease the bias from O(η²) to O(η⁴), and we can further decrease the bias to O(η⁶) by choosing k = 4. However, Ψ_k increases very rapidly in k. Indeed, as can be seen from (A1), Ψ_k increases like k!. Thus, one possible heuristic (assuming η is known) is to choose $k = \tilde{k}$ where $\tilde{k}$ minimizes the bias upper bound kΨ_k+2(2Eη)^k+2. Since Ψ_k increases factorially, Ψ_k ∼ (C_k)^k for some constant C, and $\tilde{k} + 2$ will be inversely proportional to η, that is $(\tilde{k} + 2) ~ η^{- 1}$ . The following corollary of Proposition 4.2 then holds for any $k ⩽ \tilde{k}$ .

Corollary 4.1

Under the assumptions of Proposition 4.2, if Ψ_i(2Eη)ⁱ is decreasing for i ⩽ k + 2, then with probability at least 1 − 1/t²:

| (\tilde{S f}) (λ) - (S f) (λ) | ≲ ‖ f ‖_{1}^{2} (k Ψ_{k + 2} {(2 E η)}^{k + 2} + \frac{t k^{2} η}{\sqrt{M}}) .

(4.6)

Similarly, if Θ_i(2Eη)ⁱ is decreasing for i ⩽ k + 2, then with probability at least 1 − 1/t²:

| (\tilde{S f}) (λ) - (S f) (λ) | ≲ \frac{{‖ f^{'} ‖}_{1}^{2}}{λ^{2}} (k Θ_{k + 2} {(2 E η)}^{k + 2} + \frac{t k^{2} η}{\sqrt{M}}) .

(4.7)

Remark 4.2

We observe that for a discrete lattice I of λ values, we can define the discrete 1-norm by $‖ g ‖_{L^{1} (I)} = \sum_{λ \in I} | g (λ) | Δ λ$ . Assume the lattice has cardinality n, and that Ψ_i(2Eη)ⁱ, Θ_i(2Eη)ⁱ are decreasing for i ⩽ k + 2. Applying Proposition 4.2 with $t = \sqrt{n} s$ and a union bound over the lattice gives

‖ \tilde{S f} - S f ‖_{L^{1} (I)} ≲ k (‖ f ‖_{1}^{2} Ψ_{k + 2} + {‖ f^{'} ‖}_{1}^{2} Θ_{k + 2}) {(2 E η)}^{k + 2} + \frac{s \sqrt{n} k^{2} η}{\sqrt{M}} (‖ f ‖_{1}^{2} + {‖ f^{'} ‖}_{1}^{2})

with probability at least 1 − 1/s². When n ≪ M, which is the context for MRA, the 1-norm of the error is O(η^k+2) as M → ∞.

4.3. Comparison

Although Propositions 4.2 and 4.1 at first glance appear quite similar, the wavelet invariant method has several important advantages over the power spectrum method, which we enumerate in the following remarks.

Remark 4.3

Proposition 4.2 (wavelet invariants) applies to any signal satisfying $f \in L^{1} (ℝ)$ , but Proposition 4.1 requires $P f \in C^{k + 2} (ℝ)$ . Thus, as k is increased, the power spectrum results apply to an increasingly restrictive function class. Furthermore, as discussed in Section 5, if the signal contains any additive noise, Py_j is not even C¹, which means the unbiasing procedure of Proposition 4.1 cannot be applied. On the other hand, by choosing $P ψ \in C^{\infty} (ℝ)$ , Sf will inherit the smoothness of the wavelet, and the wavelet invariant results will hold for any $f \in L^{1} (ℝ)$ and any k.

Remark 4.4

Since (Pf_τ)(ξ) = (Pf)((1 − τ)ξ), dilation will transport the frequency content at ξ to (1 − τ)ξ, so that the displacement is τξ. Thus, when ξ is very large, |(Pf)(ξ) − (Pf_τ)(ξ)| can be large even for τ small. Because the wavelet invariants bin the frequency content, and these bins become increasingly large in the high frequencies, this does not occur for wavelet invariants. More specifically, there is always a signal f and frequency ξ for which $| (P f) (ξ) - (\tilde{P f}) (ξ) |$ is large regardless of k. Consider for example when $(P f) (ω) = e^{- {(ω - ξ)}^{2}}$ . Then Ω_k(ξ) ∼ ξ^k, and $| (P f) (ξ) - (\tilde{P f}) (ξ) | ≳ 1$ . However, for M large enough, the order k wavelet invariant estimator satisfies $| (S f) (λ) - (\tilde{S f}) (λ) | = O (k Ψ_{k + 2} η^{k + 2})$ for all λ. The wavelet invariants are thus stable for high-frequency signals, where the power spectrum fails.

Remark 4.5

For the wavelet invariants there will be a unique $\tilde{k}$ that minimizes kΨ_k+2(2Eη)^k+2, and $\tilde{k}$ does not depend on λ. Furthermore, $\tilde{k}$ can be explicitly computed given the wavelet ψ and moment constant E. On the other hand, the minimum of kΩ_k+2(ω)(2Eω)^k+2 with respect to k will depend on both the frequency ω and the signal f, so that $\tilde{k} = \tilde{k} (ω, f)$ , and it becomes unclear how to choose the unbiasing order.

4.4. Simulation results for dilation MRA

We first illustrate the unbiasing procedure of Propositions 4.1 and 4.2 for the high-frequency signal $f (x) = e^{- 5 x^{2}} \cos (32 x)$ . Figure 3 shows the power spectrum estimator $\tilde{P f}$ and the wavelet invariant estimator $\tilde{P_{S} f}$ for k = 0, 2, 4 for both small and large dilations, where $\tilde{P_{S} f}$ denotes the combined wavelet invariant unbiasing plus optimization procedure (see Section 6.5). Higher order unbiasing is beneficial for both methods for small dilations but fails for the power spectrum for large dilations. Both methods will of course fail for η large enough, but for high-frequency signals the power spectrum fails much sooner.

Fig. 3. — Order k = 0, 2, 4 power spectrum estimators $\tilde{P f}$ (first two figures) and wavelet invariant estimators $\tilde{P_{S} f}$ (last two figures) for the signal $f_{3} (x) = e^{- 5 x^{2}} \cos (32 x)$ . Figures 3(a) and 3(c) show small dilations and Figs 3(b) and 3(d) show large dilations.

Next we compare ${‖ P f - \tilde{P f} ‖}_{2}$ and ${‖ P f - \tilde{P_{S} f} ‖}_{2}$ , the L² error of estimating the power spectrum of the target signal via the power spectrum estimators of Proposition 4.1 and via the wavelet invariant estimators of Proposition 4.2, followed by a convex optimization procedure. We consider order k = 0, 2, 4 estimators for both the power spectrum and wavelet invariants on the following Gabor atoms of increasing frequency:

f_{1} (x) = e^{- 5 x^{2}} \cos (8 x)

f_{2} (x) = e^{- 5 x^{2}} \cos (16 x)

f_{3} (x) = e^{- 5 x^{2}} \cos (32 x) .

These functions satisfy f = Real(h) where $(P h) (ω) = (π / 5) e^{- {(ω - ξ)}^{2} / 10}$ for ξ = 8, 16, 32, and thus exhibit the behavior described in Remark 4.4.

Simulation results are shown in Fig. 4; the horizontal axis shows log₂(M) while the vertical axis shows log₂(Error). For each value of M, the error was calculated for 10 independent simulations and then averaged. The unbiasing procedure of Propositions 4.1 and 4.2 requires knowledge of the moments of the dilation distribution, but in practice these are unknown. Thus, the first two even moments of the dilation distribution (η²,C₄η⁴) were estimated empirically with the fourth-order estimators described in Section 6.3 (see Definition 6.1). For the low-frequency signal, the fourth-order power spectrum estimator was best for both small and large dilations and is preferable due to the lower computational cost (see Remark 2.3). For the high-frequency signal, the fourth-order wavelet invariant estimator was best for large dilations and WSC k = 2 and k = 4 were best and equivalent for small dilations. For the medium-frequency signal, the higher order power spectrum estimators were best for small dilations while the higher order wavelet invariant estimators were best for large dilations. Thus, the simulation results confirm that the wavelet invariants will have an advantage over Fourier invariants when the signals are either high frequency or corrupted by large dilations. We remark that one obtains nearly identical error plots with oracle knowledge of the dilation moments, indicating that the empirical moment estimation procedure is highly accurate in the absence of additive noise, even for small M values.

5. Noisy dilation MRA model

Finally, we consider the noisy dilation MRA model (Model 2) where signals are randomly translated and dilated and corrupted by additive noise. Section 5.1 gives unbiasing results for wavelet invariants and Section 5.2 reports relevant simulations.

5.1. Wavelet inariant results for noisy dilation MRA

To state Proposition 5.1 as succinctly as possible, we also define the following quantity

Ψ : = \sum_{m = 0, 2, \dots, k} Ψ_{m} {(E η)}^{m},

(5.1)

where E is defined in (4.3) and Ψ_m is defined in (A1).

Proposition 5.1

Assume Model 2 and that ψ is (k + 2)-admissible. Define the following estimator of (Sf)(λ):

(\tilde{S f}) (λ) : = \frac{1}{M} \sum_{j = 1}^{M} [(S y_{j}) (λ) - B_{2} η^{2} λ^{2} {(S y_{j})}^{''} (λ) - \dots - B_{k} η^{k} λ^{k} {(S y_{j})}^{(k)} (λ)] - σ^{2}

where the constants B_i satisfy (4.1). Then with probability at least 1 − 1/t²

| (\tilde{S f}) (λ) - (S f) (λ) | ≲ k Λ_{k + 2} (λ) {(2 E η)}^{k + 2} + \frac{t}{\sqrt{M}} [k Λ (λ) + Ψ σ^{2} + \sqrt{Ψ (Λ_{0} (λ) + Λ (λ))} σ],

(5.2)

where E, Λ(λ), Ψ are as defined in (4.3), (4.5), (5.1).

The following corollary is an immediate consequence of Proposition 5.1.

Corollary 5.1

Let the assumptions of Proposition 5.1 hold, and in addition assume Ψ_i(2Eη)ⁱ is decreasing for i ⩽ k + 2. Then with probability at least 1 − 1/t²

| (\tilde{S f}) (λ) - (S f) (λ) | ≲ k Ψ_{k + 2} {(2 E η)}^{k + 2} ‖ f ‖_{1}^{2} + \frac{t k}{\sqrt{M}} [k η ‖ f ‖_{1}^{2} + σ ‖ f ‖_{1} + σ^{2}] .

(5.3)

We remark that there are two components to the estimation error bounded by the right-hand side of (5.3): the first two terms are the error due to dilation, as in Corollary 4.1 of Proposition 4.2, and the last two terms are the error due to additive noise, as given in Proposition 3.2. Thus, the wavelet invariant representation allows for a decomposition of the error of the noisy dilation MRA model into the sum of the errors of the random dilation model and the additive noise model. This is possible because the representation inherits the differentiability of the wavelet and is not possible when $P ψ \notin C^{k} (ℝ)$ , in which case the dilation unbiasing procedure has a more complicated effect on the additive noise. A result equivalent to Proposition 5.1 cannot be made for the power spectrum, because the nonlinear unbiasing procedure of Proposition 4.1 cannot be applied to the power spectra of signals from the noisy dilation MRA corruption model, since they are not differentiable in the presence of additive noise.

Proof of Proposition 5.1.

Since Sf is a translation invariant representation, we can ignore the translation factors ${t_{j}}_{j = 1}^{M}$ and consider the model $y_{j} = f_{τ_{j}} + ε_{j}$ . For notational convenience, we define the following order k derivative ‘unbiasing’ operator:

A_{λ} g (λ) : = g (λ) - B_{2} η^{2} λ^{2} \frac{d}{d λ^{2}} g (λ) - \dots - B_{k} η^{k} λ^{k} \frac{d}{d λ^{k}} g (λ),

(5.4)

which is defined on any function of λ, so that we can express our estimator by

(\tilde{S f}) (λ) = \frac{1}{M} \sum_{j = 1}^{M} [\frac{1}{2 π} \int {| {\hat{y}}_{j} (ω) |}^{2} A_{λ} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω] - σ^{2} = \frac{1}{M} \sum_{j = 1}^{M} [\frac{1}{2 π} \int (| {{\hat{f}}_{τ_{j}} (ω) |}^{2} + {\hat{f}}_{τ_{j}} (ω) {\bar{\hat{ε}}}_{j} (ω) + {\bar{\hat{f}}}_{τ_{j}} (ω) {\hat{ε}}_{j} (ω) + {| {\hat{ε}}_{j} (ω) |}^{2}) A_{λ} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω] - σ^{2} .

We can thus decompose the error as follows:

\begin{array}{l} | (\tilde{S f}) (λ) - (S f) (λ) | ⩽ \underset{Cross Term Error}{\underset{︸}{{| \frac{1}{M} \sum_{j = 1}^{M} \frac{1}{2 π} \int ({\hat{f}}_{τ_{j}} (ω) {\bar{\hat{ε}}}_{j} (ω) + {\bar{\hat{f}}}_{τ_{j}} (ω) {\hat{ε}}_{j} (ω)) A_{λ} | {\hat{ψ}}_{λ} (ω) |}^{2} d ω ∣}} ∣ \\ + \underset{Dilation Error}{\underset{︸}{{| \frac{1}{M} \sum_{j = 1}^{M} \frac{1}{2 π} \int | {\hat{f}}_{τ_{j}} (ω) |}^{2} A_{λ} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω - (S f) (λ) ∣}} + \underset{Additive Noise Error}{\underset{︸}{{| \frac{1}{M} \sum_{j = 1}^{M} \frac{1}{2 π} \int | {\hat{ε}}_{j} (ω) |}^{2} A_{λ} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω - σ^{2} ∣}} . \end{array}

To bound the above terms we utilize the following two Lemmas, which are proved in Appendix F.

Lemma 5.1

Let the notation and assumptions of Proposition 5.1 hold, and let A_λ be the operator defined in (5.4). Then with probability at least 1 − 1/t²

{| \frac{1}{M} \sum_{j = 1}^{M} \frac{1}{2 π} \int | {\hat{ε}}_{j} (ω) |}^{2} A_{λ} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω - σ^{2} ∣ ⩽ \frac{2 t \sqrt{k} Ψ σ^{2}}{\sqrt{M}} .

Lemma 5.2

Let the notation and assumptions of Proposition 5.1 hold, and let A_λ be the operator defined in (5.4). Then with probability at least 1 − 1/t²

{| \frac{1}{M} \sum_{j = 1}^{M} \frac{1}{2 π} \int ({\hat{f}}_{τ_{j}} (ω) {\bar{\hat{ε}}}_{j} (ω) + {\bar{\hat{f}}}_{τ_{j}} (ω) {\hat{ε}}_{j} (ω)) A_{λ} | {\hat{ψ}}_{λ} (ω) |}^{2} d ω | ≲ \frac{t}{\sqrt{M}} \sqrt{Ψ (Λ_{0} (λ) + Λ (λ))} σ .

Applying Proposition 4.2 to bound the dilation error, Lemma 5.1 to bound the additive noise error, and Lemma 5.2 to bound the cross term error gives (5.2). □

5.2. Simulation results for noisy dilation MRA

We once again consider the Gabor atoms of varying frequency introduced in Section 4.4, and compare the L² error of estimating the power spectrum by (1) averaging the power spectra of the noisy signals, and applying additive noise unbiasing; this is the zero-order power spectrum method (PS k = 0), defined in Proposition 3.1, and (2) by approximating the wavelet invariants by the estimators given in Proposition 5.1 for k = 0, 2, 4, and then applying the optimization procedure described in Section 6.5; we refer to these methods as WSC k = i for i = 0, 2, 4. We emphasize that for the noisy dilation MRA model, it is impossible to define higher order methods for the power spectrum.

We first consider the errors obtained given oracle knowledge of the noise moments, both additive and dilation. Results are shown in Fig. 5 for all parameter combinations resulting from σ = 2⁻⁴, 2⁻³ (giving SNR = 2.2, 0.56) and η = 0.06, 0.12. The horizontal axis shows log₂(M) and the vertical axis shows log₂(Error); for each value of M, the error was calculated for 10 independent simulations and then averaged. For all simulations τ was given a uniform distribution, a challenging regime for dilations, and the sample size ranged over 16 ⩽ M ⩽ 131, 072. For the medium- and high-frequency signals, for large enough M, WSC k = 2 and WSC k = 4 have significantly smaller error than the order zero estimators, indicating that the nonlinear unbiasing procedure of Proposition 5.1 contributes a definitive advantage. For the high-frequency signal and large M, the error using WSC k = 4 is decreased by a factor of about 3 from the PS k = 0 error. For small dilations (η = 0.06), there is not much of a difference in performance between WSC k = 2 and WSC k = 4, but the gap between these estimators widens for large dilations (η = 0.12), as the fourth-order correction becomes more important. For the low-frequency signal under small dilations, PS k = 0 achieves the smallest error for large M. However, when M is small or the dilations are large, the WSC estimators have the advantage for the low-frequency signal as well, and WSC k = 4 is once again the best estimator for large M.

We note that although in general recovering the power spectrum is insufficient for recovering the signal, the signal can be recovered when $\hat{f} (ω) \in ℝ$ and $\hat{f} (ω) ⩾ 0$ by taking the inverse Fourier transform of the root power spectrum. Figure 6 shows the approximate signals recovered by this procedure from PS k = 0 (Fig. 6(c)) and WSC k = 4 (Fig. 6(b)) for the high-frequency Gabor signal f₃(x) (Fig. 6(a)). The WSC-recovered signal is a much better approximation of the target signal. The recovered power spectra are shown in Fig. 6(d); PS k = 0 is much flatter than the target power spectrum, while WSC k = 4 is a good approximation of both the shape and height of the target power spectrum.

Fig. 6. — Signal recovery results for $f_{3} (x) = e^{- 5 x^{2}} \cos (32 x)$ with M = 20, 000, η = 0.12, SNR = 2.2.

Appendix G outlines an empirical procedure for estimating the moments of τ in the special case when t = 0 in the noisy dilation MRA model (i.e. no random translations). All simulations reported in Fig. 5 are repeated (with minor modifications) with empirical additive and dilation moment estimation, and the results are reported in Fig. G7 of Appendix G.

Appendix H contains additional simulation results for a variety of high-frequency signals.

Remark 5.1

One could also solve noisy dilation MRA with an EM algorithm. Appendix I describes how the method proposed in [1] can be extended to solve Model 2. Although EM algorithms provide a flexible tool for accurate parameter estimation in a variety of MRA models, the primary disadvantage is the high computational cost of each iteration. Each iteration costs O(Mn³), while wavelet invariant estimators can be computed in O(Mn²). In addition the statistical priors chosen may bias the signal reconstruction [12], and the algorithm will generally only converge to a local maximum. In this article we thus explore whether it is possible to solve noisy dilation MRA more efficiently and accurately by nonlinear unbiasing procedures.

6. Numerical implementation

In this section we describe the numerical implementation of the proposed method used to generate the results reported in Sections 3, 4.4 and 5.2. Section 6.1 describes how signals were generated, and Sections 6.2 and 6.3 describe empirical procedures for estimating the additive noise level and the moments of the dilation distribution τ. Finally, Section 6.4 discusses how the derivatives used for unbiasing were computed, and Section 6.5 describes the convex optimization algorithm used to recover Pf from Sf. All simulations used a Morlet wavelet constructed with ξ = 3π/4.

6.1. Signal generation and SNR

All signals were defined on [−N/4, N/4] and then padded with zeros to obtain a signal defined on [−N/2, N/2]; the additive noise was also defined on [−N/2, N/2]. Signals were sampled at a rate of 1/2^ℓ, thus resolving frequencies in the interval [−2^ℓπ, 2^ℓπ] with a frequency sampling rate of 2π/N. We used N = 2⁵ and ℓ = 5 in all experiments, keeping the box size and resolution fixed. For each experiment with hidden signal f, the SNR was calculated by $SNR = (\frac{1}{N} \int_{- N / 2}^{N / 2} f {(x)}^{2} d x) / σ^{2}$ .

6.2. Empirical estimation of additive noise level

The additive noise level σ² can be estimated from the mean vertical shift of the mean power spectrum $\frac{1}{M} \sum_{j = 1}^{M} {| {\hat{y}}_{j} (ω) |}^{2}$ in the tails of the distribution. Specifically, for Σ = [−2^ℓπ, 2^ℓπ] \ [−2^ℓ−1π, 2^ℓ−1π], we define

{\tilde{σ}}^{2} = \frac{1}{| Σ |} \sum_{ω \in Σ} \frac{1}{M} \sum_{j = 1}^{M} {| {\hat{y}}_{j} (ω) |}^{2} .

If we choose ℓ large enough so that the target signal frequencies are essentially contained in the interval $[- 2^{ℓ - 1} π, 2^{ℓ - 1} π], {| {\hat{y}}_{j} (ω) |}^{2} = {| {\hat{ε}}_{j} (ω) |}^{2}$ for ω ∈ Σ, and this is a robust and unbiased estimation procedure since $E {| {\hat{ε}}_{j} (ω) |}^{2} = σ^{2}$ by Lemma D.1.

6.3. Empirical moment estimation for dilation MRA

Given the additive noise level, the moments of the dilation distribution τ for dilation MRA (Model 3) can be empirically estimated from the mean and variance of the random variables α_m(y_j) defined by

α_{m} (y_{j}) = \int_{0}^{2^{ℓ} π} ω^{m} {| {\hat{y}}_{j} (ω) |}^{2} d ω

(6.1)

for integer m ⩾ 0. More specifically, we define the order m squared coefficient of variation by

C V_{m} : = \frac{Var [α_{m} (y_{j})]}{{| E [α_{m} (y_{j})] |}^{2}} .

(6.2)

The following proposition guarantees that for M large the second and fourth moments of the dilation distribution can be recovered from CV₀, CV₁. In fact one could continue this procedure for higher m values, i.e. ${C V_{m}}_{m = 0}^{k / 2 - 1}$ will define estimators of the first $\frac{k}{2}$ even moments of τ, accurate up to O(η^k+2), but for brevity we omit the general case.

Proposition 6.1

Assume Model 3 and CV₀, CV₁ defined by (6.1) and (6.2). Then

C V_{0} = η^{2} + (3 C_{4} - 3) η^{4} + O (η^{6})

C V_{1} = 4 η^{2} + (25 C_{4} - 33) η^{4} + O (η^{6}) .

Proof.

Since $y_{j} = L_{τ_{j}} f (x - t_{j})$ ,

α_{m} (y_{j}) = {\int_{0}^{2^{ℓ} π} ω^{m} | \hat{f} ((1 - τ_{j}) ω) |}^{2} d ω = {\int_{0}^{2^{ℓ} π (1 - τ_{j})} \frac{ξ^{m}}{{(1 - τ_{j})}^{m}} | \hat{f} (ξ) |}^{2} \frac{d ξ}{(1 - τ_{j})} = {(1 - τ_{j})}^{- (m + 1)} α_{m} (f),

where we assume we have chosen ℓ large enough so that the target signal frequencies are essentially supported in [−2^ℓ−1π, 2^ℓ−1π]. Thus,

C V_{m} = \frac{E [α_{m} {(y_{j})}^{2}] - {(E [α_{m} (y_{j})])}^{2}}{{(E [α_{m} (y_{j})])}^{2}} = \frac{E [{(1 - τ_{j})}^{- 2 (m + 1)}]}{{(E [{(1 - τ_{j})}^{- (m + 1)}])}^{2}} - 1.

When m = 0, we have

C V_{0} = \frac{E [{(1 - τ_{j})}^{- 2}]}{{(E [{(1 - τ_{j})}^{- 1}])}^{2}} - 1 = \frac{E [1 + 2 τ + 3 τ^{2} + 4 τ^{3} + 5 τ^{4} + O (τ^{5})]}{{(E [1 + τ + τ^{2} + τ^{3} + τ^{4} + O (τ^{5})])}^{2}} - 1 = \frac{1 + 3 η^{2} + 5 C_{4} η^{4} + O (η^{6})}{{(1 + η^{2} + C_{4} η^{4} + O (η^{6}))}^{2}} - 1 = \frac{1 + 3 η^{2} + 5 C_{4} η^{4} + O (η^{6}))}{1 + 2 η^{2} + (2 C_{4} + 1) η^{4} + O (η^{6})} - 1 = (1 + 3 η^{2} + 5 C_{4} η^{4} + O (η^{6})) (1 - 2 η^{2} + (3 - 2 C_{4}) η^{4} + O (η^{6})) - 1 = η^{2} + (3 C_{4} - 3) η^{4} + O (η^{6}) .

When m = 1, we have

C V_{1} = \frac{E [{(1 - τ_{j})}^{- 4}]}{{(E [{(1 - τ_{j})}^{- 2}])}^{2}} - 1 = \frac{E [1 + 4 τ + 10 τ^{2} + 20 τ^{3} + 35 τ^{4} + O (τ^{5})]}{{(E [1 + 2 τ + 3 τ^{2} + 4 τ^{3} + 5 τ^{4} + O (τ^{5})])}^{2}} - 1 = \frac{1 + 10 η^{2} + 35 C_{4} η^{4} + O (η^{6})}{{(1 + 3 η^{2} + 5 C_{4} η^{4} + O (η^{6}))}^{2}} - 1 = \frac{1 + 10 η^{2} + 35 C_{4} η^{4} + O (η^{6})}{(1 + 6 η^{2} + (9 + 10 C_{4}) η^{4} + O (η^{6}))} - 1 = (1 + 10 η^{2} + 35 C_{4} η^{4} + O (η^{6})) (1 - 6 η^{2} + (27 - 10 C_{4}) η^{4} + O (η^{6})) - 1 = 4 η^{2} + (25 C_{4} - 33) η^{4} + O (η^{6}) .

We cannot compute CV_m exactly, but by replacing Var, $E$ with their finite sample estimators, we obtain an approximate ${\tilde{C V}}_{m} \to C V_{m}$ as M → ∞. Motivated by Proposition G.1, we thus use ${\tilde{C V}}_{0}$ , ${\tilde{C V}}_{1}$ to define estimators of η² and C₄η⁴.

Definition 6.1

Assume Model 3 and let ${\tilde{C V}}_{0}$ , ${\tilde{C V}}_{1}$ be the empirical versions of (6.2). Define the second-order estimator of η² by ${\tilde{η}}^{2} = {\tilde{C V}}_{0}$ . Define the fourth-order estimators of (η², C₄η⁴) by the unique positive solution ( ${\tilde{η}}^{2}$ , ${\tilde{C}}_{4}$ ) of

{\tilde{C V}}_{0} = η^{2} + (3 C_{4} - 3) η^{4}

{\tilde{C V}}_{1} = 4 η^{2} + (25 C_{4} - 33) η^{4} .

For noisy dilation MRA (Model 2), estimating the dilation moments is more difficult. We give a procedure for estimating the moments in the special case t = 0 in Appendix G. Empirical moment estimation procedures that are simultaneously robust to translations, dilations and additive noise are an important area of future research.

6.4. Derivatives

All derivatives were approximated numerically using finite difference calculations. A sixth-order finite difference approximation was used for second derivatives, and a fourth-order finite difference approximation was used for fourth derivatives. This procedure was done on the empirical mean for each representation, not the individual signals. In fact since the wavelet is known, $\frac{d^{n}}{d λ^{n}} {| {\hat{ψ}}_{λ} (ω) |}^{2}$ could be computed analytically, and (Sy_j)⁽ⁿ⁾(λ) computed using Definition 2.2. Thus error due to finite difference approximations could be avoided for wavelet invariant derivatives.

6.5. Optimization

In this section we describe the convex optimization algorithm for computing $(\tilde{P_{S} f})$ , the power spectrum approximation that best matches the wavelet invariants $(\tilde{S f})$ . Since the wavelet invariants are only computed for λ > 0, we also incorporate zero frequency information into the loss function via $(\tilde{P f}) (0)$ , an approximation of the power spectrum at frequency zero. For all of the examples reported in this article, the quasi-newton algorithm was used to solve an unconstrained optimization problem minimizing the following convex loss function:

loss (\hat{g}) : = \sum_{λ} {(〈 {\hat{g}}^{2}, {| {\hat{ψ}}_{λ}^{+} |}^{2} 〉 - \tilde{S f} (λ))}^{2} + {(\hat{g} {(0)}^{2} - (\tilde{P f}) (0))}^{2} .

where

{| {\hat{ψ}}_{λ}^{+} (ω) |}^{2} = ({| {\hat{ψ}}_{λ} (ω) |}^{2} + {| {\hat{ψ}}_{λ} (- ω) |}^{2}) \cdot 1 (ω ⩾ 0) .

Letting $\hat{g} *$ denote the minimizer of the above loss function, we then define $(\tilde{P_{S} f}) : = \hat{g} * {(ω)}^{2}$ . Theorem 2.4 ensures that when the loss function is defined with the exact wavelet invariants Sf, it has a unique minimizer corresponding to Pf. Whenever $f (x) \in ℝ$ , the symmetry of (Pf)(ω) ensures that $(S f) (λ) = 〈 | \hat{f} |^{2}, {| {\hat{ψ}}_{λ}^{+} |}^{2} 〉$ , and thus it is sufficient to optimize over the non-negative frequencies and then symmetrically extend the solution. Such a procedure ensures the output of the optimization algorithm is symmetric while avoiding adding constraints to the optimization. The algorithm was initialized using the mean power spectrum with additive noise unbiasing only, i.e. PS k = 0. The optimization output does depend on various numerical tolerance parameters, which were held fixed for all examples.

Remark 6.1

Alternatively, one can invert the representation by applying a pseudo-inverse with Tikhonov regularization. Specifically, if F is the matrix defining the wavelet invariants, so that Sy = F(Py), then one can define $(\tilde{P_{S} f}) = {(F^{T} F + λ I)}^{- 1} F^{T} (\tilde{S f})$ . This procedure however requires careful selection of the hyper-parameter λ and did not work as well as inverting via optimization in our experiments.

7. Conclusion

This article considers a generalization of classic MRA, which incorporates random dilations in addition to random translations and additive noise and proposes solving the problem with a wavelet invariant representation. These wavelet invariants have several desirable properties over Fourier invariants, which allow for the construction of unbiasing procedures that cannot be constructed for Fourier invariants. Unbiasing the representation is critical for high-frequency signals, where even small diffeomorphisms cause a large perturbation. After unbiasing, the power spectrum of the target signal can be recovered from a convex optimization procedure.

Several directions remain for further investigation, including extending results to higher dimensions and considering rigid transformations instead of translations. Such extensions could be especially relevant to image processing, where variations in the size of an object can be modeled as dilations. Incorporating the effect of tomographic projection would also lead to results more directly relevant to problems such as cryo-EM. The tools of the present article, although significantly reducing the bias, do not allow for a completely unbiased estimator for noisy dilation MRA due to the bad scaling of certain intrinsic constants. Thus, an important open question is whether it is possible to define unbiased estimators for noisy dilation MRA using a different approach. The noisy dilation MRA model of this article corresponds to linear diffeomorphisms, and constructing unbiasing procedures that apply to more general diffeomorphisms is also an important future direction. In addition, one can construct wavelet invariants that characterize higher order auto-correlation functions such as the bispectrum, and future work will investigate full signal recovery with such invariants.

Acknowledgements

We would like to thank the reviewers for their detailed comments and insights that greatly improved the manuscript. We would also like to thank Stephanie Hickey for providing useful references on flexible regions of macromolecular structures.

Funding

Alfred P. Sloan Foundation (Sloan Fellowship FG-2016–6607 to M.H.); Defense Advanced Research Projects Agency (Young Faculty Award D16AP00117 to M.H.); National Science Foundation (grant 1912906 to A.L.; grant 1620216 and CAREER award 1845856 to M.H.).

A. Wavelet admissibility conditions

This appendix describes the wavelet admissibility conditions that are needed for the main results in this article, namely Propositions 4.2 and 5.1. The wavelet ψ is k-admissible if $\hat{ψ} \in C^{k} (ℝ)$ and Ψ_k < ∞, Θ_k < ∞ where

Ψ_{k} : = \frac{1}{2 π} \sum_{i = 0}^{k} (\begin{array}{l} k \\ i \end{array}) \frac{k!}{i!} {‖ ω^{i} {(P ψ)}^{(i)} (ω) ‖}_{1},

(A.1)

Θ_{k} : = \frac{1}{2 π} \sum_{i = 0}^{k} (\begin{array}{l} k \\ i \end{array}) \frac{k!}{i!} {‖ ω^{i - 2} {(P ψ)}^{(i)} (ω) ‖}_{1} .

(A.2)

For ψ to be k-admissible, it is sufficient for $\hat{ψ} \in C^{k} (ℝ)$ , (Pψ)⁽ⁱ⁾ to decay faster than ωⁱ⁺¹, and $\int \frac{| \hat{ψ} (ω) |^{2}}{ω^{2}} d ω < \infty$ (see Lemma B.1 in Appendix B). The condition $\int \frac{| \hat{ψ} (ω) |^{2}}{ω^{2}} d ω < \infty$ is slightly stronger than the classic admissability condition $C_{ψ} : = \int \frac{| \hat{ψ} (ω) |^{2}}{ω} d ω < \infty$ [55, Theorem 4.4]. When $\hat{ψ}$ is continuously differentiable, $\hat{ψ} (0) = 0$ is sufficient to guarantee C_ψ < ∞; but here we need $\hat{ψ} (ω) ~ ω^{\frac{1}{2} + ϵ}$ for some ϵ > 0 as ω → 0. If this condition is removed, we are not guaranteed Θ_k < ∞, but all results in fact still hold, with $Λ_{k} (λ) = Ψ_{k} ‖ f ‖_{1}^{2}$ replacing $Λ_{k} (λ) = Ψ_{k} ‖ f ‖_{1}^{2} \land \frac{Θ_{k} {‖ f^{'} ‖}_{1}^{2}}{λ^{2}}$ in Propositions 4.2 and 5.1. Any wavelet with fast decay satisfies this stronger admissibility condition, and it ensures that a smooth signal will enjoy a fast decay of wavelet invariants.

Remark A.1

The Morlet wavelet ψ(x) =→g(x)(e^iξx − C) is k-admissible for any k, since $\hat{ψ} \in C^{\infty} (ℝ)$ , Pψ has fast decay, and $\hat{ψ} (ω) ~ ω$ as ω → 0. One can also choose $\hat{ψ}$ to be an order k + 1-spline of compact support.

B. Properties of wavelet invariants

This appendix establishes several important properties of wavelet invariants. Lemma B.1 gives sufficient conditions guaranteeing that a wavelet is k-admissible. Lemmas 4.3 and 4.4 bound wavelet invariant derivatives. Lemma B.2 bounds terms that arise in the dilation unbiasing procedure of Sections 4.2 and 5.

Lemma B.1 (k-admissible).

If $\hat{ψ} \in C^{k} (ℝ)$ , (Pψ)⁽ⁱ⁾ decays fast than ωⁱ⁺¹, and $\int \frac{| \hat{ψ} (ω) |^{2}}{ω^{2}} d ω < \infty$ , then ψ is k-admissible.

Proof.

We first note that $\hat{ψ} \in C^{k} (ℝ)$ guarantees $P ψ \in C^{k} (ℝ)$ . Since (Pψ)⁽ⁱ⁾ decays faster than ωⁱ⁺¹ and $P ψ \in C^{k} (ℝ)$ , $ω^{i} {(P ψ)}^{(i)} (ω) \in L^{1} (ℝ)$ for 0 ⩽ i ⩽ k, so Ψ_k < ∞. Also $P ψ \in C^{k} (ℝ)$ and $ω^{i} {(P ψ)}^{(i)} \in L^{1} (ℝ)$ implies $ω^{i - 2} {(P ψ)}^{(i)} \in L^{1} (ℝ)$ for 2 ⩽ i ⩽ k. In addition, $ω^{- 2} (P ψ) (ω) \in L^{1} (ℝ)$ by assumption. Thus, to conclude Θ_k < ∞, it only remains to show $ω^{- 1} {(P ψ)}^{'} (ω) \in L^{1} (ℝ)$ . Since (Pψ)′ is continuous and decays faster than ω², only the integrability around the origin needs to be verified. We note that $\int \frac{| \hat{ψ} (ω) |^{2}}{ω^{2}} d ω < \infty$ and Pψ continuous implies Pψ ∼ ω^1+ϵ for some ϵ > 0 as ω → 0. Thus, (Pψ)′ ∼ ω^ϵ as ϵ → 0, so that ω⁻¹(Pψ)′ ∼ ω^ϵ−1; the function is thus integrable around the origin since ϵ − 1 > −1. □

Lemma 4.3 (Low frequency bound).

Assume $P ψ \in C^{m} (ℝ)$ and $f \in L^{1} (ℝ)$ . Then the quantity |λ^m(Sf)^(m)(λ)| can be bounded uniformly over all λ. Specifically,

| λ^{m} {(S f)}^{(m)} (λ) | ⩽ Ψ_{m} ‖ f ‖_{1}^{2}

for Ψ_m defined in (A.1).

Proof.

Let $g (ω) = (P ψ) (ω) = | \hat{ψ} (ω) |^{2}$ , and let

g_{λ} (ω) : = \frac{1}{λ} g (\frac{ω}{λ}) = {| {\hat{ψ}}_{λ} (ω) |}^{2} .

Utilizing Definition 2.2 we obtain

λ^{m} {(S f)}^{(m)} (λ) = \frac{1}{2 π} \int | \hat{f} (ω) |^{2} [λ^{m} \frac{d^{m}}{d λ^{m}} g_{λ} (ω)] d ω .

Expanding the derivative gives

λ^{m} \frac{d^{m}}{d λ^{m}} g_{λ} (ω) = C_{m, 0} g_{λ} (ω) + C_{m, 1} ω g_{λ}^{'} (ω) + C_{m, 2} ω^{2} g_{λ}^{''} (ω) + \dots C_{m, m} ω^{m} g_{λ}^{(m)} (ω), C_{m, i} = {(- 1)}^{m} (\begin{matrix} m \\ i \end{matrix}) \frac{m!}{i!} .

Utilizing $‖ \hat{f} ‖_{\infty} ⩽ ‖ f ‖_{1}$ and $g_{λ}^{(i)} (ω) = \frac{1}{λ^{i + 1}} g^{(i)} (\frac{ω}{λ})$ , one obtains

| λ^{m} {(S f)}^{(m)} (λ) | ⩽ \sum_{i = 0}^{m} \frac{| C_{m, i} |}{2 π} \int | \hat{f} (ω) |^{2} | ω^{i} g_{λ}^{(i)} (ω) | d ω ⩽ ‖ f ‖_{1}^{2} \sum_{i = 0}^{m} \frac{| C_{m, i} |}{2 π} \int | ω^{i} g_{λ}^{(i)} (ω) | d ω = ‖ f ‖_{1}^{2} \sum_{i = 0}^{m} \frac{| C_{m, i} |}{2 π} \int | ω^{i} g^{(i)} (ω) | d ω = ‖ f ‖_{1}^{2} \sum_{i = 0}^{m} \frac{| C_{m, i} |}{2 π} \cdot {‖ ω^{i} g^{(i)} (ω) ‖}_{1} = Ψ_{m} ‖ f ‖_{1}^{2} .

Lemma 4.4 (High frequency bound for differentiable functions).

Assume $P ψ \in C^{m} (ℝ)$ , and $f^{'} \in L^{1} (ℝ)$ . Then the quantity |λ^m(Sf)^(m)(λ)| can be bounded by

| λ^{m} {(S f)}^{(m)} (λ) | ⩽ \frac{Θ_{m}}{λ^{2}} {‖ f^{'} ‖}_{1}^{2}

for Θ_m defined in (A.2).

Proof.

Recall from the proof of Lemma 4.3 that

| λ^{m} {(S f)}^{(m)} (λ) | ⩽ \sum_{i = 0}^{m} \frac{| C_{m, i} |}{2 π} \int | \hat{f} (ω) |^{2} | ω^{i} g_{λ}^{(i)} (ω) | d ω

where $g_{λ} (ω) = \frac{1}{λ} g (\frac{ω}{λ}) = {| {\hat{ψ}}_{λ} (ω) |}^{2}$ and $C_{m, i} = {(- 1)}^{m} (\begin{matrix} m \\ i \end{matrix}) \frac{m!}{i!}$ . Since $‖ ω \hat{f} (ω) ‖_{\infty} ⩽ {‖ f^{'} ‖}_{1}$ and $g_{λ}^{(i)} (ω) = \frac{1}{λ^{i + 1}} g^{(i)} (\frac{ω}{λ})$ , we obtain

| λ^{m} {(S f)}^{(m)} (λ) | ⩽ \sum_{i = 0}^{m} \frac{| C_{m, i} |}{2 π} \int | ω \hat{f} (ω) |^{2} | ω^{i - 2} g_{λ}^{(i)} (ω) | d ω ⩽ {‖ f^{'} ‖}_{1}^{2} \sum_{i = 0}^{m} \frac{| C_{m, i} |}{2 π} \int | ω^{i - 2} g_{λ}^{(i)} (ω) | d ω = \frac{{‖ f^{'} ‖}_{1}^{2}}{λ^{2}} \sum_{i = 0}^{m} \frac{| C_{m, i} |}{2 π} \int | ω^{i - 2} g^{(i)} (ω) | d ω = \frac{{‖ f^{'} ‖}_{1}^{2}}{λ^{2}} \sum_{i = 0}^{m} \frac{| C_{m, i} |}{2 π} \cdot {‖ ω^{i - 2} g^{(i)} (ω) ‖}_{1} = \frac{Θ_{m}}{λ^{2}} {‖ f^{'} ‖}_{1}^{2} .

Lemma B.2

Assume $P f \in C^{0} (ℝ)$ and ψ is m-admissible, and let B_m, E,Ψ_m,Θ_m be as defined in (4.1), (4.3), (A.1) (A.2). Then,

{\frac{1}{2 π} \int | \hat{f} (ω) |^{2} \cdot | B_{m} η^{m} λ^{m} \frac{d^{m}}{d λ^{m}} | {\hat{ψ}}_{λ} (ω) |}^{2} ∣ d ω ⩽ {(E η)}^{m} Λ_{m} (λ),

where

Λ_{m} (λ) = (‖ f ‖_{1}^{2} Ψ_{m} \land \frac{{‖ f^{'} ‖}_{1}^{2} Θ_{m}}{λ^{2}}) .

Proof.

From the proof of Lemma 4.3:

{\frac{1}{2 π} \int | \hat{f} (ω) |^{2} \cdot | λ^{m} \frac{d^{m}}{d λ^{m}} | {\hat{ψ}}_{λ} (ω) |}^{2} ∣ d ω ⩽ Ψ_{m} ‖ f ‖_{1}^{2} .

From the proof of Lemma 4.4:

{\frac{1}{2 π} \int | \hat{f} (ω) |^{2} \cdot | λ^{m} \frac{d^{m}}{d λ^{m}} | {\hat{ψ}}_{λ} (ω) |}^{2} ∣ d ω ⩽ Θ_{m} \frac{{‖ f^{'} ‖}_{1}^{2}}{λ^{2}} .

Utilizing |B_m| ⩽ E_m gives

{\frac{1}{2 π} \int | \hat{f} (ω) |^{2} \cdot | B_{m} η^{m} λ^{m} \frac{d^{m}}{d λ^{m}} | {\hat{ψ}}_{λ} (ω) |}^{2} ∣ d ω ⩽ {(E η)}^{m} (‖ f ‖_{1}^{2} Ψ_{m} \land \frac{{‖ f^{'} ‖}_{1}^{2} Θ_{m}}{λ^{2}}) .

The following Corollary is obtained from Lemma B.2 when f is a dirac-delta function.

Corollary B.1

Assume ψ is m-admissible, and let B_m, E,Ψ_m be as defined in (4.1), (4.3), (A1). Then,

{\frac{1}{2 π} \int | B_{m} η^{m} λ^{m} \frac{d^{m}}{d λ^{m}} | {\hat{ψ}}_{λ} (ω) |}^{2} ∣ d ω ⩽ {(E η)}^{m} Ψ_{m} .

C. Power spectrum and wavelet invariant equivalence

This appendix contains supporting results for demonstrating the equivalence of the power spectrum and wavelet invariants. Lemma 2.1 establishes that wavelet invariants uniquely determine any bandlimited L² function, as long as the wavelet satisfies the linear independence Condition 2.3 and a mild integrability condition. Proposition 2.1 gives two criteria that are sufficient to guarantee Condition 2.3. Finally, Lemma C.1 establishes that the Morlet wavelet satisfies Condition 2.3.

Lemma 2.1

Let $p \in L^{2} (ℝ)$ be continuous and assumep(w) = p(−w), $\hat{ψ}$ has compact support and Condition 2.3. Then

Proof.

Since p is continuous, there exists an ϵ > 0 such that on (0, ϵ) one either has p = 0, p > 0, or p < 0. Claim: one must have p = 0. Suppose not, and without loss of generality assume p > 0 on (0, ϵ) and that the support of ${| {\hat{ψ}}^{+} (ω) |}^{2}$ is contained in the interval [1, 2]. Now choose λ₀ small enough so that ${| {\hat{ψ}}_{λ_{0}}^{+} (ω) |}^{2}$ is supported on [ϵ/1, ϵ) i.e. λ₀ = ϵ/2. Clearly, there must exist a subset $M \subseteq [ϵ / 2, ϵ]$ of positive measure such that ${| {\hat{ψ}}_{λ_{0}}^{+} (ω) |}^{2} > 0$ on $M$ . Then,

0 = \int_{0}^{\infty} p (ω) {| {\hat{ψ}}_{λ_{0}}^{+} (ω) |}^{2} d ω = \int_{ϵ / 2}^{ϵ} p (ω) {| {\hat{ψ}}_{λ_{0}}^{+} (ω) |}^{2} d ω ⩾ \int_{M} p (ω) {| {\hat{ψ}}_{λ_{0}}^{+} (ω) |}^{2} d ω ⩾ 0.

We conclude

\int_{M} p (ω) {| {\hat{ψ}}_{λ_{0}}^{+} (ω) |}^{2} d ω = 0,

but this is impossible since the integrand is strictly positive on $M$ . We thus conclude that p = 0 on (0, ϵ). Thus, it is sufficient to only consider frequencies [ϵ, ∞].

Assume $\int p (ω) {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω = 0$ for all λ. Since p(ω) = p(−ω),

\int p (ω) {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω = \int_{0}^{\infty} p (ω) {| {\hat{ψ}}_{λ}^{+} (ω) |}^{2} d ω = \int_{ϵ}^{\infty} p (ω) {| {\hat{ψ}}_{λ}^{+} (ω) |}^{2} d ω = {〈 p, {| {\hat{ψ}}_{λ}^{+} |}^{2} 〉}_{I} = 0 \forall λ,

where I = [ϵ, ∞). We now define ${| {\hat{ϕ}}_{λ}^{+} (ω) |}^{2} : = λ^{- β} {| {\hat{ψ}}_{λ}^{+} (ω) |}^{2}$ for some β > 0, and observe that

\int_{0}^{\infty} p (ω) {| {\hat{ϕ}}_{λ}^{+} (ω) |}^{2} d ω = \forall λ \Rightarrow {\int_{0}^{\infty} | {〈 p, {| {\hat{ϕ}}_{λ}^{+} |}^{2} 〉}_{ℝ^{+}} |}^{2} d λ = {\int_{0}^{\infty} | {〈 p, {| {\hat{ϕ}}_{λ}^{+} |}^{2} 〉}_{I} |}^{2} d λ = 0.

Note

\int_{0}^{\infty} {| {〈 p, {| {\hat{ϕ}}_{λ}^{+} |}^{2} 〉}_{I} |}^{2} d λ = \int_{0}^{\infty} {〈 p, {| {\hat{ϕ}}_{λ}^{+} |}^{2} 〉}_{I} {〈 \bar{p}, {| {\hat{ϕ}}_{λ}^{+} |}^{2} 〉}_{I} d λ = \int_{0}^{\infty} (\int_{I} p (ω_{1}) {| {\hat{ϕ}}_{λ}^{+} (ω_{1}) |}^{2} d ω_{1}) (\int_{I} \bar{p (ω_{2})} {| {\hat{ϕ}}_{λ}^{+} (ω_{2}) |}^{2} d ω_{2}) d λ = \int_{I} \bar{p (ω_{2})} (\int_{I} p (ω_{1}) (\int_{0}^{\infty} {| {\hat{ϕ}}_{λ}^{+} (ω_{1}) |}^{2} {| {\hat{ϕ}}_{λ}^{+} (ω_{2}) |}^{2} d λ) d ω_{1}) d ω_{2} .

We now apply the change of variable ω_i = 1/ξ_i, and let g(ξ_i) = p(1/ξ_i). We obtain

0 = \int_{0}^{1 / ϵ} \bar{g (ξ_{2})} (\int_{0}^{1 / ϵ} g (ξ_{1}) (\int_{0}^{\infty} \frac{1}{ξ_{1}^{2} ξ_{2}^{2}} | {\hat{ϕ}}_{λ}^{+} (\frac{1}{ξ_{1}}) |^{2} {| {\hat{ϕ}}_{λ}^{+} (\frac{1}{ξ_{2}}) |}^{2} d λ) d ξ_{1}) d ξ_{2} .

(C.1)

Now consider the kernel

k (ξ_{1}, ξ_{2}) = \int_{0}^{\infty} \frac{1}{ξ_{1}^{2} ξ_{2}^{2}} | {\hat{ϕ}}_{λ}^{+} (\frac{1}{ξ_{1}}) | 2 {| {\hat{ϕ}}_{λ}^{+} (\frac{1}{ξ_{2}}) |}^{2} d λ .

Note that k is a strictly positive definite kernel function if for any finite sequence ${ξ_{i}}_{i = 1}^{n}$ in [0, 1/ϵ], the n by n matrix A defined by

A_{i j} = k (ξ_{i}, ξ_{j})

is strictly positive definite [83]. Viewing ${\tilde{ξ}}_{i} (λ) = ξ_{i}^{- 2} {| {\hat{ϕ}}_{λ}^{+} (1 / ξ_{i}) |}^{2}$ as functions of λ, we see that

A_{i j} = {〈 {\tilde{ξ}}_{i} (λ), {\tilde{ξ}}_{j} (λ) 〉}_{ℝ^{+}}

and A is thus a Gram matrix. Since the ${\tilde{ξ}}_{i} (λ)$ are linearly independent if and only if the ${| {\hat{ψ}}_{λ}^{+} (ω_{i}) |}^{2}$ are linearly independent, and the ${| {\hat{ψ}}_{λ}^{+} (ω_{i}) |}^{2}$ are linearly independent by assumption, we can conclude that A and thus k are strictly positive definite. Now consider the corresponding integral operator on [0, 1/ϵ]:

K g (ξ_{2}) = \int_{0}^{1 / ϵ} g (ξ_{1}) k (ξ_{1}, ξ_{2}) d ξ_{1} .

Since $ψ \in L^{1} (ℝ)$ , ${| {\hat{ψ}}_{λ}^{+} |}^{2}$ and thus ${| {\hat{ϕ}}_{λ}^{+} |}^{2}$ are continuous, and k will thus be continuous as long as it remains bounded. To check boundedness we observe that k(ξ₁, ξ₂)² ⩽ k(ξ₁, ξ₁)k(ξ₂, ξ₂) [21], and

k (ξ, ξ) = \int_{0}^{\infty} \frac{1}{ξ^{4}} {| {\hat{ϕ}}_{λ}^{+} (\frac{1}{ξ}) |}^{4} d λ = \int_{0}^{\infty} \frac{1}{ξ^{4}} \frac{1}{λ^{2 + 2 β}} {| {\hat{ψ}}^{+} (\frac{1}{λ ξ}) |}^{4} d λ = \int_{0}^{\infty} \frac{1}{ξ^{4}} {(ω ξ)}^{2 + 2 β} {| {\hat{ψ}}^{+} (ω) |}^{4} \frac{d ω}{ξ ω^{2}} = ξ^{2 β - 3} \int_{0}^{\infty} ω^{2 β} {| {\hat{ψ}}^{+} (ω) |}^{4} d ω ⩽ 3 ξ^{2 β - 3} \int_{0}^{\infty} ω^{2 β} | \hat{ψ} (ω) |^{4} d ω ⩽ 3 ξ^{2 β - 3} {‖ ω^{β} P ψ ‖}_{2}^{2} .

Since $\hat{ψ}$ has a compact support, clearly ${‖ ω^{β} P ψ ‖}_{2}^{2} < \infty$ , and k is thus bounded on the compact interval [0, 1/ϵ] as long as β ⩾ 3/2. Since k is continuous and [0, 1/ϵ] is compact, K : L²[0, 1/ϵ] → L²[0, 1/ϵ] is a compact, self-adjoint operator and by Mercer’s Theorem K is also strictly positive definite [83]. Since ${〈 K g, g 〉}_{[0, 1 / ϵ]} = 0$ by (C.1), we conclude g = 0 in L²[0, 1/ϵ]. Thus, p(1/ξ) = 0 for almost every ξ ∈ (0, 1/ϵ], which implies p(ω) = 0 for almost every ω ∈ [ϵ, ∞). Since p(ω) = p(−ω) and p = 0 on (0, ϵ), p = 0 for almost every $ω \in ℝ$ . □

Proposition 2.1

The following are sufficient to guarantee Condition 2.3:

$| \hat{ψ} (ω) |^{2}$ has a compact support contained in the interval [a, b], where a and b have the same sign, e.g. complex analytic wavelets with compactly supported Fourier transform.
$| \hat{ψ} (ω) |^{2} \in C^{\infty} (ℝ)$ and there exists an N such that all derivatives of order at least N are non-zero ω = 0, e.g. the Morlet wavelet.

Proof.

Let ${ω_{i}}_{i = 1}^{n}$ be a finite sequence of distinct positive frequencies, and let ${\tilde{ω}}_{i} (λ) = \frac{1}{| λ |} {| {\hat{ψ}}^{+} (\frac{ω_{i}}{λ}) |}^{2}$ denote the corresponding functions of λ.

First assume (i). Without loss of generality we assume that [a, b] is a positive interval and that $| \hat{ψ} (ω) |^{2} > 0$ on (a, a + ϵ) for some ϵ > 0. Clearly, ${| {\hat{ψ}}^{+} (ω) |}^{2} = | \hat{ψ} (ω) |^{2}$ . A simple calculation shows that the support of ${\tilde{ω}}_{i} (λ)$ is contained in the interval $[\frac{ω_{i}}{b}, \frac{ω_{i}}{a}]$ , and ${\tilde{ω}}_{i} (λ) > 0$ in a neighborhood of $\frac{ω_{i}}{a}$ . Assume we have ordered the ω_i so that ω₁ > . . . > ω_n > 0. Now suppose

c_{1} {\tilde{ω}}_{1} (λ) + \dots + c_{n} {\tilde{ω}}_{n} (λ) = 0.

Note ${\tilde{ω}}_{1} (λ)$ is the only function in the above collection with support in a neighborhood of $\frac{ω_{1}}{a}$ ; thus, we must have c₁ = 0, so that

c_{2} {\tilde{ω}}_{2} (λ) + \dots + c_{n} {\tilde{ω}}_{n} (λ) = 0.

But now ${\tilde{ω}}_{2} (λ)$ is the only function in the above collection with support in a neighborhood of $\frac{ω_{2}}{a}$ , so we must have c₂ = 0, and proceeding iteratively we conclude that c₁ = . . . = c_n = 0. Thus, ${{\tilde{ω}}_{i} (λ)}_{i = 1}^{n}$ is a linearly independent set, and Condition 2.3 holds.

Now assume (ii). Since ${\frac{d^{n}}{d ω^{n}} ({| {\hat{ψ}}^{+} (ω) |}^{2}) |}_{ω = 0} = {2 \frac{d^{n}}{d ω^{n}} (| \hat{ψ} (ω) |^{2}) |}_{ω = 0}$ , ${| {\hat{ψ}}^{+} (ω) |}^{2}$ is $C^{\infty} (ℝ)$ and all derivatives of order at least N are non-zero at ω = 0. Note ${{\tilde{ω}}_{i} (λ)}_{i = 1}^{n} = {| λ |^{- 1} {| {\hat{ψ}}^{+} (ω_{i} / λ) |}^{2}}_{i = 1}^{n}$ are linearly independent if and only if ${{| {\hat{ψ}}^{+} (ω_{i} / λ) |}^{2}}_{i = 1}^{n}$ are linearly independent. Defining $\tilde{λ} = 1 / λ$ , this holds if and only if ${{| {\hat{ψ}}^{+} (ω_{i} \tilde{λ}) |}^{2}}_{i = 1}^{n} = {g (ω_{i} \tilde{λ})}_{i = 1}^{n}$ are linearly independent as functions of $\tilde{λ}$ , where we define $g (ω) = {| {\hat{ψ}}^{+} (ω) |}^{2}$ . Assume

c_{1} g (ω_{1} \tilde{λ}) + c_{2} g (ω_{2} \tilde{λ}) + \dots + c_{n} g (ω_{n} \tilde{λ}) = 0.

Differentiating m times for N ⩽ m ⩽ N + n − 1, we obtain

c_{1} ω_{1}^{N} g^{(N)} (ω_{1} \tilde{λ}) + \dots + c_{n} ω_{n}^{N} g^{(N)} (ω_{n} \tilde{λ}) = 0 ⋮ c_{1} ω_{1}^{N + n - 1} g^{(N + n - 1)} (ω_{1} \tilde{λ}) + \dots + c_{n} ω_{n}^{N + n - 1} g^{(N + n - 1)} (ω_{n} \tilde{λ}) = 0.

The above holds for all $\tilde{λ}$ . We now take the limit as $\tilde{λ} \to 0$ to obtain

g^{(N)} (0) (ω_{1}^{N} c_{1} + ω_{2}^{N} c_{2} + \dots ω_{n}^{N} c_{n}) = 0 g^{(N + 1)} (0) (ω_{1}^{N + 1} c_{1} + ω_{2}^{N + 1} c_{2} + \dots ω_{n}^{N + 1} c_{n}) = 0 ⋮ g^{(N + n - 1)} (0) (ω_{1}^{N + n - 1} c_{1} + ω_{2}^{N + n - 1} c_{2} + \dots ω_{n}^{N + n - 1} c_{n}) = 0.

Since g(m) (0) ≠ 0, we obtain

\begin{array}{l} [\begin{matrix} ω_{1}^{N} & \dots & ω_{n}^{N} \\ ω_{1}^{N + 1} & \dots & ω_{n}^{N + 1} \\ ⋮ & ⋮ \\ ω_{1}^{N + n - 1} & \dots & ω_{n}^{N + n - 1} \end{matrix}] [\begin{matrix} c_{1} \\ c_{2} \\ ⋮ \\ c_{n} \end{matrix}] = [\begin{matrix} 0 \\ 0 \\ ⋮ \\ 0 \end{matrix}] \\ \underset{: = A}{\underset{︸}{[\begin{matrix} 1 & \dots & 1 \\ ω_{1} & \dots & ω_{n} \\ ⋮ & ⋮ \\ ω_{1}^{(n - 1)} & \dots & ω_{n}^{(n - 1)} \end{matrix}]}} \underset{: = B}{\underset{︸}{[\begin{matrix} ω_{1}^{N} & 0 & \dots & 0 \\ 0 & ω_{2}^{N} & \dots & 0 \\ ⋮ & ⋮ \\ 0 & 0 & \dots & ω_{n}^{N} \end{matrix}]}} [\begin{matrix} c_{1} \\ c_{2} \\ ⋮ \\ c_{n} \end{matrix}] = [\begin{matrix} 0 \\ 0 \\ ⋮ \\ 0 \end{matrix}] . \end{array}

Since A is a Vandermonde matrix constructed from distinct ω_i, det(A) ≠ 0. Since the ω_i are non-zero, det(B) ≠ 0. Thus, det(AB) = det(A) det(B) ≠ 0. We conclude AB is invertible and so all c_i = 0, which gives Condition 2.3. □

Lemma C.1

Suppose we construct a Morlet wavelet with parameter ξ, that is $ψ (x) = C_{ξ} π^{- 1 / 4} e^{- x^{2} / 2} (e^{i ξ x} - e^{- ξ^{2} / 2})$ for $C_{ξ} = {(1 - e^{- ξ^{2}} - 2 e^{- 3 ξ^{2} / 4})}^{- 1 / 2}$ . Then, for almost all $ξ \in ℝ^{+}$ , the wavelet satisfies Condition 2.3.

Proof.

The Fourier transform $\hat{ψ}$ has form

\hat{ψ} (ω) = {\tilde{C}}_{ξ} e^{- ω^{2} / 2} (e^{ξ ω} - 1)

for some constant ${\tilde{C}}_{ξ}$ depending on ξ, so that

g (ω) : = {\tilde{C}}_{ξ}^{- 2} | \hat{ψ} (ω) |^{2} = e^{- ω^{2}} {(e^{ξ ω} - 1)}^{2} .

From direct calculation or a computer algebra system, one obtains

g^{(n)} (0) = {\begin{array}{l} H_{n} (ξ) - 2 H_{n} (ξ / 2) & n odd \\ H_{n} (ξ) - 2 H_{n} (ξ / 2) + \frac{{(- 1)}^{\frac{n}{2}} n!}{(\frac{n}{2})!} & n even \end{array}

where H_n(ξ) is the n^th degree physicist’s Hermite polynomial. We have g′(0) = 0, but for n > 1, g⁽ⁿ⁾(0) = 0 only when ξ is a root of the above polynomial. Since the set of roots of the polynomials ${g^{(n)} (0)}_{n = 1}^{\infty}$ is countable, if ξ is selected at random from $ℝ$ , it is not a root of any of these polynomials with probability 1, and g⁽ⁿ⁾(0) ≠ 0 for all n. Thus, the wavelet satisfies criterion (ii) of Proposition 2.1, and thus the linear independence Condition 2.3. □

D. Supporting results: classic MRA

This appendix contains supporting results for Section 3. The first two lemmas (Lemmas D.1 and Lemma D.2) establish additive noise bounds for the power spectrum and are needed to prove Proposition 3.1. The next two lemmas (Lemmas D.3 and Lemma D.4) establish additive noise bounds for wavelet invariants and are needed to prove Proposition 3.2.

Lemma D.1

Let ε(x) be a white noise processes on $[- \frac{1}{2}, \frac{1}{2}]$ with variance σ². Then, for all frequencies ω, ξ,

E [| \hat{ε} (ω) |^{2}] = σ^{2}

(D.1)

E [| \hat{ε} (ω) |^{4}] ⩽ 3 σ^{4}

(D.2)

E [| \hat{ε} (ω) |^{2} | \hat{ε} (ξ) |^{2}] ⩽ 3 σ^{4} .

(D.3)

Proof.

By Proposition J.1,

E [| \hat{ε} (ω) |^{2}] = E [\hat{ε} (ω) \hat{\hat{ε} (ω)}] = E [(\int_{- 1 / 2}^{1 / 2} e^{- i ω x} d B_{x}) (\int_{- 1 / 2}^{1 / 2} e^{i ω x} d B_{x})] = σ^{2} \int_{- 1 / 2}^{1 / 2} d x = σ^{2},

which shows (D.1). By Proposition J.2,

E [| \hat{ε} (ω) |^{4}] = E [\hat{ε} {(ω)}^{2} {(\bar{\hat{ε}} (ω))}^{2}] = E [{(\int_{- 1 / 2}^{1 / 2} e^{- i ω x} d B_{x})}^{2} {(\int_{- 1 / 2}^{1 / 2} e^{i ω x} d B_{x})}^{2}] = 2 σ^{4} {(\int_{- 1 / 2}^{1 / 2} d x)}^{2} + σ^{4} (\int_{- 1 / 2}^{1 / 2} e^{- 2 i ω x} d x) (\int_{- 1 / 2}^{1 / 2} e^{2 i ω x} d x) ⩽ 2 σ^{4} + σ^{4} (\int_{- 1 / 2}^{1 / 2} | e^{- 2 i ω x} | d x) (\int_{- 1 / 2}^{1 / 2} | e^{2 i ω x} | d x) = 3 σ^{4},

which shows (D.2). Finally, by Proposition J.3, we have

E [| \hat{ε} (ω) |^{2} | \hat{ε} (ξ) |^{2}] = E [(\int_{- 1 / 2}^{1 / 2} e^{- i ω x} d B_{x}) (\int_{- 1 / 2}^{1 / 2} e^{i ω x} d B_{x}) (\int_{- 1 / 2}^{1 / 2} e^{- i ξ x} d B_{x}) (\int_{- 1 / 2}^{1 / 2} e^{i ξ x} d B_{x})] = σ^{4} [(\int_{- 1 / 2}^{1 / 2} e^{- i (ω + ξ) x} d x) (\int_{- 1 / 2}^{1 / 2} e^{i (ω + ξ) x} d x)] + σ^{4} [(\int_{- 1 / 2}^{1 / 2} e^{i (ξ - ω) x} d x) (\int_{- 1 / 2}^{1 / 2} e^{i (ω - ξ) x} d x) + (\int_{- 1 / 2}^{1 / 2} d x) (\int_{- 1 / 2}^{1 / 2} d x)] ⩽ σ^{4} [3 (\int_{- 1 / 2}^{1 / 2} d x) (\int_{- 1 / 2}^{1 / 2} d x)] = 3 σ^{4},

which gives (D.3). □

Lemma D.2

Let ε(x) be a white noise processes on $[- \frac{1}{2}, \frac{1}{2}]$ with variance σ². Then, for any signal $f \in L^{1} (ℝ)$ ,

E [(P (f + ε)) (ω)] = (P f) (ω) + σ^{2}

Var [(P (f + ε)) (ω)] ⩽ 4 σ^{2} (P f) (ω) + 2 σ^{4} .

Proof.

Since $E [\hat{ε} (ω)] = E [\bar{\hat{ε} (ω)}] = 0$ and $E [| \hat{ε} (ω) |^{2}] = σ^{2}$ by Lemma D.1,

E [(P (f + ε)) (ω)] = E [(\hat{f} (ω) + \hat{ε} (ω)) (\bar{\hat{f} (ω)} + \bar{\hat{ε} (ω)})] = E [| \hat{f} (ω) |^{2} + \hat{f} (ω) \bar{\hat{ε} (ω)} + \hat{ε} (ω) \bar{\hat{f} (ω)} + | \hat{ε} (ω) |^{2}] = (P f) (ω) + σ^{2} .

We now control Var[(P(f + ε))(ω)]. Note that:

{[(P (f + ε)) (ω)]}^{2} = {(| \hat{f} (ω) |^{2} + \hat{f} (ω) \bar{\hat{ε} (ω)} + \hat{ε} (ω) \bar{\hat{f} (ω)} + | \hat{ε} (ω) |^{2})}^{2}

and that

E [| \hat{ε} (ω) |^{2} \hat{ε} (ω)] = E [(\int_{- 1 / 2}^{1 / 2} e^{- i ω x} d B_{x}) (\int_{- 1 / 2}^{1 / 2} e^{i ω s} d B_{s}) (\int_{- 1 / 2}^{1 / 2} e^{- i ω p} d B_{p})] = 0,

since even when x = s = p, $E [{(Δ B_{x})}^{3}] = 0$ . Ignoring the terms with zero expectation, we thus get

E [(P (f + ε)) {(ω)}^{2}] = E (| \hat{f} (ω) |^{4} + 4 | \hat{f} (ω) |^{2} | \hat{ε} (ω) |^{2} + | \hat{ε} (ω) |^{4} + \hat{f} {(ω)}^{2} \hat{\hat{ε} {(ω)}^{2}} + \hat{ε} {(ω)}^{2} {\hat{\hat{f} (ω)}}^{2}) ⩽ E (| \hat{f} (ω) |^{4} + 6 | \hat{f} (ω) |^{2} | \hat{ε} (ω) |^{2} + | \hat{ε} (ω) |^{4}) = {[(P f) (ω)]}^{2} + 6 σ^{2} (P f) (ω) + 3 σ^{4}

where the last line follows from Lemma D.1. Thus,

Var [(P (f + ε)) (ω)] = E [(P (f + ε)) {(ω)}^{2}] - {(E [(P (f + ε)) (ω)])}^{2} ⩽ {[(P f) (ω)]}^{2} + 6 σ^{2} (P f) (ω) + 3 σ^{4} - {((P f) (ω) + σ^{2})}^{2} = 4 σ^{2} (P f) (ω) + 2 σ^{4} .

□

Proposition 3.1

Assume Model 1. Define the following estimator of (Pf)(ω):

(\tilde{P f}) (ω) : = \frac{1}{M} \sum_{j = 1}^{M} (P y_{j}) (ω) - σ^{2} .

Then, with probability at least 1 − 1/t²,

| (P f) (ω) - (\tilde{P f}) (ω) | ⩽ \frac{2 t σ}{\sqrt{M}} (‖ f ‖_{1} + σ) .

(3.1)

Proof.

Let $f^{t_{j}} (x) = f (x - t_{j})$ so that $y_{j} = f^{t_{j}} + ε_{j}$ . We first note since $\hat{f^{t} j} (ω) = e^{- i ω t_{j}} \hat{f} (ω)$ , the power spectrum is translation invariant, that is $(P f^{t_{j}}) (ω) = (P f) (ω)$ for all ω, t_j. Thus, by Lemma D.2,

E [(P y_{j}) (ω)] = E [(P (f^{t_{j}} + ε_{j})) (ω)] = (P f^{t_{j}}) (ω) + σ^{2} = (P f) (ω) + σ^{2}

and

Var [(P y_{j}) (ω)] = Var [(P (f^{t_{j}} + ε_{j})) (ω)] ⩽ 4 σ^{2} (P f^{t_{j}}) (ω) + 2 σ^{4} = 4 σ^{2} (P f) (ω) + 2 σ^{4} .

Since the y_j are independent,

Var (\frac{1}{M} \sum_{j = 1}^{M} (P y_{j}) (ω)) ⩽ \frac{1}{M} (4 σ^{2} (P f) (ω) + 2 σ^{4}) .

Applying Chebyshev’s inequality to the random variable $X = \frac{1}{M} \sum_{j = 1}^{M} (P y_{j}) (ω)$ , we obtain

ℙ (| \frac{1}{M} \sum_{j = 1}^{M} (P y_{j}) (ω) - ((P f) (ω) + σ^{2}) | ⩾ \frac{t (2 σ \sqrt{(P f) (ω)} + \sqrt{2} σ^{2})}{\sqrt{M}}) ⩽ \frac{1}{t^{2}} .

Observing that $\sqrt{(P f) (ω)} = | \hat{f} (ω) | ⩽ ‖ f ‖_{1}$ gives (3.1). □

Lemma D.3

Let ε(x) be a white noise processes on $[- \frac{1}{2}, \frac{1}{2}]$ with variance σ². Then,

E [(S ε) (λ)] = σ^{2}

E [(S ε) {(λ)}^{2}] ⩽ 3 σ^{4} .

Proof.

Since $E [| \hat{ε} (ω) |^{2}] = σ^{2}$ by Lemma D.1, we have

E [(S ε) (λ)] = E [{‖ ε * ψ_{λ} ‖}_{2}^{2}] = E [\frac{1}{2 π} {‖ \hat{ε} \cdot {\hat{ψ}}_{λ} ‖}_{2}^{2}] = E [\frac{1}{2 π} \int | \hat{ε} (ω) |^{2} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω] = \frac{σ^{2}}{2 π} \int {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω = σ^{2} {‖ ψ_{λ} ‖}_{2}^{2} = σ^{2} .

Since by Lemma D.1, $E [| \hat{ε} (ω) |^{2} | \hat{ε} (ξ) |^{2}] ⩽ 3 σ^{4}$ , we also have

E [(S ε) {(λ)}^{2}] = E [{‖ ε * ψ_{λ} ‖}_{2}^{4}] = E [\frac{1}{{(2 π)}^{2}} {‖ \hat{ε} \cdot {\hat{ψ}}_{λ} ‖}_{2}^{2} {‖ \hat{ε} \cdot {\hat{ψ}}_{λ} ‖}_{2}^{2}] = E [\frac{1}{{(2 π)}^{2}} \iint | \hat{ε} (ω) |^{2} | \hat{ε} (ξ) |^{2} {| {\hat{ψ}}_{λ} (ω) |}^{2} {| {\hat{ψ}}_{λ} (ξ) |}^{2} d ω d ξ] ⩽ \frac{3 σ^{4}}{{(2 π)}^{2}} \iint {| {\hat{ψ}}_{λ} (ω) |}^{2} {| {\hat{ψ}}_{λ} (ξ) |}^{2} d ω d ξ = 3 σ^{4} {({‖ ψ_{λ} ‖}_{2}^{2})}^{2} = 3 σ^{4} .

□

Lemma D.4

Let ε(x) be a white noise processes on $[- \frac{1}{2}, \frac{1}{2}]$ with variance σ². Then, for any signal $f \in L^{1} (ℝ)$

E [(S (f + ε)) (λ)] = (S f) (λ) + σ^{2}

Var [(S (f + ε)) (λ)] ⩽ 4 σ^{2} (S f) (λ) + 2 σ^{4} .

Proof.

Utilizing $E [ε] = E [\bar{ε}] = 0$ and Lemma D.3, we have

E [(S (f + ε)) (λ)] = E [\int {| (f + ε) * ψ_{λ} (u) |}^{2} d u] = \int {| f * ψ_{λ} (u) |}^{2} + E [\int {| ε * ψ_{λ} (u) |}^{2} d u] = (S f) (λ) + E [(S ε) (λ)] = (S f) (λ) + σ^{2} .

To bound $E [(S (f + ε)) {(λ)}^{2}]$ , note that

{[(S (f + ε)) (λ)]}^{2} = {(\int {| f * ψ_{λ} (u_{1}) |}^{2} + (ε * ψ_{λ} (u_{1})) (\bar{f} * \bar{ψ_{λ} (u_{1})}) + (f * ψ_{λ} (u_{1})) (\bar{ε} * \bar{ψ_{λ} (u_{1})}) + ∣ ε * ψ_{λ} (u_{1})) |}^{2} d u_{1}) \cdot {(\int {| f * ψ_{λ} (u_{2}) |}^{2} + (ε * ψ_{λ} (u_{2})) (\bar{f} * \bar{ψ_{λ} (u_{2})}) + (f * ψ_{λ} (u_{2})) (\bar{ε} * \bar{ψ_{λ} (u_{2})}) + ∣ ε * ψ_{λ} (u_{2})) |}^{2} d u_{2}) .

When we take expectation, any term involving one or three ε terms disappears, so that

E [(S (f + ε)) {(λ)}^{2}] = E [\iint {| f * ψ_{λ} (u_{1}) |}^{2} {| f * ψ_{λ} (u_{2}) |}^{2} d u_{1} d u_{2} {+ \iint {| f * ψ_{λ} (u_{1}) |}^{2} ∣ ε * ψ_{λ} (u_{2})) |}^{2} d u_{1} d u_{2} + \iint (ε * ψ_{λ} (u_{1})) (\bar{f} * \bar{ψ_{λ} (u_{1})}) (ε * ψ_{λ} (u_{2})) (\bar{f} * \bar{ψ_{λ} (u_{2})}) d u_{1} d u_{2} + \iint (ε * ψ_{λ} (u_{1})) (\bar{f} * \bar{ψ_{λ} (u_{1})}) (f * ψ_{λ} (u_{2})) (\bar{ε} * \bar{ψ_{λ} (u_{2})}) d u_{1} d u_{2} + \iint (f * ψ_{λ} (u_{1})) (\bar{ε} * \bar{ψ_{λ} (u_{1})}) (ε * ψ_{λ} (u_{2})) (\bar{f} * \bar{ψ_{λ} (u_{2})}) d u_{1} d u_{2} + \iint (f * ψ_{λ} (u_{1})) (\bar{ε} * \bar{ψ_{λ} (u_{1})}) (f * ψ_{λ} (u_{2})) (\bar{ε} * \bar{ψ_{λ} (u_{2})}) d u_{1} d u_{2} {+ \iint ∣ ε * ψ_{λ} (u_{1})) |}^{2} {| f * ψ_{λ} (u_{2}) |}^{2} d u_{1} d u_{2} {+ \iint ∣ ε * ψ_{λ} (u_{1})) |}^{2} ∣ ε * ψ_{λ} (u_{2})) |^{2} d u_{1} d u_{2}] ⩽ E [\iint {| f * ψ_{λ} (u_{1}) |}^{2} {| f * ψ_{λ} (u_{2}) |}^{2} d u_{1} d u_{2} {+ 6 \iint {| f * ψ_{λ} (u_{1}) |}^{2} ∣ ε * ψ_{λ} (u_{2})) |}^{2} d u_{1} d u_{2} {+ \iint ∣ ε * ψ_{λ} (u_{1})) |}^{2} ∣ ε * ψ_{λ} (u_{2})) |^{2} d u_{1} d u_{2}] = E [{[(S f) (λ)]}^{2} + 6 (S f) (λ) (S ε) (λ) + {[(S ε) (λ)]}^{2}] = [(S f) {(λ)}^{2}] + 6 σ^{2} (S f) (λ) + 3 σ^{4},

where the last line follows from Lemma D.3. Thus,

Var [(S (f + ε)) (λ)] = E [(S (f + ε)) {(λ)}^{2}] - {(E [(S (f + ε)) (λ)])}^{2} ⩽ {[(S f) (λ)]}^{2} + 6 σ^{2} (S f) (λ) + 3 σ^{4} - {[(S f) (λ) + σ^{2}]}^{2} = 4 σ^{2} (S f) (λ) + 2 σ^{4} .

□

Proposition 3.2

Assume Model 1. Define the following estimator of (Sf)(λ):

(\tilde{S f}) (λ) : = \frac{1}{M} \sum_{j = 1}^{M} (S y_{j}) (λ) - σ^{2} .

Then, with probability at least 1 − 1/t²,

| (S f) (λ) - (\tilde{S f}) (λ) | ⩽ \frac{2 t σ}{\sqrt{M}} (‖ f ‖_{1} + σ) .

(D.4)

Proof.

Let $f^{t_{j}} (x) = f (x - t_{j})$ so that $y_{j} = f^{t_{j}} + ε_{j}$ . We first note that the wavelet invariants are translation invariant, that is $S f^{t_{j}} = S f$ for all t_j. We now compute the mean and variance of the coefficients (Sy_j)(λ). By Lemma D.4,

E [(S y_{j}) (λ)] = E [(S (f^{t_{j}} + ε_{j})) (λ)] = (S f^{t_{j}}) (λ) + σ^{2} = (S f) (λ) + σ^{2}

and

Var [(S y_{j}) (λ)] = Var [(S (f^{t_{j}} + ε_{j})) (λ)] ⩽ 4 σ^{2} (S f^{t_{j}}) (λ) + 2 σ^{4} = 4 σ^{2} (S f) (λ) + 2 σ^{4} .

Since the y_j are independent,

Var [\frac{1}{M} \sum_{j = 1}^{M} (S y_{j}) (λ)] ⩽ \frac{1}{M} [4 σ^{2} (S f) (λ) + 2 σ^{4}] .

Applying Chebyshev’s inequality to the random variable $X = \frac{1}{M} \sum_{j = 1}^{M} (S y_{j}) (λ)$ gives

ℙ (| \frac{1}{M} \sum_{j = 1}^{M} (S y_{j}) (λ) - [(S f) (λ) + σ^{2}] | ⩾ \frac{t (2 σ \sqrt{(S f) (λ)} + \sqrt{2} σ^{2})}{\sqrt{M}}) ⩽ \frac{1}{t^{2}} .

By Young’s convolution inequality, $(S f) (λ) = {‖ f * ψ_{λ} ‖}_{2}^{2} ⩽ ‖ f ‖_{1}^{2} {‖ ψ_{λ} ‖}_{2}^{2} = ‖ f ‖_{1}^{2}$ , which gives (D.4). □

E. Supporting results: dilation MRA

This appendix contains the technical details of the dilation unbiasing procedure that is central to Propositions 4.1, 4.2 and 5.1. Lemma 4.1 bounds the bias and variance of the estimator, and Lemma 4.2 bounds the error of the estimator given M independent samples.

Lemma 4.1

Let Fλ(τ) = L((1 − τ)λ) for some function L ∈ C^k+2(0, ∞) and a random variable τ satisfying the assumptions of Section 2.1, and let k ⩾ 2 be an even integer. Assume there exist functions $Λ_{i} : ℝ \to ℝ$ , $R : ℝ \to ℝ$ such that

| λ^{i} L^{(i)} (λ) | ⩽ Λ_{i} (λ) for 0 ⩽ i ⩽ k + 2, \frac{Λ_{k + 2} ((1 - τ) λ)}{Λ_{k + 2} (λ)} ⩽ R (λ),

and define the following estimator of L(λ):

G_{λ} (τ) : = F_{λ} (τ) - B_{2} η^{2} F_{λ}^{''} (τ) - B_{4} η^{4} F_{λ}^{(4)} (τ) - \dots - B_{k} η^{k} F_{λ}^{(k)} (τ) .

Then G_λ(τ) satisfies

| E G_{λ} (τ) - L (λ) | ≲ k R (λ) Λ_{k + 2} (λ) {(2 E η)}^{k + 2}

Var G_{λ} (τ) ≲ k^{2} R {(λ)}^{2} Λ {(λ)}^{2}

where

Λ {(λ)}^{2} : = \sum_{0 ⩽ i, j ⩽ k + 2, i + j ⩾ 2} Λ_{i} (λ) Λ_{j} (λ) {(2 E η)}^{i + j}

and E is the absolute constant defined in (4.3).

Proof.

We Taylor expand F_λ(τ) about τ = 0

F_{λ} (τ) = F_{λ} (0) + F_{λ}^{'} (0) τ + \frac{F_{λ}^{''} (0)}{2} τ^{2} + \dots + \frac{F_{λ}^{(k + 1)} (0)}{(k + 1)!} τ^{k + 1} + \underset{: = R_{0} (τ, λ)}{\underset{︸}{\int_{0}^{τ} \frac{F_{λ}^{(k + 2)} (t)}{(k + 1)!} {(τ - t)}^{k + 1} d t}} .

We note

E [F_{λ} (τ)] = F_{λ} (0) + \frac{F_{λ}^{''} (0)}{2} η^{2} + \dots + \frac{F_{λ}^{k} (0)}{k!} C_{k} η^{k} + E [R_{0} (τ, λ)],

which motivates an unbiasing with the first k/2 even derivatives, and thus a Taylor expansion of these derivatives

F_{λ} (τ) = F_{λ} (0) + F_{λ}^{'} (0) τ + \dots + \frac{F_{λ}^{(k + 1)} (0)}{(k + 1)!} τ^{k + 1} + \underset{: = R_{0} (τ, λ)}{\underset{︸}{\int_{0}^{τ} \frac{F_{λ}^{(k + 2)} (t)}{(k + 1)!} {(τ - t)}^{k + 1} d t}}

F_{λ}^{''} (τ) = F_{λ}^{''} (0) + F_{λ}^{(3)} (0) τ + \dots + \frac{F_{λ}^{(k + 1)} (0)}{(k - 1)!} τ^{k - 1} + \underset{: = R_{2} (τ, λ)}{\underset{︸}{\int_{0}^{τ} \frac{F_{λ}^{(k + 2)} (t)}{(k - 1)!} {(τ - t)}^{k - 1} d t}}

F_{λ}^{(4)} (τ) = F_{λ}^{(4)} (0) + F_{λ}^{(5)} (0) τ + \dots + \frac{F_{λ}^{(k + 1)} (0)}{(k - 3)!} τ^{k - 3} + \underset{: = R_{4} (τ, λ)}{\underset{︸}{\int_{0}^{τ} \frac{F_{λ}^{(k + 2)} (t)}{(k - 3)!} {(τ - t)}^{k - 3} d t}} ⋮

F_{λ}^{(k)} (τ) = F_{λ}^{(k)} (0) + F_{λ}^{(k + 1)} (0) τ + \underset{: = R_{k} (τ, λ)}{\underset{︸}{\int_{0}^{τ} F_{λ}^{(k + 2)} (t) (τ - t) d t}} .

Multiplication of the i^th even derivative by B_iηⁱ gives

F_{λ} (τ) = F_{λ} (0) + F_{λ}^{'} (0) τ + \dots + \frac{F_{λ}^{(k + 1)} (0)}{(k + 1)!} τ^{k + 1} + R_{0} (τ, λ)

B_{2} η^{2} F_{λ}^{''} (τ) = B_{2} η^{2} F_{λ}^{''} (0) + B_{2} η^{2} F_{λ}^{(3)} (0) τ + \dots + B_{2} η^{2} \frac{F_{λ}^{(k + 1)} (0)}{(k - 1)!} τ^{k - 1} + B_{2} η^{2} R_{2} (τ, λ)

B_{4} η^{4} F_{λ}^{(4)} (τ) = B_{4} η^{4} F_{λ}^{(4)} (0) + B_{4} η^{4} F_{λ}^{(5)} (0) τ + \dots + B_{4} η^{4} \frac{F_{λ}^{(k + 1)} (0)}{(k - 3)!} τ^{k - 3} + B_{4} η^{4} R_{4} (τ, λ) ⋮

B_{k} η^{k} F_{λ}^{(k)} (τ) = B_{k} η^{k} F_{λ}^{(k)} (0) + B_{k} η^{k} F_{λ}^{(k + 1)} (0) τ + B_{k} η^{k} R_{k} (τ, λ) .

We want an estimator that targets F_λ(0) = L(λ). We thus consider the following variable as an estimator:

G_{λ} (τ) : = F_{λ} (τ) - B_{2} η^{2} F_{λ}^{''} (τ) - B_{4} η^{4} F_{λ}^{(4)} (τ) - \dots - B_{k} η^{k} F_{λ}^{(k)} (τ)

and show that $E [G_{λ} (τ)] = F_{λ} (0) + O (η^{k + 2})$ for constants B_i chosen according to (4.1). We have

E [F_{λ} (τ)] = F_{λ} (0) + F_{λ}^{''} (0) \frac{C_{2}}{2} η^{2} + \dots + F_{λ}^{(k)} (0) \frac{C_{k}}{k!} η^{k} + E [R_{0} (τ, λ)] E [B_{2} η^{2} F_{λ}^{''} (τ)] = F_{λ}^{''} (0) B_{2} η^{2} + F_{λ}^{(4)} (0) \frac{B_{2} C_{2}}{2} η^{4} + \dots + F_{λ}^{(k)} (0) \frac{B_{2} C_{k - 2}}{(k - 2)!} η^{k} + E [B_{2} η^{2} R_{2} (τ, λ)] E [B_{4} η^{4} F_{λ}^{(4)} (τ)] = F_{λ}^{(4)} (0) B_{4} η^{4} + F_{λ}^{(6)} (0) \frac{B_{4} C_{2}}{2} η^{6} + \dots + F_{λ}^{(k)} (0) \frac{B_{4} C_{k - 4}}{(k - 4)!} η^{k} + E [B_{4} η^{4} R_{4} (τ, λ)] ⋮ E [B_{k - 2} η^{k - 2} F_{λ}^{(k - 2)} (τ)] = F_{λ}^{(k - 2)} (0) B_{k - 2} η^{k - 2} + F_{λ}^{(k)} (0) \frac{B_{k - 2} C_{2}}{2} η^{k} + E [B_{k - 2} η^{k - 2} R_{k - 2} (τ, λ)] E [B_{k} η^{k} F_{λ}^{(k)} (τ)] = F_{λ}^{(k)} (0) B_{k} η^{k} + E [B_{k} η^{k} R_{k} (τ, λ)] .

That is,

E [G_{λ} (τ)] = F_{λ} (0) + F_{λ}^{''} (0) (\frac{C_{2}}{2!} - B_{2}) η^{2} + F_{λ}^{(4)} (0) (\frac{C_{4}}{4!} - \frac{B_{2} C_{2}}{2!} - B_{4}) η^{4} + F_{λ}^{(6)} (0) (\frac{C_{6}}{6!} - \frac{B_{2} C_{4}}{4!} - \frac{B_{4} C_{2}}{2!} - B_{6}) η^{6} \dots + F_{λ}^{(k)} (0) (\frac{C_{k}}{k!} - \frac{B_{2} C_{k - 2}}{(k - 2)!} - \dots - \frac{B_{k - 2} C_{2}}{2!} - B_{k}) η^{k} + H_{1} (λ)

where

H_{1} (λ) = E [R_{0} (λ, τ) - B_{2} η^{2} R_{2} (τ, λ) - \dots - B_{k} η^{k} R_{k} (λ, τ)] .

Since (4.1) guarantees that

B_{2} = \frac{C_{2}}{2!} B_{4} = \frac{C_{4}}{4!} - {(\frac{C_{2}}{2!})}^{2} B_{6} = \frac{C_{6}}{6!} - \frac{C_{2} C_{4}}{2! 4!} - (\frac{C_{4}}{4!} - {(\frac{C_{2}}{2!})}^{2}) \frac{C_{2}}{2!} ⋮ B_{k} = \frac{C_{k}}{k!} - \frac{B_{2} C_{k - 2}}{(k - 2)!} - \dots - \frac{B_{k - 2} C_{2}}{2!},

the coefficients of η², η⁴, . . . , η^k vanish, and we obtain

E [G_{λ} (τ)] = F_{λ} (0) + H_{1} (λ) .

First we bound the bias H₁(λ). In the remainder of the proof we let B₀ = −1 to simplify notation, so that

H_{1} (λ) = \sum_{i = 0, 2, \dots, k} - B_{i} R_{i} (λ, τ) η^{i} .

We first obtain a bound for |B_iR_i(λ, τ)ηⁱ|. Note

(k + 1 - i)! η^{i} R_{i} (λ, τ) = η^{i} \int_{0}^{τ} F_{λ}^{(k + 2)} (t) {(τ - t)}^{k + 1 - i} d t = η^{i} \int_{0}^{τ} λ^{k + 2} L^{(k + 2)} ((1 - t) λ) {(τ - t)}^{k + 1 - i} d t .

We observe that

| {((1 - t) λ)}^{k + 2} L^{(k + 2)} ((1 - t) λ) | ⩽ Λ_{k + 2} ((1 - t) λ) | λ^{k + 2} L^{(k + 2)} ((1 - t) λ) | ⩽ \frac{1}{{(1 - t)}^{k + 2}} \frac{Λ_{k + 2} ((1 - t) λ)}{Λ_{k + 2} (λ)} Λ_{k + 2} (λ) | λ^{k + 2} L^{(k + 2)} ((1 - t) λ) | ⩽ \frac{R (λ) Λ_{k + 2} (λ)}{{(1 - t)}^{k + 2}}

so that

- \frac{R (λ) Λ_{k + 2} (λ)}{{(1 - t)}^{k + 2}} ⩽ λ^{k + 2} L^{(k + 2)} ((1 - t) λ) ⩽ \frac{R (λ) Λ_{k + 2} (λ)}{{(1 - t)}^{k + 2}} .

Now assume first of all that τ is positive. We have

| (k + 1 - i)! η^{i} R_{i} (λ, τ) | ⩽ η^{i} R (λ) Λ_{k + 2} (λ) \int_{0}^{τ} \frac{{(τ - t)}^{k + 1 - i}}{{(1 - t)}^{k + 2}} d t ⩽ η^{i} R (λ) Λ_{k + 2} (λ) \int_{0}^{τ} \frac{τ^{k + 1 - i}}{{(1 - t)}^{k + 2}} d t = η^{i} τ^{k + 1 - i} R (λ) Λ_{k + 2} (λ) \frac{1}{(k + 1)} (\frac{1}{{(1 - τ)}^{k + 1}} - 1) ⩽ \frac{2^{k + 2} R (λ)}{k + 1} η^{i} τ^{k + 2 - i} Λ_{k + 2} (λ)

where the last line follows since $\frac{1}{{(1 - τ)}^{k + 1}} ⩽ 2 \cdot 2^{k + 1} τ$ for $τ \in [0, \frac{1}{2}]$ . A similar argument can be applied when τ is negative, and we can conclude

| B_{i} η^{i} R_{i} (λ, τ) | ⩽ \frac{2^{k + 2} R (λ)}{(k + 1) (k + 1 - i)!} Λ_{k + 2} (λ) | B_{i} | η^{i} | τ |^{k + 2 - i},

(E.1)

which gives

E | B_{i} η^{i} R_{i} (λ, τ) | ⩽ \frac{2^{k + 2} R (λ)}{(k + 1) (k + 1 - i)!} Λ_{k + 2} (λ) T^{k + 2 - i} | B_{i} | η^{k + 2} = \frac{2^{k + 2} (k + 2 - i) R (λ)}{k + 1} Λ_{k + 2} (λ) \frac{T^{k + 2 - i}}{(k + 2 - i)!} | B_{i} | η^{k + 2} .

We thus obtain

| E [G_{λ} (τ)] - L (λ) | = | H_{1} (λ) | ⩽ \frac{R (λ) Λ_{k + 2} (λ)}{k + 1} {(2 E η)}^{k + 2} \sum_{i = 0, 2, \dots, k} (k + 2 - i) ≲ R (λ) k Λ_{k + 2} (λ) {(2 E η)}^{k + 2},

which establishes the bound on the bias. We now bound the variance. We note

G_{λ} (τ) = \underset{: = (I)}{\underset{︸}{\sum_{i = 0, 2, \dots, k} \sum_{j = 0, 1, \dots, k + 1 - i} \frac{- B_{i}}{j!} F_{λ}^{(i + j)} (0) η^{i} τ^{j}}} + \underset{: = (I I)}{\underset{︸}{\sum_{i = 0, 2, \dots, k} - B_{i} R_{i} (λ, τ) η^{i}}} .

Thus,

Var [G_{λ} (τ)] = E [G_{λ} {(τ)}^{2}] - E {[G_{λ} (τ)]}^{2} = E [(I) (I)] + 2 E [(I) (II)] + E [(II) (II)] - F_{λ} {(0)}^{2} - 2 F_{λ} (0) H_{1} (λ) - H_{1} {(λ)}^{2} ⩽ \underset{: = (A)}{\underset{︸}{(E [(I) (I)] - F_{λ} {(0)}^{2})}} + \underset{: = (B)}{\underset{︸}{(2 E [(I) (II)] - 2 F_{λ} (0) H_{1} (λ))}} + \underset{: = (C)}{\underset{︸}{E [(II) (II)]}}

and we proceed to bound each term.

(I) (I) - F_{λ} {(0)}^{2} = \sum_{i = 0, 2, \dots, k} \sum_{ℓ = 0, 2, \dots, k} \sum_{j = 0}^{k + 1 - i} \sum_{s = 0}^{k + 1 - ℓ} \frac{B_{i} B_{ℓ}}{j! ℓ!} F_{λ}^{(i + j)} (0) F_{λ}^{(ℓ + s)} (0) η^{i + ℓ} τ^{j + s} 1_{E}

where 1_E is an indicator function indicating that i,j,ℓ,s are not all zero. We have

E | \frac{B_{i} B_{ℓ}}{j! ℓ!} F_{λ}^{(i + j)} (0) F_{λ}^{(ℓ + s)} (0) η^{i + ℓ} τ^{j + s} | ⩽ \frac{| B_{i} B_{ℓ} |}{j! ℓ!} C_{j + s} Λ_{i + j} (λ) Λ_{ℓ + s} (λ) η^{i + ℓ + j + s} ⩽ \frac{| B_{i} B_{ℓ} |}{j! ℓ!} T^{j} T^{s} Λ_{i + j} (λ) Λ_{ℓ + s} (λ) η^{i + ℓ + j + s} ⩽ E^{i + j} E^{ℓ + s} Λ_{i + j} (λ) Λ_{ℓ + s} (λ) η^{i + ℓ + j + s} = (Λ_{i + j} (λ) {(E η)}^{i + j}) (Λ_{ℓ + s} (λ) {(E η)}^{ℓ + s}) .

Noting that only terms where j + s is even survive expectation, and letting $\tilde{i} = i + j$ and $\tilde{ℓ} = ℓ + s$ , we obtain

E [(I) (I)] - F_{λ} {(0)}^{2} ⩽ \sum_{i = 0, 2, \dots, k} \sum_{ℓ = 0, 2, \dots, k} \sum_{j = 0}^{k + 1 - i} \sum_{s = 0}^{k + 1 - ℓ} Λ_{i + j} (λ) {(4 T η)}^{i + j} Λ_{ℓ + s} (λ) {(4 T η)}^{ℓ + s} 1_{E} 1 (j + s even) = \sum_{\tilde{i} = 0}^{k + 1} \sum_{\tilde{ℓ} = 0}^{k + 1} C_{\tilde{i}, \tilde{ℓ}} Λ_{\tilde{i}} (λ) {(E η)}^{\tilde{i}} Λ_{\tilde{ℓ}} (λ) {(E η)}^{\tilde{ℓ}}

for coefficients $C_{\tilde{i} \tilde{ℓ}}$ such that C_0,0 = 0, $C_{\tilde{i} \tilde{ℓ}} = 0$ if $\tilde{i} + \tilde{ℓ}$ is odd, and $C_{\tilde{i} \tilde{ℓ} \tilde{ℓ}} ⩽ k^{2}$ . Thus,

E [(I) (I)] - F_{λ} {(0)}^{2} ⩽ k^{2} \sum_{\begin{matrix} 2 ⩽ \tilde{i} + \tilde{ℓ} ⩽ 2 k + 2 \\ \tilde{i} + \tilde{ℓ} even \end{matrix}} Λ_{\tilde{i}} (λ) Λ_{\tilde{ℓ}} (λ) {(E η)}^{\tilde{i} + \tilde{ℓ}} ⩽ k^{2} Λ {(λ)}^{2} .

Next we bound $E [(II) (II)]$ .

(II) (II) = \sum_{i = 0, 2, \dots, k} \sum_{ℓ = 0, 2, \dots k} B_{i} B_{ℓ} R_{i} (λ, τ) R_{ℓ} (λ, τ) η^{i + ℓ} .

Utilizing Equation (E.1), we have

| B_{i} B_{ℓ} R_{i} (λ, τ) R_{ℓ} (λ, τ) η^{i + ℓ} | ⩽ \frac{2^{2 k + 4} R {(λ)}^{2} | B_{i} B_{ℓ} |}{{(k + 1)}^{2} (k + 1 - i)! (k + 1 - ℓ)!} Λ_{k + 2} {(λ)}^{2} η^{i + ℓ} | τ |^{2 k + 4 - i - ℓ},

which gives

E | B_{i} B_{ℓ} R_{i} (λ, τ) R_{ℓ} (λ, τ) η^{i + ℓ} | ⩽ \frac{2^{2 k + 4} R {(λ)}^{2} T^{2 k + 4 - i - ℓ} | B_{i} B_{ℓ} |}{{(k + 1)}^{2} (k + 1 - i)! (k + 1 - ℓ)!} Λ_{k + 2} {(λ)}^{2} η^{2 k + 4} ⩽ \frac{R {(λ)}^{2} (k + 2 - i) (k + 2 - ℓ)}{{(k + 1)}^{2}} (\frac{T^{k + 2 - i} | B_{i} |}{(k + 2 - i)!}) (\frac{T^{k + 2 - ℓ} | B_{ℓ} |}{(k + 2 - ℓ)!}) Λ_{k + 2} {(λ)}^{2} {(2 η)}^{2 k + 4} ⩽ \frac{R {(λ)}^{2} (k + 2 - i) (k + 2 - ℓ)}{{(k + 1)}^{2}} Λ_{k + 2} {(λ)}^{2} {(2 E η)}^{2 k + 4}

so that

E [(II) (II)] ⩽ \frac{R {(λ)}^{2}}{{(k + 1)}^{2}} Λ_{k + 2} {(λ)}^{2} {(2 E η)}^{2 k + 4} \sum_{i = 0, 2, \dots, k} \sum_{ℓ = 0, 2, \dots k} (k + 1 - i) (k + 2 - ℓ) ≲ k^{2} R {(λ)}^{2} Λ_{k + 2} {(λ)}^{2} {(2 E η)}^{2 k + 4} ⩽ k^{2} R {(λ)}^{2} Λ {(λ)}^{2} .

Finally we bound the cross term $2 E [(I) (II)] - 2 F_{λ} (0) H_{1} (λ)$ .

(I) (II) = \sum_{i = 0, 2, \dots, k} \sum_{j = 0}^{k + 1 - i} \sum_{ℓ = 0, 2, \dots, k} \frac{B_{i}}{j!} F_{λ}^{(i + j)} (0) η^{i} τ^{j} B_{ℓ} R_{ℓ} (λ, τ) η^{ℓ}

(E.2)

Since $| F_{λ}^{(i + j)} (0) | ⩽ Λ_{i + j} (λ)$ and $| B_{ℓ} R_{ℓ} (λ, τ) η^{ℓ} | ⩽ \frac{2^{k + 2} R (λ) | B_{ℓ} |}{(k + 1) (k + 1 - ℓ)!} Λ_{k + 2} (λ) η^{ℓ} τ^{k + 2 - ℓ}$ from (E.1), we have

| \frac{B_{i}}{j!} F_{λ}^{(i + j)} (0) η^{i} τ^{j} B_{ℓ} R_{ℓ} (λ, τ) η^{ℓ} | ⩽ \frac{2^{k + 2} R (λ) | B_{i} B_{ℓ} |}{(k + 1) j! (k + 1 - ℓ)!} Λ_{i + j} (λ) Λ_{k + 2} (λ) η^{i + ℓ} τ^{k + 2 + j - ℓ}

so that

E | \frac{B_{i}}{j!} F_{λ}^{(i + j)} (0) η^{i} τ^{j} B_{ℓ} R_{ℓ} (λ, τ) η^{ℓ} | ⩽ \frac{2^{k + 2} R (λ) T^{k + 2 + j - ℓ} | B_{i} B_{ℓ} |}{(k + 1) j! (k + 1 - ℓ)!} Λ_{i + j} (λ) Λ_{k + 2} (λ) η^{i + j + k + 2} = \frac{2^{k + 2} R (λ) (k + 2 - ℓ)}{(k + 1)} (\frac{T^{j} | B_{i} |}{j!}) (\frac{T^{k + 2 - ℓ} | B_{ℓ} |}{(k + 2 - ℓ)!}) Λ_{i + j} (λ) Λ_{k + 2} (λ) η^{i + j + k + 2} = \frac{R (λ) (k + 2 - ℓ)}{(k + 1)} [{(E η)}^{i + j} Λ_{i + j} (λ)] \cdot [{(2 E η)}^{k + 2} Λ_{k + 2} (λ)] .

The same bound holds for the terms of F_λ(0)H₁(λ), which arise from i = 0, j = 0 in (E.2), so that

2 E [(I) (II)] - 2 F_{λ} (0) H_{1} (λ) ≲ (\sum_{i = 0, 2, \dots, k} \sum_{j = 0}^{k + 1 - i} {(E η)}^{i + j} Λ_{i + j} (λ)) (\sum_{ℓ = 0, 2, \dots, k} \frac{R (λ) (k + 2 - ℓ)}{(k + 1)} {(2 E η)}^{k + 2} Λ_{k + 2} (λ)) ≲ (k \sum_{\tilde{i} = 0}^{k + 1} Λ_{\tilde{i}} (λ) {(E η)}^{\tilde{i}}) (k R (λ) {(2 E η)}^{k + 2} Λ_{k + 2} (λ)) ⩽ k^{2} R (λ) \sum_{\tilde{i} = 0}^{k + 1} Λ_{\tilde{i}} (λ) Λ_{k + 2} (λ) {(2 E η)}^{\tilde{i} + k + 2} ⩽ k^{2} R (λ) Λ {(λ)}^{2} .

Thus, $Var [G_{λ} (τ)] ≲ k^{2} R {(λ)}^{2} Λ {(λ)}^{2}$ and the lemma is proved. □

Lemma 4.2

Let the assumptions and notation of Lemma 4.1 hold, and let τ₁, . . . , τ_M be independent. Define

\tilde{L} (λ) : = \frac{1}{M} \sum_{j = 1}^{M} G_{λ} (τ_{j}) .

Then, with probability at least 1 − 1/t²,

| \tilde{L} (λ) - L (λ) | ≲ k R (λ) (Λ_{k + 2} (λ) {(2 E η)}^{k + 2} + \frac{t Λ (λ)}{\sqrt{M}}) .

Proof.

By Lemma 4.1 and the independence of the τ_j, we have

| L (λ) - E \tilde{L} (λ) | ≲ k R (λ) Λ_{k + 2} (λ) {(2 E η)}^{k + 2}

Var \tilde{L} (λ) ≲ \frac{1}{M} k^{2} Λ {(λ)}^{2}

so by Chebyshev’s inequality we can conclude that with probability at least 1 − 1/t, we have

| \tilde{L} (λ) - E [\tilde{L} (λ)] | ⩽ \frac{t k R (λ) Λ (λ)}{\sqrt{M}},

which gives

| L (λ) - \tilde{L} (λ) | ⩽ | L (λ) - E [\tilde{L} (λ)] | + | E [\tilde{L} (λ)] - \tilde{L} (λ) | ≲ k R (λ) Λ_{k + 2} (λ) {(2 E η)}^{k + 2} + \frac{t k R (λ) Λ (λ)}{\sqrt{M}} .

□

F. Supporting results: noisy dilation MRA

This appendix contains supporting results needed to prove Proposition 5.1, which defines a wavelet invariant estimator for noisy dilation MRA. Lemma 5.1 controls the additive noise error and Lemma 5.2 controls the cross-term error. Lemma F.1 guarantees that the dilation unbiasing procedure applied to the additive noise still has mean σ², which is needed to prove Lemma 5.1.

Lemma 5.1

Let the notation and assumptions of Proposition 5.1 hold, and let A_λ be the operator defined in (5.4). Then, with probability at least 1 − 1/t²,

{| \frac{1}{M} \sum_{j = 1}^{M} \frac{1}{2 π} \int | {\hat{ε}}_{j} (ω) |}^{2} A_{λ} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω - σ^{2} ∣ ⩽ \frac{2 t \sqrt{k} Ψ σ^{2}}{\sqrt{M}} .

Proof.

Let

D (ε_{j}, λ) : = \frac{1}{2 π} \int {| \hat{ϵ_{j}} (ω) |}^{2} A_{λ} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω .

By Lemma D.1, $E_{ε} [{| {\hat{ε}}_{j} (ω) |}^{2}] = σ^{2}$ , and we thus obtain

E_{ε} [D (ε_{j}, λ)] = E_{ε} [\frac{1}{2 π} \int {| \hat{ε_{j}} (ω) |}^{2} A_{λ} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω] = E_{ε} [\frac{1}{2 π} \int {| {\hat{ε}}_{j} (ω) |}^{2} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω - \frac{1}{2 π} \int {| {\hat{ε}}_{j} (ω) |}^{2} B_{2} η^{2} λ^{2} \frac{d}{d λ^{2}} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω - \dots - \frac{1}{2 π} \int {| \hat{ε_{j}} (ω) |}^{2} B_{k} η^{k} λ^{k} \frac{d}{d λ^{k}} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω] = σ^{2} (\frac{1}{2 π} \int {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω - \frac{B_{2} η^{2}}{2 π} \int λ^{2} \frac{d}{d λ^{2}} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω - \dots - \frac{B_{k} η^{k}}{2 π} \int λ^{k} \frac{d}{d λ^{k}} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω) = σ^{2} (1 - 0 - \dots - 0) = σ^{2},

where we have used Lemma F.1 to conclude $\int λ^{m} (\frac{d^{m}}{d λ^{m}} {| {\hat{ψ}}_{λ} (ω) |}^{2}) d ω = 0$ m = 2, . . . , k. Also since ${(a_{1} + \dots + a_{n})}^{2} ⩽ n (a_{1}^{2} + \dots + a_{n}^{2})$ by the Cauchy–Schwarz inequality, we obtain

E_{ε} [D {(ε_{j}, λ)}^{2}] ⩽ E_{ε} [k \sum_{m = 0, 2, .., k} {(\frac{B_{m} η^{m}}{2 π} \int {| {\hat{ε}}_{j} (ω) |}^{2} λ^{m} \frac{d^{m}}{d λ^{m}} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω)}^{2}]

where we let $\frac{d}{d λ^{0}} {| {\hat{ψ}}_{λ} (ω) |}^{2}$ denote ${| {\hat{ψ}}_{λ} (ω) |}^{2}$ and B0 = 1. By Lemma D.1, we have $E_{ε} [{| ε_{j} (ω) |}^{2} {| ε_{j} (ξ) |}^{2}] ⩽ 3 σ^{4}$ for all frequencies ω, ξ, so that

E_{ε} [{(\frac{B_{m} η^{m}}{2 π} \int {| \hat{ε_{j}} (ω) |}^{2} λ^{m} \frac{d^{m}}{d λ^{m}} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω)}^{2}] ⩽ E_{ε} [{\frac{B_{m}^{2} η^{2 m}}{4 π^{2}} \iint {| {\hat{ε}}_{j} (ω) |}^{2} {| \hat{ε_{j}} (ξ) |}^{2} | λ^{m} \frac{d^{m}}{d λ^{m}} | {\hat{ψ}}_{λ} (ω) |}^{2} | \cdot | λ^{m} \frac{d^{m}}{d λ^{m}} {| {\hat{ψ}}_{λ} (ξ) |}^{2} ∣ d ω d ξ] ⩽ 3 σ^{4} {({\frac{1}{2 π} \int | B_{m} η^{m} λ^{m} \frac{d^{m}}{d λ^{m}} | {\hat{ψ}}_{λ} (ω) |}^{2} ∣ d ω)}^{2} ⩽ 3 σ^{4} Ψ_{m}^{2} {(E η)}^{2 m},

where the last line follows from Corollary B.1 in Appendix B. We thus obtain

E_{ε} [D {(ε_{j}, λ)}^{2}] ⩽ k \sum_{m = 0, 2, .., k} E_{ε} [{(\frac{B_{m} η^{m}}{2 π} \int {| {\hat{ε}}_{j} (ω) |}^{2} λ^{m} \frac{d^{m}}{d λ^{m}} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω)}^{2}] ⩽ 3 k σ^{4} \sum_{m = 0, 2, .., k} Ψ_{m}^{2} {(E η)}^{2 m} : = (I)

so that

E_{ε} [D (ε_{j}, λ) - σ^{2}] = 0

{Var}_{ε} [D (ε_{j}, λ) - σ^{2}] = {Var}_{ε} [D (ε_{j}, λ)] ⩽ E_{ε} [{(D (ε_{j}, λ))}^{2}] ⩽ (I) .

Thus,

{Var}_{ε} (\frac{1}{M} \sum_{j = 1}^{M} D (ε_{j}, λ) - σ^{2}) ⩽ \frac{(I)}{M}

so that by Chebyshev’s inequality with probability at least 1 − 1/t²

| \frac{1}{M} \sum_{j = 1}^{M} D (ε_{j}, λ) - σ^{2} | ⩽ \frac{t \sqrt{(I)}}{\sqrt{M}} ⩽ t \sqrt{3 k} (\sum_{m = 0, 2, \dots, k} Ψ_{m} {(E η)}^{m}) \frac{σ^{2}}{\sqrt{M}} = 2 t \sqrt{k} Ψ \frac{σ^{2}}{\sqrt{M}} .

□

Lemma 5.2

Let the notation and assumptions of Proposition 5.1 hold, and let A_λ be the operator defined in (5.4). Then, with probability at least 1 − 1/t²,

{| \frac{1}{M} \sum_{j = 1}^{M} \frac{1}{2 π} \int ({\hat{f}}_{τ_{j}} (ω) {\hat{ε}}_{j} (ω) + {\hat{f}}_{τ_{j}} (ω) {\hat{ε}}_{j} (ω)) A_{λ} | {\hat{ψ}}_{λ} (ω) |}^{2} d ω ∣ ≲ \frac{t}{\sqrt{M}} \sqrt{Ψ (Λ_{0} (λ) + Λ (λ))} σ .

Proof.

We have

\frac{1}{M} \sum_{j = 1}^{M} \frac{1}{2 π} \int ({\hat{f}}_{τ_{j}} (ω) {\hat{\hat{ε}}}_{j} (ω) + {\hat{f}}_{τ_{j}} (ω) {\hat{ε}}_{j} (ω)) A_{λ} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω = \frac{1}{M} \sum_{j = 1}^{M} Y_{j} + {\bar{Y}}_{j}

where

Y_{j} : = \frac{1}{2 π} \int (\bar{\hat{f_{τ_{j}}}} (ω) {\hat{ε}}_{j} (ω)) A_{λ} {| {\hat{ψ}}_{λ} (ω) |}^{2} d ω .

The random variable Y_j has randomness depending on both ε_j and τ_j. Note that

E_{ε, τ} [Y_{j}] = E_{ε, τ} [E_{ε, τ} [Y_{j} ∣ τ_{j}]]

since Y_j is integrable. Thus, since $E_{ε, τ} [{\hat{ε}}_{j} (ω)] = 0$ , we obtain $E_{ε, τ} [Y_{j} ∣ τ_{j}] = 0$ , which yields $E_{ε, τ} [Y_{j}] = 0$ . We also have

{Var}_{ε, τ} [Y_{j}] = E_{ε, τ} [Y_{j}^{2}] ⩽ E_{ε, τ} [{({\frac{1}{2 π} \int | \bar{{\hat{f}}_{τ_{j}}} (ω) | \cdot | {\hat{ε}}_{j} (ω) | \cdot | A_{λ} | {\hat{ψ}}_{λ} (ω) |}^{2} ∣ d ω)}^{2}] ⩽ E_{ε, τ} [({\frac{1}{2 π} \int {| \bar{{\hat{f}}_{τ_{j}}} (ω) |}^{2} \cdot | A_{λ} | {\hat{ψ}}_{λ} (ω) |}^{2} ∣ d ω) ({\frac{1}{2 π} \int {| {\hat{ε}}_{j} (ω) |}^{2} \cdot | A_{λ} | {\hat{ψ}}_{λ} (ω) |}^{2} ∣ d ω)] = E_{τ} [{\frac{1}{2 π} \int {| \bar{{\hat{f}}_{τ_{j}}} (ω) |}^{2} \cdot | A_{λ} | {\hat{ψ}}_{λ} (ω) |}^{2} ∣ d ω] E_{ε} [{\frac{1}{2 π} \int {| {\hat{ε}}_{j} (ω) |}^{2} \cdot | A_{λ} | {\hat{ψ}}_{λ} (ω) |}^{2} ∣ d ω] .

Letting B₀ = 1 and applying Lemma B.2, we have

E_{τ} [{\frac{1}{2 π} \int {| \bar{{\hat{f}}_{τ_{j}}} (ω) |}^{2} \cdot | A_{λ} | {\hat{ψ}}_{λ} (ω) |}^{2} ∣ d ω] ⩽ E_{τ} [{\sum_{m = 0, 2, \dots, k} \frac{1}{2 π} \int {| {\hat{f}}_{τ_{j}} (ω) |}^{2} \cdot | B_{m} η^{m} λ^{m} \frac{d^{m}}{d λ^{m}} | {\hat{ψ}}_{λ} (ω) |}^{2} | d ω] ⩽ E_{τ} [\sum_{m = 0, 2, \dots, k} {(E η)}^{m} ({‖ f_{τ_{j}} ‖}_{1}^{2} Ψ_{m} \land \frac{{‖ f_{τ_{j}}^{'} ‖}_{1}^{2} Θ_{m}}{λ^{2}})] ⩽ \sum_{m = 0, 2, \dots, k} {(E η)}^{m} (‖ f ‖_{1}^{2} Ψ_{m} \land \frac{4 {‖ f^{'} ‖}_{1}^{2} Θ_{m}}{λ^{2}}) ⩽ 4 \sum_{m = 0, 2, \dots, k} {(E η)}^{m} Λ_{m} (λ) ≲ Λ_{0} (λ) + Λ (λ)

since ${‖ τ_{j} ‖}_{\infty} ⩽ \frac{1}{2}$ guarantees ${‖ f_{τ_{j}}^{'} ‖}_{1} = \frac{1}{1 - τ_{j}} {‖ f^{'} ‖}_{1} ⩽ 2 {‖ f^{'} ‖}_{1}$ . Also,

E_{ε} [{\frac{1}{2 π} \int {| {\hat{ε}}_{j} (ω) |}^{2} \cdot | A_{λ} | {\hat{ψ}}_{λ} (ω) |}^{2} | d ω] ⩽ E_{ε} [{\sum_{m = 0, 2, \dots, k} \frac{1}{2 π} \int {| {\hat{ε}}_{j} (ω) |}^{2} \cdot | B_{m} η^{m} λ^{m} \frac{d^{m}}{d λ^{m}} | {\hat{ψ}}_{λ} (ω) |}^{2} | d ω] = σ^{2} ({\sum_{m = 0, 2, \dots, k} \frac{1}{2 π} \int | B_{m} η^{m} λ^{m} \frac{d^{m}}{d λ^{m}} | {\hat{ψ}}_{λ} (ω) |}^{2} | d ω) ⩽ σ^{2} \sum_{m = 0, 2, \dots, k} {(E η)}^{m} Ψ_{m} = σ^{2} Ψ

where the second line follows from Lemma D.1 in Appendix D and the next to last line from Corollary B.1 in Appendix B. We thus have

E_{ε, τ} [Y_{j}] = 0

{Var}_{ε, τ} [Y_{j}] ≲ σ^{2} Ψ (Λ_{0} (λ) + Λ (λ))

and an identical argument can be applied to the $\bar{Y_{j}}$ so that by Chebyshev’s inequality with probability at least 1 − 1/t²

| \frac{1}{M} \sum_{j = 1}^{M} Y_{j} + {\bar{Y}}_{j} | ⩽ | \frac{1}{M} \sum_{j = 1}^{M} Y_{j} | + | \frac{1}{M} \sum_{j = 1}^{M} \bar{Y_{j}} | ≲ t \sqrt{Ψ} \sqrt{Λ_{0} (λ) + Λ (λ)} \frac{σ}{\sqrt{M}} .

□

Lemma F.1

Assume ψ is k-admissible. Then,

\int λ^{m} (\frac{d^{m}}{d λ^{m}} {| {\hat{ψ}}_{λ} (ω) |}^{2}) d ω = 0

(F.1)

for all 1 ⩽ m ⩽ k.

Proof.

We recall that since ψ is k-admissible, ${| {\hat{ψ}}_{λ} (ω) |}^{2} \in C^{k} (ℝ)$ , and to simplify notation we let $g = | \hat{ψ} |^{2}$ and

g_{λ} (ω) = \frac{1}{λ} g (\frac{ω}{λ}) = {| {\hat{ψ}}_{λ} (ω) |}^{2} .

We first establish that

λ^{k} (\frac{d}{d λ^{k}} g_{λ} (ω)) = \frac{d}{d ω} (- ω λ^{k - 1} \frac{d}{d λ^{k - 1}} g_{λ} (ω)) - (k - 1) λ^{k - 1} \frac{d}{d λ^{k - 1}} g_{λ} (ω) .

(F.2)

The proof is by induction. When k = 1, we obtain

LHS of Eqn . (F . 2) = λ \frac{d}{d λ} (\frac{1}{λ} g (\frac{ω}{λ})) = - \frac{ω}{λ^{2}} g^{'} (\frac{ω}{λ}) - \frac{1}{λ} g (\frac{ω}{λ}) = - ω g_{λ}^{'} (ω) - g_{λ} (ω)

and

RHS of Eqn . (F .2) = \frac{d}{d ω} (- ω g_{λ} (ω)) = - ω g_{λ}^{'} (ω) - g_{λ} (ω),

so the base case is established. We now assume that Equation (F.2) holds and show it also holds for k+1 replacing k. By the inductive hypothesis

\frac{d}{d λ^{k}} g_{λ} (ω) = \frac{d}{d ω} (- ω λ^{- 1} \frac{d}{d λ^{k - 1}} g_{λ} (ω)) - (k - 1) λ^{- 1} \frac{d}{d λ^{k - 1}} g_{λ} (ω) \frac{d}{d λ^{k + 1}} g_{λ} (ω) = \frac{d}{d ω} (- ω λ^{- 1} \frac{d}{d λ^{k}} g_{λ} (ω) + \frac{d}{d λ^{k - 1}} g_{λ} (ω) ω λ^{- 2}) - (k - 1) (λ^{- 1} \frac{d}{d λ^{k}} g_{λ} (ω) + \frac{d}{d λ^{k - 1}} g_{λ} (ω) (- λ^{- 2})) = \frac{d}{d ω} (- ω λ^{- 1} \frac{d}{d λ^{k}} g_{λ} (ω)) - (k - 1) λ^{- 1} \frac{d}{d λ^{k}} g_{λ} (ω) + \underset{= - λ^{- 1} \frac{d}{d λ^{k}} g_{λ} (ω) by inductive hypothesis}{\underset{︸}{\frac{d}{d ω} (ω λ^{- 2} \frac{d}{d λ^{k - 1}} g_{λ} (ω)) + (k - 1) λ^{- 2} \frac{d}{d λ^{k - 1}} g_{λ} (ω)}} = \frac{d}{d ω} (- ω λ^{- 1} \frac{d}{d λ^{k}} g_{λ} (ω)) - k λ^{- 1} \frac{d}{d λ^{k}} g_{λ} (ω)

so that

λ^{k + 1} \frac{d}{d λ^{k + 1}} g_{λ} (ω) = \frac{d}{d ω} (- ω λ^{k} \frac{d}{d λ^{k}} g_{λ} (ω)) - k λ^{k} \frac{d}{d λ^{k}} g_{λ} (ω) .

Thus, (F.2) is established. We now use integration by parts to show (F.2) implies (F.1) in the Lemma. The proof of (F.1) is once again by induction. When k = 1, we have already shown

λ (\frac{d}{d λ} g_{λ} (ω)) = - ω g_{λ}^{'} (ω) - g_{λ} (ω) .

(F.3)

Integration by parts gives

\int ω g_{λ}^{'} (ω) d ω = {(ω g_{λ} (ω)) |}_{- \infty}^{\infty} - \int g_{λ} (ω) d ω = \int g_{λ} (ω) d ω .

Note ωg_λ(ω) vanishes at ±∞ since $g \in L^{1} (ℝ)$ guarantees $g_{λ} \in L^{1} (ℝ)$ , and thus g_λ must decay faster that ω⁻¹. Utilizing (F.3),

\int ω g_{λ}^{'} (ω) - g_{λ} (ω) d ω = 0 \Rightarrow \int λ (\frac{d}{d λ} g_{λ} (ω)) d ω = 0

and the base case is established. We now assume

\int λ^{k - 1} (\frac{d}{d λ^{k - 1}} g_{λ} (ω)) d ω = 0.

By integrating Equation (F.2), we obtain

\int λ^{k} (\frac{d}{d λ^{k}} g_{λ} (ω)) d ω = \int \frac{d}{d ω} (- ω λ^{k - 1} \frac{d}{d λ^{k - 1}} g_{λ} (ω)) d ω - (k - 1) \underset{= 0 by induc . hypo .}{\underset{︸}{\int λ^{k - 1} \frac{d}{d λ^{k - 1}} g_{λ} (ω) d ω}} = \int - ω \frac{d}{d ω} (λ^{k - 1} \frac{d}{d λ^{k - 1}} g_{λ} (ω)) d ω - \underset{= 0 by induc . hypo .}{\underset{︸}{\int λ^{k - 1} \frac{d}{d λ^{k - 1}} g_{λ} (ω) d ω}} = - {ω λ^{k - 1} \frac{d}{d λ^{k - 1}} g_{λ} (ω) |}_{- \infty}^{\infty} + \underset{= 0 by induc . hypo .}{\underset{︸}{\int λ^{k - 1} \frac{d}{d λ^{k - 1}} g_{λ} (ω) d ω}} = 0.

We are guaranteed $- ω λ^{k - 1} \frac{d}{d λ^{k - 1}} g_{λ} (ω)$ vanishes at ±∞ since in the proof of Lemma 4.3 we showed $λ^{k - 1} \frac{d}{d λ^{k - 1}} g_{λ} (ω) = \sum_{j = 0}^{k - 1} C_{j} ω^{j} g_{λ}^{(j)} (ω)$ , and $ω^{j} g_{λ}^{(j)} \in L^{1} (ℝ)$ implies $ω^{j + 1} g_{λ}^{(j)}$ vanishes at ±∞. □

G. Moment estimation for noisy dilation MRA

In this appendix we outline a moment estimation procedure for noisy dilation MRA (Model 2) in the special case t = 0, i.e. signals are randomly dilated and subjected to additive noise but are not translated. This procedure is a generalization of the method presented in Section 6.3.

Given the additive noise level, the moments of the dilation distribution τ can be empirically estimated from the mean and variance of the random variables β_m(y_j) defined by

β_{m} (y_{j}) = \int_{0}^{2^{ℓ} π} ω^{m} {\hat{y}}_{j} (ω) d ω

(G.1)

for integer m ⩾ 0. To account for the effect of additive noise on the above random variables, we define

g_{m} (ℓ, σ) = \int_{0}^{2^{ℓ} π} \int_{0}^{2^{ℓ} π} \frac{2 σ^{2} ξ^{m} ω^{m} \sin (\frac{1}{2} (ξ - ω))}{(ξ - ω)} d ω d ξ

(G.2)

and an order m additive noise adjusted squared coefficient of variation by

C V_{m} : = \frac{Var [β_{m} (y_{j})] - g_{m} (ℓ, σ)}{{| E [β_{m} (y_{j})] |}^{2}} .

(G.3)

Remark G.1

If the noisy signals are supported in $[- \frac{N}{2}, \frac{N}{2}]$ instead of $[- \frac{1}{2}, \frac{1}{2}]$ , (G.2) is replaced with

g_{m} (N, ℓ, σ) = \int_{0}^{2^{ℓ} π} \int_{0}^{2^{ℓ} π} \frac{2 σ^{2} ξ^{m} ω^{m} \sin (\frac{N}{2} (ξ - ω))}{(ξ - ω)} d ω d ξ .

The following proposition mirrors Proposition 6.1 for dilation MRA; its proof appears at the end of Appendix G.

Proposition G.1

Assume Model 2 with t = 0 and CV₀, CV₁ defined by (G.1), (G.2) and (G.3). Then,

C V_{0} = η^{2} + (3 C_{4} - 3) η^{4} + O (η^{6})

C V_{1} = 4 η^{2} + (25 C_{4} - 33) η^{4} + O (η^{6}) .

Once again we cannot compute CV_m exactly, but by replacing Var, $E$ with their finite sample estimators, we obtain approximations ${\tilde{C V}}_{m}$ that can be used to define estimators of the dilation moments.

Definition G.1

Assume Model 3 with t = 0 and ${\tilde{C V}}_{0}$ , ${\tilde{C V}}_{1}$ the empirical counterparts of (G.3). Define the second-order estimator of η² by ${\tilde{η}}^{2} = {\tilde{C V}}_{0}$ . Define the fourth-order estimators of (η², C₄η⁴) by the unique positive solution ( ${\tilde{η}}^{2}$ , ${\tilde{C}}_{4}$ ) of

{\tilde{C V}}_{0} = η^{2} + (3 C_{4} - 3) η^{4}

{\tilde{C V}}_{1} = 4 η^{2} + (25 C_{4} - 33) η^{4} .

As M → ∞, the second-order moment estimator is accurate up to O(η⁴) and the fourth-order moment estimators are accurate up to O(η⁶). However, in the finite sample regime, the g_m(ℓ, σ) appearing in (G.3) will be replaced with $g_{m} (ℓ, σ) \pm O (σ^{2} / \sqrt{M})$ , so that the estimators given in Definition G.1 are subject to an error of order $O (σ^{2} / \sqrt{M})$ . More generally, the additive noise fluctuations imply that to estimate the first k/2 even moments of τ up to an O(η^k+1) error will require $σ^{2} / \sqrt{M} ⩽ η^{k + 1}$ , or M ⩾ σ⁴η^−2(k+1).

Having established an empirical moment estimation procedure for noisy dilation MRA when t = 0, we repeat the simulations of Section 5.2 on the restricted model, but estimate the additive and dilation moments empirically. Since accurately estimating the moments of τ is difficult for σ large, we make three modifications to the oracle set-up. First, we lower the additive noise level by a factor of 2 from the oracle simulations, and consider all parameter combinations resulting from σ = 2⁻⁵, 2⁻⁴ (giving SNR = 9.0, 2.2) and η = 0.06, 0.12. Secondly, we take M substantially larger than for the oracle simulations, with 16, 384 ⩽ M ⩽ 370, 727. Thirdly, we compute WSC k = 4 only for large dilations. For large dilations (η², C₄η⁴) are approximated with fourth-order estimators, while for small dilations η² is approximated with a second-order estimator (see Definition G.1).

Results are shown in Fig. G7, and the same overall behavior observed in the oracle simulations for large M holds. The additive noise level was estimated empirically as described in Section 6.2. For the medium- and high-frequency signal, WSC k = 2 has substantially smaller error than both PS k = 0 and WSC k = 0; for the large-frequency signal, the error is decreased by at least a factor of 2 for large dilations and a factor of 4 for small dilations relative to both zero order estimators. When WSC k = 4 is defined, it has a smaller error than WSC k = 2 for the high-frequency signal, while WSC k = 2 is preferable for the low- and medium-frequency signal. We observe that for the oracle simulations WSC k = 4 is preferable for all frequencies, so this is most likely due to error in the moment estimation degrading the WSC k = 4 estimator. For the low-frequency signal, PS k = 0 once again achieves the smallest error for small dilations, while for large dilations the higher order wavelet methods appear to surpass PS k = 0 for M large enough.

Proof of Proposition G.1.

Since $y_{j} = L_{τ_{j}} f + ε_{j}$ , we have

E [β_{m} (y_{j})] = E [\int_{0}^{2^{ℓ} π} ω^{m} ({\hat{f}}_{τ_{j}} (ω) + {\hat{ε}}_{j} (ω)) d ω] = E [\int_{0}^{2^{ℓ} π} ω^{m} {\hat{f}}_{τ_{j}} (ω) d ω] = E [\int_{0}^{2^{ℓ} π} ω^{m} \hat{f} ((1 - τ_{j}) ω) d ω] = E [\int_{0}^{2^{2} π (1 - τ_{j})} \frac{ξ^{m}}{{(1 - τ_{j})}^{m}} \hat{f} (ξ) \frac{d ξ}{(1 - τ_{j})}] = β_{m} (f) E [{(1 - τ_{j})}^{- (m + 1)}] .

We now compute the variance. We first establish that

g_{m} (ℓ, σ) = E [(\int_{0}^{2^{ℓ} π} ω^{m} {\hat{ε}}_{j} (ω) d ω) (\int_{0}^{2^{ℓ} π} ω^{m} \bar{{\hat{ε}}_{j} (ω)} d ω)] .

By Thm 4.5 of [49]

E [{\hat{ε}}_{j} (ω) \bar{{\hat{ε}}_{j} (ξ)}] = E [(\int_{- 1 / 2}^{1 / 2} e^{- i ω t} d B_{t}) (\int_{- 1 / 2}^{1 / 2} e^{i ξ t} d B_{t})] = σ^{2} \int_{- 1 / 2}^{1 / 2} e^{i (ξ - ω) t} d t = \frac{2 σ^{2} \sin (\frac{1}{2} (ξ - ω))}{(ξ - ω)}

so that

E [(\int_{0}^{2^{ℓ} π} {\hat{ε}}_{j} (ω) d ω) (\int_{0}^{2^{ℓ} π} \bar{{\hat{ε}}_{j} (ω)} d ω)] = \int_{0}^{2^{ℓ} π} \int_{0}^{2^{ℓ} π} ω^{m} ξ^{m} E [{\hat{ε}}_{j} (ω) \bar{{\hat{ε}}_{j} (ξ)}] d ω d ξ = \int_{0}^{2^{ℓ} π} \int_{0}^{2^{ℓ} π} ω^{m} ξ^{m} \frac{2 σ^{2} \sin (\frac{1}{2} (ξ - ω))}{(ξ - ω)} d ω d ξ = g_{m} (ℓ, σ) .

We thus obtain

[{| β_{m} (y_{j}) |}^{2}] = E [(\int_{0}^{2^{ℓ} π} ω^{m} ({\hat{f}}_{τ_{j}} (ω) + {\hat{ε}}_{j} (ω)) d ω) (\int_{0}^{2^{ℓ} π} ω^{m} (\bar{{\hat{f}}_{τ_{j}} (ω)} + \bar{{\hat{ε}}_{j} (ω)}) d ω)] = E [(\int_{0}^{2^{ℓ} π} ω^{m} \hat{f} ((1 - τ_{j}) ω) d ω) (\int_{0}^{2^{ℓ} π} ω^{m} \bar{\hat{f} ((1 - τ_{j}) ω)} d ω) + (\int_{0}^{2^{ℓ} π} ω^{m} {\hat{ε}}_{j} (ω) d ω) (\int_{0}^{2^{ℓ} π} ω^{m} \bar{{\hat{ε}}_{j} (ω)} d ω)] = E [{(1 - τ_{j})}^{- 2 (m + 1)} β_{m} (f) \bar{β_{m} (f)}] + g_{m} (ℓ, σ) = {| β_{m} (f) |}^{2} E [{(1 - τ_{j})}^{- 2 (m + 1)}] + g_{m} (ℓ, σ) .

Thus,

Var [β_{m} (y_{j})] - g_{m} (ℓ, σ) = E [{| β_{m} (y_{j}) |}^{2}] - g_{m} (ℓ, σ) - {| E [β_{m} (y_{j})] |}^{2} = {| β_{m} (f) |}^{2} E [{(1 - τ_{j})}^{- 2 (m + 1)}] - {| β_{m} (f) |}^{2} {(E [{(1 - τ_{j})}^{- (m + 1)}])}^{2} .

Dividing by ${| E [β_{m} (y_{j})] |}^{2}$ gives

C V_{m} = \frac{E [{(1 - τ_{j})}^{- 2 (m + 1)}]}{{(E [{(1 - τ_{j})}^{- (m + 1)}])}^{2}} - 1,

and the remainder of the proof is identical to the proof of Proposition 6.1. □

H. Additional simulations for noisy dilation MRA

We investigate the L² error of estimating the power spectrum using PS (k = 0) and WSC (k = 0, 2, 4) for three additional high-frequency functions:

f_{4} (x) = 1.175 \cos (32 x) \cdot 1 (x \in [- 0.2, 0.2])

f_{5} (x) = 0.299 \exp^{- 0.04 x^{2}} \cos (30 x + 1.5 x^{2})

f_{6} (x) = (2.304 / π) \cos (35 x) sinc (3 x) .

The multiplicative constants were chosen so that the L² norms of f₄, f₅, f₆ are comparable with the L² norms of the Gabor signals f₁, f₂, f₃ defined in Section 4.4. The signal f₄ is not continuous and has compact support, with a slowly decaying, oscillating Fourier transform given by ${\hat{f}}_{4} (ω) / 0.47 = sinc (0.2 (ω - 32)) + sinc (0.2 (- ω - 32))$ . The signal f₅ is a linear chirp with a constantly varying instantaneous frequency. The signal f₆ is slowly decaying in space, with a discontinuous Fourier transform of compact support given by ${\hat{f}}_{6} (ω) / 0.384 = 1 (ω \in [- 38, - 32]) + 1 (ω \in [32, 38])$ .

Implementation details were as described in Section 6, and simulations were run with oracle moment estimation on the full model (parameter values as described in Section 5.2). Figure H8 shows the L² error. As for the high-frequency Gabor in Section 5.2, WSC (k = 2) and WSC (k = 4) significantly outperformed the zero order estimators. In addition for large dilations, the WSC (k = 4) outperformed WSC (k = 2) on f₄ and f₆.

Fig. H8. — L² error with standard error bars for noisy dilation MRA model (oracle moment estimation). First, second and third columns show results for f₄, f₅ and f₆. All plots for the same signal have the same axis limits.

I. Expectation maximization algorithm for noisy dilation MRA

In this appendix we discuss how the expectation-maximization (EM) algorithm proposed in [1] can be extended to solve noisy dilation MRA. We first summarize the EM framework, which differentiates between observed data $y = {y_{j}}_{j = 1}^{M}$ , latent variables $s = {s_{j}}_{j = 1}^{M}$ and model parameters x. The goal is to produce the x that maximizes the marginalized likelihood function

p (y | x) = \int p (y, s | x) d s .

Maximizing p(y|x) directly is generally not tenable because enumerating the various values for s is too costly. However, EM algorithms can be used to find local maxima of the above function, by iterating between estimating the conditional distribution of latent variables given the current estimate of parameters (E-step) and estimating parameters given the current estimate of the conditional distribution of latent variables (M-step). Specifically, the iterative procedure updates x^k, the current estimate of x, by

Q (x | x^{k}) = E_{s | y, x^{k}} [\log p (y, s | x)] E-step

(I.1)

x^{k + 1} = \arg \max_{x} Q (x | x^{k}) M-step .

(I.2)

Since (under certain conditions) log p(y|x) improves at least as much as Q at each iteration [30], the algorithm converges to a local maximum of p(y|x). This framework can be applied to noisy dilation MRA, and explicit formulas for both the E-step and M-step can be derived. Assume for simplicity that signals have been discretized to have length n and that the translation distribution ρ_t and dilation distribution ρ_τ are unknown and also discrete with n possible values ${t^{ℓ}}_{ℓ = 1}^{n}$ and ${τ^{q}}_{q = 1}^{n}$ , respectively. Letting x = (f, ρ_t, ρ_τ) denote the parameters, sj = (t^j, τ^j) denote the latent/nuisance variables, and px denote conditioning on x, the likelihood function has form

p (y, s | x) = p_{x} (y | s) p_{x} (s) = \prod_{j = 1}^{M} \frac{1}{{(2 π σ^{2})}^{\frac{n}{2}}} \exp (- \frac{1}{2 σ^{2}} {‖ L_{τ_{j}} T_{t_{j}} f - y_{j} ‖}_{2}^{2}) ρ_{t} (t_{j}) ρ_{τ} (τ_{j}) .

Thus (up to a constant), the log likelihood has form

\log p (y, s | x) = \sum_{j = 1}^{M} - \frac{1}{2 σ^{2}} {‖ L_{τ_{j}} T_{t_{j}} f - y_{j} ‖}_{2}^{2} + \sum_{j = 1}^{M} \log ρ_{t} (t_{j}) + \sum_{j = 1}^{M} \log ρ_{τ} (τ_{j}) .

(I.3)

Given the current estimate $x^{k} = (f^{k}, ρ_{t}^{k}, ρ_{τ}^{k})$ of parameters, the E-step is performed by first computing the conditional distribution of the latent variables

w_{k}^{ℓ, q, j} = ℙ (t_{j} = t^{ℓ}, τ_{j} = τ^{q} | x^{k}) = C_{k}^{j} \exp (- \frac{1}{2 σ^{2}} {‖ L_{τ_{j}} T_{t_{j}} f^{k} - y_{j} ‖}_{2}^{2}) ρ_{t}^{k} (t^{ℓ}) ρ_{τ}^{k} (τ^{q}),

(I.4)

where $C_{k}^{j}$ is a normalizing constant so that $\sum_{ℓ, q} w_{k}^{ℓ, q, j} = 1$ . These weights are then used to compute Q, that is, by combining (I1), (I3) and (I4):

Q (f, ρ_{t}, ρ_{τ} | f^{k}, ρ_{t}^{k}, ρ_{τ}^{k}) = \sum_{j = 1}^{M} \sum_{ℓ = 1}^{n} \sum_{q = 1}^{n} w_{k}^{ℓ, q, j} (- \frac{1}{2 σ^{2}} {‖ L_{τ_{j}} T_{t_{j}} f - y_{j} ‖}_{2}^{2} + \log ρ_{t} (t^{ℓ}) + \log ρ_{τ} (τ^{q})),

(I.5)

up to a constant. The M-step is then computed by

(f^{k + 1}, ρ_{t}^{k + 1}, ρ_{τ}^{k + 1}) = \arg \max_{f, ρ_{t}, ρ_{τ}} Q (f, ρ_{t}, ρ_{τ} | f^{k}, ρ_{t}^{k}, ρ_{τ}^{k}) .

(I.6)

Since f, ρ_t, ρ_τ all appear in distinct sums in (I5), performing the maximization in (I6) is straightforward. Since ${‖ L_{τ_{j}} T_{t_{j}} f - y_{i} ‖}_{2}^{2} = \frac{1}{1 - τ_{q}} f - T_{l}^{- 1} L_{τ}^{- 1} y_{j} ‖_{2}^{2},$ it is easy to check that

{‖ L_{τ_{j}} T_{t_{j}} f - y_{j} ‖}_{2}^{2} = \frac{1}{1 - τ_{q}} f - T_{ℓ}^{- 1} L_{τ}^{- 1} y_{j} ‖_{2}^{2}

f^{k + 1} = \frac{1}{C} \sum_{j = 1}^{M} \sum_{ℓ = 1}^{n} \sum_{q = 1}^{n} \frac{w_{k}^{ℓ, q, j}}{(1 - τ_{q})} T_{ℓ}^{- 1} L_{τ}^{- 1} y_{j}, C = \sum_{j = 1}^{M} \sum_{ℓ = 1}^{n} \sum_{q = 1}^{n} \frac{w_{k}^{ℓ, q, j}}{(1 - τ_{q})} .

(I.7)

Using Lemma 15 in [1], one can also obtain closed form expressions for the updates to $ρ_{t}^{k}$ , $ρ_{τ}^{k}$ :

ρ_{t}^{k + 1} (t^{ℓ}) = \frac{{\tilde{w}}_{k}^{ℓ}}{\sum_{ℓ^{'}} {\tilde{w}}_{k}^{ℓ^{'}}} for {\tilde{w}}_{k}^{ℓ} = \sum_{j} \sum_{q} w_{k}^{ℓ, q, j}, ρ_{τ}^{k + 1} (τ^{q}) = \frac{{\tilde{v}}_{k}^{q}}{\sum_{q^{'}} {\tilde{v}}_{k}^{q^{q}}} for {\tilde{v}}_{k}^{q} = \sum_{j} \sum_{ℓ} w_{k}^{ℓ, q, j} .

Note when a discrete signal defined on some fixed grid is dilated, its dilation is defined on a different grid. Thus, computing (I.4) and (I.7) will involve off-grid interpolation, a subtlety not arising in classic MRA, and this interpolation may contribute additional error. We also note that one can always force the translation distribution to be uniform by retranslating the signals uniformly, and in this case all sums over ℓ in this section could be eliminated. This would improve the computational complexity of the algorithm but may be disadvantageous in terms of sample complexity, as in classic MRA a uniform translation distribution requires a larger sample size for accurate estimation than an aperiodic translation distribution [1].

J. Supporting results: stochastic calculus

This appendix contains several stochastic calculus results that are used to control the statistics of the additive noise. Proposition J.1 is a simple generalization of Thm 4.5 of [49]. Proposition J.2 controls the second moment of the stochastic quantity in Proposition J.1 and is in fact a special case of Proposition J.3. Both Propositions J.2 and J.3 are proved with standard techniques from stochastic calculus, and for brevity we omit the proofs.

Proposition J.1

Assume $\int_{0}^{T} f {(t)}^{2} d t < \infty$ , $\int_{0}^{T} {\bar{f (t)}}^{2} d t < \propto$ , and let B_t be a Brownian motion with variance σ². Then,

E [(\int_{0}^{T} f (t) d B_{t}) (\int_{0}^{T} \bar{f (t)} d B_{t})] = σ^{2} \int_{0}^{T} f (t) \bar{f (t)} d t .

Proposition J.2

Let f(t) be a bounded and continuous complex deterministic function on [0, T], and let B_t be a Brownian motion with variance σ². Then, for a fixed non-random time T, we have

E [{(\int_{0}^{T} f (t) d B_{t})}^{2} {(\int_{0}^{T} \bar{f (t)} d B_{t})}^{2}] = 2 σ^{4} {(\int_{0}^{T} | f (t) |^{2} d t)}^{2} + σ^{4} (\int_{0}^{T} f {(t)}^{2} d t) (\int_{0}^{T} {\bar{f (t)}}^{2} d t) .

Corollary J.1

When f(t) is real, the above reduces to

E [{(\int_{0}^{T} f (t) d B_{t})}^{4}] = 3 σ^{4} {(\int_{0}^{T} f {(t)}^{2} d t)}^{2} .

Proposition J.3

Let f(t), g(t) be bounded and continuous complex deterministic functions on [0, T], and let B_t be a Brownian motion with variance σ². Then, for a fixed non-random time T, we have

E [(\int_{0}^{T} f (t) d B_{t}) (\int_{0}^{T} \bar{f (t)} d B_{t}) (\int_{0}^{T} g (t) d B_{t}) (\int_{0}^{T} \bar{g (t)} d B_{t})] = σ^{4} [(\int_{0}^{T} f (t) g (t) d t) (\int_{0}^{T} \bar{f (t) g (t)} d t) + (\int_{0}^{T} f (t) \bar{g (t)} d t) (\int_{0}^{T} \bar{f (t)} g (t) d t) + (\int_{0}^{T} | f (t) |^{2} d t) (\int_{0}^{T} | g (t) |^{2} d t)] .

Contributor Information

Matthew Hirn, Department of Computational Mathematics, Science and Engineering, Department of Mathematics and Center for Quantum Computing, Science and Engineering, Michigan State University, East Lansing, MI 48824.

Anna Little, Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824.

References

1.Abbe E, Bendory T, Leeb W, Pereira JM, Sharon N & Singer A (2018) Multireference alignment is easier with an aperiodic translation distribution. IEEE Trans. Inf. Theory, 65, 3565–3584. [Google Scholar]
2.Aizenbud Y, Landa B & Shkolnisky Y (2019) Rank-one multi-reference factor analysis. arXiv preprint arXiv:1905.12442. [Google Scholar]
3.Bai X. c., Rajendra E, Yang G, Shi Y & Scheres SHW (2015) Sampling the conformational space of the catalytic subunit of human γ-secretase. eLife, 4, e11182. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Bandeira A, Chen Y, Lederman RR & Singer A (2020) Non-unique games over compact groups and orientation estimation in cryo-EM. Inverse Probl. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Bandeira A, Rigollet P & Weed J (2017) Optimal rates of estimation for multi-reference alignment. arXiv preprint at arXiv:1702.08546. [Google Scholar]
6.Bandeira AS (2015) Synchronization problems and alignment. Topics in Mathematics of Data Science Lecture Notes. Cambridge, MA: Massachusetts Institute of Technology. [Google Scholar]
7.Bandeira AS, Blum-Smith B, Kileel J, Perry A, Weed J & Wein AS (2017) Estimation under group actions: recovering orbits from invariants. arXiv preprint arXiv:1712.10163. [Google Scholar]
8.Bandeira AS, Boumal N & Singer A (2017) Tightness of the maximum likelihood semidefinite relaxation for angular synchronization. Math. Programming, 163, 145–167. [Google Scholar]
9.Bandeira AS, Boumal N & Voroninski V (2016) On the low-rank approach for semidefinite programs arising in synchronization and community detection. Conf. Learn. Theory, 49, 361–382. [Google Scholar]
10.Bandeira AS, Charikar M, Singer A & Zhu A (2014) Multireference alignment using semidefinite programming. Proceedings of the 5th Conference on Innovations in Theoretical Computer Science. ACM, pp. 459–470. [Google Scholar]
11.Bartesaghi A, Merk A, Banerjee S, Matthies D, Wu X, Milne JLS & Subramaniam S (2015) 2.2 åresolution cryo-EM structure of β-galactosidase in complex with a cell-permeant inhibitor. Science, 348, 1147–1151. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Bendory T, Bartesaghi A & Singer A (March 2020) Single-particle cryo-electron microscopy: mathematical theory, computational challenges, and opportunities. IEEE Signal Process. Mag, 37, 58–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Bendory T, Boumal N, Leeb W, Levin E & Singer A (2019) Multi-target detection with application to cryo-electron microscopy. Inverse Probl. [Google Scholar]
14.Bendory T, Boumal N, Ma C, Zhao Z & Singer A (2017) Bispectrum inversion with application to multireference alignment. IEEE Trans. Signal Process, 66, 1037–1050. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Boumal N (2016) Nonconvex phase synchronization. SIAM J. Optim, 26, 2355–2377. [Google Scholar]
16.Boumal N, Bendory T, Lederman RR & Singer A (2018) Heterogeneous multireference alignment: a single pass approach. 2018 52nd Annual Conference on Information Sciences and Systems (CISS). IEEE, pp. 1–6. [Google Scholar]
17.Bowman GD & Poirier MG (2015) Post-translational modifications of histones that influence nucleosome dynamics. Chem. Rev, 115, 2274–2295. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Brown LG (1992) A survey of image registration techniques. ACM Computing Surv. (CSUR), 24, 325–376. [Google Scholar]
19.Bruna J & Mallat S (2018) Multiscale sparse microcanonical models. Math. Stat. Learn, 1, 257–315. [Google Scholar]
20.Buescu J & Paixão AC (2007) Eigenvalue distribution of positive definite kernels on unbounded domains. Integral Equ. Oper. Theory, 57, 19–41. [Google Scholar]
21.Buescu J, Paixao AC, Garcia F & Lourtie I (2004) Positive-definiteness, integral equations and fourier transforms. J. Integral Equ. Appl, 16, 33–52. [Google Scholar]
22.Capodiferro L, Cusani R, Jacovitti G & Vascotto M (1987) A correlation based technique for shift, scale, and rotation independent object identification. ICASSP’87: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 12. IEEE, pp. 221–224. [Google Scholar]
23.Chandran V & Elgar SL (1992) Position, rotation, and scale invariant recognition of images using higher-order spectra. ICASSP’92: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5. IEEE, pp. 213–216. [Google Scholar]
24.Chen Y & Candès EJ (2018) The projected power method: an efficient algorithm for joint alignment from pairwise differences. Comm. Pure Appl. Math, 71, 1648–1714. [Google Scholar]
25.Chen Y, Guibas LJ & Huang Q-X (2014) Near-optimal joint object matching via convex relaxation. Proceedings of the 31st International Conference on Machine Learning, vol. 32 of Proceedings of Machine Learning Research, pp. 100–108. [Google Scholar]
26.Cheng C, Jiang J & Sun Q (2017) Phaseless sampling and reconstruction of real-valued signals in shift-invariant spaces. J. Fourier Anal. Appl, 1–34. [Google Scholar]
27.Clerc M & Mallat S (2002) The texture gradient equation for recovering shape from texture. IEEE Trans. Pattern Anal. Mach. Intell, 24, 536–549. [Google Scholar]
28.Clerc M & Mallat S (2003) Estimating deformations of stationary processes. Ann. Stat, 31, 1772–1821. [Google Scholar]
29.Collis WB, White PR & Hammond JK (1998) Higher-order spectra: the bispectrum and trispectrum. Mech. Syst. Signal Process, 12, 375–394. [Google Scholar]
30.Dempster AP, Laird NM & Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J. Royal Stat. Soc. Ser. B (Methodological), 39, 1–22. [Google Scholar]
31.DesJarlais R & Tummino PJ (2016) Role of histone-modifying enzymes and their complexes in regulation of chromatin biology. Biochemistry, 55, 1584–1599. [DOI] [PubMed] [Google Scholar]
32.Diamond R (1992) On the multiple simultaneous superposition of molecular structures by rigid body transformations. Protein Sci, 1, 1279–1287. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Dvornek NC, Sigworth FJ & Tagare HD (2015) Subspaceem: a fast maximum-a-posteriori algorithm for cryo-EM single particle reconstruction. J. Struct. Biol, 190, 200–214. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Eickenberg M, Exarchakis G, Hirn M & Mallat S (2017) Solid harmonic wavelet scattering: predicting quantum molecular energy from invariant descriptors of 3D electronic densities. Adv. Neural Inf. Proc. Syst 30 (NIPS 2017), 6540–6549. [Google Scholar]
35.Eickenberg M, Exarchakis G, Hirn M, Mallat S & Thiry L (2018) Solid harmonic wavelet scattering for predictions of molecule properties. J. Chem. Phys, 148, 241732. [DOI] [PubMed] [Google Scholar]
36.Ekman D, Björklund AK, Frey-Skött J & Elofsson A (2005) Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. J. Mol. Biol, 348, 231–243. [DOI] [PubMed] [Google Scholar]
37.Fernandez-Leiro R, Conrad J, Scheres SHW & Lamers MH (2015) Cryo-EM structures of the E. coli replicative dna polymerase reveal its dynamic interactions with the DNA sliding clamp, exonuclease and t. eLife, 4, e11134. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Fischer N, Neumann P, Konevega AL, Bock LV, Ficner R, Rodnina MV & Stark H (2015) Structure of the E. coli ribosome–ef-tu complex at< 3 åresolution by c s-corrected cryo-EM. Nature, 520, 567–570. [DOI] [PubMed] [Google Scholar]
39.Forneris F, Wu J & Gros P (2012) The modular serine proteases of the complement cascade. Curr. Opin. Struct. Biol, 22, 333–341. [DOI] [PubMed] [Google Scholar]
40.Foroosh H, Zerubia JB & Berthod M (2002) Extension of phase correlation to subpixel registration. IEEE Trans. Image Process, 11, 188–200. [DOI] [PubMed] [Google Scholar]
41.Frank J (2006) Three-Dimensional Electron Microscopy of Macromolecular Assemblies: Visualization of Biological Molecules in Their Native State. Oxford, United Kingdom: Oxford University Press. [Google Scholar]
42.Gao F, Wolf G & Hirn M (2019) Geometric scattering for graph data analysis. Proceedings of the 36th International Conference on Machine Learning, PMLR, vol. 97, pp. 2122–2131. [Google Scholar]
43.Gil-Pita R, Rosa-Zurera M, Jarabo-Amores P & López-Ferreras F (2005) Using multilayer perceptrons to align high range resolution radar signals. International Conference on Artificial Neural Networks. Springer, pp. 911–916. [Google Scholar]
44.Hansen LP (1982) Large sample properties of generalized method of moments estimators. Econometrica, 50, 1029–1054. [Google Scholar]
45.Hirn M, Mallat S & Poilvert N (2017) Wavelet scattering regression of quantum chemical energies. Multiscale Model. Simul, 15, 827–863. arXiv:1605.04654. [Google Scholar]
46.Hotta K, Mishima T & Kurita T (2001) Scale invariant face detection and classification method using shift invariant features extracted from log-polar image. IEICE Trans. Inf. Syst, 84, 867–878. [Google Scholar]
47.Hudson S & Psaltis D (1993) Correlation filters for aircraft identification from radar range profiles. IEEE Trans. Aerosp. Electron. Syst, 29, 741–748. [Google Scholar]
48.Kam Z (1980) The reconstruction of structure from electron micrographs of randomly oriented particles. Journal of Theoretical Biology, 82, 15–39. [DOI] [PubMed] [Google Scholar]
49.Klebaner FC (2012) Introduction to Stochastic Calculus With Applications. Singapore: World Scientific Publishing Company. [Google Scholar]
50.Landa B & Shkolnisky Y (2019) Multi-reference factor analysis: low-rank covariance estimation under unknown translations. arXiv preprint arXiv:1906.00211. [Google Scholar]
51.Leggett RM, Heavens D, Caccamo M, Clark MD & Davey RP (2015) Nanook: multi-reference alignment analysis of nanopore sequencing data, quality and error profiles. Bioinformatics, 32, 142–144. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Levitt M (2009) Nature of the protein universe. Proc. Natl. Acad. Sci, 106, 11079–11084. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Lim WA (2002) The modular logic of signaling proteins: building allosteric switches from simple binding domains. Curr. Opin. Struct. Biol, 12, 61–68. [DOI] [PubMed] [Google Scholar]
54.Ma C, Bendory T, Boumal N, Sigworth F & Singer A (2019) Heterogeneous multireference alignment for images with application to 2d classification in single particle reconstruction. IEEE Trans. Image Process, 29, 1699–1710. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Mallat S (2008) A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way, 3rd edn. Cambridge, MA: Academic Press. [Google Scholar]
56.Mallat S (2012) Group invariant scattering. Comm. Pure Appl. Math, 65: 1331–1398. [Google Scholar]
57.Martinec D & Pajdla T Robust rotation and translation estimation in multiview reconstruction. 2007 IEEE Conference on Computer Vision and Pattern Recognition, vol. 2007. IEEE, pp. 1–8. [Google Scholar]
58.McGinty RK & Tan S (2016) Recognition of the nucleosome by chromatin factors and enzymes. Curr. Opin. Struct. Biol, 37, 54–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Merk A, Bartesaghi A, Banerjee S, Falconieri V, Rao P, Davis MI, Pragani R, Boxer MB, Earl LA, Milne JLS, et al. (2016) Breaking cryo-EM resolution barriers to facilitate drug discovery. Cell, 165, 1698–1707. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Meynard A & Torrésani B (2018) Spectral analysis for nonstationary audio. IEEE/ACM Trans. Audio Speech Lang. Process, 26, 2371–2380. [Google Scholar]
61.Omer H & Torrésani B (2013) Estimation of frequency modulations on wideband signals; applications to audio signal analysis. 10th International Conference on Sampling Theory and Applications, Bremen, Germany, 2013, 29–32. [Google Scholar]
62.Omer H & Torrésani B (2017) Time-frequency and time-scale analysis of deformed stationary processes, with application to non-stationary sound modeling. Appl. Comput. Harmon. Anal, 43, 1–22. [Google Scholar]
63.Palamini M, Canciani A & Forneris F (2016) Identifying and visualizing macromolecular flexibility in structural biology. Front. Mol. Biosci, 3, 47. [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Park W & Chirikjian GS (2014) An assembly automation approach to alignment of noncircular projections in electron microscopy. IEEE Trans. Automat. Sci. Eng, 11, 668–679. [Google Scholar]
65.Park W, Midgett CR, Madden DR & Chirikjian GS (2011) A stochastic kinematic model of class averaging in single-particle electron microscopy. Int. J. Robot. Res, 30, 730–754. [DOI] [PMC free article] [PubMed] [Google Scholar]
66.Perry A, Weed J, Bandeira A, Rigollet P & Singer A (2017) The sample complexity of multi-reference alignment. SIAM J. Math. Data Sci, 1, 497–517. [Google Scholar]
67.Perry A, Wein AS, Bandeira AS & Moitra A (2018) Message-passing algorithms for synchronization problems over compact groups. Comm. Pure Appl. Math, 71, 2275–2322. [Google Scholar]
68.Punjani A, Rubinstein JL, Fleet DJ & Brubaker MA (2017) Cryosparc: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods, 14, 290. [DOI] [PubMed] [Google Scholar]
69.Robinson D, Farsiu S & Milanfar P (2007) Optimal registration of aliased images using variable projection with applications to super-resolution. Comput. J, 52, 31–42. [Google Scholar]
70.Sadler BM & Giannakis GB (1992) Shift-and rotation-invariant object reconstruction using the bispectrum. JOSA A, 9, 57–69. [Google Scholar]
71.Sjors HW, Scheres MV, Nuñez R, Sorzano COS, Marabini R, Herman GT & Carazo J-M (2005) Maximum-likelihood multi-reference refinement for electron microscopy images. J. Mol. Biol, 348, 139–149. [DOI] [PubMed] [Google Scholar]
72.Sharon N, Kileel J, Khoo Y, Landa B & Singer A (2020) Method of moments for 3-D single particle ab initio modeling with non-uniform distribution of viewing angles. Inverse Probl, 36, 044003. [Google Scholar]
73.Singer A (2011) Angular synchronization by eigenvectors and semidefinite programming. Appl. Comput. Harmon. Anal, 30, 20–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
74.Singer A (2018) Mathematics for cryo-electron microscopy. Proceedings of the International Congress of Mathematicians, vol. 4. Rio de Janeiro, pp. 4013–4032. [Google Scholar]
75.Sirohi D, Chen Z, Sun L, Klose T, Pierson TC, Rossmann MG & Kuhn RJ (2016) The 3.8 åresolution cryo-EM structure of Zika virus. Science, 352, 467–470. [DOI] [PMC free article] [PubMed] [Google Scholar]
76.Sonday B, Singer A & Kevrekidis IG (2013) Noisy dynamic simulations in the presence of symmetry: data alignment and model reduction. Comput. Math. Appl, 65, 1535–1557. [Google Scholar]
77.Sorzano COS, Bilbao-Castro JR, Shkolnisky Y, Alcorlo M, Melero R, Caffarena-Fernández G, Li M, Xu G, Marabini R & Carazo JM (2010) A clustering approach to multireference alignment of single-particle projections in electron microscopy. J. Struct. Biol, 171, 197–206. [DOI] [PMC free article] [PubMed] [Google Scholar]
78.Sun W (2017) Phaseless sampling and linear reconstruction of functions in spline spaces. arXiv preprint arXiv:1709.04779. [Google Scholar]
79.Theobald DL & Steindel PA (2012) Optimal simultaneous superpositioning of multiple structures with missing data. Bioinformatics, 28, 1972–1979. [DOI] [PMC free article] [PubMed] [Google Scholar]
80.Tsatsanis MK & Giannakis GB (1990) Translation, rotation, and scaling invariant object and texture classification using polyspectra. Advanced Signal Processing Algorithms, Architectures, and Implementations, vol. 1348. Bellingham, WA: International Society for Optics and Photonics, pp. 103–115. [Google Scholar]
81.Villarreal SA & Stewart PL (2014) Cryo-em and image sorting for flexible protein/dna complexes. J. Struct. Biol, 187, 76–83. [DOI] [PubMed] [Google Scholar]
82.Wein AS (2018) Statistical estimation in the presence of group actions. Ph.D. Thesis, Massachusetts Institute of Technology. [Google Scholar]
83.Winkler J & Niranjan M (2002) Uncertainty in Geometric Computations, vol. 704. Berlin, Germany: Springer Science & Business Media. [Google Scholar]
84.Zhong Y & Boumal N (2018) Near-optimal bounds for phase synchronization. SIAM J. Optim, 28, 989–1016. [Google Scholar]
85.Zwart JP, van der Heiden R, Gelsema S & Groen F (2003) Fast translation invariant classification of HRR range profiles in a zero phase representation. IEE Proc. Radar Sonar Nav, 150, 411–418. [Google Scholar]

[R1] 1.Abbe E, Bendory T, Leeb W, Pereira JM, Sharon N & Singer A (2018) Multireference alignment is easier with an aperiodic translation distribution. IEEE Trans. Inf. Theory, 65, 3565–3584. [Google Scholar]

[R2] 2.Aizenbud Y, Landa B & Shkolnisky Y (2019) Rank-one multi-reference factor analysis. arXiv preprint arXiv:1905.12442. [Google Scholar]

[R3] 3.Bai X. c., Rajendra E, Yang G, Shi Y & Scheres SHW (2015) Sampling the conformational space of the catalytic subunit of human γ-secretase. eLife, 4, e11182. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Bandeira A, Chen Y, Lederman RR & Singer A (2020) Non-unique games over compact groups and orientation estimation in cryo-EM. Inverse Probl. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Bandeira A, Rigollet P & Weed J (2017) Optimal rates of estimation for multi-reference alignment. arXiv preprint at arXiv:1702.08546. [Google Scholar]

[R6] 6.Bandeira AS (2015) Synchronization problems and alignment. Topics in Mathematics of Data Science Lecture Notes. Cambridge, MA: Massachusetts Institute of Technology. [Google Scholar]

[R7] 7.Bandeira AS, Blum-Smith B, Kileel J, Perry A, Weed J & Wein AS (2017) Estimation under group actions: recovering orbits from invariants. arXiv preprint arXiv:1712.10163. [Google Scholar]

[R8] 8.Bandeira AS, Boumal N & Singer A (2017) Tightness of the maximum likelihood semidefinite relaxation for angular synchronization. Math. Programming, 163, 145–167. [Google Scholar]

[R9] 9.Bandeira AS, Boumal N & Voroninski V (2016) On the low-rank approach for semidefinite programs arising in synchronization and community detection. Conf. Learn. Theory, 49, 361–382. [Google Scholar]

[R10] 10.Bandeira AS, Charikar M, Singer A & Zhu A (2014) Multireference alignment using semidefinite programming. Proceedings of the 5th Conference on Innovations in Theoretical Computer Science. ACM, pp. 459–470. [Google Scholar]

[R11] 11.Bartesaghi A, Merk A, Banerjee S, Matthies D, Wu X, Milne JLS & Subramaniam S (2015) 2.2 åresolution cryo-EM structure of β-galactosidase in complex with a cell-permeant inhibitor. Science, 348, 1147–1151. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Bendory T, Bartesaghi A & Singer A (March 2020) Single-particle cryo-electron microscopy: mathematical theory, computational challenges, and opportunities. IEEE Signal Process. Mag, 37, 58–76. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Bendory T, Boumal N, Leeb W, Levin E & Singer A (2019) Multi-target detection with application to cryo-electron microscopy. Inverse Probl. [Google Scholar]

[R14] 14.Bendory T, Boumal N, Ma C, Zhao Z & Singer A (2017) Bispectrum inversion with application to multireference alignment. IEEE Trans. Signal Process, 66, 1037–1050. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Boumal N (2016) Nonconvex phase synchronization. SIAM J. Optim, 26, 2355–2377. [Google Scholar]

[R16] 16.Boumal N, Bendory T, Lederman RR & Singer A (2018) Heterogeneous multireference alignment: a single pass approach. 2018 52nd Annual Conference on Information Sciences and Systems (CISS). IEEE, pp. 1–6. [Google Scholar]

[R17] 17.Bowman GD & Poirier MG (2015) Post-translational modifications of histones that influence nucleosome dynamics. Chem. Rev, 115, 2274–2295. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Brown LG (1992) A survey of image registration techniques. ACM Computing Surv. (CSUR), 24, 325–376. [Google Scholar]

[R19] 19.Bruna J & Mallat S (2018) Multiscale sparse microcanonical models. Math. Stat. Learn, 1, 257–315. [Google Scholar]

[R20] 20.Buescu J & Paixão AC (2007) Eigenvalue distribution of positive definite kernels on unbounded domains. Integral Equ. Oper. Theory, 57, 19–41. [Google Scholar]

[R21] 21.Buescu J, Paixao AC, Garcia F & Lourtie I (2004) Positive-definiteness, integral equations and fourier transforms. J. Integral Equ. Appl, 16, 33–52. [Google Scholar]

[R22] 22.Capodiferro L, Cusani R, Jacovitti G & Vascotto M (1987) A correlation based technique for shift, scale, and rotation independent object identification. ICASSP’87: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 12. IEEE, pp. 221–224. [Google Scholar]

[R23] 23.Chandran V & Elgar SL (1992) Position, rotation, and scale invariant recognition of images using higher-order spectra. ICASSP’92: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5. IEEE, pp. 213–216. [Google Scholar]

[R24] 24.Chen Y & Candès EJ (2018) The projected power method: an efficient algorithm for joint alignment from pairwise differences. Comm. Pure Appl. Math, 71, 1648–1714. [Google Scholar]

[R25] 25.Chen Y, Guibas LJ & Huang Q-X (2014) Near-optimal joint object matching via convex relaxation. Proceedings of the 31st International Conference on Machine Learning, vol. 32 of Proceedings of Machine Learning Research, pp. 100–108. [Google Scholar]

[R26] 26.Cheng C, Jiang J & Sun Q (2017) Phaseless sampling and reconstruction of real-valued signals in shift-invariant spaces. J. Fourier Anal. Appl, 1–34. [Google Scholar]

[R27] 27.Clerc M & Mallat S (2002) The texture gradient equation for recovering shape from texture. IEEE Trans. Pattern Anal. Mach. Intell, 24, 536–549. [Google Scholar]

[R28] 28.Clerc M & Mallat S (2003) Estimating deformations of stationary processes. Ann. Stat, 31, 1772–1821. [Google Scholar]

[R29] 29.Collis WB, White PR & Hammond JK (1998) Higher-order spectra: the bispectrum and trispectrum. Mech. Syst. Signal Process, 12, 375–394. [Google Scholar]

[R30] 30.Dempster AP, Laird NM & Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J. Royal Stat. Soc. Ser. B (Methodological), 39, 1–22. [Google Scholar]

[R31] 31.DesJarlais R & Tummino PJ (2016) Role of histone-modifying enzymes and their complexes in regulation of chromatin biology. Biochemistry, 55, 1584–1599. [DOI] [PubMed] [Google Scholar]

[R32] 32.Diamond R (1992) On the multiple simultaneous superposition of molecular structures by rigid body transformations. Protein Sci, 1, 1279–1287. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Dvornek NC, Sigworth FJ & Tagare HD (2015) Subspaceem: a fast maximum-a-posteriori algorithm for cryo-EM single particle reconstruction. J. Struct. Biol, 190, 200–214. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Eickenberg M, Exarchakis G, Hirn M & Mallat S (2017) Solid harmonic wavelet scattering: predicting quantum molecular energy from invariant descriptors of 3D electronic densities. Adv. Neural Inf. Proc. Syst 30 (NIPS 2017), 6540–6549. [Google Scholar]

[R35] 35.Eickenberg M, Exarchakis G, Hirn M, Mallat S & Thiry L (2018) Solid harmonic wavelet scattering for predictions of molecule properties. J. Chem. Phys, 148, 241732. [DOI] [PubMed] [Google Scholar]

[R36] 36.Ekman D, Björklund AK, Frey-Skött J & Elofsson A (2005) Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. J. Mol. Biol, 348, 231–243. [DOI] [PubMed] [Google Scholar]

[R37] 37.Fernandez-Leiro R, Conrad J, Scheres SHW & Lamers MH (2015) Cryo-EM structures of the E. coli replicative dna polymerase reveal its dynamic interactions with the DNA sliding clamp, exonuclease and t. eLife, 4, e11134. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Fischer N, Neumann P, Konevega AL, Bock LV, Ficner R, Rodnina MV & Stark H (2015) Structure of the E. coli ribosome–ef-tu complex at< 3 åresolution by c s-corrected cryo-EM. Nature, 520, 567–570. [DOI] [PubMed] [Google Scholar]

[R39] 39.Forneris F, Wu J & Gros P (2012) The modular serine proteases of the complement cascade. Curr. Opin. Struct. Biol, 22, 333–341. [DOI] [PubMed] [Google Scholar]

[R40] 40.Foroosh H, Zerubia JB & Berthod M (2002) Extension of phase correlation to subpixel registration. IEEE Trans. Image Process, 11, 188–200. [DOI] [PubMed] [Google Scholar]

[R41] 41.Frank J (2006) Three-Dimensional Electron Microscopy of Macromolecular Assemblies: Visualization of Biological Molecules in Their Native State. Oxford, United Kingdom: Oxford University Press. [Google Scholar]

[R42] 42.Gao F, Wolf G & Hirn M (2019) Geometric scattering for graph data analysis. Proceedings of the 36th International Conference on Machine Learning, PMLR, vol. 97, pp. 2122–2131. [Google Scholar]

[R43] 43.Gil-Pita R, Rosa-Zurera M, Jarabo-Amores P & López-Ferreras F (2005) Using multilayer perceptrons to align high range resolution radar signals. International Conference on Artificial Neural Networks. Springer, pp. 911–916. [Google Scholar]

[R44] 44.Hansen LP (1982) Large sample properties of generalized method of moments estimators. Econometrica, 50, 1029–1054. [Google Scholar]

[R45] 45.Hirn M, Mallat S & Poilvert N (2017) Wavelet scattering regression of quantum chemical energies. Multiscale Model. Simul, 15, 827–863. arXiv:1605.04654. [Google Scholar]

[R46] 46.Hotta K, Mishima T & Kurita T (2001) Scale invariant face detection and classification method using shift invariant features extracted from log-polar image. IEICE Trans. Inf. Syst, 84, 867–878. [Google Scholar]

[R47] 47.Hudson S & Psaltis D (1993) Correlation filters for aircraft identification from radar range profiles. IEEE Trans. Aerosp. Electron. Syst, 29, 741–748. [Google Scholar]

[R48] 48.Kam Z (1980) The reconstruction of structure from electron micrographs of randomly oriented particles. Journal of Theoretical Biology, 82, 15–39. [DOI] [PubMed] [Google Scholar]

[R49] 49.Klebaner FC (2012) Introduction to Stochastic Calculus With Applications. Singapore: World Scientific Publishing Company. [Google Scholar]

[R50] 50.Landa B & Shkolnisky Y (2019) Multi-reference factor analysis: low-rank covariance estimation under unknown translations. arXiv preprint arXiv:1906.00211. [Google Scholar]

[R51] 51.Leggett RM, Heavens D, Caccamo M, Clark MD & Davey RP (2015) Nanook: multi-reference alignment analysis of nanopore sequencing data, quality and error profiles. Bioinformatics, 32, 142–144. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] 52.Levitt M (2009) Nature of the protein universe. Proc. Natl. Acad. Sci, 106, 11079–11084. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] 53.Lim WA (2002) The modular logic of signaling proteins: building allosteric switches from simple binding domains. Curr. Opin. Struct. Biol, 12, 61–68. [DOI] [PubMed] [Google Scholar]

[R54] 54.Ma C, Bendory T, Boumal N, Sigworth F & Singer A (2019) Heterogeneous multireference alignment for images with application to 2d classification in single particle reconstruction. IEEE Trans. Image Process, 29, 1699–1710. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] 55.Mallat S (2008) A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way, 3rd edn. Cambridge, MA: Academic Press. [Google Scholar]

[R56] 56.Mallat S (2012) Group invariant scattering. Comm. Pure Appl. Math, 65: 1331–1398. [Google Scholar]

[R57] 57.Martinec D & Pajdla T Robust rotation and translation estimation in multiview reconstruction. 2007 IEEE Conference on Computer Vision and Pattern Recognition, vol. 2007. IEEE, pp. 1–8. [Google Scholar]

[R58] 58.McGinty RK & Tan S (2016) Recognition of the nucleosome by chromatin factors and enzymes. Curr. Opin. Struct. Biol, 37, 54–61. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R59] 59.Merk A, Bartesaghi A, Banerjee S, Falconieri V, Rao P, Davis MI, Pragani R, Boxer MB, Earl LA, Milne JLS, et al. (2016) Breaking cryo-EM resolution barriers to facilitate drug discovery. Cell, 165, 1698–1707. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R60] 60.Meynard A & Torrésani B (2018) Spectral analysis for nonstationary audio. IEEE/ACM Trans. Audio Speech Lang. Process, 26, 2371–2380. [Google Scholar]

[R61] 61.Omer H & Torrésani B (2013) Estimation of frequency modulations on wideband signals; applications to audio signal analysis. 10th International Conference on Sampling Theory and Applications, Bremen, Germany, 2013, 29–32. [Google Scholar]

[R62] 62.Omer H & Torrésani B (2017) Time-frequency and time-scale analysis of deformed stationary processes, with application to non-stationary sound modeling. Appl. Comput. Harmon. Anal, 43, 1–22. [Google Scholar]

[R63] 63.Palamini M, Canciani A & Forneris F (2016) Identifying and visualizing macromolecular flexibility in structural biology. Front. Mol. Biosci, 3, 47. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R64] 64.Park W & Chirikjian GS (2014) An assembly automation approach to alignment of noncircular projections in electron microscopy. IEEE Trans. Automat. Sci. Eng, 11, 668–679. [Google Scholar]

[R65] 65.Park W, Midgett CR, Madden DR & Chirikjian GS (2011) A stochastic kinematic model of class averaging in single-particle electron microscopy. Int. J. Robot. Res, 30, 730–754. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R66] 66.Perry A, Weed J, Bandeira A, Rigollet P & Singer A (2017) The sample complexity of multi-reference alignment. SIAM J. Math. Data Sci, 1, 497–517. [Google Scholar]

[R67] 67.Perry A, Wein AS, Bandeira AS & Moitra A (2018) Message-passing algorithms for synchronization problems over compact groups. Comm. Pure Appl. Math, 71, 2275–2322. [Google Scholar]

[R68] 68.Punjani A, Rubinstein JL, Fleet DJ & Brubaker MA (2017) Cryosparc: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods, 14, 290. [DOI] [PubMed] [Google Scholar]

[R69] 69.Robinson D, Farsiu S & Milanfar P (2007) Optimal registration of aliased images using variable projection with applications to super-resolution. Comput. J, 52, 31–42. [Google Scholar]

[R70] 70.Sadler BM & Giannakis GB (1992) Shift-and rotation-invariant object reconstruction using the bispectrum. JOSA A, 9, 57–69. [Google Scholar]

[R71] 71.Sjors HW, Scheres MV, Nuñez R, Sorzano COS, Marabini R, Herman GT & Carazo J-M (2005) Maximum-likelihood multi-reference refinement for electron microscopy images. J. Mol. Biol, 348, 139–149. [DOI] [PubMed] [Google Scholar]

[R72] 72.Sharon N, Kileel J, Khoo Y, Landa B & Singer A (2020) Method of moments for 3-D single particle ab initio modeling with non-uniform distribution of viewing angles. Inverse Probl, 36, 044003. [Google Scholar]

[R73] 73.Singer A (2011) Angular synchronization by eigenvectors and semidefinite programming. Appl. Comput. Harmon. Anal, 30, 20–36. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R74] 74.Singer A (2018) Mathematics for cryo-electron microscopy. Proceedings of the International Congress of Mathematicians, vol. 4. Rio de Janeiro, pp. 4013–4032. [Google Scholar]

[R75] 75.Sirohi D, Chen Z, Sun L, Klose T, Pierson TC, Rossmann MG & Kuhn RJ (2016) The 3.8 åresolution cryo-EM structure of Zika virus. Science, 352, 467–470. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R76] 76.Sonday B, Singer A & Kevrekidis IG (2013) Noisy dynamic simulations in the presence of symmetry: data alignment and model reduction. Comput. Math. Appl, 65, 1535–1557. [Google Scholar]

[R77] 77.Sorzano COS, Bilbao-Castro JR, Shkolnisky Y, Alcorlo M, Melero R, Caffarena-Fernández G, Li M, Xu G, Marabini R & Carazo JM (2010) A clustering approach to multireference alignment of single-particle projections in electron microscopy. J. Struct. Biol, 171, 197–206. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R78] 78.Sun W (2017) Phaseless sampling and linear reconstruction of functions in spline spaces. arXiv preprint arXiv:1709.04779. [Google Scholar]

[R79] 79.Theobald DL & Steindel PA (2012) Optimal simultaneous superpositioning of multiple structures with missing data. Bioinformatics, 28, 1972–1979. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R80] 80.Tsatsanis MK & Giannakis GB (1990) Translation, rotation, and scaling invariant object and texture classification using polyspectra. Advanced Signal Processing Algorithms, Architectures, and Implementations, vol. 1348. Bellingham, WA: International Society for Optics and Photonics, pp. 103–115. [Google Scholar]

[R81] 81.Villarreal SA & Stewart PL (2014) Cryo-em and image sorting for flexible protein/dna complexes. J. Struct. Biol, 187, 76–83. [DOI] [PubMed] [Google Scholar]

[R82] 82.Wein AS (2018) Statistical estimation in the presence of group actions. Ph.D. Thesis, Massachusetts Institute of Technology. [Google Scholar]

[R83] 83.Winkler J & Niranjan M (2002) Uncertainty in Geometric Computations, vol. 704. Berlin, Germany: Springer Science & Business Media. [Google Scholar]

[R84] 84.Zhong Y & Boumal N (2018) Near-optimal bounds for phase synchronization. SIAM J. Optim, 28, 989–1016. [Google Scholar]

[R85] 85.Zwart JP, van der Heiden R, Gelsema S & Groen F (2003) Fast translation invariant classification of HRR range profiles in a zero phase representation. IEE Proc. Radar Sonar Nav, 150, 411–418. [Google Scholar]

PERMALINK

Wavelet invariants for statistically robust multi-reference alignment

Matthew Hirn

Anna Little

Abstract

1. Introduction

Model 1 (Classic MRA).

Fig. 1.

1.1. Notation

2. MRA models and the method of invariants

2.1. MRA data models

Model 2 (Noisy dilation MRA data model).

Remark 2.1

Model 3 (Dilation MRA data model).

2.2. Method of invariants

2.2.1. Motivation and related work

2.2.2. Wavelet invariants

Definition 2.1 (Wavelet invariants).

Definition 2.2 (Wavelet invariant derivatives).

Remark 2.2

Remark 2.3

Remark 2.4

Condition 2.3

Remark 2.5

Theorem 2.4

Proof.

Lemma 2.1

Proposition 2.1

Remark 2.6

3. Unbiasing for classic MRA

Proposition 3.1

Proposition 3.2

Fig. 2.

4. Unbiasing for dilation MRA

Remark 4.1

Lemma 4.1

Lemma 4.2

4.1. Power spectrum results for dilation MRA

Proposition 4.1

Proof.

4.2. Wavelet invariant results for dilation MRA

Lemma 4.3 (Low-frequency bound).

Lemma 4.4 (High-frequency bound for differentiable functions).

Proposition 4.2

Proof.

Corollary 4.1

Remark 4.2

4.3. Comparison

Remark 4.3

Remark 4.4

Remark 4.5

4.4. Simulation results for dilation MRA

Fig. 3.

Fig. 4.

5. Noisy dilation MRA model

5.1. Wavelet inariant results for noisy dilation MRA

Proposition 5.1

Corollary 5.1

Proof of Proposition 5.1.

Lemma 5.1

Lemma 5.2

5.2. Simulation results for noisy dilation MRA

Fig. 5.

Fig. 6.

Remark 5.1

6. Numerical implementation

6.1. Signal generation and SNR

6.2. Empirical estimation of additive noise level

6.3. Empirical moment estimation for dilation MRA

Proposition 6.1

Proof.

Definition 6.1

6.4. Derivatives

6.5. Optimization

Remark 6.1

7. Conclusion

Acknowledgements

Funding

A. Wavelet admissibility conditions

Remark A.1