Sparse Recovery Beyond Compressed Sensing: Separable Nonlinear Inverse Problems

Brett Bernstein; Sheng Liu; Chrysa Papadaniil; Carlos Fernandez-Granda

doi:10.1109/tit.2020.2985015

. Author manuscript; available in PMC: 2021 Sep 1.

Published in final edited form as: IEEE Trans Inf Theory. 2020 Apr 1;66(9):5904–5926. doi: 10.1109/tit.2020.2985015

Sparse Recovery Beyond Compressed Sensing: Separable Nonlinear Inverse Problems

Brett Bernstein ¹, Sheng Liu ², Chrysa Papadaniil ³, Carlos Fernandez-Granda ⁴

PMCID: PMC7480821 NIHMSID: NIHMS1621482 PMID: 32921802

Abstract

Extracting information from nonlinear measurements is a fundamental challenge in data analysis. In this work, we consider separable inverse problems, where the data are modeled as a linear combination of functions that depend nonlinearly on certain parameters of interest. These parameters may represent neuronal activity in a human brain, frequencies of electromagnetic waves, fluorescent probes in a cell, or magnetic relaxation times of biological tissues. Separable nonlinear inverse problems can be reformulated as underdetermined sparse-recovery problems, and solved using convex programming. This approach has had empirical success in a variety of domains, from geophysics to medical imaging, but lacks a theoretical justification. In particular, compressed-sensing theory does not apply, because the measurement operators are deterministic and violate incoherence conditions such as the restricted-isometry property. Our main contribution is a theory for sparse recovery adapted to deterministic settings. We show that convex programming succeeds in recovering the parameters of interest, as long as their values are sufficiently distinct with respect to the correlation structure of the measurement operator. The theoretical results are illustrated through numerical experiments for two applications: heat-source localization and estimation of brain activity from electroencephalography data.

Keywords: Sparse recovery, convex programming, incoherence, correlated measurements, dual certificates, nonlinear inverse problems, source localization

I. Introduction

A. Separable Nonlinear Inverse Problems

The inverse problem of extracting information from nonlinear measurements is a fundamental challenge in many applied domains, including geophysics, microscopy, astronomy, medical imaging, and signal processing. In this work, we focus on separable nonlinear (SNL) problems [1], [2], where the data are modeled as samples from a linear combination of functions that depend nonlinearly on certain quantities of interest. Depending on the application, these quantities may represent neuronal activity in a human brain, oscillation frequencies of electromagnetic waves, locations of fluorescent probes in a cell, magnetic-resonance relaxation times of biological tissues, or positions of celestial objects in the sky.

Mathematically, the goal in an SNL problem is to estimate k parameters $θ_{1}, \dots, θ_{k} \in R^{p}$ from samples of a function

f (t) : = \sum_{i = 1}^{k} c_{i} φ_{t} (θ_{i}),

(I.1)

where $c_{1}, \dots, c_{k} \in R$ are unknown coefficients. The dependence between each component and the corresponding parameter at a particular value of t is governed by a nonlinear map $φ_{t} : R^{p} \to R$ . For simplicity of exposition, we assume that t is one-dimensional and f is a real-valued function, but the framework can be directly extended to multidimensional and complex-valued measurements. The data are samples of f at n locations $s_{1}, \dots, s_{n} \in R$

y : = [\begin{matrix} f (s_{1}) \\ ⋮ \\ f (s_{n}) \end{matrix}] = \sum_{i = 1}^{k} c_{i} \vec{φ} (θ_{i}),

(I.2)

where $\vec{φ} {(θ_{i})}_{j} ≔ φ_{s_{j}} (θ_{i})$ , 1 ≤ j ≤ n. Each $\vec{φ} (θ_{i}) \in R^{n}$ is a feature vector associated to one of the parameters θ_i. Without loss of generality, we assume that the feature vectors are normalized, i.e. ${∥ \vec{φ} (θ) ∥}_{2} = 1$ for all $θ \in R^{p}$ . The following examples illustrate the importance of SNL problems in a range of applications.

Deconvolution of point sources: Deconvolution consists of estimating a signal from samples of its convolution with a fixed kernel K. When the signal is modeled as a superposition of point sources or spikes, representing fluorescent probes in microscopy [4], [5], celestial bodies in astronomy [6] or interfaces between geological layers in seismography [7], this is an SNL problem where θ₁, …, θ_k are the locations of the spikes. In that case, φ_t (θ) is a shifted copy of the convolution kernel K (t − θ), as illustrated in the top row of Figure 1.
Spectral super-resolution: Super-resolving the spectrum of a multisinusoidal signal from samples taken over a short period of time is an important problem in communications, radar, and signal processing [8]. This is an SNL problem where φ_t (θ) is a complex exponential exp(−i2πθt) with frequency θ (see the second row of Figure 1).
Heat-source localization: Finding the position of several heat sources in a material with known conductivity from temperature measurements is an SNL problem where φ_t (θ) is the Green’s function of the heat equation parametrized by the location θ of a particular heat source [9]. The bottom row of Figure 1 shows an example (see Section IV-A for more details).
Estimation of neural activity: Electroencephalography measurements of the electric potential field on the surface of the head can be used to detect regions of focalized activity in the brain [10]. The data are well approximated by an SNL model where the parameters are the locations of these regions [11]. The function φ_t (θ) represents the potential at a specific location t on the scalp, which originates from neural activity at position θ in the brain. This function can be computed by solving the Poisson differential equation taking into account the geometry and electric properties of the head [12]. Figure 2 shows an example. See Section IV-B for more details.
Quantitative magnetic-resonance imaging: The magnetic-resonance relaxation times T₁ and T₂ of biological tissues govern the local fluctuations of the magnetic field measured by MR imaging systems [13]. MR fingerprinting is a technique to estimate these parameters by fitting an SNL model where each component corresponds to a different tissue [14]–[16]. In this case, the parameter $θ \in R^{2}$ encodes the values of T₁ and T₂ and the function φ_t (θ) can be computed by solving the Bloch differential equations [17].

Fig. 1: — Illustration of three inverse problems that can be modeled as SNL problems: deconvolution in reflection seismology (the convolution kernel is a Ricker wavelet [3]), super-resolution of spectral lines, and heat-source localization. The left column shows the continuous measurements *φ_t* for three parameters θ₁, θ₂, θ₃. The right column shows the data samples corresponding to an example where the coefficients are set to c ≔ (1, 2, 0.5). For super-resolution, only the real part of the data is shown.

Fig. 2: — Localization of brain-activity sources from EEG data is an SNL problem. The image in the top left corner shows the position of three sources in a human brain, situated in the occipital (θ₁), temporal (θ₂) and frontal (θ₃) lobes. The remaining three images show the EEG data corresponding to each of these sources, obtained from 256 sensors located on the surface of the head (see Section IV-B for more details).

B. Reformulation as a Sparse-Recovery Problem

A natural approach to estimate the parameters of an SNL model is to solve the nonlinear least-squares problem,

\underset{{\tilde{θ}}_{1}, …, {\tilde{θ}}_{k} \in R^{p}, \tilde{c} \in R^{k}}{minimize} | | y - \sum_{i = 1}^{k} {\tilde{c}}_{i} \vec{φ} ({\tilde{θ}}_{i}) | |_{2}^{2} .

(I.3)

Unfortunately, the resulting cost function is typically non-convex and has local minima, as illustrated by the simple example in Figure 3. Consequently, local-descent methods do not necessarily recover the true parameters, even in the absence of noise, and global optimization becomes intractable unless k is very small.

Fig. 3: — Nonlinear least-squares cost functions associated to deterministic SNL problems often have non-optimal local minima. The graph depicts the landscape of the nonlinear least squares cost function $L (θ) = \min_{\tilde{c} \in R} {∥ \vec{φ} (θ_{1}) - \tilde{c} \vec{φ} (θ) ∥}_{2}^{2}$ associated to a deconvolution problem, where the data is generated by convolving a Ricker wavelet [3] with a single spike located θ₁. In addition to the global minimum at θ₁ there are several spurious local minima.

Alternatively, we can reformulate the SNL problem as a sparse-recovery problem and leverage ℓ₁-norm minimization to solve it. This approach was pioneered in the 1970s by geophysicists working on spike deconvolution in the context of reflection seismology [18]–[22]. Since then, it has been applied to many SNL problems such as acoustic sensing [23], [24], radar [25], [26], electroencephalography (EEG) [27], [28], positron emission tomography (PET) [29]–[31], direction of arrival [32], [33], quantitative magnetic resonance imaging [15], [16], and source localization [9], [34], [35]. Our goal is to provide a theory of sparse recovery via convex optimization explaining the empirical success of this approach.

Let us represent the parameters $θ_{1}, \dots, θ_{k} \in R^{p}$ of an SNL model as a superposition of Dirac measures or spikes in $R^{p}$ , interpreted as a p-dimensional parameter space, (I.2),

μ ≔ \sum_{i = 1}^{k} c_{i} δ_{θ_{i}},

(I.4)

where δ_{θ_i} denotes a Dirac measure supported at θ_i. Intuitively, the atomic measure μ is a signal that encodes the parameters of interest and their corresponding coefficients. The data described by Equation (I.2) can now be expressed as

y = \int_{R^{p}} \vec{φ} (θ) μ (d θ) .

(I.5)

The SNL problem is equivalent to recovering μ from these linear measurements. The price we pay for linearizing is that the linear inverse problem is extremely underdetermined: y has dimension n, but μ lives in a continuous space of infinite dimensionality! To solve the problem, we need to exploit the assumption that the data only depends on a small number of parameters or, equivalently, that μ is sparse.

For an SNL problem to be well posed, θ₁, …, θ_k should be the only set of k or less parameters such that Equation (I.2) holds. In that case, μ is the solution to the sparse-recovery problem

\begin{matrix} \underset{\tilde{μ}}{minimize} & | support (\tilde{μ}) | \\ subject to & \int_{R^{p}} \vec{φ} (θ) (\tilde{μ}) (d θ) = y . \end{matrix}

(I.6)

where minimization occurs over the set of measures in $R^{p}$ . The cardinality of the support of an atomic measure is a nonconvex function, which is notoriously challenging to minimize. A fundamental insight underlying many approaches to sparse estimation in high-dimensional statistics and signal processing is that one can bypass this difficulty by replacing the nonconvex function with a convex counterpart. In particular, minimizing the ℓ₁ norm instead of the cardinality function has proven to be very effective in many applications.

In order to analyze the application of ℓ₁-norm minimization to SNL problems, we consider a continuous setting, where the optimization variable is a measure supported on a continuous domain. The goal is to obtain an analysis that is valid for arbitrarily fine discretizations of the domain. This is important because, as we will see below, a fine discretization results in a highly-correlated linear operator, which violates the usual assumptions made in the literature on sparse recovery.

In the case of measures supported on a continuous domain, the continuous counterpart of the ℓ₁ norm is the total-variation (TV) norm [36], [37]¹. Indeed, the TV norm of the atomic measure μ in Equation (I.4) equals the ℓ₁ norm of its coefficients ‖c₁‖₁. Just as the ℓ_∞ norm is the dual norm of the ℓ_∞ norm, the TV norm is defined by

‖ μ ‖_{TV} : = \sup_{f \in C, | | f | |_{\infty} \leq 1} | \int f d μ |,

(I.7)

where the supremum is taken over all continuous functions in the unit $L_{\infty}$ -norm ball. Replacing the cardinality function by this sparsity-promoting norm yields the following convex program

\begin{matrix} \underset{\tilde{μ}}{minimize} & ‖ \tilde{μ} ‖_{TV} \\ subject to & \int_{R^{p}} \vec{φ} (θ) \tilde{μ} (d θ) = y . \end{matrix}

(I.8)

The goal of this paper is to understand when the solution to Problem (I.8) exactly recovers the parameters of an SNL model.

C. Compressed Sensing

Section I-B shows that solving an SNL problem is equivalent to recovering a sparse signal from linear underdetermined measurements. This is reminiscent of compressed sensing [38]–[40]. In its most basic formulation, the goal of compressed sensing is to estimate a signal $x \in R^{m}$ with k nonzeros from linear measurements $y \in R^{n}$ given by y := Ax, where $A \in R^{n \times m}$ and m > n. Remarkably, exact recovery of x is still possible under certain conditions on the matrix A, even though the linear system is underdetermined.

Overdetermined linear inverse problems are said to be ill posed when the measurement matrix is ill conditioned. This occurs when there exist vectors that lie close to the null space of the matrix, or equivalently when a subset of its columns is highly correlated. Analogously, the compressed-sensing problem is ill posed if any sparse subset of columns is highly correlated, because this implies that sparse vectors lie close to the null space. Early works on compressed sensing derive recovery guarantees assuming a bound on the maximum correlation between the columns of the measurement matrix A (sometimes called incoherence). They prove that tractable algorithms such as ℓ₁-norm minimization and greedy techniques achieve exact recovery as long as the maximum correlation is of order n^−1/2 for sparsity levels k of up to order $\sqrt{n}$ [41]–[44], even if the data are corrupted by additive noise [45], [46]. These results were subsequently strengthened to sparsity levels of order n (up to logarithmic factors) [38], [47]–[50] under stricter assumptions on the conditioning of sparse subsets of columns in the measurement matrix, such as the restricted-isometry property [48] or the restricted-eigenvalue condition [49].

The question is whether compressed-sensing theory applies to SNL problems. Let us consider an SNL problem where the parameter space is discretized to yield a finite-dimensional version of the sparse-recovery problem described in Section I-B. The measurement model in Equation (I.5) can then be expressed as

y = \sum_{j = 1}^{m} \vec{φ} (η_{j}) x_{j}

(I.9)

= Φ x .

(I.10)

Φ is a measurement matrix whose columns correspond to the feature vectors of $η_{1}, \dots, η_{m} \in R^{p}$ , which denote the m points of the discretized parameter space. The signal $x \in R^{m}$ is the discrete version of μ in Equation (I.4): a sparse vector, such that x_i = c_j when η_j = θ_i for some i ∈ {1, …, k} and x_j = 0 otherwise. For compressed-sensing theory to apply here, the intercolumn correlations of Φ should be very low. Figure 4 shows the intercolumn correlations of a typical compressed-sensing matrix: each column is almost uncorrelated with every other column. Figures 5 and 6 show the correlation function $ρ_{θ} : R^{p} \to R$

ρ_{θ} (η) ≔ 〈 \vec{φ} (θ), \vec{φ} (η) 〉,

(I.11)

for the different SNL problems discussed in Section I-A. The intercolumn correlations of the measurement matrix Φ in the corresponding discretized SNL problems are given by samples of ρ_{η_i} at the locations of the remaining grid points η_j, i ≠ j. The contrast with the intercolumn structure of the compressed-sensing matrix is striking: nearby columns in all SNL measurement matrices are very highly correlated. This occurs under any reasonable discretization of the parameter space; discretizing very coarsely would result in inaccurate parameter estimates, defeating the whole point of solving the SNL problem.

Fig. 4: — Columns and intercolumn correlations for a typical compressed-sensing matrix $A \in R^{100 \times 300}$ with i.i.d. standard Gaussian entries. The notation *A_i* denotes the ith column of A. Note that $∣ A_{i}^{T} A_{j} ∣$ is small whenever i ≠ j.

Fig. 5: — Correlation structure of the measurement operators arising in deconvolution, super-resolution, and heat-source localization. The left column shows the discrete data $\vec{φ} (θ_{i})$ for three parameter values. The right column gives the absolute values of the corresponding correlation functions $ρ_{θ_{i}} (η) = \vec{φ} {(θ_{i})}^{T} \vec{φ} (η)$ .

Fig. 6: — Example of correlation functions for the electroencephalography brain-activity localization problem. Each column shows three views of the correlation function $ρ_{θ_{i}} (η) = \vec{φ} {(θ_{i})}^{T} \vec{φ} (η)$ , i = 1, 2, 3, corresponding to one of the neural-activity sources shown in Figure 2.

The difference between the correlation structure of the measurement matrix in compressed-sensing and SNL problems is not surprising. The entries of compressed-sensing matrices are random. As a result, small subsets of columns are almost uncorrelated with high probability. In contrast, matrices in discretized SNL problems arise from a deterministic model tied to an underlying continuous parameter space and to a function φ_t that is typically smooth. Since φ_t (θ) ≈ φ_t (θ′) when θ ≈ θ′, nearby columns are highly correlated. These matrices do not satisfy any of the properties of the conditioning of sparse submatrices commonly assumed in compressed sensing. In conclusion, the answer to our previous question is a resounding no: compressed-sensing theory does not apply to SNL problems.

D. Beyond Sparsity and Randomness: Separation and Correlation Decay

The fact that compressed-sensing theory does not apply to SNL problems involving deterministic measurements is not a theoretical artifact. Sparsity is not a strong enough condition to ensure that such SNL problems are well posed. If φ_t is smooth, which is usually the case in applications, the features $\vec{φ} (θ)$ corresponding to parameters that are clustered in the parameter space are highly correlated. This can be seen in the correlation plots of Figures 5 and 6. As a result, different sparse combinations of features may yield essentially the same data. For a detailed analysis of this issue in the context of super-resolution and deconvolution of point sources we refer the reader to Section 3.2 in [51] (see also [52] and [53]) and Section 2.1 in [54], respectively.

Additional assumptions beyond sparsity are necessary to establish recovery guarantees for SNL problems. At the very least, the features $\vec{φ} (θ_{1}), \dots, \vec{φ} (θ_{k})$ in the data cannot be too correlated. For arbitrary SNL problems it is challenging to define simple conditions to preclude this from happening. However, in most practical situations, SNL problems exhibit correlation decay, meaning that the correlation function ρ_θ defined in Equation (I.11) is bounded by a decaying function away from θ. This is a natural property: the more separated two parameters θ and θ′ are in the parameter space, the less correlated we expect their features $\vec{φ} (θ)$ and $\vec{φ} (θ^{'})$ to be. All the examples in Section I-A have correlation decay (see Figures 5 and 6).

For SNL problems with correlation decay there is a simple way of ensuring that the features corresponding to the true parameters θ₁, …, θ_k are not highly correlated: imposing a minimum separation between them in the parameter space. The main contribution of this paper is showing that this is in fact sufficient to guarantee that TV-norm minimization achieves exact recovery, under some additional conditions on the derivatives of the correlation function.

E. Organization

In Section II we propose a theoretical framework for the analysis of sparse estimation in the context of SNL inverse problems. We focus on the case p = 1 for simplicity, but our results can be extended to higher dimensions, as described in Section III-E. Our main results are Theorems II.4 and II.6, which establish exact-recovery results for SNL problems with correlation decay under a minimum separation on the true parameters. Section III contains the proof of these results, which are based on a novel dual-certificate construction. Section IV illustrates the theoretical results through numerical experiments for two applications: heat-source localization and estimation of brain activity from electroencephalography data.

II. Main Results

A. Correlation decay

In this section we formalize the notion of correlation decay by defining several conditions on the correlation function ρ_θ and on its derivatives. Throughout we assume that the problem is one dimensional (p ≔ 1).

To alleviate notation we define

ρ_{θ}^{(q, r)} (η) ≔ {\vec{φ}}^{(q)} {(θ)}^{T} {\vec{φ}}^{(r)} (η),

(II.1)

for q = 0, 1 and r = 0, 1, 2, where ${\vec{φ}}^{(q)}$ is the qth derivative of $\vec{φ}$ . Recall that we assume ${∥ \vec{φ} (θ) ∥}_{2}$ for all $θ \in R$ . This implies ρ_θ(θ) = 1 and $ρ_{θ}^{(1, 0)} (θ) = ρ_{θ}^{(0, 1)} (θ) = 0$ for all $θ \in R$ . Plots of these derivatives are shown in Figure 8 for the deconvolution, super-resolution, and heat-source localization problems. Our conditions take the form of bounds in different regions of the parameter space: a near region, an intermediate region, and a far region, as depicted in Figure 7.

Fig. 8: — Derivatives of the correlation functions arising in deconvolution, super-resolution, and heat-source localization. In all cases the correlation function is locally concave and all derivatives exhibit decaying tails. In the case of super-resolution the tail decay is not summable, but can be made summable by applying a window function to the measurements.

Fig. 7: — Illustration of the near, intermediate and decay conditions on the correlation function defined in Section II-A. This example shows a correlation function arising in deconvolution.

In the near region the correlation can be arbitrarily close to one, but is locally bounded by a quadratic function.

Condition II.1 (Near Condition). The correlation function ρ_θ satisfies the near condition if

ρ_{θ}^{(0, 2)} (η) \leq - γ_{0} and

(II.2)

| ρ_{θ}^{(1, 2)} (η) | \leq γ_{1} ‖ {\vec{φ}}^{(1)} (θ) ‖_{2}^{2}

(II.3)

hold for all η in [N⁻, N⁺], where N^± ≔ θ ± N, and γ₀, γ₁ are positive constants.

Equation (II.2) requires correlations to be concave locally, which is natural since the maximum of ρ_θ is attained at θ. Equation (II.3) is a regularity condition that requires $ρ_{θ}^{(0, 2)} (η)$ to vary smoothly as we change the center θ. The normalization quantity ${∥ {\vec{φ}}^{(1)} (θ) ∥}_{2}^{2}$ captures how sensitive the features $\vec{φ}$ are to perturbations. If this quantity is small for some i then we require more regularity from ρ_θ because θ is harder to distinguish from nearby points using the measurements $\vec{φ}$ .

In the intermediate region the correlation function ρ_θ is bounded but can otherwise fluctuate arbitrarily. In addition, we require a similar constraint on its derivative with respect to the position of the center θ.

Condition II.2 (Intermediate Condition). The correlation function ρ_θ satisfies the intermediate condition if

‖ ρ_{θ}^{(0, 0)} (η) | \leq γ_{2} < 1 and

(II.4)

| ρ_{θ}^{(1, 0)} (η) | \leq γ_{3} ‖ {\vec{φ}}^{(1)} (θ) ‖_{2}^{2}

(II.5)

hold for η < N⁻ and η > N⁺, where N^± are defined as in the near condition, and γ₂, γ₃ are positive constants.

In the decay region the correlation and its derivatives are bounded by a decaying function.

Condition II.3 (Decay Condition). The correlation function ρ_θ satisfies the decay condition with decay constant σ > 0 if

| ρ_{θ}^{(0, 2)} (η) | \leq C_{0, r^{e - (| n - 0 | - D)}} / σ and

(II.6)

| ρ_{θ}^{(1, r)} (η) | \leq C_{1, r^{e - (| n - θ | - D) / σ}} | | {\vec{φ}}^{(1)} (θ) | |_{2}^{2}

(II.7)

hold for η < D⁻ and η > D⁺, where r = 0, 1, 2, D^± ≔ θ ± D, and C_q,r are positive constants.

The choice of exponential decay is for concreteness, and can be replaced by a different summable decay bound². Figure 8 shows the derivatives of the correlation functions for several SNL problems.

The sample locations s₁, …, s_n have a direct effect on the correlation function, and hence on whether an SNL problem satisfies Conditions II.1 to II.3. In Figure 9 we depict the correlation function ρ_θ for different sampling patterns in a Gaussian deconvolution problem with $\vec{φ} {(θ)}_{i} ≔ \exp (- {(θ - s_{i})}^{2} ∕ 2)$ . For both regular and irregular sampling patterns, the correlation functions are well behaved as long as there are samples close to the true parameter. In contrast, when there are few samples, or the samples are distant from the true parameter, the conditions in Conditions II.1 to II.3 may be violated. We refer the interested reader to [54] for a more precise analysis of sampling patterns for deconvolution of point sources.

Fig. 9: — Plots of the correlation function *ρ_θ*(η) for a deconvolution problem with varying sampling patterns (indicated by the x markers on the horizontal axis). Specifically, $\vec{φ} {(θ)}_{i} ≔ \exp (- {(θ - s_{i})}^{2} ∕ 2)$ are samples from a θ-centered Gaussian kernel. In the dense uniform case, the correlation function satisfies Conditions II.1 to II.3. For irregular sampling pattern, the correlation function will in general satisfy the conditions if there are some samples close to θ. For sparse uniform samples, the correlation function is nearly flat at θ since there are too few measurements to distinguish nearby parameters. If the samples are only located to the right of θ, then all parameters on the left side have nearly identical measurements.

B. Exact Recovery for SNL Problems with Uniform Correlation Decay

In this section we focus on SNL problems where the correlation function ρ_θ of the measurement operator is approximately translation invariant, meaning that ρ_θ has similar properties for any value of θ. Examples of such SNL problems include super-resolution, deconvolution, and heat-source localization if the conductivity is approximately uniform. We prove that TV-norm minimization recovers a superposition of Dirac measures exactly as long as the support satisfies a separation condition related to the decay properties of the correlation function and its derivatives.

Theorem II.4. Let Θ ≔ {θ₁, …, θ_k} be the support of the measure μ defined in Equation (I.4). Assume that the correlation functions ρθ_i, θ_i ∈ Θ, satisfy Conditions II.1 to II.3 for fixed constants N, D, σ, $γ \in R^{4}$ , and $C \in R^{2 \times 3}$ . Then μ is the unique solution to Problem (I.8) as long as the minimum separation of Θ satisfies,

\min_{i \neq j, θ_{i}, θ_{j} \in Θ} | θ_{i} - θ_{j} | > 2 D + Δ σ, Δ ≔ \max (\log (1 + 2 (C_{0, 0} + C_{1, 1})), λ_{1}, λ_{2}),

(II.8)

where

λ_{1} = 2 \log (\frac{2 A}{- C_{0, 0} + \sqrt{C_{0, 0}^{2} + 4 (1 - γ_{2}) A}}),

(II.9)

λ_{2} = 2 \log (\frac{2 B}{- C_{0, 2} + \sqrt{C_{0, 2}^{2} + 4 γ_{0} B}}),

(II.10)

A = 2 (2 C_{0, 0} + C_{1, 1} - C_{1, 1} γ_{2} + C_{0, 1} γ_{3}) + 1 - γ_{2}, B = 2 ((2 C_{0, 0} + C_{1, 1}) γ_{0} + C_{0, 2} + C_{0, 1} γ_{1} + γ_{0},

and the constants C_q,r are chosen so that

C_{0, 1} C_{1, 0} = C_{0, 0} C_{1, 1,} and C_{0, 1} C_{1, 2} = C_{1, 1} C_{0, 2} .

(II.11)

Note that the condition in Equation (II.11) is only needed to simplify the statement and proof of our results.

Theorem II.4 establishes that TV minimization recovers the true parameters of an SNL problem when the support separation is larger than a constant that is proportional to the rate of decay of the correlation function and its derivatives. This separation is measured from the edges of the intermediate regions, as depicted in Figure 10. In stark contrast to compressed-sensing theory, the result holds for correlation functions that are arbitrarily close to one in the near regions, and may have arbitrary bounded fluctuations in the intermediate regions.

Fig. 10: — Illustration of the minimum-separation condition required by Theorem II.4.

Our statement and proof of Theorem II.4 focuses on clarity over sharpness. For example, in the Gaussian deconvolution problem depicted in Figure 9 with dense uniform samples, the theorem requires a minimum separation of approximately 6.6 to guarantee exact parameter recovery. To obtain this bound, we let

\begin{array}{l} N = 1, D = 0, σ = 1, \\ γ = {[\begin{matrix} 0.185 & 0.983 & 0.788 & 0.868 \end{matrix}]}^{T}, \\ C = [\begin{matrix} 2.818 & 3.348 & 4.200 \\ 6.786 & 8.060 & 10.113 \end{matrix}], \end{array}

(II.12)

and verify the conditions numerically. This is in contrast with the minimum separation of approximately 3 proven in [54]. Part of this discrepancy comes from applying exponential decay bounds to a Gaussian-shaped correlation function.

Our result requires conditions on the correlation functions centered at the true support Θ, and also on their derivatives. The decay conditions on the derivatives constrain the correlation structure of the measurement operator when we perturb the position of the true parameters. For example, they implicitly bound pairwise correlations centered in a small neighborhood of the support. Exploring to what extent these conditions are necessary is an interesting question for future research.

C. Exact Recovery for SNL Problems with Nonuniform Correlation Decay

The measurement operators associated to many SNL problems of practical interest have nonuniform correlations. Figures 5 and 6 show that this is the case for heat-source localization with spatially-varying conductivity, and for estimation of brain-activity from EEG data. Our goal in this section is to establish recovery guarantees for such problems.

The conditions on the correlation structure of the measurement operator required by Theorem II.4 only pertain to the correlation functions centered at the true parameters. In order to generalize the result, we allow the correlation function centered at each parameter to satisfy the conditions in Section II-A with different constants. This makes it possible for the correlation to have near and intermediate regions of varying widths around each element of the support, as well as different decay constants in the far region. Our main result is that TV-norm minimization achieves exact recovery for SNL problems with nonuniform correlation structure, as long as the support satisfies a minimum-separation condition dependent on the corresponding support-centered correlation functions.

Let Θ be the support of our signal of interest, and assume ρ_{θ_i} satisfies the decay conditions in Section II-A with parameters σ_i, D_i, and N_i, which are different for all θ_i ∈ Θ. Extending the notation, we let $N_{i}^{\pm} ≔ θ_{i} \pm N_{i}$ and $D_{i}^{\pm} ≔ θ_{i} \pm D_{i}$ denote the endpoints of the near and decay regions, respectively. Intuitively, when σ_i and D_i are small, the corresponding correlation function ρ_{θ_i} is “narrower” and should require less separation than “wider” correlation functions with large values of σ_i and D_i. This is illustrated in Figure 11, where we depict ρ_{θ_i}. for the heat-source localization problem at three different values of i. The decay becomes more pronounced towards the right due to the changing thermal conductivity of the underlying medium. For the problem to be well posed, one would expect θ₁ to require more separation from other active sources than θ₂, which in turn should require more than θ₃. We confirm this intuition through numerical experiments in Section IV-A. To make it mathematically precise, we define the following generalized notion of support separation.

Fig. 11: — Support-centered correlations with varying decay parameters for the heat-source localization problem. The vertical dashed lines indicate the locations of $D_{i}^{\pm}$ for i = 1, 2, 3, whereas the curved dashed lines indicate the exponential decay bounds. As we move from left to right along the θ-axis the thermal conductivity is decreasing, which causes the correlation functions to become narrower.

Definition II.5 (Generalized support Separation). Suppose for all θ_i ∈ Θ that ρ_{θ_i} satisfies Condition II.3 with parameters D_i and σ_i. Define the normalized distance d(θ_i, θ_j) for i ≠ j by

d (θ_{i}, θ_{j}) = \frac{| θ_{i} - θ_{j} | - D_{i} - D_{j}}{\max (σ_{i}, σ_{j})} .

(II.13)

Assume that Θ is ordered so that θ₁ < θ₂ < … < θ_k. Θ has separation Δ > 0 if d(θ_i, θ_j) > |i − j|Δ for all θ_i, θ_j ∈ Θ with i ≠ j.

The normalized distance d(θ_i, θ_j) is measured between the edges of the decay regions of θ_i and θ_j, and normalized by the level of decay. This allows sharply decaying correlation functions to be in close proximity with one another. We require d(θ_i, θ_j) > |i − j|Δ to prevent the parameters from becoming too clustered. If we only require the weaker condition d(θ_i, θ_j) > Δ, and if σ_i grows very quickly with i, then we could have d(θ₁, θ_j) ≈ Δ for all j > i. This causes too much overlap between the correlation functions.

Figure 13 gives an example of parameters and correlation functions that satisfy the conditions of Definition II.5. The following theorem establishes exact-recovery guarantees under a condition on the generalized support separation.

Fig. 13: — Non-trivial configuration allowed by Definition II.5. The central parameter θ₄ has weaker decay, and thus requires more separation from the other parameters.

Theorem II.6. Suppose that for all θ_i ∈ Θ that ρ_{θ_i} satisfies Conditions II.1 to II.3 and (II.11) with constants N ≔ N_i, D ≔ D_i, σ ≔ σ_i, C, and γ. Note that C and γ are the same for each θ_i. Then the true measure μ defined in (I.4) is the unique solution to Problem (I.8) when Θ has separation Δ (as determined by Definition II.5) satisfying

Δ > \max (\log (1 + 2 (C_{0, 0} + C_{1, 1})), λ_{1}, λ_{2}),

(II.14)

(II.9), (II.10), and (II.11).

The proof of Theorem II.6, which implies Theorem II.4, is given in Section III. The theorem establishes that TV minimization recovers the true parameters of SNL problems with nonuniform correlation decays when the generalized support separation is larger than a constant. Equivalently, exact recovery is achieved as long as each true parameter θ_i is separated from the rest by a distance that is proportional to the rate of decay of the correlation function centered at θ_i. The separation is measured from the edges of the intermediate regions, which can also vary in width as depicted in Figure 10. The result matches our intuition about SNL problems: the parameters can be recovered as long as they yield measurements that are not highly correlated. As mentioned previously, the theorem requires decay conditions on the derivatives of the correlation function, which constrain the correlation structure of the measurement operator.

D. Robustness to Noise

In practice, measurements are always corrupted by noisy perturbations. Noise can be taken into account in our measurement model (I.2) by incorporating an additive noise vector $z \in R^{n}$ :

y ≔ \sum_{i = 1}^{k} c_{i} \vec{φ} (θ_{i}) + z .

(II.15)

To adapt the TV-norm minimization problem (I.8) to such measurements, we relax the data consistency constraint from an equality to an inequality:

\begin{matrix} \underset{\tilde{μ}}{minimize} & ‖ \tilde{μ} ‖_{TV} \\ subject to & ‖ \int_{R^{p}} \vec{φ} (θ) \tilde{μ} (d θ) - y ‖_{2} \leq ξ, \end{matrix}

(II.16)

where ξ > 0 is a parameter that must be tuned according to the noise level. Previous works have established robustness guarantees for TV-norm minimization applied to specific SNL problems such as super-resolution [56], [57] and deconvolution [54] at small noise levels. These proofs are based on dual certificates. Combining the arguments in [54], [56] with our dual-certificate construction in Section III yields robustness guarantees for general SNL problems in terms of support recovery at high signal-to-noise regimes. We omit the details, since the proof would essentially mimic the ones in [54], [56]. In Figure 14 we show an application of Equation (II.16) to a noisy deconvolution problem taken from [54].

Fig. 14: — Deconvolution from noisy samples (in red) with a signal-to-noise ratio of 20.7 dB for the Ricker wavelet. The noise is iid Gaussian. The recovered signals are obtained by solving equation (II.16) on a fine grid which contains the location of the original spikes.

E. Discretization

The continuous optimization problem (II.16) can be solved by applying ℓ₁-norm minimization after discretizing the parameter space. This is a very popular approach in practice for a variety of SNL problems [16], [18], [23], [27], [32]. If the true parameters lie on the discretization grid, then our exact-recovery results translate immediately. The following corollary is a discrete version of Theorem II.6.

Corollary II.7. Assume Θ lies on a known discretized grid G ≔ {η₁, …, η_m} so that Θ ⊂ G. Furthermore, suppose the conditions of Theorem II.6 hold so that μ as defined in (I.4) is the unique solution to Problem (I.8). Define the dictionary $Φ \in R^{n \times m}$ by

Φ ≔ [\vec{φ} (η_{1}) \dots \vec{φ} (η_{m})] .

(II.17)

Then the ℓ₁-norm minimization problem

\begin{matrix} \underset{\tilde{x}}{minimize} & | | \tilde{x} | |_{1} \\ subject to & Φ \tilde{x} = y \end{matrix}

(II.18)

has a unique solution $x \in R^{m}$ satisfying

x_{j} ≔ {\begin{matrix} c_{i} & i f η_{j} = θ_{i}, \\ 0 & o t h e r w i s e, \end{matrix}

(II.19)

for j = 1, …, m.

Proof: If the support of $\tilde{μ}$ in Problem (I.8) is restricted to lie on G, then Problems (I.8) and (II.18) are the same. Thus ‖μ‖_TV must be smaller than ${∥ \tilde{x} ∥}_{1}$ for any $\tilde{x}$ such that $Φ \tilde{x} = y$ . By assumption μ is supported on G, so the result follows. ■

Of course, the true parameters may not lie on the grid used to solve the ℓ₁-norm minimization problem. The proof techniques used to derive robustness guarantees for superresolution and deconvolution in [54], [56] can be leveraged to provide some control over the discretization error. Performing a more accurate analysis of discretization error for SNL problems is an interesting direction for future research.

F. Related Work

1). Sparse Recovery via Convex Programming from Deterministic Measurements:

In [58], Donoho and Elad develop a theory of sparse recovery from generic measurements based on the spark and coherence of the measurement matrix Φ. The spark is defined to be the smallest positive value s such that Φ has s linearly dependent columns. The coherence, which we denote by M(Φ), is the maximum absolute correlation between any two columns of Φ. The authors show that exact recovery is achieved by ℓ₁-minimization when the number of true parameters is less than (1 + 1/M(Φ))/2. As discussed in Section I-C, these arguments are inapplicable to the finely-discretized parameter spaces occurring in SNL problems since neighboring columns of Φ have correlations approaching one. In [59], the authors provide a support-dependent condition for exact recovery of discrete vectors. Using our notation, they require that β(Θ)/(1 − α(Θ)) < 1 where

α (Θ) ≔ \max_{θ_{i} \in Θ} \underset{θ_{k} \in Θ, k \neq i}{Σ} | \vec{φ} {(θ_{i})}^{T} \vec{φ} (θ_{k}) | and β (Θ) ≔ \max_{η_{j} \notin Θ} \underset{θ_{k} \in Θ}{Σ} | \vec{φ} {(η_{j})}^{T} \vec{φ} (θ_{j}) | .

(II.20)

Here $\vec{φ} (η_{j})$ for η_j ∉ Θ ranges over the columns of Φ that do not correspond to the true parameters. This condition is also inapplicable for matrices arising in our problems of interest because β(Θ) approaches one (or larger) for finely-discretized parameter spaces. Sharper exact-recovery guarantees in subsequent works [38], [47]–[50], [60] require randomized measurements, and therefore do not hold for deterministic SNL problems as explained in Section I-C.

2). Convex Programming Applied to Specific SNL Problems:

In [51], [61], the authors establish exact recovery guarantees for super-resolution via convex optimization by leveraging parameter separation (see Section I-D). Subsequent works build upon these results to study the robustness of this methodology to noise [56], [57], [62], [63], missing data [64], and outliers [65]. A similar analysis is carried out in [54] for deconvolution. The authors establish a sampling theorem for Gaussian and Ricker-wavelet convolution kernels, which characterizes what sampling patterns yield exact recovery under a minimum-separation condition. Other works have analyzed the Gaussian deconvolution problem under nonnegativity constraints [66], [67], and also for randomized measurements [68]. All of these works exploit the properties of specific measurement operators. In contrast, the present paper establishes a general theory that only relies on the correlation structure of the measurements. The works that are closer to this spirit are [69], [70], which analyze deconvolution via convex programming for generic convolution kernels. The results in [69] require quadratically decaying bounds on the first three derivatives of the convolution kernel. In [70], the authors prove exact recovery assuming bounds on the first four derivatives of the autocorrelation function of the convolution kernel. In contrast to these works, our results allow for discrete irregular sampling and for measurement operators that are not convolutional, which is necessary to analyze applications such as heat-source localization or estimation of brain activity.

Algorithms for solving the convex programs arising from SNL problems divide into two categories: grid-based and grid-free (or off-the-grid). In grid-based methods the parameter space is discretized, thus yielding a finite-dimensional ℓ₁ minimization problem that can be solved using standard methods (as we have done in Section IV). Grid-free methods attempt to directly solve the infinite-dimensional, but with weaker guarantees (see [71] and [72] for review articles). Recently, several authors [73]-[75] have proposed grid-free algorithms based on the conditional gradient method [76].

3). Other Methodologies:

SNL parameter recovery can be formulated as a nonlinear least squares problem [2]. For a fixed value of the parameters θ₁, …, θ_k, the optimal coefficients c₁, …, c_k in (I.1) have a closed form solution. This makes it possible to minimize the nonlinear cost function with respect to θ₁, …, θ_k directly, a technique known as variable projection [1]. As shown in Figure 3, a downside to this approach is that it may converge to suboptimal local minima, even in the absence of noise.

Prony’s method [8], [77] and the finite-rate-of-innovation (FRI) framework [78], [79] can be applied to tackle SNL problems, as long one can recast them as spectral superresolution problems. This provides a recovery method that avoids discretizing the parameter space. The FRI framework has also been applied to arbitrary non-bandlimited convolution kernels [80] and nonuniform sampling patterns [81], but without exact-recovery guarantees. These techniques have been recently extended by Dragotti and Murray-Bruce [82] to physics-driven SNL problems. By approximating complex exponentials with weighted sums of Green’s functions, they are able to recast parameter recovery as a related spectral super-resolution problem that approximates the true SNL problem.

III. Proof of Exact-Recovery Results

A. Dual Certificates

To prove Theorem II.6 we construct a certificate that guarantees exact recovery.

Proposition III.1 (Proof in Section A). Let $ϴ = {θ_{1}, \dots, θ_{k}} \subset R$ denote the support of the signed atomic measure μ. Assume that for any sign pattern ξ ∈ {±1}^k there is a $\tilde{q} \in R^{p}$ such that $\tilde{Q} (θ) ≔ {\tilde{q}}^{T} \vec{φ} (θ)$ satisfies

\tilde{Q} (θ_{i}) = ξ_{i}, \forall θ_{i} \in Θ,

(III.1)

| \tilde{Q} (θ) | < 1, \forall θ \in Θ^{c} .

(III.2)

Then μ is the unique solution to problem (I.8).

To prove exact recovery of a signal we need to show that it is possible to interpolate the sign pattern of its amplitudes, which we denote by ξ, on its support Θ using an interpolating function $\tilde{Q} (θ)$ that is expressible as a linear combination of the coordinates of $\vec{φ} (θ)$ . The coefficient vector of this linear combination, denoted by $\tilde{q}$ , is known as a dual certificate in the literature because it certifies recovery and is a solution to the Lagrange dual of problem (I.8):

\begin{array}{l} \underset{ν}{\max i m i z e} ν^{T} y \\ subject to \sup_{θ} | v^{T} \vec{φ} (θ) | \leq 1 . \end{array}

(III.3)

Dual certificates have been widely used to derive guarantees for inverse problems involving random measurements, including compressed sensing [83], [84], matrix completion [85] and phase retrieval [86]. In such cases, the certificates are usually constructed by leveraging concentration bounds and other tools from probability theory [87]. In contrast, our setting is completely deterministic. More recently, dual certificates have been proposed for specific deterministic problems such as super-resolution [51] and deconvolution [54]. Our goal here is to provide a construction that is valid for generic SNL models with correlation decay.

B. Correlation-Based Dual Certificates

Our main technical contribution is a certificate that only depends on the correlation function of the measurement operator. In contrast, certificate constructions in previous works on SNL problems [51], [54], [69] typically rely on problem-specific structure, with the exception of [70] which proposes a certificate for time-invariant problems based on autocorrelation functions.

In our SNL problems of interest the function $\vec{φ} (θ)$ mapping the parameters of interest to the data is assumed to be continuous and smooth. As a result, Equations (III.1) and (III.2) imply that any valid interpolating function $\tilde{Q}$ reaches a local extremum at each θ_i ∈ Θ. Equivalently, $\tilde{Q}$ satisfies the following 2k interpolation equations for all θ_i ∈ Θ:

\tilde{Q} (θ_{i}) = ξ_{i},

(III.4)

{\tilde{Q}}^{(1)} (θ_{i}) = 0 .

(III.5)

Inspired by this observation, we define

q ≔ \sum_{i = 1}^{k} α_{i} \vec{φ} (θ_{i}) + β_{i} \frac{{\vec{φ}}^{(1)} (θ_{i})}{{‖ {\vec{φ}}^{(1)} (θ_{i}) ‖}_{2}^{2}}

(III.6)

where $α_{i}, β_{i} \in R$ , i = 1, …, k, are chosen so that $Q (θ) ≔ q^{T} \vec{φ} (θ)$ satisfies the interpolation equations. Crucially, this choice of coefficients yields an interpolation function Q that is a linear combination of correlation functions centered at Θ,

Q (θ) = \sum_{i = 1}^{k} α_{i} \vec{φ} {(θ_{i})}^{T} \vec{φ} (θ) + β_{i} \frac{{\vec{φ}}^{(1)} {(θ_{i})}^{T} \vec{φ} (θ)}{{‖ {\vec{φ}}^{(1)} (θ_{i}) ‖}_{2}^{2}}

(III.7)

= \sum_{i = 1}^{k} α_{i} ρ_{θ_{i}} (θ) + β_{i} \frac{ρ_{θ_{i}}^{(1, 0)} (θ)}{ρ_{θ_{i}}^{(1, 1)} (θ_{i})} .

(III.8)

In essence, we interpolate the sign pattern of the signal on its support using support-centered correlations. Since ρ_{θ_i}(θ_i) = 1 and $ρ_{θ_{i}}^{(1, 0)} (θ_{i}) = 0$ , Equations (III.4) and (III.8) imply α_i ≈ ξ_i when ρ(θ,η) and ρ^(1,0)(θ,η) decay as |θ − η| grows large and the support is sufficiently separated. The term that depends on β can be interpreted as a correction to the derivatives of Q so that (III.5) is satisfied. The normalizing factor used in (III.8) makes this explicit:

{\frac{\partial}{\partial θ} |}_{θ = θ_{i}} β_{i} \frac{ρ_{θ_{i}}^{(1, 0)} (θ)}{ρ_{θ_{i}}^{(1, 1)} (θ_{i})} = β_{i}

(III.9)

for i = 1, …, k. Figure 15 illustrates the construction for the heat-source localization problem. The construction is inspired by previous interpolation-based certificates tailored to superresolution [51] and deconvolution [54].

Fig. 15: — The image shows the interpolating function Q(θ) defined in Equation (III.8) for the heat-source localization problem. The function is a linear combination of *ρ_{θ_i}* (θ) (red curves) and $ρ_{θ_{i}}^{(1, 0)} (θ)$ (orange curves) for *θ_i* ∈ Θ.

In the remainder of this section we show that our proposed construction yields a valid certificate if the conditions of Theorem II.6 hold. In Section III-C we prove Lemma III.2, which establishes that the interpolation equations have a unique solution and therefore Q satisfies (III.1).

Lemma III.2. Under the assumptions of Theorem II.6 there exist $α, β \in R^{k}$ which uniquely solve Equations (III.4) and (III.5).

In Section III-D we prove Lemma III.3, which establishes that Q satisfies (III.2). This completes the proof of Theorem II.6.

Lemma III.3. Let $α, β \in R^{k}$ be the coefficients obtained in Lemma III.2. Under the assumptions of Theorem II.6, the interpolating function Q defined in (III.8) satisfies |Q(θ)| < 1 for θ ∉ Θ.

C. Proof of Lemma III.2

To simplify notation we define the ith normalized correlation and its derivatives by

{\hat{ρ}}_{θ_{i}}^{(q, r)} (θ) ≔ \frac{ρ_{θ_{i}}^{(q, r)} (θ)}{{‖ {\vec{φ}}^{(q)} (θ_{i}) ‖}_{2}^{2}},

(III.10)

for q = 0, 1 and r = 0, 1, 2, where $ρ_{θ_{i}}^{(q, r)}$ is defined in Equation (II.1). Using this notation we have

Q (θ) = \sum_{i = 1}^{k} α_{i} ρ_{θ_{i}} (θ) + β_{i} \frac{ρ_{θ_{i}}^{(1, 0)} (θ)}{ρ_{θ_{i}}^{(1, 1)} (θ_{i})}

(III.11)

= \sum_{i = 1}^{k} α_{i} {\hat{ρ}}_{θ_{i}} (θ) + β_{i} {\hat{ρ}}_{θ_{i}}^{(1, 0)} (θ) .

(III.12)

To prove Lemma III.2 we establish the following stronger result, which also gives useful bounds on α and β. Throughout we assume that ρ satisfies Conditions II.1 to II.3 and (II.11) with parameters γ, C, N, D, and σ, and that Θ has separation Δ.

Lemma III.4. Let s ≔ 2e^−Δ/(1 − e^−Δ). If Δ > log(1 + 2(C_0,0 + C_1,1)) then there are $α, β \in R^{k}$ which uniquely solve equations (III.4) and (III.5). Furthermore, we have

{‖ α ‖}_{\infty} \leq \frac{1 - C_{1, 1^{s}}}{1 - (C_{0, 0} + C_{1, 1}) s}, {‖ β ‖}_{\infty} \leq \frac{C_{0, 1^{s}}}{1 - (C_{0, 0} + C_{1, 1}) s}, 1 - {‖ α - ξ ‖}_{\infty} \geq \frac{1 - (2 C_{0, 0} + C_{1, 1}) s}{1 - (C_{0, 0} + C_{1, 1}) s} .

(III.13)

To prove Lemma III.4 we begin by rewriting Equations (III.4) and (III.5) in block matrix-vector form for the Q function defined in Equation (III.8):

[\begin{matrix} I + P^{(0, 0)} & P^{(1, 0)} \\ P^{(0, 1)} & I + P^{(1, 1)} \end{matrix}] [\begin{matrix} α \\ β \end{matrix}] = [\begin{matrix} ξ \\ 0 \end{matrix}],

(III.14)

where $I \in R^{k \times k}$ is the identity matrix, and $P^{(q, r)} \in R^{k \times k}$ satisfies

P_{ij}^{(q, r)} = {\hat{ρ}}_{θ_{j}}^{(q, r)} (θ_{i})

(III.15)

for i ≠ j and $P_{ii}^{(q, r)} = 0$ . To see why $P_{ii}^{(1, 0)} = 0$ agrees with equations (III.4) and (III.5), note that

{\hat{ρ}}_{θ_{i}}^{(1, 0)} (θ_{i}) = \frac{{\vec{φ}}^{(1)} {(θ_{i})}^{T} \vec{φ} (θ_{i})}{{‖ {\vec{φ}}^{(1)} (θ_{i}) ‖}_{2}^{2}} = 0 .

(III.16)

The atoms are normalized– ${∥ \vec{φ} (θ) ∥}_{2} = 1$ for all θ– which implies

0 = \frac{d}{d θ} {‖ \vec{φ} (θ) ‖}_{2}^{2} = 2 {\vec{φ}}^{(1)} {(θ)}^{T} \vec{φ} (θ) .

For the same reason it follows that ${\hat{ρ}}_{θ_{i}}^{(0, 1)} (θ_{i}) = 0$ .

Our plan is to bound the norm of each P^(q,r). If these norms are sufficiently small then the matrix in Equation (III.14) is nearly the identity, and the desired result follows from a linear-algebraic argument. Define $∊_{i}^{(q, r)} (θ)$ by

ϵ_{i}^{(q, r)} (θ) ≔ \sum_{j \neq i} | {\hat{ρ}}_{θ_{j}}^{(q, r)} (θ) |,

(III.17)

where q = 0,1 and r = 0,1,2. Here we think of θ as a point close to θ_i, so $∊_{i}^{(q, r)} (θ)$ captures the cumulative correlation from the other, more distant elements of Θ. We expect $∊_{i}^{(q, r)} (θ)$ to be small when ρ has sufficient decay and Θ is well separated. For a matrix A let ‖A‖_∞ denote the infinity-norm defined by

{‖ A ‖}_{\infty} ≔ \sup_{{‖ x ‖}_{\infty} = 1} {‖ A x ‖}_{\infty} .

(III.18)

‖A‖_∞ equals the maximum sum of absolute values in any row of A. We have

{‖ P^{(q, r)} ‖}_{\infty} = ϵ^{(q, r)} ≔ \max_{i \in {1, … k}} ϵ_{i}^{(q, r)} (θ_{i}) .

(III.19)

The following lemma shows that equation (III.14) is invertible when ε^(q,r) is sufficiently bounded.

Lemma III.5 (Proof in Section B). Suppose

ϵ^{(1, 1)} < 1 a n d

(III.20)

c ≔ ϵ^{(0, 0)} + \frac{ϵ^{(1, 0)} ϵ^{(0, 1)}}{1 - ϵ^{(1, 1)}} < 1.

(III.21)

Then the matrix in (III.14) is invertible and

{‖ α ‖}_{\infty} \leq \frac{1}{1 - c}

(III.22)

{‖ β ‖}_{\infty} \leq \frac{ϵ^{(0, 1)} / (1 - ϵ^{(1, 1)})}{1 - c}

(III.23)

{‖ α - ξ ‖}_{\infty} \leq \frac{c}{1 - c} .

(III.24)

To apply Lemma III.5 we must first bound ε^(q,r) for q, r ∈ {0, 1}. By the decay and separation conditions (Condition II.3 and Definition II.5), θ_j lies in the exponentially decaying tail of ρ_{θ_i}, for i ≠ j. This gives

ϵ_{i}^{(q, r)} (θ_{i}) = \sum_{j \neq i} | {\hat{ρ}}_{θ_{j}}^{(q, r)} (θ_{i}) | \leq C_{q, r} \sum_{θ_{j} > θ_{i}} \exp (- d (θ_{i}, θ_{j}))

(III.25)

+ C_{q, r} \sum_{θ_{j} < θ_{i}} \exp (- d (θ_{i}, θ_{j}))

(III.26)

\leq 2 C_{q, r} \sum_{k = 1}^{\infty} \exp (- k Δ)

(III.27)

= \frac{2 C_{q, r} e^{- Δ}}{1 - e^{- Δ}} = C_{q, r} s,

(III.28)

where s ≔ 2e^−Δ/(1 − e^−Δ) and the distance d is defined in Definition II.5. As this bound is independent of i, we have

ϵ^{(q, r)} \leq C_{q, r} s

(III.29)

for q, r ∈ {0, 1}. In terms of the conditions of Lemma III.5, we obtain

c = ϵ^{(0, 0)} + \frac{ϵ^{(1, 0)} ϵ^{(0.1)}}{1 - ϵ^{(1, 1)}} \leq C_{0, 0} s + \frac{C_{1, 0} C_{0, 1} s^{2}}{1 - C_{1, 1} s} = \frac{C_{0, 0} s + (C_{1, 0} C_{0, 1} - C_{0, 0} C_{1, 1}) s^{2}}{1 - C_{1, 1} s} = \frac{C_{0, 0} s}{1 - C_{1, 1} s},

(III.30)

assuming Equation (II.11) (C_1,0C_0,1 = C_0,0C_1,1) holds. Thus we have c < 1, as required by (III.21), when $s < \frac{1}{C_{0, 0} + C_{1, 1}}$ . Using this, we can find conditions on Δ so that the hypotheses of Lemma III.5 hold. If Δ > log(1 + κ) for some κ > 0 then, using that f(x) = e^−x/(1 − e^−x) is decreasing for x > 0, we have

\frac{s}{2} = \frac{e^{- Δ}}{1 - e^{- Δ}} < \frac{1 / (1 + κ)}{1 - 1 / (1 + κ)} = \frac{1}{κ} .

(III.31)

Letting κ = 2C_1,1 this shows that Δ > log(1 + 2C_1,1) implies

ϵ^{(1, 1)} \leq C_{1, 1} s = 2 C_{1, 1} \frac{s}{2} < 2 C_{1, 1} \frac{1}{2 C_{1, 1}} = 1,

(III.32)

giving (III.20). For condition (III.21) we let κ = 2(C_0,0 + C_1,1). Then (III.31) shows that Δ > log(1 + 2(C_0,0 + C_1,1)) implies

s = 2 \frac{s}{2} < 2 \frac{1}{2 (C_{0, 0} + C_{1, 1})} = \frac{1}{C_{0, 0} + C_{1, 1}}

(III.33)

as required. Thus when Δ > log(1 + 2(C_0,0 + C_1,1)) both conditions of Lemma III.5 hold. By plugging (III.29) and (III.30) into the bounds of Lemma III.5 we obtain Lemma III.4. This completes the proof of Lemma III.2.

D. Proof of Lemma III.3

Lemma III.2 implies that we can solve (III.4) and (III.5) for α and β and then obtain Q via (III.8). To prove Lemma III.3 we must show that |Q(θ)| < 1 for θ ∈ Θ^c. This is accomplished in two steps. We first show that |Q(θ)| < 1 for θ that are not in the near region of any ρ_{θ_i}, i = 1, …, k. Secondly, we show that for any θ in the near region of some ρ_{θ_i} we have Q⁽²⁾(θ) < 0, where we assume ξ_i = 1 without loss of generality. This proves that Q has a local maximum at θ_i, and is smaller than one nearby.

To bound Q and Q⁽²⁾ we apply the triangle inequality to (III.8), obtaining the following lemma.

Lemma III.6. Fix θ_i ∈ Θ and let Q be defined as in (III.8). For all $θ \in R$

| Q (θ) | \leq {‖ α ‖}_{\infty} | {\hat{ρ}}_{θ_{i}} (θ) | + {‖ β ‖}_{\infty} | {\hat{ρ}}_{θ_{i}}^{(1, 0)} (θ) | + {‖ α ‖}_{\infty} ϵ_{i}^{(0, 0)} (θ) + {‖ β ‖}_{\infty} ϵ_{i}^{(1, 0)} (θ) .

(III.34)

If ${\hat{ρ}}_{θ_{i}}^{(0, 2)} (θ) \leq 0$ and ξ ∈ {−1, +1}^k is the sign pattern interpolated by Q then

Q^{(2)} (θ) \leq (1 - {‖ α - ξ ‖}_{\infty}) {\hat{ρ}}_{θ_{i}}^{(0, 2)} (θ) + {‖ β ‖}_{\infty} | {\hat{ρ}}_{θ_{i}}^{(1, 2)} (θ) | + {‖ α ‖}_{\infty} ϵ_{i}^{(0, 2)} (θ) + {‖ β ‖}_{\infty} ϵ_{i}^{(1, 2)} (θ) .

(III.35)

In the next lemma we show that the separation conditions in Definition II.5 ensure that the support does not cluster together. We assume that θ₁ < θ₂ < … < θ|_Θ|.

Lemma III.7 (Proof in Section C). Fix θ_i ∈ Θ and let θ > θ_i. If i ≤ k − 1, assume that

\frac{θ - D_{i}^{+}}{σ_{i}} \leq \frac{D_{i + 1}^{-} - θ}{σ_{i + 1}} .

(III.36)

Then

\frac{D_{j}^{-} - θ}{σ_{j}} \geq \frac{Δ}{2} i f j = i + 1,

(III.37)

\frac{D_{j}^{-} - θ}{σ_{j}} \geq Δ (j - (i + 1)) i f j > i + 1,

(III.38)

\frac{θ - D_{j}^{+}}{σ_{j}} \geq Δ (i - j) i f j < i,

(III.39)

where Δ is defined in Definition II.5, as long as Conditions II.1 to II.3 and (II.11) hold.

Inequality (III.36) implies that θ_i is the closest element of Θ to θ, if we use the generalized distance normalized by σ.

Bounding |Q(θ)| Outside the Near Region

Our goal is to bound |Q(θ)| for $θ \in R$ that are not in the near region of any ρθ_i. We can assume, flipping the axis if necessary, that the conditions of Lemma III.7 hold for the θ_i closest to θ and that $θ \geq N_{i}^{+}$ (recall that $N_{i}^{+} = θ_{i} + N_{i}$ is the boundary of the near region of ρθ_i). Then θ lies in the exponentially decaying tail of ρ_{θ_j} for j ≠ i, so that

ϵ_{i}^{(q, r)} (θ) = \sum_{j \neq i} {\hat{ρ}}_{θ_{j}}^{(q, r)} (θ) \leq C_{q, r} \exp (- (D_{i + 1}^{-} - θ) / σ_{i + 1}) + \sum_{j < i} C_{q, r} \exp (- (θ - D_{j}^{+}) / σ_{j})

(III.40)

+ \sum_{j > i + 1} C_{q, r} \exp (- (D_{j}^{-} - θ) / σ_{j})

(III.41)

\leq C_{q, r} e^{- Δ / 2} + 2 C_{q, r} \sum_{k = 1}^{\infty} e^{- k Δ}

(III.42)

= C_{q, r} (x + s),

(III.43)

where x = e^−Δ/2 and s is defined in Lemma III.4. Plugging this into (III.34) and combining with Lemma III.4 yields

| Q (θ) | \leq \frac{1 - C_{1, 1} s}{1 - (C_{0, 0} + C_{1, 1}) s} (| {\hat{ρ}}_{θ_{i}} (θ) | + C_{0, 0} (x + s)) + \frac{C_{0, 1} s}{1 - (C_{0, 0} + C_{1, 1}) s} (| {\hat{ρ}}_{θ_{i}}^{(1, 0)} (θ) | + C_{1, 0} (x + s))

(III.44)

\leq \frac{1 - C_{1, 1} s}{1 - (C_{0, 0} + C_{1, 1}) s} (γ_{2} + C_{0, 0} (x + s)) + \frac{C_{0, 1} s}{1 - (C_{0, 0} + C_{1, 1}) s} (γ_{3} + C_{1, 0} (x + s)) .

(III.45)

Solving for |Q(θ)| < 1 we obtain

(1 - C_{1, 1} s) (γ_{2} + C_{0, 0} (x + s)) + (C_{0, 1} s) (γ_{3} + C_{1, 0} (x + s)) < 1 - (C_{0, 0} + C_{1, 1}) s .

(III.46)

Isolating s and x,

s (2 C_{0, 0} + C_{1, 1} - C_{1, 1} γ_{2} + C_{0, 1} γ_{3}) + x C_{0, 0} < 1 - γ_{2},

(III.47)

where we apply Equation (II.11) (C_1,0C_0,1 = C_0,0C_1,1) to cancel terms. Since $s = \frac{2 x^{2}}{1 - x^{2}}$ we obtain the inequality

\frac{2 x^{2}}{1 - x^{2}} a + x b < c

(III.48)

where a ≔ 2C_0,0 + C_1,1 − C_1,1γ₂ + C_0,1γ₃ > 0, b ≔ C_0,0, and c ≔ 1 − γ₂ > 0. The following lemma shows that this inequality is satisfied by our assumptions, completing this part of the proof.

Lemma III.8 (Proof in Section D). Let a, b, c, Δ > 0, and x ≔ e^−Δ/2. Then the inequality

\frac{2 x^{2}}{1 - x^{2}} a + x b < c

(III.49)

is satisfied when

Δ > 2 \log (\frac{2 (2 a + c)}{- b + \sqrt{b^{2} + 4 (2 a + c) c}}) .

(III.50)

Bounding Q⁽²⁾(θ) in the Near Region

For the final part of the proof we must prove Q⁽²⁾(θ) < 0 for θ in the near region of some θ_i with ξ_i = 1. Since the interpolation equations (III.4) and (III.5) guarantee that Q(θ_i) = 1, strict concavity implies that Q has a unique maximum on $[N_{i}^{-}, N_{i}^{+}]$ thus establishing that Q(θ) < 1 for $θ \in [N_{i}^{-}, N_{i}^{+}] {θ_{i}}$ . We cannot have Q(θ) ≤ −1 on the near region since we already showed that |Q(θ)| < 1 outside the near region. We can assume, without loss of generality, that the conditions of Lemma III.7 hold for some i and that $θ \leq N_{i}^{+}$ . By the same argument used to obtain Equation (III.43), we have

ϵ_{i}^{(q, r)} (θ) \leq C_{q, r} (x + s) .

(III.51)

Plugging this into (III.34) and applying Lemma III.4 we obtain

Q^{(2)} (θ) \leq \frac{1 - (2 C_{0, 0} + C_{1, 1}) s}{1 - (C_{0, 0} + C_{1, 1}) s} {\hat{ρ}}_{θ {}_{i}}^{(0, 2)} (θ)

(III.52)

+ \frac{1 - C_{1, 1} s}{1 - (C_{0, 0} + C_{1, 1}) s} (C_{0, 2} (x + s)) + \frac{C_{0, 1} s}{1 - (C_{0, 0} + C_{1, 1}) s} (| {\hat{ρ}}_{θ_{i}}^{(1, 2)} (θ) | + C_{1, 2} (x + s))

(III.53)

\leq \frac{1 - (2 C_{0, 0} + C_{1, 1}) s}{1 - (C_{0, 0} + C_{1, 1}) s} (- γ_{0}) + \frac{1 - C_{1, 1} s}{1 - (C_{0, 0} + C_{1, 1}) s} (C_{0, 2} (x + s)) + \frac{C_{0, 1} s}{1 - (C_{0, 0} + C_{1, 1}) s} (γ_{1} + C_{1, 2} (x + s)),

(III.54)

where ${\hat{ρ}}_{θ_{i}}^{(0, 2)} (θ) \leq 0$ by Condition II.1. Solving for Q⁽²⁾(θ) < 0 yields

- (1 - (2 C_{0, 0} + C_{1, 1}) s) γ_{0} +

(III.55)

(1 - C_{1, 1} s) C_{0, 2} (x + s) + C_{0, 1} s (γ_{1} + C_{1, 2} (x + s)) < 0.

(III.56)

Grouping into terms involving s and x we obtain

s ((2 C_{0, 0} + C_{1, 1}) γ_{0} + C_{0, 2} + C_{0, 1} γ_{1}) + C_{0, 2} x < γ_{0},

(III.57)

where we apply Equation (II.11) (C_0,1C_1,2 = C_0,2C_1,1) to cancel terms. Letting a ≔ (2C_0,0 + C_1,1)γ₀ + C_0,2 + C_0,1γ₁ > 0, b ≔ C_0,2, and c ≔ γ₀ > 0 we obtain an inequality of the form (III.49). Applying Lemma III.8 completes the proof.

E. Extensions to Higher Dimensions

In this section we briefly describe an extension of our dual certificate construction to settings where the parameter space has dimension p > 1. We leave a more detailed analysis to future work. In higher dimensions, one can build the interpolating function Q by setting

Q (η) ≔ \sum_{i = 1}^{k} [α_{i} ρ_{θ_{i}} (η) + \sum_{l = 1}^{p} β_{i, l} \frac{\partial_{1, l} ρ_{θ_{i}} (η)}{{‖ \partial_{l} \vec{φ} (θ_{i}) ‖}_{2}^{2}}],

(III.58)

where $α \in R^{k}$ , $β \in R^{k \times p}$ , ∂_l denotes the partial derivative with respect to the 1th coordinate, and ∂_{1,lρ_θ}(η) denotes the partial derivative of ρ_θ(η) with respect to the lth coordinate of θ. The coefficients are chosen so that Q satisfies the analog of the interpolation Equations (III.4) and (III.5),

Q (θ_{i}) = ξ_{i},

(III.59)

\nabla Q (θ_{i}) = 0.

(III.60)

In Figure 16 we show an example of an interpolating function Q for the electroencephalography brain-activity localization problem. The interpolating function is associated to the signal with three active sources of brain activity from Figure 2. To control Q for problems with correlation decay, one can extend the correlation-decay conditions in Section II-A to higher dimensions by requiring analogous bounds on the first and second partial derivatives. For example, the local quadratic bound in Equation (II.2) becomes a bound on the eigenvalues of the Hessian of ρ_θ. Similar conditions have been used in [51], [69] to obtain recovery guarantees for the super-resolution and deconvolution problems in two dimensions.

Fig. 16: — Interpolation function Q associated to a valid dual certificate for the electroencephalography brain-activity localization problem. The dual certificate is associated to a signal consisting of the three sources of brain activity shown in Figure 2. Q interpolates the sign pattern of the signal, which equals +1 on θ₁ and θ₃, and −1 on θ₂. The certificate is built using the multidimensional extension of our proposed interpolation technique described in Section III-E.

IV. Numerical Experiments

A. Heat-source localization

Our theoretical results establish that convex programming yields exact recovery in parameter-estimation problems that have correlation decay. In this section we investigate this phenomenon numerically for a heat-source localization problem in one dimension. The heat sources are modeled as a collection of point sources,

μ ≔ \sum_{θ_{i} \in Θ} c_{i} δ_{θ_{i}},

(IV.1)

where Θ is a finite number of points in the unit interval and $c_{i} \in R$ for i = 1, …, k.

The heat u(θ,t) at position θ and time t is assumed to evolve following the heat equation with Neumann boundary conditions (see e.g. [9]),

\frac{\partial}{\partial t} u (θ, t) = \frac{\partial}{\partial θ} (c (θ) \frac{\partial}{\partial θ} u (θ, t)),

(IV.2)

\frac{\partial}{\partial θ} u (- 0.5, t) = \frac{\partial}{\partial θ} u (0.5, t) = 0

(IV.3)

on the unit interval, where c(θ) represents the conductivity of the medium at θ (see Figure 17). The data are heat measurements u(j, T), where j is sampled on a regular grid of 100 points at a fixed time T ≔ 10⁻⁴. Our goal is to recover the initial heat distribution at t = 0 represented by the heat sources μ. This is an SNL problem where $\vec{φ} (θ)$ corresponds to the measurements caused by a source located at θ.

Fig. 17: — The conductivity function c(θ) used in the experiment described in Section IV-A. Heat diffuses faster in the central region, where the conductivity is higher.

If the conductivity is constant, then $\vec{φ} (θ)$ has a Gaussian-like shape with fixed width sampled on the measurement grid, and the correlation function ρ_θ is also shaped like a Gaussian with fixed width. In this example, the conductivity varies (see Figure 17), which results in correlations that are still mostly Gaussian-like but have very different widths (see Figure 18). The measurement operator therefore has nonuniform correlation decay properties. According to the theory presented in Section II-C, we expect convex programming to recover the heat-source location as long as they are separated by a minimum separation that takes into account the structure of the correlation decay. To verify whether this is the case we consider two different separation measures. The first just measures the minimum separation between the sources,

Δ_{sep} ≔ \min_{i \neq j} | θ_{i} - θ_{j} | .

(IV.4)

Fig. 18: — Examples of the correlation functions *ρ_θ₁*, …, *ρ_θ₇* for heat sources with different separations and configurations. The sources used in the top row of figures are uniformly spaced on the unit interval. The heat sources in the bottom row have one central source, and two clusters of three sources located at the ends. From left to right the supports are dilated by different factors. This reduces the separation between the heat sources but maintains their relative position.

The second takes into account the correlation function of the SNL problem,

Δ_{corr} ≔ \min_{i \neq j} \frac{| θ_{i} - θ_{j} |}{\max (σ_{i}, σ_{j})} .

(IV.5)

where σ_i, 1 ≤ i ≤ k, is the standard deviation of the best Gaussian upper bound on the correlation function centered at θ_i,

σ_{i} ≔ \inf {s > 0 | ρ_{θ_{i}} (θ) < e^{- {(θ_{i} - θ)}^{2} / (2 s^{2})} for all θ .}

(IV.6)

The question is which of these separations characterizes the performance of the convex-programming approach more accurately. We consider two different heat-source configurations: one where the sources are uniformly spaced and another where they are clustered, as depicted in Figure 18. In the clustered configuration, groups of spikes are placed in regions of low conductivity where each ρ_{θ_i} exhibits sharp decay. This creates a noticeable discrepancy between the two measures of separation. We expect exact recovery to occur for much smaller values of Δ_sep in the clustered case than it does in the uniform case since it doesn’t account for the correlation decay. In contrast, we expect Δ_corr to have similar thresholds for exact recovery in both cases, as it captures the inherent difficulty in the problem. To recover these sources we solve the discretized ℓ₁-norm minimization problem described in Section II-E using CVX [88]. The measurement matrix is computed by solving the differential equation on a grid of 10³ points in the parameter space.

Figure 19 shows the results. For both types of support, we observe that exact recovery occurs as long as the minimum separation Δ_sep between sources is larger than a certain value. However, this value is very different for the two types of support. It is much smaller for the clustered support. Intuitively, this is due to the fact that the clustered sources are mostly located in the region where the conductivity is lower, and consequently the correlation decay is much faster. Our theory suggests that exact recovery will occur at smaller separations for such configurations, which is what we observe. Quantifying separation using Δ_corr accounts for the variation in correlation decay, resulting in a similar phase transition for both types of support. This is consistent with the theory for SNL problems with nonuniform correlation decay developed in Section II-C.

Fig. 19: — The figure shows the performance of heat-source localization based on convex programming for the experiment described in Section IV-A. The upper row shows the result for sources with uniformly spaced supports, whereas the bottom row shows results for sources with clustered supports (see Figure 18). The left column plots the results with respect to the minimum separation Δ_sep. The right column plots the results in terms of the correlation-aware separation Δ_corr. Recovery succeeds if the relative ℓ₂-norm recovery error is smaller than 3 · 10⁻⁵. All failures have an absolute error above 9 · 10⁻³, and most have errors above 1. For comparison the ℓ₂ norm of the true signal equals $\sqrt{7}$ (≈ 2.65) in all cases.

B. Estimation of Brain Activity via Electroencephalography

In this section we consider the SNL problem of brain-activity localization from electroencephalography (EEG) data. Areas of focalized brain activity are usually known as sources. In EEG, as well as in magnetoencephalography, sources are usually modeled using electric dipoles, represented mathematically as point sources or Dirac measures at the corresponding locations [89]. EEG measurements are samples of the electric potential on the surface of the head obtained using an array of electrodes. The mapping from the source positions and amplitudes to the EEG data is governed by Poisson’s equation. For a fixed model of the head geometry– obtained for example from 3D images acquired using magnetic-resonance imaging– the mapping can be computed by solving the differential equation numerically. By linearity, this corresponds to an SNL model where θ represents the position of a dipole and φ_t (θ) is the value of the corresponding electrical potential at a location t of the scalp.

To simulate realistic EEG data with associated ground-truth source positions, we use Brainstorm [90], an open-source software for the analysis of EEG and other electrophysiological recordings. We use template ICBM152, which is a nonlinear average of 152 3D magnetic-resonance image scans [91], to model the geometry of the brain and head. The sources are modeled as electrical dipoles situated on the cortical surface, discretized as a tesselated grid with 15,000 points. The orientation of the dipoles is assumed to be perpendicular to the cortex. The data-acquisition sensor array has 256 channels (HydroCel GSN, EGI). The data corresponding to each point on the grid are simulated numerically using the open-source software OpenMEEG [92]. The computation assumes realistic electrical properties for the brain, skull and scalp. As a result, we obtain a 256 × 15,000 measurement operator that can be used to generate data corresponding to different combinations of sources. Figure 2 shows three examples.

The correlation structure of the EEG linear measurement operator is depicted in Figure 6. The complex geometric structure of the folds in cortex induce an intricate correlation structure close to the source location. However, the correlation presents a clear correlation decay; points that are far enough from the source have low correlation values. Our theoretical results consequently suggest that ℓ₁-norm minimization should succeed in recovering superpositions of sources that are sufficiently separated. This is supported by the fact that for such superpositions, we can build valid dual certificates as illustrated in Figure 16.

In order to test our hypothesis, we randomly choose superpositions of three sources located at approximately the same distance from each other, for a range of distances. For each superposition we generate 8 different measurement vectors by assigning every possible combination of positive and negative unit-norm amplitudes to each source. We then estimate the source locations by solving an ℓ₁-norm minimization problem with equality constraints using CVX [88]. Figure 20 shows the results. When the separation is sufficiently large, exact recovery indeed occurs for all possible patterns of positive and negative amplitudes, as expected from our theoretical analysis. As the separation decreases, recovery fails for some of the patterns, as shown in the image on the right.

Fig. 20: — Result of the brain-source localization experiment described in Section IV-B. The horizontal axis indicates the distance between the sources. In the left image success is declared if the three sources are accurately recovered for *any* pattern of positive and negative amplitudes. In the right image success is declared if the sources are recovered for *all* patterns of positive and negative amplitudes. Recovery succeeds if the relative ℓ₂-norm recovery error is smaller than 10⁻⁴. All failures have an absolute error above 1.18. For comparison the ℓ₂ norm of the true signal equals $\sqrt{3}$ (≈ 1.73) in all cases.

V. Conclusion and Future Work

In this work, we establish that deterministic SNL problems can be solved via a tractable convex optimization program, as long as the parameters have a minimum separation in the parameter space with respect to the correlation structure of the measurement operator. As mentioned in Sections II-D, our results can be used to establish some robustness guarantees at low-noise levels in terms of support recovery. Deriving more precise stability analysis in the spirit of [93] is an interesting research direction. Another interesting open problem is characterizing the performance of reweighting techniques, which are commonly applied in practice to enhance solutions obtained from ℓ₁-norm minimization [16], [94].

A drawback of the sparse-recovery framework discussed in this paper is computational complexity: for SNL problems in two or three dimensions it is very computationally expensive to solve ℓ₁ -norm minimization problems, even if we discretize the domain. In the last few years, algorithms that minimize nonconvex cost functions directly via gradient descent have been shown to provable succeed in solving underdetermined inverse problems involving randomized measurements [95], [96]. As illustrated in Figure 3, nonlinear least-squares cost functions associated to deterministic SNL problems often have non-optimal local minima. An intriguing question is how to design cost functions for deterministic SNL problems that can be tackled directly by descent methods.

Another computationally efficient alternative is to perform recovery using machine learning. Recent works suggest calibrating a feedforward network to output the model parameters using a training set of examples, with applications in point-source deconvolution [97] and super-resolution of line spectra [98]. Understanding under what conditions such techniques can be expected to yield accurate estimates is a challenging question for future research.

Throughout the appendix, we assume there is some compact interval $I \subset R$ containing the support Θ of the true measure μ. In problem (I.8), the variable $\tilde{μ}$ takes values in the set of finite signed Borel measures supported on I.

Fig. 12: — Illustration of the generalized support-separation condition in Definition II.5.

Acknowledgment

B.B. is supported by the MacCracken Fellowship, and the Isaac Barkey and Ernesto Yhap Fellowship. C.P. is supported by NIH award NEI-R01-EY025673. C.F. is supported by NSF award DMS-1616340.

Biographies

Brett Bernstein is a Ph.D. student in Mathematics at the Courant Institute of Mathematical Sciences supervised by Carlos Fernandez-Granda. The goal of his research is to improve the algorithms, techniques, and workflows used in statistical machine learning and the study of inverse problems.

Sheng Liu is a Ph.D. student in Data Science at the NYU Center for Data Science supervised by Carlos Fernandez-Granda. The goal of his research is to improve the methodology used in machine learning and the study of inverse problems as well as their applications in healthcare.

Chrysa Papadaniil received a Ph.D. degree in electrical engineering from the Aristotle University of Thessaloniki in 2016. She was a Postdoctoral Researcher at the Center for Neural Science in New York University from 2016 to 2019. She is currently a Research Scientist at the Center for Brain Imaging in New York University. Her research focuses on brain-imaging modalities such as fMRI, EEG, MEG, and TMS.

Carlos Fernandez-Granda received a Ph.D. degree in electrical engineering from Stanford University in 2014. In 2015 he spent a year at Google as a Postdoctoral Researcher. He is currently an Assistant Professor of Mathematics and Data Science at the Courant Institute of Mathematical Sciences and the Center for Data Science at NYU. The goal of his research is to design new methodology for inverse problems drawing from optimization, machine learning and statistics, with a focus on signal processing and healthcare, as well as to analyze these methods mathematically in order to characterize their potential and limitations.

Appendix A

Proof of Proposition III.1

The following proof is standard, but given here for completeness.

: Let ν be feasible for problem (I.8) and define h = ν − μ. By taking the Lebesgue decomposition of h with respect to |μ| we can write

h = h_{Θ} + h_{Θ^{c}},

(A.1)

where h_Θ is absolutely continuous with respect to |μ|, and h_Θ^c, |μ| are mutually singular. In other words, the support of h_Θ is contained in Θ, and h_Θ^c(Θ) = 0. This allows us to write

h_{Θ} = \sum_{t_{j} \in T} b_{j} δ_{t_{j},}

(A.2)

for some $b \in R^{∣ ϴ ∣}$ . Set ξ ≔ sign(b), where we arbitrarily choose ξ_j = ±1 if b_j = 0. By assumption there exists a corresponding Q interpolating ξ on Θ. Since μ and ν are feasible for problem (I.8) we have ∫ φ(θ) dh(θ) = 0. This implies

0 = q^{T} \int φ (θ) d h (θ)

(A.3)

= \int Q (θ) d h (θ)

(A.4)

= {‖ h_{Θ} ‖}_{TV} + \int Q (θ) d h_{Θ^{c}} (θ) .

(A.5)

Applying the triangle inequality, we obtain

{‖ ν ‖}_{TV} = {‖ μ + h_{Θ} ‖}_{TV} + {‖ h_{Θ^{c}} ‖}_{TV} (Mutually Singular)

(A.6)

\geq {‖ μ ‖}_{TV} + {‖ h_{Θ^{c}} ‖}_{TV} - {‖ h_{Θ} ‖}_{TV} (Triangle Inequality)

(A.7)

= {‖ μ ‖}_{TV} + {‖ h_{Θ^{c}} ‖}_{TV} + \int Q (θ) d h_{Θ^{c}} (θ)

(A.8)

(Equation (A.3)) \geq {‖ μ ‖}_{TV} (| Q (θ) | \leq 1),

(A.9)

where the last inequality is strict if ‖h_Θ^c‖_TV > 0 since |Q(θ)| < 1 for θ ∈ Θ^c. This establishes that μ is optimal for problem (I.8), and that any other optimal solution must be supported on Θ. Equation (A.3) implies that any feasible solution supported on Θ must be equal to μ, (since ‖h_Θ‖_TV = 0), completing the proof of uniqueness. ■

Appendix B

Proof of Lemma III.5

: For any matrix $A \in R^{n \times n}$ such that ‖A‖_∞ < 1 the Neumann series $Σ_{j = 0}^{\infty} A^{j}$ converges to ${(I - A)}^{- 1}$ , which implies that $I - A$ is invertible [99]. By the triangle inequality and the submultiplicativity of the ∞-norm, this gives

{‖ {(I - A)}^{- 1} ‖}_{\infty} \leq \sum_{j = 0}^{\infty} {‖ A ‖}_{\infty}^{j} = \frac{1}{1 - ‖ A_{\infty} ‖} .

(B.1)

Setting A = −P^(1,1) and applying ‖P^(1,1)‖_∞ = ε^(1,1) < 1 proves $I + P^{(1, 1)}$ is invertible. Let $C$ be the Schur complement of $I + P^{(1, 1)}$ in (III.14) so that

C = I + P^{(0, 0)} - P^{(1, 0)} {(I + P^{(1, 1)})}^{- 1} P^{(0, 1)} .

(B.2)

By the triangle inequality and (B.1) applied with A = −P^(1,1) we obtain

{‖ I - C ‖}_{\infty} \leq ϵ^{(0, 0)} + \frac{ϵ^{(1, 0)} ϵ^{(0, 1)}}{1 - ϵ^{(1, 1)}} = c < 1,

(B.3)

proving $C$ is invertible. As both $I + P^{(1, 1)}$ and its Schur complement $C$ are invertible, the matrix in (III.14) is also invertible (see e.g. [100]), which establishes the first claim.

By applying blockwise Gaussian elimination we solve (III.14) in terms of $C$ to obtain

α = C^{- 1} ξ

(B.4)

β = - {(I + P^{(1, 1)})}^{- 1} P^{(0, 1)} α .

(B.5)

Applying (B.1) and noting that ‖ξ‖_∞ = 1 we obtain the required bounds on ‖α‖_∞ and ‖β‖_∞. Finally,

‖ I - C ‖ α = α - ξ,

(B.6)

which implies ‖α − ξ‖∞ ≤ c ‖α‖_∞ and completes the proof.

Appendix C

Proof of Lemma III.7

: Throughout we use the fact that Θ having separation Δ > 0 (Definition II.5) implies $D_{i}^{+} \leq D_{j}^{-}$ for j > i. When i ≤ k − 1, (III.36) implies that

θ \leq \frac{σ_{i} D_{i + 1}^{-} + σ_{i + 1} D_{i}^{+}}{σ_{i} + σ_{i + 1}},

where the righthand side is the σ-weighted average of $D_{i}^{+}$ and $D_{i + 1}^{-}$ . If $θ \geq D_{i}^{+}$ this gives

\frac{D_{i + 1}^{-} - θ}{σ_{i + 1}} \geq \frac{D_{i + 1}^{-} - D_{i}^{+}}{σ_{i} + σ_{i + 1}} \geq \frac{max (σ_{i}, σ_{i + 1}) Δ}{σ_{i} + σ_{i + 1}} \geq \frac{Δ}{2} .

If $θ < D_{i}^{+}$ then

\frac{D_{i + 1}^{-} - θ}{σ_{i + 1}} > \frac{D_{i + 1}^{-} - D_{i}^{+}}{σ_{i + 1}} \geq \frac{D_{i + 1}^{-} - D_{i}^{+}}{max (σ_{i}, σ_{i + 1})} \geq Δ > \frac{Δ}{2},

(C.1)

by the separation conditions, proving (III.37). If i + 1 < j ≤ ‖Θ‖ then d(θ_i+1, θ_j) ≥ Δ(j − (i + 1)), so that

\frac{D_{j}^{-} - θ}{σ_{j}} \geq \frac{D_{j}^{-} - D_{i + 1}^{+}}{σ_{j}} \geq \frac{D_{j}^{-} - D_{i + 1}^{+}}{max (σ_{j}, σ_{i + 1})} \geq Δ (j - (i + 1)),

(C.2)

by the separation conditions applied to θ_j and θ_i+1, which implies (III.38). Finally, if j < i then

\frac{θ - D_{j}^{+}}{σ_{j}} \geq \frac{D_{i}^{-} - D_{j}^{+}}{σ_{j}} \geq \frac{D_{i}^{-} - D_{j}^{+}}{max (σ_{j}, σ_{i})} \geq Δ (i - j),

(C.3)

by the separation conditions applied to θ_j and θ_i, which establishes (III.39).

Appendix D

Proof of Lemma III.8

: Multiplying through by 1 − x² (which is positive by assumption) in (III.49) we obtain

- x^{3} b + (2 a + c) x^{2} + x b - c < 0 .

(D.1)

The above inequality is implied by the simpler quadratic inequality

(2 a + c) x^{2} + x b - c < 0,

(D.2)

where the omitted term −x³b is always negative. Since the inequality is satisfied for x = 0 we obtain the condition

0 < x < \frac{- b + \sqrt{b^{2} + 4 (2 a + c) c}}{2 (2 a + c)} .

(D.3)

Translating this to a statement on Δ we obtain

Δ > 2 \log (\frac{2 (2 a + c)}{- b + \sqrt{b^{2} + 4 (2 a + c) c}}) .

(D.4)

Footnotes

Not to be confused with the total variation of a piecewise-constant function used in image processing.

In the case of super-resolution, the decay is not summable, but can be made summable by applying a window to the data, which is standard practice in spectral super-resolution [55].

This paper was presented in part at the SIAM 2018 Annual Meeting and the Asilomar Conference on Signals, Systems, and Computers, November 2019

Contributor Information

Brett Bernstein, Courant Institute of Mathematical Sciences, New York University, New York, NY 10011 USA.

Sheng Liu, Center for Data Science, New York University, New York, NY 10011 USA.

Chrysa Papadaniil, Center for Brain Imaging, New York University, New York, NY 10011 USA.

Carlos Fernandez-Granda, Courant Institute of Mathematical Sciences, New York University, New York, NY 10011 USA; Center for Data Science, New York University, New York, NY 10011 USA.

References

[1].Golub GH and Pereyra V, “The differentiation of pseudo-inverses and nonlinear least squares problems whose variables separate,” SIAM Journal on numerical analysis, vol. 10, no. 2, pp. 413–432, 1973. [Google Scholar]
[2].Golub GH, “Separable nonlinear least squares: the variable projection method and its applications,” Inverse problems, vol. 19, no. 2, p. R1, 2003. [Google Scholar]
[3].Ricker N, “The form and laws of propagation of seismic wavelets,” Geophysics, vol. 18, no. 1, pp. 10–40, 1953. [Google Scholar]
[4].Betzig E, Patterson GH, Sougrat R, Lindwasser OW, Olenych S, Bonifacino JS, Davidson MW, Lippincott-Schwartz J, and Hess HF, “Imaging intracellular fluorescent proteins at nanometer resolution,” Science, vol. 313, no. 5793, pp. 1642–1645, 2006. [DOI] [PubMed] [Google Scholar]
[5].Hess ST, Girirajan TP, and Mason MD, “Ultra-high resolution imaging by fluorescence photoactivation localization microscopy,” Biophysical journal, vol. 91, no. 11, p. 4258, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
[6].Puschmann KG and Kneer F, “On super-resolution in astronomical imaging,” Astronomy and Astrophysics, vol. 436, pp. 373–378, 2005. [Google Scholar]
[7].Sheriff RE and Geldart LP, Exploration seismology. Cambridge university press, 1995. [Google Scholar]
[8].Stoica P and Moses RL, Spectral analysis of signals, 1st ed Upper Saddle River, New Jersey: Prentice Hall, 2005. [Google Scholar]
[9].Li Y, Osher S, and Tsai R, “Heat source identification based on constrained minimization,” Inverse Problems and Imaging, vol. 8, no. 1, pp. 199–221, 2014. [Google Scholar]
[10].Niedermeyer E and da Silva FL, Electroencephalography: basic principles, clinical applications, and related fields. Lippincott Williams & Wilkins, 2005. [Google Scholar]
[11].Michel CM, Murray MM, Lantz G, Gonzalez S, Spinelli L, and de Peralta RG, “Eeg source imaging,” Clinical neurophysiology, vol. 115, no. 10, pp. 2195–2222, 2004. [DOI] [PubMed] [Google Scholar]
[12].Nunez PL and Srinivasan R, Electric fields of the brain: the neurophysics of EEG. Oxford University Press, USA, 2006. [Google Scholar]
[13].Nishimura DG, Principles of magnetic resonance imaging. Stanford University, 1996. [Google Scholar]
[14].Ma D, Gulani V, Seiberlich N, Liu K, Sunshine JL, Duerk JL, and Griswold MA, “Magnetic resonance fingerprinting,” Nature, vol. 495, no. 7440, p. 187, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].McGivney D, Deshmane A, Jiang Y, Ma D, Badve C, Sloan A, Gulani V, and Griswold M, “Bayesian estimation of multicomponent relaxation parameters in magnetic resonance fingerprinting,” Magnetic Resonance in Medicine, 2017. [Online]. Available: 10.1002/mrm.27017 [DOI] [PMC free article] [PubMed] [Google Scholar]
[16].Tang S, Fernandez-Granda C, Lannuzel S, Bernstein B, Lattanzi R, Cloos M, Knoll F, and Assländer J, “Multicompartment magnetic resonance fingerprinting,” Inverse problems, vol. 34, no. 9, p. 4005, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].Bloch F, “Nuclear induction,” Physical review, vol. 70, no. 7-8, p. 460, 1946. [Google Scholar]
[18].Taylor HL, Banks SC, and McCoy JF, “Deconvolution with the l1 norm,” Geophysics, vol. 44, no. 1, pp. 39–52, 1979. [Google Scholar]
[19].Claerbout JF and Muir F, “Robust modeling with erratic data,” Geophysics, vol. 38, no. 5, pp. 826–844, 1973. [Google Scholar]
[20].Levy S and Fullagar PK, “Reconstruction of a sparse spike train from a portion of its spectrum and application to high-resolution deconvolution,” Geophysics, vol. 46, no. 9, pp. 1235–1243, 1981. [Google Scholar]
[21].Santosa F and Symes WW, “Linear inversion of band-limited reflection seismograms,” SIAM Journal on Scientific and Statistical Computing, vol. 7, no. 4, pp. 1307–1330, 1986. [Google Scholar]
[22].Debeye H and Van Riel P, “Lp-norm deconvolution,” Geophysical Prospecting, vol. 38, no. 4, pp. 381–403, 1990. [Google Scholar]
[23].Zhao Y, Dinstel A, Azimi-Sadjadi MR, and Wachowski N, “Localization of near-field sources in sonar data using the sparse representation framework,” in IEEE OCEANS 2011, 2011, pp. 1–6. [Google Scholar]
[24].Bertin N, Daudet L, Emiya V, and Gribonval R, “Compressive sensing in acoustic imaging,” in Compressed Sensing and its Applications. Springer, 2015, pp. 169–192. [Google Scholar]
[25].Potter LC, Ertin E, Parker JT, and Cetin M, “Sparsity and compressed sensing in radar imaging,” Proceedings of the IEEE, vol. 98, no. 6, pp. 1006–1020, 2010. [Google Scholar]
[26].Tang Z, Blacqui Gère, and G. Leus, “Aliasing-free wideband beam-forming using sparse signal representation,” IEEE Transactions on Signal Processing, vol. 59, no. 7, pp. 3464–3469, 2011. [Google Scholar]
[27].Silva C, Maltez J, Trindade E, Arriaga A, and Ducla-Soares E, “Evaluation of l1 and l2 minimum norm performances on EEG localizations,” Clinical neurophysiology, vol. 115, no. 7, pp. 1657–1668, 2004. [DOI] [PubMed] [Google Scholar]
[28].Xu P, Tian Y, Chen H, and Yao D, “Lp norm iterative sparse solution for EEG source localization,” IEEE transactions on biomedical engineering, vol. 54, no. 3, pp. 400–409, 2007. [DOI] [PubMed] [Google Scholar]
[29].Gunn RN, Gunn SR, Turkheimer FE, Aston JA, and Cunningham VJ, “Positron emission tomography compartmental models: a basis pursuit strategy for kinetic modeling,” Journal of Cerebral Blood Flow & Metabolism, vol. 22, no. 12, pp. 1425–1439, 2002. [DOI] [PubMed] [Google Scholar]
[30].Reader AJ, Matthews JC, Sureau FC, Comtat C, Trebossen R, and Buvat I, “Fully 4d image reconstruction by estimation of an input function and spectral coefficients,” in Nuclear Science Symposium Conference Record, 2007. NSS’07. IEEE, vol. 5 IEEE, 2007, pp. 3260–3267. [Google Scholar]
[31].Heins P, Moeller M, and Burger M, “Locally sparse reconstruction using the ℓ1,1-norm,” arXiv preprint arXiv:1405.5908, 2014. [Google Scholar]
[32].Malioutov D, Cetin M, and Willsky AS, “A sparse signal reconstruction perspective for source localization with sensor arrays,” IEEE Trans. Signal Process, vol. 53, no. 8, pp. 3010–3022, Aug. 2005. [Google Scholar]
[33].Borcea L and Kocyigit I, “Resolution analysis of imaging with \ell 1 optimization,” SIAM Journal on Imaging Sciences, vol. 8, no. 4, pp. 3015–3050, 2015. [Google Scholar]
[34].Mamonov AV and Tsai YR, “Point source identification in nonlinear advection–diffusion–reaction systems,” Inverse Problems, vol. 29, no. 3, p. 035009, 2013. [Google Scholar]
[35].Pieper K, Tang BQ, Trautmann P, and Walter D, “Inverse point source location with the Helmholtz equation on a bounded domain,” arXiv preprint arXiv:1805.03310, 2018. [Google Scholar]
[36].Rudin W, Real and Complex Analysis, ser. Mathematics series; McGraw-Hill, 1987. [Online]. Available: https://books.google.com/books?id=NmW7QgAACAAJ [Google Scholar]
[37].Folland G, Real Analysis: Modern Techniques and Their Applications, ser Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts. Wiley, 2013. [Online]. Available: https://books.google.com/books?id=wI4fAwAAQBAJ [Google Scholar]
[38].Donoho DL, “Compressed sensing,” IEEE Transactions on information theory, vol. 52, no. 4, pp. 1289–1306, 2006. [Google Scholar]
[39].Candès EJ and Wakin MB, “An introduction to compressive sampling,” IEEE signal processing magazine, vol. 25, no. 2, pp. 21–30, 2008. [Google Scholar]
[40].Foucart S and Rauhut H, A mathematical introduction to compressive sensing. Birkh{ä}user Basel, 2013, vol. 1, no. 3. [Google Scholar]
[41].Donoho DL and Elad M, “Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization,” Proceedings of the National Academy of Sciences, vol. 100, no. 5, pp. 2197–2202, 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
[42].Donoho DL and Huo X, “Uncertainty principles and ideal atomic decomposition,” Information Theory, IEEE Transactions on, vol. 47, no. 7, pp. 2845–2862, 2001. [Google Scholar]
[43].Gribonval R and Nielsen M, “Sparse representations in unions of bases,” IEEE transactions on Information theory, vol. 49, no. 12, pp. 3320–3325, 2003. [Google Scholar]
[44].Tropp JA, “Greed is good: algorithmic results for sparse approximation,” IEEE Trans. Inf. Thy, vol. 50, no. 10, pp. 2231–2242, 2004. [Google Scholar]
[45].Donoho DL, Elad M, and Temlyakov VN, “Stable recovery of sparse overcomplete representations in the presence of noise,” IEEE Transactions on information theory, vol. 52, no. 1, pp. 6–18, 2006. [Google Scholar]
[46].Tropp JA, “Just relax: convex programming methods for identifying sparse signals in noise,” IEEE Trans. Inf. Thy, vol. 52, no. 3, pp. 1030–1051, 2006. [Google Scholar]
[47].Needell D and Tropp JA, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,” Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 301–321, 2009. [Google Scholar]
[48].Candes EJ and Tao T, “Decoding by linear programming,” IEEE transactions on information theory, vol. 51, no. 12, pp. 4203–4215, 2005. [Google Scholar]
[49].Bickel PJ, Ritov Y, and Tsybakov AB, “Simultaneous analysis of lasso and {D}antzig selector,” The Annals of Statistics, pp. 1705–1732, 2009. [Google Scholar]
[50].Candes EJ, “The restricted isometry property and its implications for compressed sensing,” Comptes Rendus Mathematique, vol. 346, no. 9-10, pp. 589–592, 2008. [Google Scholar]
[51].Candès EJ and Fernandez-Granda C, “Towards a mathematical theory of super-resolution,” Communications on Pure and Applied Mathematics, vol. 67, no. 6, pp. 906–956. [Google Scholar]
[52].Moitra A, “Super-resolution, extremal functions and the condition number of Vandermonde matrices,” in Proceedings of the 47th Annual ACM Symposium on Theory of Computing (STOC), 2015. [Google Scholar]
[53].Slepian D, “Prolate spheroidal wave functions, Fourier analysis, and uncertainty. V - The discrete case,” Bell System Technical Journal, vol. 57, pp. 1371–1430, 1978. [Google Scholar]
[54].Bernstein B and Fernandez-Granda C, “Deconvolution of point sources: a sampling theorem and robustness guarantees,” arXiv preprint arXiv:1707.00808, 2017. [Google Scholar]
[55].Harris F, “On the use of windows for harmonic analysis with the discrete Fourier transform,” Proceedings of the IEEE, vol. 66, no. 1, pp. 51–83, 1978. [Google Scholar]
[56].Fernandez-Granda C, “Support detection in super-resolution,” in Proceedings of the 10th International Conference on Sampling Theory and Applications, 2013, pp. 145–148. [Google Scholar]
[57].Candès EJ and Fernandez-Granda C, “Super-resolution from noisy data,” Journal of Fourier Analysis and Applications, vol. 19, no. 6, pp. 1229–1254, 2013. [Google Scholar]
[58].Donoho DL and Elad M, “Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization,” Proceedings of the National Academy of Sciences, vol. 100, no. 5, pp. 2197–2202, 2003. [Online]. Available: http://www.pnas.org/content/100/5/2197 [DOI] [PMC free article] [PubMed] [Google Scholar]
[59].Dossal C, Peyr Gé, and J. Fadili, “A numerical exploration of compressed sampling recovery,” Linear Algebra and its Applications, vol. 432, no. 7, pp. 1663–1679, 2010. [Google Scholar]
[60].Chandrasekaran V, Recht B, Parrilo PA, and Willsky AS, “The convex geometry of linear inverse problems,” Foundations of Computational Mathematics, vol. 12, no. 6, pp. 805–849, 2012. [Google Scholar]
[61].Fernandez-Granda C, “Super-resolution of point sources via convex programming,” Information and Inference, vol. 5, no. 3, pp. 251–303, 2016. [Google Scholar]
[62].Tang G, Shah P, Bhaskar BN, and Recht B, “Robust line spectral estimation,” in Signals, Systems and Computers, 2014 48th Asilomar Conference on IEEE, 2014, pp. 301–305. [Google Scholar]
[63].Duval V and Peyré G, “Exact support recovery for sparse spikes deconvolution,” Foundations of Computational Mathematics, pp. 1–41, 2015. [Google Scholar]
[64].Tang G, Bhaskar B, Shah P, and Recht B, “Compressed sensing off the grid,” Information Theory, IEEE Transactions on, vol. 59, no. 11, pp. 7465–7490, November 2013. [Google Scholar]
[65].Fernandez-Granda C, Tang G, Wang X, and Zheng L, “Demixing sines and spikes: Robust spectral super-resolution in the presence of outliers,” Information and Inference: A Journal of the IMA, vol. 7, no. 1, pp. 105–168, 2017. [Google Scholar]
[66].Schiebinger G, Robeva E, and Recht B, “Superresolution without separation,” Information and Inference, 2017. [Google Scholar]
[67].Eftekhari A, Tanner J, Thompson A, Toader B, and Tyagi H, “Spa rse non-negative super-resolution-simplified and stabilised,” arXiv preprint arXiv:1804.01490, 2018. [Google Scholar]
[68].Poon C, Keriven N, and Peyré G, “A dual certificates analysis of compressive off-the-grid recovery,” arXiv preprint arXiv:1802.08464, 2018. [Google Scholar]
[69].Bendory T, Dekel S, and Feuer A, “Robust recovery of stream of pulses using convex optimization,” Journal of Mathematical Analysis and Applications, vol. 442, no. 2, pp. 511–536, 2016. [Google Scholar]
[70].Tang G and Recht B, “Atomic decomposition of mixtures of translation-invariant signals,” IEEE CAMSAP, 2013. [Google Scholar]
[71].Hettich R and Kortanek KO, “Semi-infinite programming: theory, methods, and applications,” SIAM review, vol. 35, no. 3, pp. 380–429, 1993. [Google Scholar]
[72].López M and Still G, “Semi-infinite programming,” European Journal of Operational Research, vol. 180, no. 2, pp. 491–518, 2007. [Google Scholar]
[73].Denoyelle Q, Duval V, Peyré G, and Soubies E, “The sliding frankwolfe algorithm and its application to super-resolution microscopy,” Inverse Problems, 2019. [Google Scholar]
[74].Eftekhari A and Thompson A, “Sparse inverse problems over measures: Equivalence of the conditional gradient and exchange methods,” SIAM Journal on Optimization, vol. 29, no. 2, pp. 1329–1349, 2019. [Google Scholar]
[75].Boyd N, Schiebinger G, and Recht B, “The alternating descent conditional gradient method for sparse inverse problems,” SIAM Journal on Optimization, vol. 27, no. 2, pp. 616–639, 2017. [Google Scholar]
[76].Frank M and Wolfe P, “An algorithm for quadratic programming,” Naval research logistics quarterly, vol. 3, no. 1-2, pp. 95–110, 1956. [Google Scholar]
[77].de Prony BGR, “Essai éxperimental et analytique: sur les lois d e la dilatabilité de fluides élastique et sur celles de la force expansive de la vapeur de l’alkool, à différentes températures,” Journal de l’école Polytechnique, vol. 1, no. 22, pp. 24–76, 1795. [Google Scholar]
[78].Vetterli M, Marziliano P, and Blu T, “Sampling signals with finite rate of innovation,” IEEE transactions on Signal Processing, vol. 50, no. 6, pp. 1417–1428, 2002. [Google Scholar]
[79].Dragotti PL, Vetterli M, and Blu T, “Sampling moments and reconstructing signals of finite rate of innovation: Shannon meets strang–fix,” IEEE Transactions on Signal Processing, vol. 55, no. 5, pp. 1741–1757, 2007. [Google Scholar]
[80].Urigüen JA, Blu T, and Dragotti PL, “Fri sampling with arbitrary kernels,” IEEE Transactions on Signal Processing, vol. 61, no. 21, pp. 5310–5323, 2013. [Google Scholar]
[81].Pan H, Blu T, and Vetterli M, “Towards generalized fri sampling with an application to source resolution in radioastronomy,” IEEE Transactions on Signal Processing, vol. 65, no. 4, pp. 821–835. [Google Scholar]
[82].Murray-Bruce J and Dragotti PL, “A universal sampling framework for solving physics-driven inverse source problems,” arXiv preprint arXiv:1702.05019, 2017. [Google Scholar]
[83].Candès EJ and Tao T, “Decoding by linear programming,” Information Theory, IEEE Transactions on, vol. 51, no. 12, pp. 4203–4215, 2005. [Google Scholar]
[84].Candes EJ and Plan Y, “A probabilistic and RIPless theory of compressed sensing,” Information Theory, IEEE Transactions on, vol. 57, no. 11, pp. 7235–7254, 2011. [Google Scholar]
[85].Candes E and Recht B, “Exact matrix completion via convex optimization,” Communications of the ACM, vol. 55, no. 6, pp. 111–119, 2012. [Google Scholar]
[86].Candes EJ, Eldar YC, Strohmer T, and Voroninski V, “Phase retrieval via matrix completion,” SIAM review, vol. 57, no. 2, pp. 225–251, 2015. [Google Scholar]
[87].Vershynin R, “Introduction to the non-asymptotic analysis of random matrices,” arXiv preprint arXiv:1011.3027, 2010. [Google Scholar]
[88].Grant M and Boyd S, “CVX: Matlab software for disciplined convex programming, version 1.21,” http://cvxr.com/cvx, Apr. 2011. [Google Scholar]
[89].Baillet S, Mosher JC, and Leahy RM, “Electromagnetic brain mapping,” IEEE Signal processing magazine, vol. 18, no. 6, pp. 14–30, 2001. [Google Scholar]
[90].Tadel F, Baillet S, Mosher JC, Pantazis D, and Leahy RM, “Brainstorm: a user-friendly application for meg/eeg analysis,” Computational intelligence and neuroscience, vol. 2011, p. 8, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
[91].Fonov V, Evans AC, Botteron K, Almli CR, McKinstry RC, Collins DL, Group BDC et al. , “Unbiased average age-appropriate atlases for pediatric studies,” Neuroimage, vol. 54, no. 1, pp. 313–327, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
[92].Gramfort A, Papadopoulo T, Olivi E, and Clerc M, “Openmeeg: opensource software for quasistatic bioelectromagnetics,” Biomedical engineering online, vol. 9, no. 1, p. 45, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[93].Li Q and Tang G, “Approximate support recovery of atomic line spectral estimation: A tale of resolution and precision,” Applied and Computational Harmonic Analysis, 2018. [Google Scholar]
[94].Candes EJ, Wakin MB, and Boyd SP, “Enhancing sparsity by reweighted ℓ1 minimization,” Journal of Fourier analysis and applications, vol. 14, no. 5-6, pp. 877–905, 2008. [Google Scholar]
[95].Candes EJ, Li X, and Soltanolkotabi M, “Phase retrieval via wirtinger flow: Theory and algorithms,” IEEE Transactions on Information Theory, vol. 61, no. 4, pp. 1985–2007, 2015. [Google Scholar]
[96].Ma C, Wang K, Chi Y, and Chen Y, “Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval, matrix completion and blind deconvolution,” arXiv preprint arXiv:1711.10467, 2017. [Google Scholar]
[97].Boyd N, Jonas E, Babcock HP, and Recht B, “Deeploco: Fast 3d localization microscopy using neural networks,” BioRxiv, p. 267096, 2018. [Google Scholar]
[98].Izacard G, Bernstein B, and Fernandez-Granda C, “A learning-based framework for line-spectra super-resolution,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2018, 2018. [Google Scholar]
[99].Reed M and Simon B, I: Functional Analysis, ser. Methods of Modern Mathematical Physics Elsevier Science, 1981. [Online]. Available: https://books.google.com/books?id=rpFTTjxOYpsC [Google Scholar]
[100].Zhang F, Axler S, and Gehring F, Matrix Theory: Basic Results and Techniques, ser. Universitext (Berlin. Print) Springer, 1999. [Online]. Available: https://books.google.com/books?id=z2hOMmPISNoC [Google Scholar]

[R1] [1].Golub GH and Pereyra V, “The differentiation of pseudo-inverses and nonlinear least squares problems whose variables separate,” SIAM Journal on numerical analysis, vol. 10, no. 2, pp. 413–432, 1973. [Google Scholar]

[R2] [2].Golub GH, “Separable nonlinear least squares: the variable projection method and its applications,” Inverse problems, vol. 19, no. 2, p. R1, 2003. [Google Scholar]

[R3] [3].Ricker N, “The form and laws of propagation of seismic wavelets,” Geophysics, vol. 18, no. 1, pp. 10–40, 1953. [Google Scholar]

[R4] [4].Betzig E, Patterson GH, Sougrat R, Lindwasser OW, Olenych S, Bonifacino JS, Davidson MW, Lippincott-Schwartz J, and Hess HF, “Imaging intracellular fluorescent proteins at nanometer resolution,” Science, vol. 313, no. 5793, pp. 1642–1645, 2006. [DOI] [PubMed] [Google Scholar]

[R5] [5].Hess ST, Girirajan TP, and Mason MD, “Ultra-high resolution imaging by fluorescence photoactivation localization microscopy,” Biophysical journal, vol. 91, no. 11, p. 4258, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] [6].Puschmann KG and Kneer F, “On super-resolution in astronomical imaging,” Astronomy and Astrophysics, vol. 436, pp. 373–378, 2005. [Google Scholar]

[R7] [7].Sheriff RE and Geldart LP, Exploration seismology. Cambridge university press, 1995. [Google Scholar]

[R8] [8].Stoica P and Moses RL, Spectral analysis of signals, 1st ed Upper Saddle River, New Jersey: Prentice Hall, 2005. [Google Scholar]

[R9] [9].Li Y, Osher S, and Tsai R, “Heat source identification based on constrained minimization,” Inverse Problems and Imaging, vol. 8, no. 1, pp. 199–221, 2014. [Google Scholar]

[R10] [10].Niedermeyer E and da Silva FL, Electroencephalography: basic principles, clinical applications, and related fields. Lippincott Williams & Wilkins, 2005. [Google Scholar]

[R11] [11].Michel CM, Murray MM, Lantz G, Gonzalez S, Spinelli L, and de Peralta RG, “Eeg source imaging,” Clinical neurophysiology, vol. 115, no. 10, pp. 2195–2222, 2004. [DOI] [PubMed] [Google Scholar]

[R12] [12].Nunez PL and Srinivasan R, Electric fields of the brain: the neurophysics of EEG. Oxford University Press, USA, 2006. [Google Scholar]

[R13] [13].Nishimura DG, Principles of magnetic resonance imaging. Stanford University, 1996. [Google Scholar]

[R14] [14].Ma D, Gulani V, Seiberlich N, Liu K, Sunshine JL, Duerk JL, and Griswold MA, “Magnetic resonance fingerprinting,” Nature, vol. 495, no. 7440, p. 187, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] [15].McGivney D, Deshmane A, Jiang Y, Ma D, Badve C, Sloan A, Gulani V, and Griswold M, “Bayesian estimation of multicomponent relaxation parameters in magnetic resonance fingerprinting,” Magnetic Resonance in Medicine, 2017. [Online]. Available: 10.1002/mrm.27017 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] [16].Tang S, Fernandez-Granda C, Lannuzel S, Bernstein B, Lattanzi R, Cloos M, Knoll F, and Assländer J, “Multicompartment magnetic resonance fingerprinting,” Inverse problems, vol. 34, no. 9, p. 4005, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] [17].Bloch F, “Nuclear induction,” Physical review, vol. 70, no. 7-8, p. 460, 1946. [Google Scholar]

[R18] [18].Taylor HL, Banks SC, and McCoy JF, “Deconvolution with the l1 norm,” Geophysics, vol. 44, no. 1, pp. 39–52, 1979. [Google Scholar]

[R19] [19].Claerbout JF and Muir F, “Robust modeling with erratic data,” Geophysics, vol. 38, no. 5, pp. 826–844, 1973. [Google Scholar]

[R20] [20].Levy S and Fullagar PK, “Reconstruction of a sparse spike train from a portion of its spectrum and application to high-resolution deconvolution,” Geophysics, vol. 46, no. 9, pp. 1235–1243, 1981. [Google Scholar]

[R21] [21].Santosa F and Symes WW, “Linear inversion of band-limited reflection seismograms,” SIAM Journal on Scientific and Statistical Computing, vol. 7, no. 4, pp. 1307–1330, 1986. [Google Scholar]

[R22] [22].Debeye H and Van Riel P, “Lp-norm deconvolution,” Geophysical Prospecting, vol. 38, no. 4, pp. 381–403, 1990. [Google Scholar]

[R23] [23].Zhao Y, Dinstel A, Azimi-Sadjadi MR, and Wachowski N, “Localization of near-field sources in sonar data using the sparse representation framework,” in IEEE OCEANS 2011, 2011, pp. 1–6. [Google Scholar]

[R24] [24].Bertin N, Daudet L, Emiya V, and Gribonval R, “Compressive sensing in acoustic imaging,” in Compressed Sensing and its Applications. Springer, 2015, pp. 169–192. [Google Scholar]

[R25] [25].Potter LC, Ertin E, Parker JT, and Cetin M, “Sparsity and compressed sensing in radar imaging,” Proceedings of the IEEE, vol. 98, no. 6, pp. 1006–1020, 2010. [Google Scholar]

[R26] [26].Tang Z, Blacqui Gère, and G. Leus, “Aliasing-free wideband beam-forming using sparse signal representation,” IEEE Transactions on Signal Processing, vol. 59, no. 7, pp. 3464–3469, 2011. [Google Scholar]

[R27] [27].Silva C, Maltez J, Trindade E, Arriaga A, and Ducla-Soares E, “Evaluation of l1 and l2 minimum norm performances on EEG localizations,” Clinical neurophysiology, vol. 115, no. 7, pp. 1657–1668, 2004. [DOI] [PubMed] [Google Scholar]

[R28] [28].Xu P, Tian Y, Chen H, and Yao D, “Lp norm iterative sparse solution for EEG source localization,” IEEE transactions on biomedical engineering, vol. 54, no. 3, pp. 400–409, 2007. [DOI] [PubMed] [Google Scholar]

[R29] [29].Gunn RN, Gunn SR, Turkheimer FE, Aston JA, and Cunningham VJ, “Positron emission tomography compartmental models: a basis pursuit strategy for kinetic modeling,” Journal of Cerebral Blood Flow & Metabolism, vol. 22, no. 12, pp. 1425–1439, 2002. [DOI] [PubMed] [Google Scholar]

[R30] [30].Reader AJ, Matthews JC, Sureau FC, Comtat C, Trebossen R, and Buvat I, “Fully 4d image reconstruction by estimation of an input function and spectral coefficients,” in Nuclear Science Symposium Conference Record, 2007. NSS’07. IEEE, vol. 5 IEEE, 2007, pp. 3260–3267. [Google Scholar]

[R31] [31].Heins P, Moeller M, and Burger M, “Locally sparse reconstruction using the ℓ1,1-norm,” arXiv preprint arXiv:1405.5908, 2014. [Google Scholar]

[R32] [32].Malioutov D, Cetin M, and Willsky AS, “A sparse signal reconstruction perspective for source localization with sensor arrays,” IEEE Trans. Signal Process, vol. 53, no. 8, pp. 3010–3022, Aug. 2005. [Google Scholar]

[R33] [33].Borcea L and Kocyigit I, “Resolution analysis of imaging with \ell 1 optimization,” SIAM Journal on Imaging Sciences, vol. 8, no. 4, pp. 3015–3050, 2015. [Google Scholar]

[R34] [34].Mamonov AV and Tsai YR, “Point source identification in nonlinear advection–diffusion–reaction systems,” Inverse Problems, vol. 29, no. 3, p. 035009, 2013. [Google Scholar]

[R35] [35].Pieper K, Tang BQ, Trautmann P, and Walter D, “Inverse point source location with the Helmholtz equation on a bounded domain,” arXiv preprint arXiv:1805.03310, 2018. [Google Scholar]

[R36] [36].Rudin W, Real and Complex Analysis, ser. Mathematics series; McGraw-Hill, 1987. [Online]. Available: https://books.google.com/books?id=NmW7QgAACAAJ [Google Scholar]

[R37] [37].Folland G, Real Analysis: Modern Techniques and Their Applications, ser Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts. Wiley, 2013. [Online]. Available: https://books.google.com/books?id=wI4fAwAAQBAJ [Google Scholar]

[R38] [38].Donoho DL, “Compressed sensing,” IEEE Transactions on information theory, vol. 52, no. 4, pp. 1289–1306, 2006. [Google Scholar]

[R39] [39].Candès EJ and Wakin MB, “An introduction to compressive sampling,” IEEE signal processing magazine, vol. 25, no. 2, pp. 21–30, 2008. [Google Scholar]

[R40] [40].Foucart S and Rauhut H, A mathematical introduction to compressive sensing. Birkh{ä}user Basel, 2013, vol. 1, no. 3. [Google Scholar]

[R41] [41].Donoho DL and Elad M, “Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization,” Proceedings of the National Academy of Sciences, vol. 100, no. 5, pp. 2197–2202, 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] [42].Donoho DL and Huo X, “Uncertainty principles and ideal atomic decomposition,” Information Theory, IEEE Transactions on, vol. 47, no. 7, pp. 2845–2862, 2001. [Google Scholar]

[R43] [43].Gribonval R and Nielsen M, “Sparse representations in unions of bases,” IEEE transactions on Information theory, vol. 49, no. 12, pp. 3320–3325, 2003. [Google Scholar]

[R44] [44].Tropp JA, “Greed is good: algorithmic results for sparse approximation,” IEEE Trans. Inf. Thy, vol. 50, no. 10, pp. 2231–2242, 2004. [Google Scholar]

[R45] [45].Donoho DL, Elad M, and Temlyakov VN, “Stable recovery of sparse overcomplete representations in the presence of noise,” IEEE Transactions on information theory, vol. 52, no. 1, pp. 6–18, 2006. [Google Scholar]

[R46] [46].Tropp JA, “Just relax: convex programming methods for identifying sparse signals in noise,” IEEE Trans. Inf. Thy, vol. 52, no. 3, pp. 1030–1051, 2006. [Google Scholar]

[R47] [47].Needell D and Tropp JA, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,” Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 301–321, 2009. [Google Scholar]

[R48] [48].Candes EJ and Tao T, “Decoding by linear programming,” IEEE transactions on information theory, vol. 51, no. 12, pp. 4203–4215, 2005. [Google Scholar]

[R49] [49].Bickel PJ, Ritov Y, and Tsybakov AB, “Simultaneous analysis of lasso and {D}antzig selector,” The Annals of Statistics, pp. 1705–1732, 2009. [Google Scholar]

[R50] [50].Candes EJ, “The restricted isometry property and its implications for compressed sensing,” Comptes Rendus Mathematique, vol. 346, no. 9-10, pp. 589–592, 2008. [Google Scholar]

[R51] [51].Candès EJ and Fernandez-Granda C, “Towards a mathematical theory of super-resolution,” Communications on Pure and Applied Mathematics, vol. 67, no. 6, pp. 906–956. [Google Scholar]

[R52] [52].Moitra A, “Super-resolution, extremal functions and the condition number of Vandermonde matrices,” in Proceedings of the 47th Annual ACM Symposium on Theory of Computing (STOC), 2015. [Google Scholar]

[R53] [53].Slepian D, “Prolate spheroidal wave functions, Fourier analysis, and uncertainty. V - The discrete case,” Bell System Technical Journal, vol. 57, pp. 1371–1430, 1978. [Google Scholar]

[R54] [54].Bernstein B and Fernandez-Granda C, “Deconvolution of point sources: a sampling theorem and robustness guarantees,” arXiv preprint arXiv:1707.00808, 2017. [Google Scholar]

[R55] [55].Harris F, “On the use of windows for harmonic analysis with the discrete Fourier transform,” Proceedings of the IEEE, vol. 66, no. 1, pp. 51–83, 1978. [Google Scholar]

[R56] [56].Fernandez-Granda C, “Support detection in super-resolution,” in Proceedings of the 10th International Conference on Sampling Theory and Applications, 2013, pp. 145–148. [Google Scholar]

[R57] [57].Candès EJ and Fernandez-Granda C, “Super-resolution from noisy data,” Journal of Fourier Analysis and Applications, vol. 19, no. 6, pp. 1229–1254, 2013. [Google Scholar]

[R58] [58].Donoho DL and Elad M, “Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization,” Proceedings of the National Academy of Sciences, vol. 100, no. 5, pp. 2197–2202, 2003. [Online]. Available: http://www.pnas.org/content/100/5/2197 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R59] [59].Dossal C, Peyr Gé, and J. Fadili, “A numerical exploration of compressed sampling recovery,” Linear Algebra and its Applications, vol. 432, no. 7, pp. 1663–1679, 2010. [Google Scholar]

[R60] [60].Chandrasekaran V, Recht B, Parrilo PA, and Willsky AS, “The convex geometry of linear inverse problems,” Foundations of Computational Mathematics, vol. 12, no. 6, pp. 805–849, 2012. [Google Scholar]

[R61] [61].Fernandez-Granda C, “Super-resolution of point sources via convex programming,” Information and Inference, vol. 5, no. 3, pp. 251–303, 2016. [Google Scholar]

[R62] [62].Tang G, Shah P, Bhaskar BN, and Recht B, “Robust line spectral estimation,” in Signals, Systems and Computers, 2014 48th Asilomar Conference on IEEE, 2014, pp. 301–305. [Google Scholar]

[R63] [63].Duval V and Peyré G, “Exact support recovery for sparse spikes deconvolution,” Foundations of Computational Mathematics, pp. 1–41, 2015. [Google Scholar]

[R64] [64].Tang G, Bhaskar B, Shah P, and Recht B, “Compressed sensing off the grid,” Information Theory, IEEE Transactions on, vol. 59, no. 11, pp. 7465–7490, November 2013. [Google Scholar]

[R65] [65].Fernandez-Granda C, Tang G, Wang X, and Zheng L, “Demixing sines and spikes: Robust spectral super-resolution in the presence of outliers,” Information and Inference: A Journal of the IMA, vol. 7, no. 1, pp. 105–168, 2017. [Google Scholar]

[R66] [66].Schiebinger G, Robeva E, and Recht B, “Superresolution without separation,” Information and Inference, 2017. [Google Scholar]

[R67] [67].Eftekhari A, Tanner J, Thompson A, Toader B, and Tyagi H, “Spa rse non-negative super-resolution-simplified and stabilised,” arXiv preprint arXiv:1804.01490, 2018. [Google Scholar]

[R68] [68].Poon C, Keriven N, and Peyré G, “A dual certificates analysis of compressive off-the-grid recovery,” arXiv preprint arXiv:1802.08464, 2018. [Google Scholar]

[R69] [69].Bendory T, Dekel S, and Feuer A, “Robust recovery of stream of pulses using convex optimization,” Journal of Mathematical Analysis and Applications, vol. 442, no. 2, pp. 511–536, 2016. [Google Scholar]

[R70] [70].Tang G and Recht B, “Atomic decomposition of mixtures of translation-invariant signals,” IEEE CAMSAP, 2013. [Google Scholar]

[R71] [71].Hettich R and Kortanek KO, “Semi-infinite programming: theory, methods, and applications,” SIAM review, vol. 35, no. 3, pp. 380–429, 1993. [Google Scholar]

[R72] [72].López M and Still G, “Semi-infinite programming,” European Journal of Operational Research, vol. 180, no. 2, pp. 491–518, 2007. [Google Scholar]

[R73] [73].Denoyelle Q, Duval V, Peyré G, and Soubies E, “The sliding frankwolfe algorithm and its application to super-resolution microscopy,” Inverse Problems, 2019. [Google Scholar]

[R74] [74].Eftekhari A and Thompson A, “Sparse inverse problems over measures: Equivalence of the conditional gradient and exchange methods,” SIAM Journal on Optimization, vol. 29, no. 2, pp. 1329–1349, 2019. [Google Scholar]

[R75] [75].Boyd N, Schiebinger G, and Recht B, “The alternating descent conditional gradient method for sparse inverse problems,” SIAM Journal on Optimization, vol. 27, no. 2, pp. 616–639, 2017. [Google Scholar]

[R76] [76].Frank M and Wolfe P, “An algorithm for quadratic programming,” Naval research logistics quarterly, vol. 3, no. 1-2, pp. 95–110, 1956. [Google Scholar]

[R77] [77].de Prony BGR, “Essai éxperimental et analytique: sur les lois d e la dilatabilité de fluides élastique et sur celles de la force expansive de la vapeur de l’alkool, à différentes températures,” Journal de l’école Polytechnique, vol. 1, no. 22, pp. 24–76, 1795. [Google Scholar]

[R78] [78].Vetterli M, Marziliano P, and Blu T, “Sampling signals with finite rate of innovation,” IEEE transactions on Signal Processing, vol. 50, no. 6, pp. 1417–1428, 2002. [Google Scholar]

[R79] [79].Dragotti PL, Vetterli M, and Blu T, “Sampling moments and reconstructing signals of finite rate of innovation: Shannon meets strang–fix,” IEEE Transactions on Signal Processing, vol. 55, no. 5, pp. 1741–1757, 2007. [Google Scholar]

[R80] [80].Urigüen JA, Blu T, and Dragotti PL, “Fri sampling with arbitrary kernels,” IEEE Transactions on Signal Processing, vol. 61, no. 21, pp. 5310–5323, 2013. [Google Scholar]

[R81] [81].Pan H, Blu T, and Vetterli M, “Towards generalized fri sampling with an application to source resolution in radioastronomy,” IEEE Transactions on Signal Processing, vol. 65, no. 4, pp. 821–835. [Google Scholar]

[R82] [82].Murray-Bruce J and Dragotti PL, “A universal sampling framework for solving physics-driven inverse source problems,” arXiv preprint arXiv:1702.05019, 2017. [Google Scholar]

[R83] [83].Candès EJ and Tao T, “Decoding by linear programming,” Information Theory, IEEE Transactions on, vol. 51, no. 12, pp. 4203–4215, 2005. [Google Scholar]

[R84] [84].Candes EJ and Plan Y, “A probabilistic and RIPless theory of compressed sensing,” Information Theory, IEEE Transactions on, vol. 57, no. 11, pp. 7235–7254, 2011. [Google Scholar]

[R85] [85].Candes E and Recht B, “Exact matrix completion via convex optimization,” Communications of the ACM, vol. 55, no. 6, pp. 111–119, 2012. [Google Scholar]

[R86] [86].Candes EJ, Eldar YC, Strohmer T, and Voroninski V, “Phase retrieval via matrix completion,” SIAM review, vol. 57, no. 2, pp. 225–251, 2015. [Google Scholar]

[R87] [87].Vershynin R, “Introduction to the non-asymptotic analysis of random matrices,” arXiv preprint arXiv:1011.3027, 2010. [Google Scholar]

[R88] [88].Grant M and Boyd S, “CVX: Matlab software for disciplined convex programming, version 1.21,” http://cvxr.com/cvx, Apr. 2011. [Google Scholar]

[R89] [89].Baillet S, Mosher JC, and Leahy RM, “Electromagnetic brain mapping,” IEEE Signal processing magazine, vol. 18, no. 6, pp. 14–30, 2001. [Google Scholar]

[R90] [90].Tadel F, Baillet S, Mosher JC, Pantazis D, and Leahy RM, “Brainstorm: a user-friendly application for meg/eeg analysis,” Computational intelligence and neuroscience, vol. 2011, p. 8, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R91] [91].Fonov V, Evans AC, Botteron K, Almli CR, McKinstry RC, Collins DL, Group BDC et al. , “Unbiased average age-appropriate atlases for pediatric studies,” Neuroimage, vol. 54, no. 1, pp. 313–327, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R92] [92].Gramfort A, Papadopoulo T, Olivi E, and Clerc M, “Openmeeg: opensource software for quasistatic bioelectromagnetics,” Biomedical engineering online, vol. 9, no. 1, p. 45, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R93] [93].Li Q and Tang G, “Approximate support recovery of atomic line spectral estimation: A tale of resolution and precision,” Applied and Computational Harmonic Analysis, 2018. [Google Scholar]

[R94] [94].Candes EJ, Wakin MB, and Boyd SP, “Enhancing sparsity by reweighted ℓ1 minimization,” Journal of Fourier analysis and applications, vol. 14, no. 5-6, pp. 877–905, 2008. [Google Scholar]

[R95] [95].Candes EJ, Li X, and Soltanolkotabi M, “Phase retrieval via wirtinger flow: Theory and algorithms,” IEEE Transactions on Information Theory, vol. 61, no. 4, pp. 1985–2007, 2015. [Google Scholar]

[R96] [96].Ma C, Wang K, Chi Y, and Chen Y, “Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval, matrix completion and blind deconvolution,” arXiv preprint arXiv:1711.10467, 2017. [Google Scholar]

[R97] [97].Boyd N, Jonas E, Babcock HP, and Recht B, “Deeploco: Fast 3d localization microscopy using neural networks,” BioRxiv, p. 267096, 2018. [Google Scholar]

[R98] [98].Izacard G, Bernstein B, and Fernandez-Granda C, “A learning-based framework for line-spectra super-resolution,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2018, 2018. [Google Scholar]

[R99] [99].Reed M and Simon B, I: Functional Analysis, ser. Methods of Modern Mathematical Physics Elsevier Science, 1981. [Online]. Available: https://books.google.com/books?id=rpFTTjxOYpsC [Google Scholar]

[R100] [100].Zhang F, Axler S, and Gehring F, Matrix Theory: Basic Results and Techniques, ser. Universitext (Berlin. Print) Springer, 1999. [Online]. Available: https://books.google.com/books?id=z2hOMmPISNoC [Google Scholar]

PERMALINK

Sparse Recovery Beyond Compressed Sensing: Separable Nonlinear Inverse Problems

Brett Bernstein

Sheng Liu

Chrysa Papadaniil

Carlos Fernandez-Granda

Abstract

I. Introduction

A. Separable Nonlinear Inverse Problems

Fig. 1:

Fig. 2:

B. Reformulation as a Sparse-Recovery Problem

Fig. 3:

C. Compressed Sensing

Fig. 4:

Fig. 5:

Fig. 6:

D. Beyond Sparsity and Randomness: Separation and Correlation Decay

E. Organization

II. Main Results

A. Correlation decay

Fig. 8:

Fig. 7:

Fig. 9:

B. Exact Recovery for SNL Problems with Uniform Correlation Decay

Fig. 10:

C. Exact Recovery for SNL Problems with Nonuniform Correlation Decay

Fig. 11:

Fig. 13:

D. Robustness to Noise

Fig. 14:

E. Discretization

F. Related Work

1). Sparse Recovery via Convex Programming from Deterministic Measurements:

2). Convex Programming Applied to Specific SNL Problems:

3). Other Methodologies:

III. Proof of Exact-Recovery Results

A. Dual Certificates

B. Correlation-Based Dual Certificates

Fig. 15:

C. Proof of Lemma III.2

D. Proof of Lemma III.3

Bounding |Q(θ)| Outside the Near Region

Bounding Q(2)(θ) in the Near Region

E. Extensions to Higher Dimensions

Fig. 16:

IV. Numerical Experiments

A. Heat-source localization

Fig. 17:

Fig. 18:

Fig. 19:

B. Estimation of Brain Activity via Electroencephalography

Fig. 20:

V. Conclusion and Future Work

Fig. 12:

Acknowledgment

Biographies

Appendix A

Proof of Proposition III.1

Appendix B

Proof of Lemma III.5

Appendix C

Proof of Lemma III.7

Appendix D

Proof of Lemma III.8

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Bounding Q⁽²⁾(θ) in the Near Region