Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2020 May 11;117(21):11226–11232. doi: 10.1073/pnas.1913995117

The Noise Collector for sparse recovery in high dimensions

Miguel Moscoso a,1, Alexei Novikov b, George Papanicolaou c,1, Chrysoula Tsogka d
PMCID: PMC7260980  PMID: 32393628

Significance

The ability to detect sparse signals from noisy, high-dimensional data is a top priority in modern science and engineering. For optimal results, current approaches need to tune parameters that depend on the level of noise, which is often difficult to estimate. We develop a parameter-free, computationally efficient, 1-norm minimization approach that has a zero false discovery rate (no false positives) with high probability for any level of noise while it detects the exact location of sparse signals when the noise is not too large.

Keywords: high-dimensional probability, convex geometry, sparsity-promoting algorithms, noisy data

Abstract

The ability to detect sparse signals from noisy, high-dimensional data is a top priority in modern science and engineering. It is well known that a sparse solution of the linear system Aρ=b0 can be found efficiently with an 1-norm minimization approach if the data are noiseless. However, detection of the signal from data corrupted by noise is still a challenging problem as the solution depends, in general, on a regularization parameter with optimal value that is not easy to choose. We propose an efficient approach that does not require any parameter estimation. We introduce a no-phantom weight τ and the Noise Collector matrix C and solve an augmented system Aρ+Cη=b0+e, where e is the noise. We show that the 1-norm minimal solution of this system has zero false discovery rate for any level of noise, with probability that tends to one as the dimension of b0 increases to infinity. We obtain exact support recovery if the noise is not too large and develop a fast Noise Collector algorithm, which makes the computational cost of solving the augmented system comparable with that of the original one. We demonstrate the effectiveness of the method in applications to passive array imaging.


We want to find sparse solutions ρRK for

Aρ=b [1]

from highly incomplete measurement data b=b0+eRN corrupted by noise e, where 1N<K. In the noiseless case, ρ can be found exactly by solving the optimization problem (1)

ρ*=argminρρ1,subject toAρ=b, [2]

provided that the measurement matrix ARN×K satisfies additional conditions (e.g., decoherence or restricted isometry properties) (2, 3) and that the solution vector ρ has a small number M of nonzero components or degrees of freedom. When measurements are noisy, exact recovery is no longer possible. However, the exact support of ρ can still be determined if the noise is not too strong. The most commonly used approach is to solve the 2-relaxed form of Eq. 2:

ρλ=argminρλρ1+Aρb22, [3]

which is known as Lasso in the statistics literature (4). There are sufficient conditions for the support of ρλ to be contained within the true support [e.g., the works of Fuchs (5), Tropp (6), Wainwright (7), and Maleki et al. (8)]. These conditions depend on the signal-to-noise ratio (SNR), which is not known and must be estimated, and on the regularization parameter λ, which must be carefully chosen and/or adaptively changed (9). Although such an adaptive procedure improves the outcome, the resulting solutions tend to include a large number of “false positives” in practice (10). Belloni et al. (11) proposed to solve the square root Lasso minimization problem instead of Eq. 3, which makes the regularization parameter λ independent of the SNR. Our contribution is a computationally efficient method for exact support recovery with no false positives in noisy settings. It also does not require an estimate on SNR.

Main Results

Suppose that ρ is an M-sparse solution of system [1] with no noise, where the columns of A have unit length. Our main result ensures that we can still recover the support of ρ when the data are noisy by looking at the support of ρτ found as

ρτ,ητ=argminρ,ητρ1+η1,subject toAρ+Cη=b0+e, [4]

with an O(1) weight τ and an appropriately chosen Noise Collector matrix CRN×Σ, ΣK. The minimization problem [4] can be understood as a relaxation of [2] as it works by absorbing all of the noise and possibly, some signal in Cητ.

The following theorem shows that, if the signal is pure noise and the columns of C are chosen independently and at random on the unit sphere SN1={xRN,x2=1}, then Cητ=e for any level of noise, with large probability.

Theorem 1 (No-Phantom Signal). Suppose that b0=0 and that e/e2 is uniformly distributed on SN1. Fix β>1, and draw Σ=Nβ columns for C independently from the uniform distribution on SN1. For any κ>0, there are constants τ=τ(κ,β) and N0=N0(κ,β) such that, for all N>N0, ρτ, the solution of Eq. 4, is 0 with probability 11/Nκ.

This theorem guarantees with large probability a zero false discovery rate in the absence of signals with meaningful information. The key to a zero false discovery rate is the choice of a no-phantom weight τ. Next, we generalize this result for the case in which the recorded signals carry useful information.

Theorem 2 (Zero False Discoveries). Let ρ be an M-sparse solution of the noiseless system Aρ=b0. Assume that κ, β, the Noise Collector, and the noise are the same as in Theorem 1. In addition, assume that the columns of A are incoherent in the sense that |ai,aj|13M. Then, there are constants τ=τ(κ,β) and N0=N0(κ,β) such that supp(ρτ)supp(ρ) for all N>N0 with probability 11/Nκ.

This theorem holds for any level of noise and the same value of τ as in Theorem 1. The incoherence conditions in Theorem 2 are needed to guarantee that the true signal does not create false positives elsewhere. Theorem 2 guarantees that the support of ρτ is inside the support of ρ. The next theorem shows that, if the noise is not too large, then ρτ and ρ have exactly the same support.

Theorem 3 (Exact Support Recovery). Keep the same assumptions as in Theorem 2. Let γ=minisupp(ρ)|ρi|/ρ. There are constants τ=τ(κ,β), c1=c1(κ,β,γ), and N0=N0(κ,β) such that, if the noise level satisfies e2c1b022ρ11N/lnN, then for all N>N0, supp(ρτ)=supp(ρ) with probability 11/Nκ.

To elucidate an interpretation of the last theorem, consider a model case where A is the identity matrix and all coefficients of b0=ρ are either one or zero. Then, b022=ρ1=M. In this case, an acceptable relative level of noise is

e2/b02N/MlnN. [5]

This means that e2N/lnN, and it implies that each coefficient of b0 may be corrupted by O(1/lnN) on average and that some coefficients of b0 may be corrupted by O(1).

Motivation

We are interested in imaging sparse scenes accurately using limited and noisy data. Such imaging problems arise in many areas, such as medical imaging (12), structural biology (13), radar (14), and geophysics (15). In imaging, the 1-norm minimization method in Eq. 2 is often used (1621) as it has the desirable property of superresolution: that is, the enhancement of the fine-scale details of the images. This has been analyzed in different settings by Donoho (22), Candès and Fernandez-Granda (23), Fannjiang and Liao (24), and Borcea and Kocyigit (25) among others. We want to retain this property in our method when the data are corrupted by additive noise.

However, noise fundamentally limits the quality of the images formed with almost all computational imaging techniques. Specifically, 1-norm minimization produces images that are unstable for low SNR due to the ill conditioning of superresolution reconstruction schemes. The instability emerges as clutter noise in the image, or grass, that degrades the resolution. Our initial motivation to introduce the Noise Collector matrix C was to regularize the matrix A and thus, to suppress the clutter in the images. We proposed in ref. 26 to seek the minimal 1-norm solution of the augmented linear system Aρ+Cη=b. The idea was to choose the columns of C almost orthogonal to those of A. Indeed, the condition number of [A|C] becomes O(1) when O(N) columns of C are taken at random. This essentially follows from the bounds on the largest and the smallest nonzero singular values of random matrices (theorem 4.6.1 in ref. 27).

The idea to create a dictionary for noise is not new. For example, the work by Laska et al. (28) considers a specific version of the measurement noise model so that b=Aρ+Ce, where C is a matrix with fewer (orthonormal) columns than rows and the noise vector e is sparse. C represents the basis in which the noise is sparse and it is assumed to be known. Then, they show that it is possible to recover sparse signals and sparse noise exactly. We stress that we do not assume here that the noise is sparse. In our work, the noise is large (SNR can be small) and is evenly distributed across the data, and therefore, it cannot be sparsely accommodated.

To suppress the clutter, our theory in ref. 26 required exponentially many columns, and therefore, ΣeN. This seemed to make the Noise Collector impractical, but the numerical experiments suggested that O(N) columns were enough to obtain excellent results. We address this issue here and explain why the Noise Collector matrix C only needs algebraically many columns. Moreover, to absorb the noise completely and thus, improve the algorithm in ref. 26, we introduce now the no-phantom weight τ in Eq. 4. Indeed, by weighting the columns of the Noise Collector matrix C with respect to those in the model matrix A, the algorithm now produces images with no clutter at all regardless of how much noise is added to the data.

Finally, we want the Noise Collector to be efficient, with almost no extra computational cost with respect to the Lasso problem in Eq. 3. To this end, the Noise Collector is constructed using circulant matrices that allow for efficient matrix vector multiplications using fast Fourier transforms (FFTs).

We now explain how the Noise Collector works and reduce our theorems to basic estimates in high-dimensional probability.

The Noise Collector

The method has two main ingredients: the Noise Collector matrix C and the no-phantom weight τ. The construction of the Noise Collector matrix C starts with the following three key properties. First, its columns should be sufficiently orthogonal to the columns of A, and therefore, it does not absorb signals with “meaningful” information. Second, the columns of C should be uniformly distributed on the unit sphere SN1 so that we could approximate well a typical noise vector. Third, the number of columns in C should grow slower than exponential with N; otherwise, the method is impractical.

One way to guarantee all three properties is to impose

|ai,cj|<αNi,jand|ci,cj|<αNij [6]

with α>1 and fill out C drawing ci at random with rejections until the rejection rate becomes too high. Then, by construction, the columns of C are almost orthogonal to the columns of A, and when the rejection rate becomes too high, this implies that we cannot pack more N-dimensional unit vectors into C; thus, we can approximate well a typical noise vector. Finally, the Kabatjanskii–Levenstein inequality (discussed in ref. 29) implies that the number Σ of columns in C grows at most polynomially: ΣNα2. The first estimate in Eq. 6 implies that any solution Cη=ai satisfies, for any iK, η1N. This estimate measures how expensive it is to approximate columns of A (i.e., the meaningful signal) with the Noise Collector. In turn, the no-phantom weight τ should be chosen so that it is expensive to approximate noise using columns of A. It cannot be taken too large, however, because we may lose the signal. In fact, one can prove that, if τN/α, then ρτ0 for any ρ and any level of noise. Intuitively, τ characterizes the rate at which the signal is lost as the noise increases. The most important property of the no-phantom weight τ is that it does not depend on the level of noise, and therefore, it is chosen before we start using the Noise Collector.

It is, however, more convenient for the proofs to use a probabilistic version of Eq. 6. Suppose that the columns of C are drawn independently at random. Then, the dot product of any two random unit vectors is still typically of order 1/N (27). If the number of columns grows polynomially, we only have to sacrifice an asymptotically negligible event where our Noise Collector does not satisfy the three key properties, and the decoherence constraints in Eq. 6 are weakened by a logarithmic factor only. This follows from basic estimates in high-dimensional probability. We will state them in the next lemma after we interpret problem [4] geometrically.

Consider the convex hulls

H1=xRNx=i=1Σξici,i=1Σ|ξi|1, [7]
H2=xRNx=i=1Kξiai,i=1Σ|ξi|1, [8]

and H(τ)=ξh1/τ+(1ξ)h2,0ξ1,hiHi. Theorem 1 states that, for a typical noise vector e, we can find λ0>0 such that eλ0H1 and eλH(τ) for any λ<λ0.

Lemma 1 (Typical Width of Convex Hulls Hi). Suppose that Σ=Nβ, β>1; vectors ciSN1, i=1,2,,Σ, are drawn at random and independently; and eSN1. Then, for any κ>0, there are constants c0=c0(κ,β), α=(β1)/2, and N0=N0(κ,β) such that, for all NN0,

maxmaxiK(|ai,e|),maxiΣ(|ci,e|)<c0lnN/N [9]

and

αlnNe/NH1 [10]

with probability 11/Nκ.

We sketch the proof of estimates [9] and [10] in Proofs. Estimate [9] can also be derived from Milman’s version (30) of Dvoretzky’s theorem. Informally, inequality [9] states that H1 and H2 are contained in the 2 ball of radius c0lnN/N except for a few spikes in statistically insignificant directions (Fig. 1, Left). Inequality [10] states that H1 contains an 2 ball of radius αlnN/N except for a few statistically insignificant directions.

Fig. 1.

Fig. 1.

(Left) A convex hull H1 is an 2 ball of radius O(lnN/N) with few spikes. (Right) An intersection of H(τ) with the span (a1,e) is a rounded rhombus.

These inequalities immediately imply Theorem 1. We just need to explain how to choose the no-phantom weight τ. There will be no phantoms if H2/τ is strictly inside the 2 ball of radius αlnN/N. This could be done if τ>c0/α.

If columns of A are orthogonal to each other, then Theorem 2 follows from Theorem 1. We just need to project the linear system in Eq. 4 on the span of ai, isupp(ρ), and apply Theorem 1 to the projections. If, in addition, we assume that b0=a1ρ1, then Proof of Theorem 3 is illustrated in Fig. 1, Right. In detail, a typical intersection of V=span(a1,e), and H(τ) is a rounded rhombus because it is the convex hull of a1/τ and the 2 ball of radius c0lnN/N. If a1ρ1+eλ0H(τ), then there are two options: 1) a1ρ1+e lies on the curved boundary of the rounded rhombus, and then, supp(ρτ)=; or 2) a1ρ1+e lies on the flat boundary of the rounded rhombus, and then, supp(ρτ)=supp(ρ). The second option happens if the vector a1ρ1+e intersects the flat boundary of H(τ). This gives the support recovery estimate in Theorem 3.

In the general case, the columns of the combined matrix [A|C] are incoherent. This property allows us to prove Theorems 2 and 3 in Proofs using known techniques (26). In particular, we automatically have exact recovery using ref. 2 applied to [A|C] if the data are noiseless.

Lemma 2 (Exact Recovery). Suppose that ρ is an M-sparse solution of Aρ=b and that there is no noise so that e=0. In addition, assume that the columns of A are incoherent: |ai,aj|13M. Then, the solution to Eq. 4 satisfies ρτ=ρ for all

M<2N3c0τlnNwith probability11Nκ. [11]

Fast Noise Collector Algorithm

To find the minimizer in Eq. 4, we consider a variational approach. We define the function

F(ρ,η,z)=λ(τρ1+η1)+12Aρ+Cηb22+z,bAρCη [12]

for a no-phantom weight τ and determine the solution as

maxzminρ,ηF(ρ,η,z). [13]

The key observation is that this variational principle finds the minimum in Eq. 4 exactly for all values of the regularization parameter λ. Hence, the method has no tuning parameters. To determine the exact extremum in Eq. 13, we use the iterative soft thresholding algorithm GeLMA (generalized Lagrangian multiplier algorithm) (31) that works as follows.

First, pick a value for β and τ. For optimal results, one can calibrate τ to be the smallest constant such that Theorem 1 holds: that is, we see no-phantom signals when the algorithm is fed with pure noise. In our numerical experiments, we use β=1.5 and τ=2.

Second, pick a value for the regularization parameter λ (e.g., λ=1). Choose step sizes Δt1<2/[A|C]2 and Δt2<λ/A.* Set ρ0=0, η0=0, and z0=0, and iterate for k0:

r=bAρkCηk,ρk+1=SτλΔt1ρk+Δt1A*(zk+r),ηk+1=SλΔt1ηk+Δt1C*(zk+r),zk+1=zk+Δt2r, [14]

where Sr(yi)=sign(yi)max{0,|yi|r}.

The Noise Collector matrix C is computed by drawing Nβ1 normally distributed N-dimensional vectors normalized to unit length. These are the generating vectors of the Noise Collector. From each of them, a circulant N×N matrix Ci, i=1,,Nβ1, is constructed. The Noise Collector matrix is obtained by concatenation, and therefore, C=C1C2CNβ1. Exploiting the circulant structure of the matrices Ci, we perform the matrix vector multiplications Cηk and C*(zk+r) in Eq. 14 using the FFT (32). This makes the complexity associated with the Noise Collector O(Nβlog(N)). Note that only the Nβ1 generating vectors are stored and not the entire N×Nβ Noise Collector matrix. In practice, we use β1.5, which makes the cost of using the Noise Collector negligible as typically, KNβ1. The columns of the Noise Collector matrix C with this circulant structure are uniformly distributed on SN1, and they satisfy Lemma 1. This implies that the theorems of this paper are still valid for such C.

Application to Imaging

We consider passive array imaging of point sources. The problem consists of determining the positions zj and the complex amplitudes αj, j=1,,M, of a few point sources from measurements of polychromatic signals on an array of receivers (Fig. 2). The imaging system is characterized by the array aperture a, the distance L to the sources, the bandwidth B, and the central wavelength λ0.

Fig. 2.

Fig. 2.

General setup for passive array imaging. The source at zj emits a signal that is recorded at all array elements xr, r=1,,Nr.

The sources are located inside an image window (IW), which is discretized with a uniform grid of points yk, k=1,,K. The unknown is the source vector ρ=[ρ1,,ρK]CK, with components ρk that correspond to the complex amplitudes of the M sources at the grid points yk, k=1,,K, with KM. For the true source vector, we have ρk=αj if yk=zj for some j=1,,M, while ρk=0 otherwise.

Denoting by G(x,y;ω) Green’s function for the propagation of a signal of angular frequency ω from point y to point x, we define the single-frequency Green’s function vector that connects a point y in the IW with all points xr, r=1,,Nr, on the array as

g(y;ω)=[G(x1,y;ω),G(x2,y;ω),,G(xN,y;ω)]CNr.

In three dimensions, G(x,y;ω)=exp{iω|xy|/c0}4π|xy| if the medium is homogeneous. The data for the imaging problem are the signals b(xr,ωl)=j=1MαjG(xr,zj;ωl) recorded at receiver locations xr, r=1,,Nr, at frequencies ωl, l=1,,S. These data are stacked in a column vector

b=[b(ω1),b(ω2),,b(ωS)]CN;N=NrS, [15]

with b(ωl)=[b(x1,ωl),b(x2,ωl),,b(xN,ωl)]CNr. Then, Aρ=b, with A being the N×K measurement matrix with columns ak that are the multiple-frequency Green’s function vectors

ak=[g(yk;ω1),g(yk;ω2),,g(yk;ωS)]CN [16]

normalized to have length 1. The system Aρ=b relates the unknown vector ρCK to the data vector bCN.

Next, we illustrate the performance of the Noise Collector in this imaging setup. The most important features are that 1) no calibration is necessary with respect to the level of noise, that 2) exact support recovery is obtained for relatively large levels of noise [i.e., e2c1b022N/(ρ1lnN)], and that 3) we have zero false discovery rates for all levels of noise with high probability.

We consider a high-frequency microwave imaging regime with central frequency f0=60 GHz corresponding to λ0=5 mm. We make measurements for S=25 equally spaced frequencies spanning a bandwidth B=20 GHz. The array has N=25 receivers and an aperture a=50 cm. The distance from the array to the center of the imaging window is L=50 cm. Then, the resolution is λ0L/a=5 mm in the cross-range (direction parallel to the array) and c0/B=15 mm in range (direction of propagation). These parameters are typical in microwave scanning technology (33).

We seek to image a source vector with sparsity M=12 (Fig. 3, Left). The size of the imaging window is 20 × 60 cm, and the pixel spacing is 5 × 15 mm. The number of unknowns is, therefore, K=1,681, and the number of data is N=625. The size of the Noise Collector is taken to be Σ=104, and therefore, β1.5. When the data are noiseless, we obtain exact recovery as expected (Fig. 3, Right).

Fig. 3.

Fig. 3.

Noiseless data. The exact solution is recovered for any value of λ in algorithm [14]. (Left) The true image. (Right) The recovered solution vector, ρτ, is plotted with red stars, and the true solution vector, ρ, is plotted with green circles.

In Fig. 4, we display the imaging results with and without the Noise Collector when the data are corrupted by additive noise. The SNR=1, and therefore, the 2 norms of the signals and the noise are equal. In column 1 of Fig. 4, we show the recovered image using 1-norm minimization without the Noise Collector. There is a lot of grass in this image, with many nonzero values outside the true support. When the Noise Collector is used, the level of the grass is reduced, and the image improves (column 2 of Fig. 4). Still, there are several false discoveries because we use τ=1 in algorithm [14].

Fig. 4.

Fig. 4.

High level of noise; SNR=1. (Column 1) 1-norm minimization without the noise collector. (Column 2) 1-norm minimization with a noise collector with Σ=104 columns and τ=1 in algorithm [14]. (Column 3) 1-norm minimization with a noise collector and the correct τ=2 in algorithm [14]. (Column 4) 2-norm solution restricted to the support. In Upper, we show the images. In Lower, we show the solution vector with red stars and the true solution vector with green circles. NC = noise collector.

In column 3 of Fig. 4, we show the image obtained with a weight τ=2 in algorithm [14]. With this weight, there are no false discoveries, and the recovered support is exact. This simplifies the imaging problem dramatically as we can now restrict the inverse problem to the true support just obtained and then, solve an overdetermined linear system using a classical 2 approach. The results are shown in column 4 of Fig. 4. Note that this second step largely compensates for the signal that was lost in the first step due to the high level of noise.

In Fig. 5, we illustrate the performance of the Noise Collector for different sparsity levels M and e2/b02 values. Success in recovering the true support of the unknown corresponds to a value of one (yellow in Fig. 5), and failure corresponds to a value of zero (blue in Fig. 5). The small phase transition zone (green in Fig. 5) contains intermediate values. The black lines in Fig. 5 are the theoretical prediction Eq. 5. These results are obtained by averaging over 10 realizations of noise. We show results for three values of data sizes N=342, N=625, and N=961. In our experiments, the nonzero components of the unknown ρ take values in [0.6,0.8], and therefore, b02/ρ1=cst/M.

Fig. 5.

Fig. 5.

(Left) Algorithm performance for exact support recovery. Success corresponds to a value of one (yellow), and failure corresponds to a value of zero (blue). The small phase transition zone (green) contains intermediate values. The black lines are the theoretical estimate N/MlnN. Ordinate and abscissa are the sparsity M and e2/b02. The data sizes are (Left) N=342, (Center) N=625, and (Right) N=961.

Remark 1: We considered passive array imaging for ease of presentation. The same results hold for active array imaging with or without multiple scattering; ref. 34 discusses the detailed analytical setup.

Remark 2: We have considered a microwave imaging regime. Similar results can be obtained in other regimes.

Proofs

Proof of Lemma 1: Using the rotational invariance of all of our probability distributions, inequality [9] is true if

P(maxi|di,e|c0lnN/N)1/Nκ,

where di, i=1,2,,K+Σ are (possibly dependent) uniformly distributed on SN1, and we can assume that e=(1,0,,0). Denote the event

Ωt=maxi|di,e|t/N.

P|di,e|t/N2exp(t2/2) for each di. We obtain PΩt2(K+Σ)exp(t2/2) using the union bound. Choosing t=c0lnN for sufficiently large c0, we get PΩtCNβNc02/2Nκ, where c02>2(β+κ) and NN0. Hence, Eq. 9 holds with probability 1Nκ.

If N columns cj, jS of C satisfy

minjS|cj,e|θ,θ=αlnN/N, [17]

then their convex hull will contain θe with probability (1/2)N. Therefore, inequality [10] follows if [17] holds with probability 11/Nκ. Using the rotational invariance of all of our probability distributions, we can assume that e=(1,0,,0). For each ci,

P|ci,e|tN=22πtex22dx12et2.

Split the index set 1,2,,Σ into N nonoverlapping subsets Sk, k=1,2,,N of size Nβ1. For each Sk,

PmaxiSk|ci,e|αlnNN112Nα2Nβ1       e12Nβ12

for α=(β1)/2. By independence,

P[17]holdsΠk=1NP(maxiSk|ci,e|αlnN/N).

Then, P[17]holds(1e12Nβ12)N1Ne12Nβ12. Choosing N0 sufficiently large, we obtain [10].

Proof of Theorem 2: When columns of A are not orthogonal, we will choose a τ smaller than that in Theorem 1 by a factor of two. Suppose that the M-dimensional space V is the span of the column vectors aj, with j in the support of ρ. Say that V is spanned by a1aM. Let W=V be the orthogonal complement to V. Consider the orthogonal decomposition ai=aiv+aiw for all iM+1. Incoherence of ai implies that aiw21/2 for all iM+1. Indeed, fix any iM+1. Suppose that aiv=k=1Mξkak and that |ξj|=maxkM|ξk|=ξl. Thus, 13M|aj,aiv||aj,k=1Mξkak|ξl1M13M. Then, ξl1/(2M). Therefore, aiv2ξ1Mξl1/2, and aiw2ai2aiv21/2.

Project system [4] on W. Then, we obtain a new system [4]. The 2 norms of the columns of new A are at least 1/2. Otherwise, the new system satisfies all conditions of Theorem 1. Indeed, b0 is projected to zero. All ci and e/e2 are projected to vectors uniformly distributed on SNM1 by the concentration of measure (27). If any ai, iM+1, was used in an optimal approximation of b0+e, then its projection aiw is used in an optimal approximation of the projection of b0+e on W. This is a contradiction to Lemma 1 if we choose τ<c0/(2α) and recall that aiw21/2.

Proof of Theorem 3: Choose τ as in Theorem 2. Incoherence of ai implies that we can argue as in Proof of Theorem 2 and assume that ai,aj=0 for ij, i,jsupp(ρ). Suppose that Vi are the two-dimensional (2D) spaces spanned by e and ai for isupp(ρ). By Lemma 1, all λH(τ)Vi look like the rounded rhombi depicted in Fig. 1, Right, and λH1ViBλi with probability 1Nκ, where Bλi is a 2D 2 ball of radius λc0lnN/N. Thus, λH(τ)ViHλi with probability 1Nκ, where Hλi is the convex hull of Bλi and a vector λfi, fi=ρiρ11τ1ai. Then, supp(ρτ)=supp(ρ) if there exists λ0 so that ρiai+e lies on the flat boundary of Hλ0i for all isupp(ρ).

If minisupp(ρ)|ρi|γρ, then there is a constant c2=c2(γ) such that, if ρiai+e lies on the flat boundary of Hλi for some i and some λ, then there exists λ0 so that ρiai+c2e lies on the flat boundary of Hλ0i for all isupp(ρ). Suppose that V is spanned by e and b0, HλV is the convex hull of Bλ and λf, and f=b0ρ11τ1 where BλV is an 2 ball of radius λc0lnN/N. If b0+c2e lies on the flat boundary of Hλ, then there must be an isupp(ρ) such that ρiai+c2e lies on the flat boundary of Hλi. If

|b0,b0+c2e|b02b0+c2e2c0lnNNf2, [18]

then b0+c2e lies on the flat boundary of Hλ. Since |b0,e|c0e2b02/N with probability 1Nκ, Eq. 18 holds if e2/b02f2N/(c2c0lnN)c1b02ρ11N/lnN.

Data Availability Statement.

There are no data in this paper; we present an algorithm, its theoretical analysis, and some numerical simulations.

Acknowledgments

The work of M.M. was partially supported by Spanish Ministerio de Ciencia e Innovación Grant FIS2016-77892-R. The work of A.N. was partially supported by NSF Grants DMS-1515187 and DMS-1813943. The work of G.P. was partially supported by Air Force Office of Scientific Research (AFOSR) Grant FA9550-18-1-0519. The work of C.T. was partially supported by AFOSR Grants FA9550-17-1-0238 and FA9550-18-1-0519. We thank Marguerite Novikov for drawing Fig. 1, Left.

Footnotes

The authors declare no competing interest.

*Choosing two step sizes instead of the smaller one Δt1 improves the convergence speed.

We chose to work with real numbers in the previous sections for ease of presentation, but the results also hold with complex numbers.

References

  • 1.Chen S. S., Donoho D. L., Saunders M. A., Atomic decomposition by basis pursuit. SIAM Rev. 43, 12–159 (2001). [Google Scholar]
  • 2.Donoho D. L., Elad M., Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proc. Natl. Acad. Sci. U.S.A. 100, 2197–2202 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Candès E. J., Tao T., Decoding by linear programming. IEEE Trans. Inf. Theory 51, 4203–4215 (2005). [Google Scholar]
  • 4.Tibshirani R., Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996). [Google Scholar]
  • 5.Fuchs J. J., Recovery of exact sparse representations in the presence of bounded noise. IEEE Trans. Inf. Theory 51, 3601–3608 (2005). [Google Scholar]
  • 6.Tropp J. A., Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inf. Theory 52, 1030–1051 (2006). [Google Scholar]
  • 7.Wainwright M. J., Sharp thresholds for high-dimensional and noisy sparsity recovery using 1-constrained quadratic programming (Lasso). IEEE Trans. Inf. Theory 55, 2183–2202 (2009). [Google Scholar]
  • 8.Maleki A., Anitori L., Yang Z., Baraniuk R., Asymptotic analysis of complex Lasso via complex approximate message passing (CAMP). IEEE Trans. Inf. Theory 59, 4290–4308 (2013). [Google Scholar]
  • 9.Zou H., The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006). [Google Scholar]
  • 10.Sampson J. N., Chatterjee N., Carroll R. J., Müller S., Controlling the local false discovery rate in the adaptive Lasso. Biostatistics 14, 653–666 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Belloni A., Chernozhukov V., Wang L., Square-root Lasso: Pivotal recovery of sparse signals via conic programming. Biometrika 98, 791–806 (2011). [Google Scholar]
  • 12.Trzasko J., Manduca A., Highly undersampled magnetic resonance image reconstruction via homotopic 0-minimization. IEEE Trans. Med. Imag. 28, 106–121 (2009). [DOI] [PubMed] [Google Scholar]
  • 13.AlQuraishi M., McAdams H. H., Direct inference of protein dna interactions using compressed sensing methods. Proc. Natl. Acad. Sci. U.S.A. 108, 14819–14824 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Baraniuk R., Steeghs P., “Compressive radar imaging” in 2007 IEEE Radar Conference (IEEE, 2007), pp. 128–133. [Google Scholar]
  • 15.Taylor H. L., Banks SC., McCoy J. F., Deconvolution with the l1 norm. Geophysics 44, 39–52 (1979). [Google Scholar]
  • 16.Malioutov D., Cetin M., Willsky A. S., A sparse signal reconstruction perspective for source localization with sensor arrays. IEEE Trans. Signal Process. 53, 3010–3022 (2005). [Google Scholar]
  • 17.Romberg J., Imaging via compressive sampling. IEEE Signal Process. Mag. 25, 14–20 (2008). [Google Scholar]
  • 18.Herman M. A., Strohmer T., High-resolution radar via compressed sensing. IEEE Trans. Signal Process. 57, 2275–2284 (2009). [Google Scholar]
  • 19.Tropp J. A., Laska J. N., Duarte M. F., Romberg J. K., Baraniuk R. G., Beyond Nyquist: Efficient sampling of sparse bandlimited signals. IEEE Trans. Inf. Theory 56, 520–544 (2010). [Google Scholar]
  • 20.Fannjiang A. C., Strohmer T., Yan P., Compressed remote sensing of sparse objects. SIAM J. Imag. Sci. 3, 595–618 (2010). [Google Scholar]
  • 21.Chai A., Moscoso M., Papanicolaou G., Robust imaging of localized scatterers using the singular value decomposition and 1 optimization. Inverse Probl. 29, 025016 (2013). [Google Scholar]
  • 22.Donoho D. L., Superresolution via sparsity constraints. SIAM J. Math. Anal. 23, 1303–1331 (1992). [Google Scholar]
  • 23.Candès E. J., Fernandez-Granda C., Towards a mathematical theory of super-resolution. Comm. Pure Appl. Math 67, 906–956 (2014). [Google Scholar]
  • 24.Fannjiang A. C., Liao W., Coherence pattern-guided compressive sensing with unresolved grids. SIAM J. Imag. Sci. 5, 179–202 (2012). [Google Scholar]
  • 25.Borcea L., Kocyigit I., Resolution analysis of imaging with 1 optimization. SIAM J. Imag. Sci. 8, 3015–3050 (2015). [Google Scholar]
  • 26.Moscoso M., Novikov A., Papanicolaou G., Tsogka C., Imaging with highly incomplete and corrupted data. Inverse Probl. 36, 035010 (2020). [Google Scholar]
  • 27.Vershynin R., High-Dimensional Probability: An Introduction with Applications in Data Science (Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, 2018). [Google Scholar]
  • 28.Laska J. N., Davenport M. A., Baraniuk R. G., “Exact signal recovery from sparsely corrupted measurements through the pursuit of justice” in 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers (IEEE, Pacific Grove, CA, 2009), pp. 1556–1560. [Google Scholar]
  • 29.Tao T., “A cheap version of the Kabatjanskii-Levenstein bound for almost orthogonal vectors.” https://terrytao.wordpress.com/2013/07/18/a-cheap-version-of-the-kabatjanskii-levenstein-bound-for-almost-orthogonal-vectors. Accessed 23 January 2019.
  • 30.Milman V. D., A new proof of Dvoretzky’s theorem on cross-sections of convex bodies. Funkcional. Anal. I Priložen. 5, 28–37 (1971). [Google Scholar]
  • 31.Moscoso M., Novikov A., Papanicolaou G., Ryzhik L., A differential equations approach to l1-minimization with applications to array imaging. Inverse Probl. 28, 10 (2012). [Google Scholar]
  • 32.Gray R. M., Toeplitz and circulant matrices: A review. Commun. Inf. Theory 2, 155–239 (2006). [Google Scholar]
  • 33.Laviada J., Arboleya-Arboleya A., Alvarez-Lopez Y., Garcia-Gonzalez C., Las-Heras F., Phaseless synthetic aperture radar with efficient sampling for broadband near-field imaging: Theory and validation. IEEE Trans. Antennas Propag. 63, 573–584 (2015). [Google Scholar]
  • 34.Chai A., Moscoso M., Papanicolaou G., Imaging strong localized scatterers with sparsity promoting optimization. SIAM J. Imaging Sci 7, 1358–1387 (2014). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

There are no data in this paper; we present an algorithm, its theoretical analysis, and some numerical simulations.


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES