Robust recovery of complex exponential signals from random Gaussian projections via low rank Hankel matrix reconstruction

Jian-Feng Cai; Xiaobo Qu; Weiyu Xu; Gui-Bo Ye

doi:10.1016/j.acha.2016.02.003

. Author manuscript; available in PMC: 2017 Oct 30.

Published in final edited form as: Appl Comput Harmon Anal. 2016 Mar 2;41(2):470–490. doi: 10.1016/j.acha.2016.02.003

Robust recovery of complex exponential signals from random Gaussian projections via low rank Hankel matrix reconstruction

Jian-Feng Cai ^a,¹, Xiaobo Qu ^b,², Weiyu Xu ^c,³, Gui-Bo Ye ^d

PMCID: PMC5662150 NIHMSID: NIHMS871330 PMID: 29093630

Abstract

This paper explores robust recovery of a superposition of R distinct complex exponential functions with or without damping factors from a few random Gaussian projections. We assume that the signal of interest is of 2N − 1 dimensions and R < 2N − 1. This framework covers a large class of signals arising from real applications in biology, automation, imaging science, etc. To reconstruct such a signal, our algorithm is to seek a low-rank Hankel matrix of the signal by minimizing its nuclear norm subject to the consistency on the sampled data. Our theoretical results show that a robust recovery is possible as long as the number of projections exceeds O(Rln² N). No incoherence or separation condition is required in our proof. Our method can be applied to spectral compressed sensing where the signal of interest is a superposition of R complex sinusoids. Compared to existing results, our result here does not need any separation condition on the frequencies, while achieving better or comparable bounds on the number of measurements. Furthermore, our method provides theoretical guidance on how many samples are required in the state-of-the-art non-uniform sampling in NMR spectroscopy. The performance of our algorithm is further demonstrated by numerical experiments.

Keywords: Complex sinusoids, Random Gaussian projection, Low-rank Hankel matrix

1. Introduction

Many practical problems involve signals that can be modeled or approximated by a superposition of a few complex exponential functions. In particular, if we choose the exponential function to be complex sinusoid, it covers signals in acceleration of medical imaging [16], analog-to-digital conversion [25], inverse scattering in seismic imaging [1], etc. Time domain signals in nuclear magnetic resonance (NMR) spectroscopy, that are widely used to analyze the compounds in chemistry and protein structures in biology, are another type of signals that can be modeled or approximated by a superposition of complex exponential functions [19]. How to recover those superpositions of complex exponential functions is of primary importance in those applications.

In this paper, we will consider how to recover those complex exponentials from linear measurements of their superposition. More specifically, let x̂ ∈ ℂ²^N ^{− 1} be a vector satisfying

{\hat{x}}_{j} = \sum_{k = 1}^{R} c_{k} z_{k}^{j}, j = 0, 1, \dots, 2 N - 2,

(1)

where z_k ∈ ℂ, k = 1, …, R, are some unknown complex numbers. In other words, x̂ is a superposition of R exponential functions. We assume R < 2N − 1. When |z_k| = 1, k = 1, …, R, x̂ is a superposition of complex sinusoids. When z_k = e^−τ_ke^2πιf_k, k = 1, …, R, x̂ models the signal in NMR spectroscopy.

Since R < 2N − 1, the degree of freedom to determine x̂ is much less than the ambient dimension 2N − 1. Therefore, it is possible to recover x̂ from its under-sampling [3,5,8,12]. In particular, we consider to recover x̂ from its linear measurement

b = A \hat{x},

(2)

where A ∈ ℂ^M ^{× (2}^N ^{− 1)} with M < 2N − 1.

We will use a Hankel structure to reconstruct the signal of interest x̂. The Hankel structure originates from the matrix pencil method [15] for harmonic retrieval for complex sinusoid. The conventional matrix pencil method assumes fully observed x̂ as well as the model order R, which are both unknown here. Following the ideas of the matrix pencil method in [15] and enhanced matrix completion (EMaC) in [10], we construct a Hankel matrix based on signal x̂. More specifically, define the Hankel matrix Ĥ ∈ ℂ^N^×^N by

{\hat{H}}_{j k} = {\hat{x}}_{j + k}, j, k = 0, 1, \dots, N - 1.

(3)

Throughout this paper, indices of all vectors and matrices start from 0, instead of 1 in conventional notations. It can be shown that Ĥ is a matrix with rank R. Instead of reconstructing x̂ directly, we reconstruct the rank-R Hankel matrix Ĥ, subject to the constraint that (2) is satisfied.

Low rank matrix recovery has been widely studied [2,5,6,20]. It is well known that minimizing the nuclear norm tends to lead to a solution of low-rank matrices. Therefore, a nuclear norm minimization problem subject to the constraint (2) is proposed. More specifically, for any given x ∈ ℂ²^N ^{− 1}, let H(x) ∈ ℂ^N^×^N be the Hankel matrix whose first row and last column is x, i.e., [H(x)]_jk = x_j₊_k. We propose to solve

min_{x} {‖ H (x) ‖}_{*}, subject to Ax = b,

(4)

where ||·||_* is the nuclear norm function (the sum of all singular values), and A and b are from the linear measurement (2). When there is noise contained in the observation, i.e.,

b = A \hat{x} + η,

we solve

min_{x} {‖ H (x) ‖}_{*}, subject to {‖ Ax - b ‖}_{2} \leq δ,

(5)

where δ = ||η||₂ is the noise level. The reconstruction of low-rank Hankel matrices via nuclear norm minimization were also proposed in [13] for system identification and realization.

An important theoretical question is how many measurements are required to get a robust reconstruction of Ĥ via (4) or (5). For a generic unstructured N × N matrix of rank R, standard theory [6,7,9,20] indicates that O(NR · poly(logN)) measurements are needed for a robust reconstruction by nuclear norm minimization. This result, however, is unacceptable here since the number of parameters of Ĥ is only 2N − 1 with the actual degrees of freedom R. The main contribution of this paper is then to prove that (4) and (5) give a robust recovery of Ĥ (hence x̂) as soon as the number of projections exceeds O(Rln² N) if we choose the linear operator A to be some scaled random Gaussian projections. This result is further extended to the robust reconstruction of low-rank Hankel or Toeplitz matrices from its few Gaussian random projections.

Our result can be applied to various signals of superposition of complex exponentials, including, but not limited to, signals of complex sinusoids and signals in accelerated NMR spectroscopy. When applied to complex sinusoids, our result here does not need any separation condition on the frequencies, while only requiring O(Rln² N) measurements instead of O(Rln⁴ N) in [10]. Furthermore, our theoretical result provides some guidance on how many samples to choose for the model proposed in [19] to recover NMR spectroscopy.

Complex sinusoids. When |z_k| = 1 for k = 1, …, R, we must have z_k = e^2πιf_k for some frequency f_k. In this case, x̂ is a superposition of complex sinusoids, for example, in the analog-to-digital conversion of radio signals [25]. We often encounter the problem of signal recovery from compressed linear measurements of the superposition of complex sinusoids in various applications. For example, in compressed sensing of spectrally sparse bandlimited signals [25], the random demodulator obtains linear mixing measurements of spectrally sparse bandlimited signals through matching filters. We refer the reader to Sections III and IV of [25] for details. In array signal processing for Direction of Arrival (DoA) estimation of electromagnetic waves [26], the signals received at the antennas of the antenna array are a superposition of complex sinusoids with different frequencies. Suppose that the battery-powered antenna array aims to save energy in sending the measurements to the fusion center where the DoAs are calculated, the antenna array can send linear projections of the signals received across the antenna array. Moreover, if the antenna array is a non-uniform antenna array, the observations across the antenna array are also linear non-uniform compressive sampling of the uniform antenna array.

The problem on recovering x̂ from its as few as possible linear measurements (2) may be solved using compressed sensing (CS) [8]. One can discretize the domain of frequencies f_k by a uniform grid. When the frequencies f_k indeed fall on the grid, x̂ is sparse in the discrete Fourier transform domain, and CS theory [8,12] suggests that it is possible to reconstruct x̂ from its very few samples via ℓ₁-norm minimization, provided that R ≪ 2N − 1. Nevertheless, the frequencies f_k in our setting usually do not exactly fall on a grid. The basis mismatch between the true parameters and the grid based on discretization degenerates the performance of conventional compressed sensing [11].

To overcome this, the authors of [4,23] proposed to recover off-the-grid complex sinusoid frequencies using total variation minimization or atomic norm [9] minimization. They proved that the total variation minimization or atomic norm minimization can have a robust reconstruction of x̂ from a non-uniform sampling of very few entries of x̂, provided that the frequencies f_k, k = 1, …, R, has a good separation. Another method for recovering off-the-grid frequencies is enhanced matrix completion (EMaC) proposed by Chen et al. [10], where the Hankel structure plays a central role similar to our model. The main result in [10] is that the complex sinusoids x̂ can be robustly reconstructed via EMaC from its very few non-uniformly sampled entries. Again, the EMaC requires a separation of the frequencies, described implicitly by an incoherence condition.

When applied to complex sinusoids, compared to the aforementioned existing results, our result in this paper does not need any separation condition on the frequencies, while achieving better or comparable bounds on the number of measurements.
Accelerated NMR spectroscopy. When z_k = e^−τ_ke^2πιf_k, k = 1, …, R, x̂ models the signal in NMR spectroscopy, which arises frequently in studying short-lived molecular systems, monitoring chemical reactions in real-time, high-throughput applications, etc. Recently, Qu et al. [19] proposed an algorithm based on low rank Hankel matrix. In this specific application, A is a matrix that denotes the under-sampling of NMR signals in the time domain. We remark that linear non-uniform subsampling measurements of the signal can greatly speed up the NMR spectroscopy [19]. Numerical results show its efficiency in [19] for which theoretical guarantee results are still needed. It is vital to give some theoretical results on this model since it will give us some guidance on how many samples should be chosen to guarantee the robust recovery. Though the result in [10] applies to this problem, it needs an incoherence condition, which remains uncertain for diverse chemical and biology samples. Our result in this paper does not require any incoherence condition. Moreover, our bound is better than that in [10].

The rest of this paper is organized as follows. We begin with our model and our main results in Section 2. Proofs for the main result are given in Section 3. Then, in Section 4, we extend the main result to the reconstruction of generic low-rank Hankel or Toeplitz matrices. The performance of our algorithm is demonstrated by numerical experiments in Section 5. Finally, in Section 6, we conclude the paper and point out some possible future works.

2. Model and main results

Our approach is based on the observation that the Hankel matrix whose first row and last column consist of entries of x̂ has rank R. Let Ĥ be the Hankel matrix defined by (3). Eq. (1) leads to a decomposition

\hat{H} = [\begin{matrix} 1 & \dots & 1 \\ z_{1} & \dots & z_{R} \\ ⋮ & ⋮ & ⋮ \\ z_{1}^{N - 1} & \dots & z_{R}^{N - 1} \end{matrix}] [\begin{matrix} c_{1} \\ ⋱ \\ c_{R} \end{matrix}] [\begin{matrix} 1 & z_{1} \dots & z_{1}^{N - 1} \\ ⋮ & ⋮ & ⋮ \\ 1 & z_{R} \dots & z_{R}^{N - 1} \end{matrix}]

Therefore, the rank of Ĥ is R. Similar to Enhanced Matrix Completion (EMaC) in [10], in order to reconstruct x̂, we first reconstruct the rank-R Hankel matrix Ĥ, is satisfied. Then, x̂ is derived directly by choosing the first row and last column of Ĥ. More specifically, for any given x ∈ ℂ²^N ^{− 1}, let H(x) ∈ ℂ^N^×^N be the Hankel matrix whose first row and last column is x, i.e., [H(x)]_jk = x_j₊_k. We propose to solve

min_{x} rank (H (x)), subject to Ax = b,

(6)

where rank(H(x)) denotes the rank of H(x), and A and b are from the linear measurement (2). When there is noise contained in the observation, i.e., b = Ax̂ + η, we correspondingly solve

min_{x} rank (H (x)), subject to {‖ Ax - b ‖}_{2} \leq δ,

(7)

where δ = ||η||₂ is the noise level.

These two problems are all NP hard problems and not easy to solve. Following the ideas of matrix completion and low rank matrix recovery [6,7,9,20], it is possible to exactly recover the low rank Hankel matrix via nuclear norm minimization. Therefore, it is reasonable to use nuclear norm minimization for our problem and it leads to the models in (4) and (5).

Theoretical results are desirable to guarantee the success of this Hankel matrix completion method. The results in [6,7,9,20] do not consider the Hankel structure. For generic N × N rank-R matrix, they require O(NR·poly(log N)) measurements for robust recovery which is too much since there are only 2N − 1 degrees of freedom in H(x). The theorems proposed in [23] work only for a special case where signals of interest are superpositions of complex sinusoids, which excludes, e.g., the signals in NMR spectroscopy. While the results from [10] extend to complex exponentials, the performance guarantees in [4,10,23] require incoherence conditions, implying the knowledge of frequency interval in spectroscopy, which are not available before the realistic sampling of diverse chemical or biological samples. This limits the applicability of these theories.

It is challenging to provide a theorem guaranteeing the exact recovery for model (4) with arbitrarily linear measurements A. In this paper, we provide a theoretical result ensuring exact recovery when A is a scaled random Gaussian matrix. Our result does not assume any incoherence conditions on the original signal.

Theorem 1

Let A = BD ∈ ℂ^M^×(2^N ^{− 1)}, where B ∈ ℂ^M^×(2^N ^{− 1)} is a random matrix whose real and imaginary parts are i.i.d. Gaussian with mean 0 and variance 1, D ∈ ℝ⁽²^N ^{− 1)×(2}^N ^{− 1)} is a diagonal matrix with the j-th diagonal $\sqrt{j + 1}$ if j ≤ N − 1 and $\sqrt{2 N - 1 - j}$ otherwise. Then, there exists a universal constant C₁ > 0 such that, for an arbitrary ε > 0, if

M \geq {(C_{1} \sqrt{R} ln N + \sqrt{2} ε)}^{2} + 1,

then, with probability at least $1 - 2 e^{- \frac{M - 1}{8}}$ , we have

x̃ = x̂, where x̃ is the unique solution of (4) with b = Ax̂;
||D(x̃ − x̂)||₂ ≤ 2δ/ε, where x̃ is the unique solution of (5) with ||b − Ax̂||₂ ≤ δ.

The scaling matrix D is introduced to preserve energy. In particular, we will introduce a variable y = Dx, and then the operator 𝒢 induced by 𝒢y = H(x) satisfies ||y||₂ = ||𝒢y||_F; see details in (9). This energy preserving property is critical in our estimation that will be seen later.

The number of measurements required is O(Rln² N), which is reasonable small compared with the number of parameters in H(x). Furthermore, there is a parameter ε in Theorem 1. For the noise-free case (a), the best choice of ε is obviously a number that is very close to 0. For the noisy case (b), we can balance the error bound and the number of measurements to get an optimal ε. On the one hand, according to the result in (b), in order to make the error in noisy case as small as possible, we would like ε to be as large as possible. On the other hand, we would like to keep the measurements M of the order of Rln² N. Therefore, a seemingly optimal choice of ε is $ε = O (\sqrt{R} ln N)$ . With this choice of ε, the number of measurements M = O(Rln² N) and the error ${‖ D (\tilde{x} - \hat{x}) ‖}_{2} \leq O (\frac{δ}{\sqrt{M}})$ .

Compared to results in [10,23], our theorem does not require any incoherence condition of the matrix Ĥ. In particular, our proposed approach for complex sinusoid signals does not need any separation condition on frequencies f_k’s for k = 1, 2, …, R. The reason for not needing a separation condition in noiseless case may be due to the Hankel matrix reconstruction method. Our proof of this fact also depends on the assumption of Gaussian measurements, for which we have the tool of Gaussian width analysis framework. For generic low-rank matrix reconstruction, it is well known that incoherence condition is necessary for successful reconstruction if partial entries of the underlying matrix are sampled [6,7]; however, incoherence is not required if Gaussian random projections are used [9,20]. We are in the same situation except for the additional Hankel structure. Our proposed approach uses Gaussian random projections of Ĥ, while the methods in [10,23] sample partial entries Ĥ. However, empirically, even for non-uniform time-domain samples, we observe that Hankel matrix completion does not seem to require the separation condition between frequencies. We thus conjecture that Hankel matrix completion does not require separations-between-frequencies condition to recover missing data from noiseless measurements under non-uniform time-domain samples, for which we currently do not have a proof.

2.1. Hankel matrix completion for recovering off-the-grid frequencies

Our results also apply to recovering frequencies in superposition of complex sinusoids, instead of recovering only the superposition of complex sinusoids. We divide our discussion into two cases.

The first case is the noise-free case, where the observations are not contaminated by additive noises. In this case, since we can recover the full signal of the superposition of the underlying sinusoids, we can use the single-snapshot MUSIC algorithm [27] to recover the underlying frequencies precisely.

The second case is the noisy case, where the observation is contaminated by additive noises. For this case, we have obtained a bound on the recovery error for the superposition signal (Theorem 1 of our paper). We can further recover the frequencies using the single-snapshot MUSIC algorithm by choosing the R smallest local minimum of surrogate criterion function R(ω) in [27]. In [27], the authors provided the stability result of recovering frequency using the single-snapshot MUSIC algorithm (Theorem 3 of [27]). Specifically, the error in surrogate criterion function R(ω) is upper bounded by the Euclidean norm of the observation noise multiplied by a constant C, where the constant C depends on the largest and smallest nonzero singular values of the involved Hankel matrix. Moreover, the recovered frequency deviates from the true frequency in the order of noise standard deviation when the noise is small (Remark 9 of [27]). We remark that this stability result from [27] is applicable without imposing separation condition on frequencies.

For frequencies satisfying a certain separation condition (Equation (23) of [27]), the authors of [27] further provide stronger and more explicit bounds on the stability of recovering frequencies from noisy data (by explicitly bounding the singular values of the involved Hankel matrix).

Moitra [28] proved that stability of recovering frequencies from noisy observations depends on the separation of frequencies. In particular, [28] shows a sharp phase transition for the relationship between the cutoff time observation index m (namely 2N − 1 in this paper) and the frequency separation δ. If m > 1/δ + 1, there is a polynomial-complexity estimator converging to the true frequencies at an inverse polynomial rate in terms of the magnitude of the noise. And conversely, when m < (1 − ε)/δ, no estimator can distinguish between a particular pair of δ-separated signals if the magnitude of the noise is not exponentially small.

However, the converse results in [28] are dealing with worst-case frequencies and worst-case frequency coefficients. Namely if the separation condition is not satisfied, one can always finds a worst-case pair of signals x and x′ such that telling them apart requires exponentially small noise. Thus Moitra’s result in [28] is not for an average-case, fixed signal x. Moreover, Moitra’s results does not mean that the single-snapshot MUSIC cannot tolerate small noises in recovering frequencies. By comparison, in this paper, our stability result is an average-case stability result, where our spectrally sparse signal is a fixed signal of superposition of complex exponentials, and our stability result is obtained over the ensemble of random Gaussian measurements. Our results are especially useful when the observations are noiseless or have high SNR.

3. Proof of Theorem 1

In this section, we prove the main result Theorem 1. The most crucial factors are that i) one has an explicit formula for the subdifferential of the objective function, and ii) the Gaussian width under the current measurement model is computable.

3.1. Orthonormal basis of the N × N Hankel matrices subspace

In this subsection, we introduce an orthonormal basis of the subspace of N × N Hankel matrices and use it to define a projection from ℂ^N^×^N to the subspace of all N × N Hankel matrices.

Let E_j ∈ ℂ^N^×^N, j = 0, 1 …, 2N − 2, be the Hankel matrix satisfying

{[E_{j}]}_{k l} = {\begin{cases} 1 / \sqrt{K_{j}}, & if k + l = j, \\ 0, & otherwise, \end{cases} k, l = 0, \dots, N - 1,

(8)

where K_j = j + 1 for j ≤ N − 1 and K_j = 2N − 1 − j for j ≥ N − 1 is the number of non-zeros in E_j. Then, it is easy to check that ${E_{j}}_{j = 0}^{2 N - 2}$ forms an orthonormal basis of the subspace of all N × N Hankel matrices, under the standard inner product in ℂ^N^×^N.

Define a linear operator

G : x \in ℂ^{2 N - 1} \mapsto G x = \sum_{j = 0}^{2 N - 2} x_{j} E_{j} \in ℂ^{N \times N} .

(9)

The adjoint 𝒢^* of 𝒢 is

G^{*} : X \in ℂ^{N \times N} \mapsto G^{*} X \in ℂ^{2 N - 1}, {[G^{*} X]}_{j} = 〈 X, E_{j} 〉 .

Obviously, 𝒢^*𝒢 is the identity operator in ℂ²^N ^{− 1}, and 𝒢𝒢^* is the orthogonal projector onto the subspace of all Hankel matrices.

3.2. Recovery condition based on restricted minimum gain condition

First of all, let us simplify the minimization problem (4) by introducing D ∈ ℂ⁽²^N ^{− 1)×(2}^N ^{− 1)}, the diagonal matrix with j-th diagonal $\sqrt{K_{j}}$ . Then, by letting y = Dx, (4) is rewritten as,

min_{y} {‖ G y ‖}_{*} subject to By = b,

(10)

where B = AD⁻¹. Recall that 𝒢 satisfies 𝒢^*𝒢 = ℐ, which is crucial in the proceeding analysis. Similarly, for the noisy case, (5) is rearranged to

min_{y} {‖ G y ‖}_{*} subject to {‖ By - b ‖}_{2} \leq ε .

(11)

By our assumption in Theorem 1, B ∈ ℂ^M^×(2^N ^{− 1)} is a random matrix whose real and imaginary parts are both real-valued random matrices with i.i.d. Gaussian entries of mean 0 and variance 1. We will prove ỹ = Dx̂ (respectively ||ỹ − ŷ||₂ ≤ 2δ/ε) with dominant probability for problem (10) for the noise free case (respectively (11) for the noisy case).

Let the descent cone of ||𝒢 ·||_* at ŷ be

T (\hat{y}) = {λ z ∣ λ \geq 0, {‖ G (\hat{y} + z) ‖}_{*} \leq {‖ G \hat{y} ‖}_{*}} .

(12)

To characterize the recovery condition, we need to use the minimum value of $\frac{{‖ Bz ‖}_{2}}{{‖ z ‖}_{2}}$ for nonzero z ∈ 𝔗(ŷ). This quantity is commonly called the minimum gain of the measurement operator B restricted on 𝔗(ŷ) [9]. In particular, if the minimum gain is bounded away from zero, then the exact recovery (respectively approximate recovery) for problem (10) (respectively (11)) holds.

Lemma 1

Let 𝔗(ŷ) be defined by (12). Assume

min_{z \in T (\hat{y})} \frac{{‖ Bz ‖}_{2}}{{‖ z ‖}_{2}} \geq ε .

(13)

Let ỹ be the solution of (10) with b = Bŷ. Then ỹ = ŷ.
Let ỹ be the solution of (11) with ||b − Bŷ||₂ ≤ δ. Then ||ỹ − ŷ||₂ ≤ 2δ/ε.

Proof

Since (a) is a special case of (b) with δ = 0, we prove (b) only. The optimality of ỹ implies ỹ − ŷ ∈ 𝔗(ŷ). By (13), we have

{‖ \tilde{y} - \hat{y} ‖}_{2} \leq \frac{1}{ε} {‖ B (\tilde{y} - \hat{y}) ‖}_{2} \leq \frac{1}{ε} ({‖ B \tilde{y} - b ‖}_{2} + {‖ B \hat{y} - b ‖}_{2}) \leq 2 δ / ε .

Minimum gain condition is a powerful concept and has been employed in recent recovery results via ℓ₁ norm minimization, block-sparse vector recovery, low-rank matrix reconstruction and other atomic norms [9].

3.3. Bound of minimum gain via Gaussian width

Lemma 1 requires to estimate the lower bound of ${min}_{z \in T (\hat{y})} \frac{{‖ Bz ‖}_{2}}{{‖ z ‖}_{2}}$ . Gordon gave a solution using Gaussian width of a set [9,14] to estimate the lower bound of minimum gain.

Definition 1

The Gaussian width of a set S ⊂ ℝ^p is defined as:

w (S) : = E_{ξ} [sup_{γ \in S} γ^{T} ξ],

where ξ ∈ ℝ^p is a random vector of independent zero-mean unit-variance Gaussians.

Let λ_n denote the expected length of a n-dimensional Gaussian random vector. Then $λ_{n} = \sqrt{2} Γ (\frac{n + 1}{2}) / Γ (\frac{n}{2})$ and it can be tightly bounded as $\frac{n}{\sqrt{n + 1}} \leq λ_{n} \leq \sqrt{n}$ [9]. The following theorem is given in Corollary 1.2 in [14]. It gives a bound on minimum gain for a random map Π: ℝ^p ↦ ℝⁿ.

Theorem 2. (See Corollary 1.2 in [14].)

Let Ω be a closed subset of {x ∈ ℝ^p|||x||₂ = 1}. Let Π ∈ ℝⁿ^×^p be a random matrix with i.i.d. Gaussian entries with mean 0 and variance 1. Then, for any ε > 0,

P (min_{z \in Ω} {‖ Π z ‖}_{2} \geq ε) \geq 1 - e^{- \frac{1}{2} {(λ_{n} - w (Ω) - ε)}^{2}},

provided λ_n − w(Ω) − ε ≥ 0. Here $\frac{n}{\sqrt{n + 1}} \leq λ_{n} \leq \sqrt{n}$ , and w(Ω) is the Gaussian width of Ω.

By converting the complex setting in our problem to the real setting and using Theorem 2, we can get the bound of (13) in terms of Gaussian width of $T_{ℝ} (\hat{y}) \cap S_{ℝ}^{4 N - 3}$ , where 𝔗_ℝ(ŷ) is a cone in ℝ⁴^N ^{− 2} defined by

T_{ℝ} (\hat{y}) = {[\begin{matrix} α \\ β \end{matrix}] | α + ι β \in T (\hat{y})} .

(14)

Lemma 2

Let the real and imaginary parts of entries of B ∈ ℂ^M^×(2^N ^{− 1)} be i.i.d. Gaussian with mean 0 and variance 1. Let 𝔗_ℝ(ŷ) be defined by (14) and $S_{c}^{2 N - 2}$ be the unit sphere in ℂ²^N ^{− 1}. Then for any ε > 0,

P (min_{z \in T (\hat{y}) \cap S_{c}^{2 N - 2}} {‖ Bz ‖}_{2} \geq ε) \geq 1 - 2 e^{- \frac{1}{2} {(λ_{M} - w (T_{ℝ} (\hat{y}) \cap S_{ℝ}^{4 N - 3}) - \frac{ε}{\sqrt{2}})}^{2}},

where $S_{ℝ}^{4 N - 3}$ is the unit sphere in ℝ⁴^N⁻².

Proof

In order to use Theorem 2, we convert the complex setting in our problem to the real setting in Theorem 2. We will use Roman letters for vectors and matrices in complex-valued spaces, and Greek letters for real valued ones. Let B = Φ + ιΨ ∈ ℂ^M^×(2^N⁻¹⁾, where both Φ ∈ ℝ^M^×(2^N⁻¹⁾ and Ψ ∈ ℝ^M^×(2^N⁻¹⁾ are real-valued random matrices whose entries are i.i.d. mean-0 variance-1 Gaussian. Then, for any z = α + ιβ ∈ ℂ²^N⁻¹ with α, β ∈ ℝ²^N⁻¹,

\begin{array}{l} {‖ Bz ‖}_{2} = {‖ (Φ + ι Ψ) (α + ι β) ‖}_{2} = {‖ (Φ α - Ψ β) + ι (Ψ α + Φ β) ‖}_{2} \\ = {({‖ [\begin{matrix} Φ & - Ψ \end{matrix}] [\begin{matrix} α \\ β \end{matrix}] ‖}_{2}^{2} + {‖ [\begin{matrix} Ψ & Φ \end{matrix}] [\begin{matrix} α \\ β \end{matrix}] ‖}_{2}^{2})}^{- 1 / 2} \end{array}

Then

min_{z = α + ι β \in T (\hat{y}) \cap S_{c}^{2 N - 2}} {‖ [\begin{matrix} Φ & - Ψ \end{matrix}] [\begin{matrix} α \\ β \end{matrix}] ‖}_{2} \geq ε / \sqrt{2}, and min_{z = α + ι β \in T (\hat{y}) \cap S_{c}^{2 N - 2}} {‖ [\begin{matrix} Ψ & Φ \end{matrix}] [\begin{matrix} α \\ β \end{matrix}] ‖}_{2} \geq ε / \sqrt{2}

(15)

implies

min_{z \in T (\hat{y}) \cap S_{c}^{2 N - 2}} {‖ Bz ‖}_{2} \geq ε .

Therefore,

P (min_{z \in T (\hat{y}) \cap S_{c}^{2 N - 2}} {‖ Bz ‖}_{2} \geq ε) \geq P ((15) holds true) .

It is easy to see that both [Φ −Ψ] and [Ψ Φ] are real-valued random matrices with i.i.d. Gaussian entries of mean 0 and variance 1. By Theorem 2,

P ((15) holds true) \geq 1 - 2 e^{- \frac{1}{2} {(λ_{M} - w (T_{ℝ} (\hat{y}) \cap S_{ℝ}^{4 N - 3}) - \frac{ε}{\sqrt{2}})}^{2}},

and therefore we get the desired result.

3.4. Estimation of Gaussian width $w (T_{ℝ} (\hat{y}) \cap S_{ℝ}^{4 N - 3})$

Denote $T_{ℝ}^{*} (\hat{y})$ be polar cone of 𝔗_ℝ(ŷ) ∈ ℝ⁴^N⁻², i.e.,

T_{ℝ}^{*} (\hat{y}) = {δ \in ℝ^{4 N - 2} ∣ γ^{T} δ \leq 0, \forall γ \in T_{ℝ} (\hat{y})} .

(16)

Following the arguments in Proposition 3.6 in [9], we obtain

w (T_{ℝ} (\hat{y}) \cap S_{ℝ}^{4 N - 3}) = E (sup_{γ \in T_{ℝ} (\hat{y}) \cap S_{ℝ}^{4 N - 3}} ξ^{T} γ) \leq E (min_{γ \in T_{ℝ}^{*} (\hat{y})} {‖ ξ - γ ‖}_{2}),

(17)

where ξ ∈ ℝ⁴^N⁻² is a random vector of i.i.d. Gaussian entries of mean 0 and variance 1. Hence, instead of estimating Gaussian width $w (T_{ℝ} (\hat{y}) \cap S_{ℝ}^{4 N - 3})$ , we bound $E ({min}_{γ \in T_{ℝ}^{*} (\hat{y})} {‖ ξ - γ ‖}_{2})$ . For this purpose, let ℱ : ℝ⁴^N⁻² ↦ ℝ be defined by

F ([\begin{matrix} α \\ β \end{matrix}]) = {‖ G (α + ι β) ‖}_{*} .

(18)

The following lemma gives us a characterization of $E ({min}_{γ \in T_{ℝ}^{*} (\hat{y})} {‖ ξ - γ ‖}_{2})$ in terms of the subdifferential ∂ℱ of ℱ.

Lemma 3

Let $T_{ℝ}^{*} (\hat{y})$ and ℱ be defined by (16) and (18) respectively. Let ω̂₁, ω̂₂ ∈ ℝ²^N⁻¹ be the real and imaginary parts of ŷ respectively and denote $\hat{ω} = [\begin{matrix} {\hat{ω}}_{1} \\ {\hat{ω}}_{2} \end{matrix}]$ . Then

T_{ℝ}^{*} (\hat{y}) = cone (\partial F (\hat{ω})) = {λ δ ∣ λ \geq 0, F (γ + \hat{ω}) \geq F (\hat{ω}) + γ^{T} δ, \forall γ \in ℝ^{4 N - 2}} .

(19)

Proof

It is observed that 𝔗_ℝ(ŷ) in (14) is the descent cone of the function ℱ

T_{ℝ} (\hat{y}) = {δ γ ∣ δ \geq 0, F (γ + \hat{ω}) \leq F (\hat{ω})} .

According to Theorem 23.4 in [21], the cone dual to the descent cone is the conic hull of subgradient, which is exactly (19).

The following lemma gives us an estimation of Gaussian width $w (T_{ℝ} (\hat{y}) \cap S_{ℝ}^{4 N - 3})$ in terms of E(||𝒢g||₂).

Lemma 4

Let 𝔗_ℝ(ŷ) and 𝒢 be defined by (14) and (9) respectively. Then

w (T_{ℝ} (\hat{y}) \cap S_{ℝ}^{4 N - 3}) \leq 3 \sqrt{R} \cdot E ({‖ G g ‖}_{2}),

where E(||𝒢g||₂) is the expectation with respect to g ∈ ℂ²^N⁻¹. Here g is a random vector whose real and imaginary parts are i.i.d. mean-0 and variance-1 Gaussian entries.

Proof

By using (17) and Lemma 3, we need to find ∂ℱ(ω̂) and thus $T_{ℝ}^{*} (\hat{y})$ . Let Ω̂₁ = 𝒢ω̂₁ and Ω̂₂ = 𝒢ω̂₂. Then 𝒢ŷ = Ω̂₁ + ιΩ̂₂. Let a singular value decomposition of the rank-R matrix 𝒢ŷ be

G \hat{y} = U \sum V^{*}, with U = Θ_{1} + ι Θ_{2}, V = Ξ_{1} + ι Ξ_{2},

(20)

where Θ₁, Θ₂, Ξ₁, Ξ₂ ∈ ℝ^N^×^R and Σ ∈ ℝ^R^×^R, and U ∈ ℂ^N^×^R and V ∈ ℂ^N^×^R satisfies U^*U = V^*V = I. Then, by direct calculation,

Θ \equiv [\begin{matrix} Θ_{1} & - Θ_{2} \\ Θ_{2} & Θ_{1} \end{matrix}] \in ℝ^{2 N \times (2 R)}, Ξ \equiv [\begin{matrix} Ξ_{1} & - Ξ_{2} \\ Ξ_{2} & Ξ_{1} \end{matrix}] \in ℝ^{2 N \times (2 R)}

(21)

satisfy Θ^TΘ = Ξ^TΞ = I. Moreover, if we define $\hat{Ω} = [\begin{matrix} {\hat{Ω}}_{1} & - {\hat{Ω}}_{2} \\ {\hat{Ω}}_{2} & {\hat{Ω}}_{1} \end{matrix}]$ , then

\hat{Ω} = Θ [\begin{matrix} \sum \\ \sum \end{matrix}] Ξ^{T}

(22)

is a singular value decomposition of the real matrix Ω̂, and the singular values Ω̂ are those of 𝒢ŷ, each repeated twice. Therefore,

F (\hat{ω}) = {‖ G \hat{y} ‖}_{*} = {‖ \sum ‖}_{*} = \frac{1}{2} {‖ \hat{Ω} ‖}_{*} .

(23)

Define a linear operator ℰ : ℝ⁴^N⁻² ↦ ℝ²^N^×2^N by

E ([\begin{matrix} α \\ β \end{matrix}]) = [\begin{matrix} G α & - G β \\ G β & G α \end{matrix}], with α, β \in ℝ^{2 N - 1} .

By (23) and the definition of Ω̂, we obtain $F (\hat{ω}) = \frac{1}{2} {‖ E \hat{ω} ‖}_{*}$ . From convex analysis theory and Ω̂ = ℰω̂, the subdifferential of ℱ is given by

\partial F (\hat{ω}) = \frac{1}{2} E^{*} \partial {‖ \hat{Ω} ‖}_{*} .

(24)

On the one hand, the adjoint ℰ* is given by, for any $Δ = [\begin{matrix} Δ_{11} & Δ_{12} \\ Δ_{21} & Δ_{22} \end{matrix}] \in ℝ^{2 N \times 2 N}$ with each block in ℝ^N^×^N,

E^{*} Δ = [\begin{array}{l} G^{*} (Δ_{11} + Δ_{22}) \\ G^{*} (Δ_{21} - Δ_{12}) \end{array}] .

(25)

On the other hand, since (22) provides a singular value decomposition of Ω̂,

\partial {‖ \hat{Ω} ‖}_{*} = {Θ Ξ^{T} + Δ ∣ Θ^{T} Δ = 0, Δ Ξ = 0, {‖ Δ ‖}_{2} \leq 1} .

(26)

Combining (24), (25), (26) and (21) yields the subdifferential of ℱ at ω̂

\partial F (\hat{ω}) = {[\begin{array}{l} G^{*} (Θ_{1} Ξ_{1}^{T} + Θ_{2} Ξ_{2}^{T} + \frac{Δ_{11} + Δ_{22}}{2}) \\ G^{*} (Θ_{2} Ξ_{1}^{T} - Θ_{1} Ξ_{2}^{T} + \frac{Δ_{21} - Δ_{12}}{2}) \end{array}] | Δ = [\begin{matrix} Δ_{11} & Δ_{12} \\ Δ_{21} & Δ_{22} \end{matrix}], Θ^{T} Δ = 0, Δ Ξ = 0, {‖ Δ ‖}_{2} \leq 1} .

We are now ready for the estimation of the Gaussian width. Let the set 𝔖 be a subset of the set of complex-valued vectors

S = {G^{*} (U V^{*} + W) ∣ U^{*} W = 0, WV = 0, {‖ W ‖}_{2} \leq 1},

(27)

where U, V are in (20). Then, it can be checked that

H \equiv {[\begin{matrix} α \\ β \end{matrix}] ∣ α + ι β \in S} \subset \partial F (\hat{ω}) .

(28)

Actually, for any W = Δ₁ + ιΔ₂ satisfying U^*W = 0, WV = 0 and ||W||₂ ≤ 1, we choose $Δ = [\begin{matrix} Δ_{1} & - Δ_{2} \\ Δ_{2} & Δ_{1} \end{matrix}]$ . Obviously, this choice of Δ satisfies the constraints on Δ in ∂ℱ(ω̂). Furthermore, $U V^{*} + W = (Θ_{1} Ξ_{1}^{T} + Θ_{2} Ξ_{2}^{T} + Δ_{1}) + ι (Θ_{2} Ξ_{1}^{T} + Θ_{1} Ξ_{2}^{T} + Δ_{2})$ . Therefore, (28) holds.

With the help of (28), we get

min_{γ \in T_{ℝ}^{*} (\hat{y})} {‖ ξ - γ ‖}_{2} = min_{λ \geq 0} min_{γ \in \partial F (\hat{ω})} {‖ ξ - λ γ ‖}_{2} \leq min_{λ \geq 0} min_{γ \in H} {‖ ξ - λ γ ‖}_{2} .

(29)

We then convert the real-valued vectors to complex-valued vectors by letting g = ξ₁ + ιξ₂ and c = γ₁ + ιγ₂, where ξ₁ and ξ₂ are the first and second half of ξ respectively and so for γ₁ and γ₂. This leads to

min_{γ \in T_{ℝ}^{*} (\hat{y})} {‖ ξ - γ ‖}_{2} \leq min_{λ \geq 0} min_{γ \in H} {‖ ξ - λ γ ‖}_{2} = min_{λ \geq 0} min_{c \in S} {‖ g - λ c ‖}_{2} .

Since 𝒢*𝒢 is the identity operator and 𝒢𝒢* is an orthogonal projector, for any λ ≥ 0 and c ∈ 𝔖,

\begin{array}{l} {‖ g - λ c ‖}_{2} = {‖ G g - λ G c ‖}_{F} = {‖ G g - λ G G^{*} (U V^{*} + W) ‖}_{F} \\ = {({‖ G g - λ (U V^{*} + W) ‖}_{F}^{2} - {‖ λ (ℐ - G G^{*}) (U V^{*} + W) ‖}_{F}^{2})}^{1 / 2} \\ \leq {‖ G g - λ (U V^{*} + W) ‖}_{F}, \end{array}

(30)

where W satisfies the conditions in the definition of 𝔖 in (27). Define two orthogonal projectors ℘₁ and ℘₂ in ℂ^N^×^N by

P_{1} X = U U^{*} X + XV V^{*} - U U^{*} XV V^{*}, P_{2} X = (I - U U^{*}) X (I - V V^{*}) .

Then, it can be easily checked that: ℘₁X and ℘₂X are orthogonal, X = ℘₁X + ℘₂X, and

P_{1} U V^{*} = U V^{*}, P_{2} U V^{*} = 0, P_{1} W = 0, P_{2} W = W,

(31)

where U, V, W are the same as those in (27). We choose

λ = {‖ P_{2} (G g) ‖}_{2}, W = \frac{1}{λ} P_{2} (G g) .

Then, W satisfies constraints in (27). This, together with (29), (30), (31), implies

\begin{array}{l} min_{γ \in T_{ℝ}^{*} (\hat{y})} {‖ ξ - γ ‖}_{2} \leq ‖ G g - {‖ P_{2} (G g) ‖_{2} U V^{*} - P_{2} (G g) ‖}_{F} = ‖ P_{1} (G g) - {‖ P_{2} (G g) ‖_{2} U V^{*} ‖}_{F} \\ \leq {‖ P_{1} (G g) ‖}_{F} + {‖ P_{2} (G g) ‖}_{2} {‖ U V^{*} ‖}_{F} = {‖ P_{1} (G g) ‖}_{F} + \sqrt{R} {‖ P_{2} (G g) ‖}_{2} . \end{array}

We will estimate both ||℘₁(𝒢g)||_F and ||℘₂(𝒢g)||₂. For ||℘₁(𝒢g)||_F, we have

\begin{array}{l} {‖ P_{1} (G g) ‖}_{F} = {‖ U U^{*} (G g) + (G g) V V^{*} - U U^{*} (G g) V V^{*} ‖}_{F} = {‖ U U^{*} (G g) + (I - U U^{*}) (G g) V V^{*} ‖}_{F} \\ \leq {‖ U U^{*} (G g) ‖}_{F} + {‖ (I - U U^{*}) (G g) V V^{*} ‖}_{F} \leq {‖ U U^{*} (G g) ‖}_{F} + {‖ (G g) V V^{*} ‖}_{F} \\ \leq 2 \sqrt{R} {‖ G g ‖}_{2} \end{array}

where in the last line we have used the inequality

{‖ U U^{*} (G g) ‖}_{F} \leq {‖ U U^{*} ‖}_{F} {‖ G g ‖}_{2} \leq \sqrt{R} {‖ G g ‖}_{2}

and similarly ${‖ (G g) V V^{*} ‖}_{F} \leq \sqrt{R} {‖ G g ‖}_{2}$ . For ||℘₂(𝒢g)||₂,

{‖ P_{2} (G g) ‖}_{2} = {‖ (I - U U^{*}) (G g) (I - V V^{*}) ‖}_{2} \leq {‖ I - U U^{*} ‖}_{2} {‖ G g ‖}_{2} {‖ I - V V^{*} ‖}_{2} \leq {‖ G g ‖}_{2} .

Altogether, we obtain

min_{γ \in T_{ℝ}^{*} (\hat{y})} {‖ ξ - γ ‖}_{2} \leq 3 \sqrt{R} {‖ G g ‖}_{2},

which together with (17) gives

w (T_{ℝ} (\hat{y}) \cap S_{ℝ}^{4 N - 3}) \leq 3 \sqrt{R} \cdot E ({‖ G g ‖}_{2}) .

3.5. Bound of E(||𝒢g||₂)

The estimation of E(||𝒢g||₂) plays an important role in proving Theorem 1 since it is needed to give the tight bound of the Gaussian width $w (T_{ℝ} (\hat{y}) \cap S_{ℝ}^{4 N - 3})$ . The following theorem gives us a bound for E(||𝒢g||₂).

Theorem 3

Let g ∈ ℝ²^N⁻¹ be a random vector whose entries are i.i.d. Gaussian random variables with mean 0 and variance 1, or g ∈ ℂ²^N⁻¹ a random vector whose real part and imaginary part have i.i.d. Gaussian random entries with mean 0 and variance 1. Then,

E ({‖ G g ‖}_{2}) \leq C_{1} ln N,

where C₁ are some positive universal constants.

We will use the moment method (see Chapter 2.3 in [24] for more details) to prove Theorem 3. In order to help the reader easily understand the proof, we begin with the real case and introduce some ideas and lemmas first. Assume g ∈ ℝ²^N⁻¹ has i.i.d standard Gaussian entries with mean 0 and variance 1. Notice that 𝒢g is symmetric. Therefore, for any even integer k, (tr (𝒢g)^k)^1/^k is the k-norm of vector of singular values, which implies ||𝒢g||₂ ≤ (tr (𝒢g)^k)^1/^k. This together with Jensen’s inequality,

E ({‖ G g ‖}_{2}) \leq E ({(tr {(G g)}^{k})}^{1 / k}) \leq {(E (tr {(G g)}^{k}))}^{1 / k} .

(32)

Thus, in order to get an upper bound of E(||𝒢g||₂), we estimate E(tr((𝒢g)^k)). Denote M = 𝒢g. It is easy to see that

E (tr (M^{k})) = \sum_{0 \leq i_{1}, i_{2}, \dots, i_{k} \leq N - 1} E (M_{i_{1} i_{2}} M_{i_{2} i_{3}} \dots M_{i_{k - 1} i_{k}} M_{i_{k} i_{1}}) .

(33)

Therefore, we only need to estimate Σ_{0≤i₁, i₂,..., i_k≤N−1} E(M_i₁i₂M_i₂i₃... M_{i_k−1i_k}M_{i_ki₁}).

To simplify the notation, we denote i_k₊₁ = i₁. Notice that $M_{i j} = \frac{g_{i + j}}{\sqrt{K_{i + j}}}$ , where g_i₊_j is a random Gaussian variable and K_j is defined in (8). Hence, M_{i_ℓ, i_ℓ+1} = M_{i_ℓ′, i_ℓ′+1} if and only if i_ℓ + i_ℓ₊₁ = i_ℓ_′ + i_ℓ_′+1. In order to utilize this property, we would like to introduce a graph for any given index i₁, i₂, ..., i_k and its equivalent edges on the graph. More specifically, we construct graph 𝔉_{i₁, i₂,..., i_k} with nodes to be i₁, i₂, ..., i_k and edges to be (i₁, i₂), (i₂, i₃), ..., (i_k₋₁, i_k), (i_k, i₁). Let the weight for the edge (i_ℓ, i_ℓ₊₁) be i_ℓ + i_ℓ₊₁. The edges with the same weights are considered as an equivalent class. Obviously, M_{i_ℓ, i_ℓ+1} = M_{i_ℓ′, i_ℓ′+1} if and only if (i_ℓ, i_ℓ₊₁) and (i_ℓ_′, i_ℓ_′+1) are in the same equivalent class. Assume there are p equivalent classes of the edges of 𝔉_{i₁,i₂,...,i_k}. These equivalent classes are indexed by 1, 2, ..., p according to their order in the graph traversal i₁ → i₂ → ... → i_k → i₁. We associate with the graph 𝔉_{i₁,i₂,...,i_k} a sequence c₁c₂ ... c_k, where c_j is the index of the equivalent class of the edge (i_j, i_j₊₁). We call c₁c₂ ... c_k the label for the equivalent classes of the graph 𝔉_{i₁,i₂,...,i_k}.

The label for the equivalent classes of the graph 𝔉_{i₁,i₂,...,i_k} plays an important role in bounding E(||𝒢g||₂). In order to help the reader understand this concept better, we give two specific examples here. For N = 6, k = 6, i₁ = 1, i₂ = 4, i₃ = 1, i₄ = 3, i₅ = 1, i₆ = 4, we have a corresponding graph and its label for the equivalent classes of the graph is 112211. For N = 6, k = 6, i₁ = 2, i₂ = 3, i₃ = 2, i₄ = 4, i₅ = 2, i₆ = 3, the label for the equivalent classes of the corresponding graph is 112211 as well. Therefore, there may be several different index sequences i₁i₂ ... i_k that correspond to the same label for the equivalent classes of the corresponding graph. Let 𝔄_{c₁c₂...c_k} be the set of indices whose label of equivalent class of the corresponding graph is c₁c₂ ... c_k, i.e.

A_{c_{1} c_{2} \dots c_{k}} = {i_{1} i_{2} \dots i_{k} ∣ the label for the equivalent class of the graph F_{i_{1}, i_{2}, \dots, i_{k}} is c_{1} c_{2} \dots c_{k}}

(34)

For given c₁c₂ ... c_k, 𝔄_{c₁c₂...c_k} is a subset of {i₁i₂ ... i_k |i_j ∈ {0, 1, ..., N − 1}, ∀ j = 1, ..., k}. The following lemma gives us an estimate for the bound Σ_{i₁,i₂...i_k∈𝔄c1c2...ck} E(M_i₁i₂M_i₂i₃ ...M_{i_k−1i_k}M_{i_ki₁}).

Lemma 5

Let ζ be the Riemann zeta function and 𝔄_{c₁c₂...c_k} be defined in (34). Define B(s) = ln(N + 1) if s = 2 and B(s) = ζ (s/2) ≤ π²/6 for s ≥ 4. Then

\sum_{i_{1} i_{2} \dots i_{k} \in A_{c_{1} c_{2} \dots c_{k}}} E (M_{i_{1} i_{2}} M_{i_{2} i_{3}} \dots M_{i_{k - 1} i_{k}} M_{i_{k} i_{1}}) \leq N \prod_{ℓ = 1}^{p} B (s_{ℓ}) (s_{ℓ} - 1)!!

(35)

where p is the number of equivalent classes shown in c₁c₂ ... c_k, and s_ℓ, ℓ = 1, ..., p, is the frequency of ℓ in c₁c₂ ... c_k.

Proof

We begin with finding free indices for any i₁, i₂, ..., i_k in the set 𝔄_{c₁c₂...c_k}. Let (j₁, j₂) be the first edge of the class 1. Therefore, the weight of the first class is j₁ + j₂. For convenience, we define k₁(j₁) = j₁. The first edge of the class 2 must have a vertex k₂(j₁, j₂), depending on j₁ and j₂, and a free vertex, denoted by j₃. The weight of the second class is k₂(j₁, j₂) + j₃. Similarly, the first edge in class 3 has a vertex k₃(j₁, j₂, j₃) and a free vertex j₄, and the weight is k₃(j₁, j₂, j₃) + j₄, and so on. Finally, the first edge in class p has a vertex k_p(j₁, j₂, ..., j_p) and a free vertex j_p₊₁, and the weight is k_p(j₁, j₂, ..., j_p) + j_p₊₁. Recall that the entry M_ij is $\frac{g_{i + j}}{\sqrt{K_{i + j}}}$ , where g_i₊_j is a random Gaussian variable. Therefore, for any i₁i₂ ... i_k ∈ 𝔄_{c₁c₂...c_k},

E (M_{i_{1} i_{2}} M_{i_{2} i_{3}} \dots M_{i_{k - 1} i_{k}} M_{i_{k} i_{1}}) = \prod_{ℓ = 1}^{p} \frac{1}{K_{m_{ℓ}}^{s_{ℓ} / 2}} E (g_{m_{ℓ}}^{s_{ℓ}}),

(36)

where m_ℓ = k_ℓ (j₁, j₂, ..., j_ℓ) + j_ℓ₊₁. Therefore, it is non-vanishing if and only if s₁, s₂, ..., s_p are all even. In these cases,

E (M_{i_{1} i_{2}} M_{i_{2} i_{3}} \dots M_{i_{k - 1} i_{k}} M_{i_{k} i_{1}}) = \prod_{ℓ = 1}^{p} \frac{(s_{ℓ} - 1)!!}{K_{m_{ℓ}}^{s_{ℓ} / 2}} .

(37)

Summing (37) over 𝔄_{c₁c₂...c_k}, we obtain

\begin{array}{l} \sum_{i_{1} i_{2} \dots i_{k} \in A_{c_{1} c_{2} \dots c_{k}}} E (M_{i_{1} i_{2}} M_{i_{2} i_{3}} \dots M_{i_{k - 1} i_{k}} M_{i_{k} i_{1}}) \leq \sum_{j_{1} = 0}^{N - 1} \sum_{j_{2} = 0}^{N - 1} \dots \sum_{j_{p} = 0}^{N - 1} \sum_{j_{p + 1} = 0}^{N - 1} \prod_{ℓ = 1}^{p} \frac{(s_{ℓ} - 1)!!}{K_{m_{ℓ}}^{s_{ℓ} / 2}} \\ = \sum_{j_{1} = 0}^{N - 1} \sum_{j_{2} = 0}^{N - 1} (\frac{(s_{1} - 1)!!}{K_{k_{1} (j_{1}) + j_{2}}^{s_{1} / 2}} \sum_{j_{3} = 0}^{N - 1} (\frac{(s_{2} - 1)!!}{K_{k_{2} (j_{1}, j_{2}) + j_{3}}^{s_{2} / 2}} \sum_{i_{4} = 0}^{N - 1} (\dots \sum_{j_{p + 1} = 0}^{N - 1} \frac{(s_{p} - 1)!!}{K_{k_{p} (j_{1}, \dots, j_{p}) + j_{p + 1}}^{s_{p} / 2}}) \dots)) \end{array}

Since, for any 0 ≤ c ≤ N − 1,

\sum_{ℓ = 0}^{N - 1} \frac{1}{K_{c + ℓ}^{s / 2}} \leq {\begin{cases} 1 + 1 / 2 + 1 / 3 + \dots 1 / N \leq ln (N + 1) & s = 2, \\ 1 + 1 / 2^{s / 2} + \dots + 1 / N^{s / 2} \leq ζ (s / 2) & s = 4, 6, \dots \end{cases}

where ζ is the Riemann zeta function. By defining B(s) = ln(N + 1) if s = 2 and B(s) = ζ (s/2) ≤ π²/6 for s ≥ 4, the desired result easily follows.

The desired bound for E(||𝒢g||₂) can be obtained if we know how many different sets of 𝔄_{c₁c₂…c_k} are available in the set {i₁i₂ … i_k|i_j ∈ {0, 1, …, N − 1}, ∀ j = 1, …, k}. Let 𝔅_{s₁s₂…s_p} be the set of all labels of p equivalent classes with ℓ-th class containing s_ℓ equivalent edges respectively, i.e.

B_{s_{1} s_{2} \dots s_{p}} = {c_{1} c_{2} \dots c_{p} ∣ \begin{array}{l} c_{1} c_{2} \dots c_{p} is a valid label of equivalent classes in graph F_{i_{1} i_{2} \dots i_{k}} \\ and there are s_{ℓ} ℓ' s in the label c_{1} c_{2} \dots c_{p} \end{array}}

(38)

Let ℭ_p be the set of all possible choice of p positive even numbers s₁, …, s_p satisfying s₁ +s₂ +…+s_p = k. Then

\begin{array}{l} E (tr (M^{k})) = \sum_{0 \leq i_{1}, i_{2}, \dots, i_{k} \leq N - 1} E (M_{i_{1} i_{2}} M_{i_{2} i_{3}} \dots M_{i_{k - 1} i_{k}} M_{i_{k} i_{1}}) \\ \leq \sum_{p = 1}^{k / 2} \sum_{s_{1} \dots s_{p} \in C_{p}} \sum_{c_{1} c_{2} \dots c_{k} \in B_{s_{1} s_{2} \dots s_{p}}} \sum_{i_{1} i_{2} \dots i_{k} \in A_{c_{1} c_{2} \dots c_{k}}} E (M_{i_{1} i_{2}} M_{i_{2} i_{3}} \dots M_{i_{k - 1} i_{k}} M_{i_{k} i_{1}}) \end{array}

(39)

By bounding the cardinality of 𝔅_{s₁s₂…s_p} and ℭ_p, we can derive the bound E(tr(M^k)) hence E(||𝒢g||₂) for the real case. The complex case can be proved by directly using the results for the real case. Now, we are in position to prove Theorem 3.

Proof of Theorem 3

Following (39), we need to count the cardinality of 𝔅_{s₁s₂…s_p}. For any c₁c₂ … c_k ∈ 𝔅_{s₁s₂…s_p}, we must have c₁ = 1. Therefore, there are $(\begin{matrix} k - 1 \\ s_{1} - 1 \end{matrix})$ choices of the positions of remaining 1’s in c₁c₂ … c_k. Once positions for 1’s are fixed, the position of the first 2 has to be the first available slot, we have $(\begin{matrix} k - s_{1} - 1 \\ s_{2} - 1 \end{matrix})$ choices for the positions of remaining 2’s, and so on. Thus,

\begin{array}{l} ∣ B_{s_{1} s_{2} \dots s_{p}} ∣ \leq (\begin{matrix} k - 1 \\ s_{1} - 1 \end{matrix}) \cdot (\begin{matrix} k - s_{1} - 1 \\ s_{2} - 1 \end{matrix}) \cdot \dots \cdot (\begin{matrix} k - s_{1} - \dots - s_{p - 1} - 1 \\ s_{p} - 1 \end{matrix}) \\ = \frac{(k - 1) (k - 2) \dots (k - s_{1} + 1)}{(s_{1} - 1)!} \frac{(k - s_{1} - 1) \dots (k - s_{1} - s_{2} + 1)}{(s_{2} - 1)!} \dots 1 \\ = \frac{(k - 1)!}{\prod_{ℓ = 1}^{p} (s_{ℓ} - 1)! \prod_{ℓ = 1}^{p - 1} (k - s_{1} - \dots - s_{ℓ})}, \end{array}

which together with (35) implies, for any s₁s₂ … s_p ∈ ℭ_p,

\begin{array}{l} \sum_{c_{1} c_{2} \dots c_{k} \in B_{s_{1} s_{2} \dots s_{p}}} \sum_{i_{1} i_{2} \dots i_{k} \in A_{c_{1} c_{2} \dots c_{k}}} E (M_{i_{1} i_{2}} M_{i_{2} i_{3}} \dots M_{i_{k - 1} i_{k}} M_{i_{k} i_{1}}) \\ \leq N \frac{(k - 1)!}{\prod_{ℓ = 1}^{p} (s_{ℓ} - 2)!! \prod_{ℓ = 1}^{p - 1} (k - s_{1} - \dots - s_{ℓ})} \prod_{ℓ = 1}^{p} B (s_{ℓ}) \end{array}

(40)

Summing (40) over ℭ_p yields

\begin{array}{l} \sum_{s_{1} \dots s_{p} \in C_{p}} \sum_{c_{1} c_{2} \dots c_{k} \in B_{s_{1} s_{2} \dots s_{p}}} \sum_{i_{1} i_{2} \dots i_{k} \in A_{c_{1} c_{2} \dots c_{k}}} E (M_{i_{1} i_{2}} M_{i_{2} i_{3}} \dots M_{i_{k - 1} i_{k}} M_{i_{k} i_{1}}) \\ \leq N (k - 1)! \sum_{s_{1} \dots s_{p} \in C_{p}} \frac{\prod_{ℓ = 1}^{p} B (s_{ℓ})}{\prod_{ℓ = 1}^{p} (s_{ℓ} - 2)!! \prod_{ℓ = 1}^{p - 1} (k - s_{1} - \dots - s_{ℓ})} \end{array}

(41)

Let us estimate the sum in the last line. Let s be the number of 2’s in s₁s₂ … s_p. Then,

\prod_{ℓ = 1}^{p} B (s_{ℓ}) \leq {ln}^{s} (N + 1) {(\frac{π^{2}}{6})}^{p - s} .

(42)

Since each s₁, …, s_p ≥ 2 and there are p − s terms greater than 4 among them, we have

\prod_{ℓ = 1}^{p} (s_{ℓ} - 2)!! \geq 2^{p - s}

(43)

and k − s₁ − … − s_ℓ = s_ℓ+1 + … + s_p ≥ 2(p − ℓ), which implies

\prod_{ℓ = 1}^{p - 1} (k - s_{1} - \dots - s_{ℓ}) \geq \prod_{ℓ = 1}^{p - 1} 2 (p - ℓ) = 2^{p - 1} (p - 1)! .

(44)

There are $(\begin{matrix} p \\ s \end{matrix})$ choices of the positions of the s 2’s. Moreover, once the s 2’s in s₁s₂ … s_p are chosen, there are at most

(\frac{k}{2} - s) \cdot (\frac{k}{2} - s - 1) \cdot \dots \cdot (\frac{k}{2} - s - (p - s + 1)) \leq {(\frac{k}{2})}^{p - s}

choices of the remaining p − s s_j ‘s. Altogether,

\begin{array}{l} \sum_{s_{1} \dots s_{p} \in C_{p}} \sum_{c_{1} c_{2} \dots c_{k} \in B_{s_{1} s_{2} \dots s_{p}}} \sum_{i_{1} i_{2} \dots i_{k} \in A_{c_{1} c_{2} \dots c_{k}}} E (M_{i_{1} i_{2}} M_{i_{2} i_{3}} \dots M_{i_{k - 1} i_{k}} M_{i_{k} i_{1}}) \\ \leq N (k - 1)! \sum_{s = 0}^{p} (\begin{matrix} p \\ s \end{matrix}) {(\frac{k}{2})}^{p - s} {ln}^{s} (N + 1) {(\frac{π^{2}}{6})}^{p - s} \frac{1}{2^{p - s} 2^{p - 1} (p - 1)!} \\ = 2 N (k - 1)! \frac{1}{(p - 1)!} \sum_{s = 0}^{p} (\begin{matrix} p \\ s \end{matrix}) {(\frac{k}{2})}^{p - s} {ln}^{s} (N + 1) {(\frac{π^{2}}{6})}^{p - s} \frac{1}{4^{p - s} 2^{s}} \\ = \frac{2 N (k - 1)!}{(p - 1)!} \sum_{s = 0}^{p} (\begin{matrix} p \\ s \end{matrix}) {(\frac{π^{2} k}{48})}^{p - s} {(\frac{ln (N + 1)}{2})}^{s} \\ = 2 N (k - 1)! \frac{{(\frac{π^{2}}{48} k + \frac{ln (N + 1)}{2})}^{p}}{(p - 1)!} \end{array}

(45)

Finally, (45) is summed over all possible p and we obtain

\begin{array}{l} E (tr (M^{k})) = \sum_{i_{1}, i_{2}, \dots, i_{k}} E (M_{i_{1} i_{2}} M_{i_{2} i_{3}} \dots M_{i_{k - 1} i_{k}} M_{i_{k} i_{1}}) \\ \leq \sum_{p = 1}^{k / 2} \sum_{s_{1} \dots s_{p} \in C_{p}} \sum_{c_{1} c_{2} \dots c_{k} \in B_{s_{1} s_{2} \dots s_{p}}} \sum_{i_{1} i_{2} \dots i_{k} \in A_{c_{1} c_{2} \dots c_{k}}} E (M_{i_{1} i_{2}} M_{i_{2} i_{3}} \dots M_{i_{k - 1} i_{k}} M_{i_{k} i_{1}}) \\ \leq 2 N (k - 1)! \sum_{p = 1}^{k / 2} \frac{{(\frac{π^{2}}{48} k + \frac{ln (N + 1)}{2})}^{p}}{(p - 1)!} \end{array}

(46)

By using the fact that, for any A > 0,

\sum_{p = 1}^{k / 2} \frac{A^{p}}{(p - 1)!} = A (1 + A + \frac{A^{2}}{2!} + \dots + \frac{A^{k / 2 - 1}}{(k / 2 - 1)!}) \leq A e^{A},

(46) is rearranged into

\begin{array}{l} E (tr (M^{k})) \leq 2 N (k - 1)! (\frac{π^{2}}{48} k + \frac{ln (N + 1)}{2}) e^{\frac{π^{2}}{48} k + \frac{ln (N + 1)}{2}} = 2 N \sqrt{N + 1} (k - 1)! (\frac{π^{2}}{48} k + \frac{ln (N + 1)}{2}) e^{\frac{π^{2}}{48} k} \\ \leq 2 {(N + 1)}^{\frac{3}{2}} k^{k} (\frac{π^{2}}{48} + \frac{ln (N + 1)}{2 k}) e^{\frac{π^{2}}{48} k} . \end{array}

Let k be the smallest even integer greater than $\frac{24}{π^{2}} ln (N + 1)$ . Then using ||M||₂ ≤ (tr(M^k))^1/^k lead to

\begin{array}{l} E ({‖ M ‖}_{2}) \leq E ({(tr (M^{k}))}^{1 / k}) \leq {(E (tr (M^{k})))}^{1 / k} \leq {(2 {(N + 1)}^{\frac{3}{2}})}^{1 / k} k {(\frac{π^{2}}{48} + \frac{ln (N + 1)}{2 k})}^{1 / k} e^{\frac{π^{2}}{48}} \\ \leq 2^{\frac{π^{2}}{24 ln (N + 1)}} \cdot e^{\frac{π^{2}}{16}} \cdot \frac{24}{π^{2}} ln (N + 1) \cdot {(\frac{π^{2}}{24})}^{\frac{π^{2}}{24 ln (N + 1)}} \cdot e^{\frac{π^{2}}{48}} \leq C_{1} ln N, \end{array}

where the constant C₁ is some universal constant.

Next, we estimate the complex case. In this case, g ∈ ℂ²^N⁻¹, where both its real part and imaginary part have i.i.d. Gaussian entries. Write g = ξ + ıη, where ξ, η ∈ ℝ²^N⁻¹ are real-valued random Gaussian vectors. From the real-valued case above, we derive

E ({‖ G ξ ‖}_{2}) \leq C_{1} ln N, E ({‖ G η ‖}_{2}) \leq C_{1} ln N .

Therefore,

E ({‖ G g ‖}_{2}) = E ({‖ G ξ + ι G η ‖}_{2}) \leq E ({‖ G ξ ‖}_{2}) + E ({‖ G η ‖}_{2}) \leq 2 C_{1} ln N .

3.6. Proof of Theorem 1

With Lemmas 1, 2, 4, and Theorem 3 in hand, we are in position to prove Theorem 1.

Proof of Theorem 1

Since (10) is equivalent to (4) by the relation y = Dx, we only need to prove that ŷ = ỹ for noise free data (||ŷ−ỹ||₂ ≤ 2δ/ε for noisy data) with dominant probability. According to Lemma 1, we only need to prove (13). By Lemma 2,

P (min_{z \in T (\hat{y}) \cap S_{c}^{2 N - 2}} {‖ Bz ‖}_{2} \geq ε) \geq 1 - 2 e^{- \frac{1}{2} {(λ_{M} - w (T_{ℝ} (\hat{y}) \cap S_{ℝ}^{4 N - 3}) - \frac{ε}{\sqrt{2}})}^{2}} .

Lemma 4, Theorem 3, and the inequality $λ_{M} \geq \frac{M}{\sqrt{M + 1}}$ imply that

λ_{M} - w (T_{ℝ} (\hat{y}) \cap S_{ℝ}^{4 N - 3}) - \frac{ε}{\sqrt{2}} \geq \frac{M}{\sqrt{M + 1}} - 3 C_{1} \sqrt{R} ln N - \frac{ε}{\sqrt{2}} \geq \sqrt{M - 1} - 3 C_{1} \sqrt{R} ln N - \frac{ε}{\sqrt{2}} .

When $M \geq {(6 C_{1} \sqrt{R} ln N + \sqrt{2} ε)}^{2} + 1$ , we can easily get $P ({min}_{z \in T (\dot{y}) \cap S_{c}^{2 N - 2}} {‖ Bz ‖}_{2} \geq ε) \geq 1 - 2 e^{- \frac{M - 1}{8}}$ . We get the desired result.

4. Extension to structured low-rank matrix reconstruction

In this section, we extend our results to low-rank Hankel matrix reconstruction and low-rank Toeplitz matrix reconstruction from their Gaussian measurements.

Since the proof of Theorem 1 does not use the specific property that ŷ is an exponential signal, Theorem 1 holds true for any low-rank Hankel matrices. We have the following corollary, which reads that any Hankel matrix of size N × N and rank R can be recovered exactly from its O(Rln² N) Gaussian measurements, and this reconstruction is robust to noise.

Corollary 1 (Low-rank Hankel matrix reconstruction)

Let Ĥ ∈ ℂ^N^×^N be a given Hankel matrix with rank R. Let x̂ ∈ ℂ²^N⁻¹ be satisfying x̂_i₊_j = Ĥ_ij for 0 ≤ i, j ≤ N − 1. Let A = BD ∈ ℂ^M^×(2^N⁻¹⁾, where B ∈ ℂ^M^×(2^N⁻¹⁾ is a random matrix whose real and imaginary parts are i.i.d. Gaussian with mean 0 and variance 1, D ∈ ℝ⁽²^N^−1)×(2^N⁻¹⁾ is the same as defined in Theorem 1. Then, there exists a universal constant C₁ > 0 such that, for any ε > 0, if

M \geq {(C_{1} \sqrt{R} ln N + \sqrt{2} ε)}^{2} + 1,

then, with probability at least $1 - 2 e^{- \frac{M - 1}{8}}$ , we have

H(x̃) = Ĥ, where x̃ is the unique solution of
$min_{x} {‖ H (x) ‖}_{*} subject t o Ax = b$

with b = Ax̂;
||H(x̃) − Ĥ )||_F ≤ 2δ/ε, where x̃ is the unique solution of
$min_{x} {‖ H (x) ‖}_{*} subject t o {‖ Ax - b ‖}_{2} \leq δ$

with ||b − Ax̂||₂ ≤ δ.

Moreover, Theorem 1 can be extended to the reconstruction of low-rank Toeplitz matrix from its Gaussian measurements. Let T̂ ∈ ℂ^N^×^N be a Toeplitz matrix. Let x̂ ∈ ℂ²^N⁻¹ be a vector satisfying x̂_N₋₁₊₍_i₋_j₎ = T̂_i,j for 0 ≤ i, j ≤ N − 1. Let P ∈ ℂ^N^×^N be an anti-diagonal matrix with anti-diagonals of 1. Then, it is easy to check that T̂ = H(x̂)P. Thus, we define a linear operator T that maps a vector in ℂ²^N⁻¹ to a N × N Toeplitz matrix by T(x) = H(x)P. Since P is a unitary matrix, one has ||T(x)||_* = ||H(x)P||_* = ||H(x)||_*. Therefore, the above corollary can be adapted to low-rank Toeplitz matrices.

5. Numerical experiments

In this section, we use numerical experiments to demonstrate the empirical performance of our proposed approach, and compare it with other methods including those in [10,23]. We use superpositions of complex sinusoids as test signals. Note that the application of our approach is not limited to such signals but any signals that are superpositions of complex exponentials. We are going to consider two sampling schemes, namely, the random Gaussian sampling model in Theorem 1 and the non-uniform sampling of entries studied in [10,23]. For the latter sampling scheme, it randomly observes M entries of x̂ whose locations are uniformly distributed in all M-subsets of {0, 1, …, 2N −2}. We also consider two signal reconstruction algorithms, the Hankel nuclear norm minimization and the atomic norm minimization, from the given samples. Therefore, we have four different approaches to compare: the Hankel nuclear norm minimization with random Gaussian sampling (our proposed approach), the Hankel nuclear norm minimization with non-uniform sampling of entries (EMaC in [10]), the atomic norm minimization with random Gaussian sampling, and the atomic norm minimization with non-uniform sampling of entries (off-the-grid CS in [23]).

We fix N = 64, i.e., the dimension of the true signal x̂ is 127. We conduct experiments under different M and R for different approaches. For each approach with a fixed M and R, we test 100 runs, where each run is executed as follows. We first generate the true signal x̂ = [x̂(0), x̂(1), …, x̂(126)]^T with $\hat{x} (t) = \sum_{k = 1}^{R} c_{k} e^{ι 2 π f_{k} t}$ for t = 0, 1, …, 126, where f_k are frequencies drawn from the interval [0, 1] uniformly at random, and c_k are complex coefficients satisfying the model c_k = (1+10^0.5m_k)e^i2πθ_k with m_k and θ_k being uniformly randomly drawn from the interval [0, 1]. Then we get M samples of x̂ according to the corresponding sampling scheme. Finally, a reconstruction x̃ is obtained by solving the corresponding reconstruction algorithm, which is numerically implemented by alternating direction method of multipliers (ADMM). If $\frac{{‖ \tilde{x} - \hat{x} ‖}_{2}}{{‖ \hat{x} ‖}_{2}} \leq 10^{- 3}$ , then we regard it as a successful reconstruction.

We plot in Fig. 1 the rate of successful reconstruction with respect to different M and R for different approaches. The black and white region indicate a 0% and 100% of successful reconstruction respectively, and a grey between 0% and 100%. From the figure, we see that the atomic norm minimization has similar performance under the random Gaussian sampling and the non-uniform sampling of entries. Moreover, the Hankel nuclear norm minimization also has similar performance under these two types of different sampling schemes. Compared with the atomic norm minimization, the Hankel nuclear norm minimization method is more robust when neighboring frequencies are close, despite different sampling schemes used.

6. Conclusion and future works

In this paper, we study compressed sensing of signal that is a weighted sum of R complex exponential functions with or without damping factor. The measurements are obtained by random Gaussian projections. We prove that, as long as the number of measurements is greater than O(Rln² N) with N the dimension of the signal, minimization (4) is guaranteed to get a robust reconstruction of the underlying signal. Compared to results in [10,23] where partial entries of the underlying signal are observed, our proposed approach does not require any incoherence condition (i.e. does not require any separation condition on frequencies for signals without damping).

The bound O(Rln² N) we obtained is not optimal. There are several possible direction to improve it. Firstly, we may improve the estimation of E(||𝒢g||₂), a key step in our proof. We empirically observed that $E ({‖ G g ‖}_{2}) = O (\sqrt{R ln N})$ , which is better than the bound in Theorem 3. We would prove this bound theoretically. Secondly, we may borrow techniques from compressed sensing to get the optimal bound under Gaussian measurements. Actually, there has been recent work that yields precise bounds [17,18,22]. These results assume Gaussian measurements, but all quantities involved are reals. It is interesting to explore whether those results can be extended to our setting.

Acknowledgments

Sponsor names

NSF, country=United States, grants=DMS-1418737

National Natural Science Foundation of China, country=China, grants=61571380, 61201045

Simons Foundation, country=United States, grants=

NIH, country=United States, grants=1R01EB020665-01

References

1.Borcea L, Papanicolaou G, Tsogka C, Berryman J. Imaging and time reversal in random media. Inverse Probl. 2002;18:1247–1279. [Google Scholar]
2.Cai JF, Candès EJ, Shen Z. A singular value thresholding algorithm for matrix completion. SIAM J Optim. 2010;20:1956–1982. [Google Scholar]
3.Candes E, Li X, Ma Y, Wright J. Robust principal component analysis? J ACM. 2011:1–37. [Google Scholar]
4.Candes E, Fernandez-Granda C. Towards a mathematical theory of super-resolution. Comm Pure Appl Math. 2014;67(6):906–956. [Google Scholar]
5.Candes E, Plan Y. Matrix completion with noise. Proc IEEE. 2009;98:925–936. [Google Scholar]
6.Candes E, Recht B. Exact matrix completion via convex optimization. Found Comput Math. 2009;9:717–772. [Google Scholar]
7.Candes E, Tao T. The power of convex relaxation: near-optimal matrix completion. IEEE Trans Inform Theory. 2010;56:2053–2080. [Google Scholar]
8.Candès EJ, Romberg J, Tao T. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inform Theory. 2006;52:489–509. [Google Scholar]
9.Chandrasekaran V, Recht B, Parrilo PA, Willsky AS. The convex geometry of linear inverse problems. Found Comput Math. 2012;12:805–849. [Google Scholar]
10.Chen Y, Chi Y. Robust spectral compressed sensing via structured matrix completion. IEEE Trans Inform Theory. 2014;60:6576–6601. [Google Scholar]
11.Chi Y, Scharf LL, Pezeshki A, Calderbank AR. Sensitivity to basis mismatch in compressed sensing. IEEE Trans Signal Process. 2011;59:2182–2195. [Google Scholar]
12.Donoho DL. Compressed sensing. IEEE Trans Inform Theory. 2006;52:1289–1306. [Google Scholar]
13.Fazel M, Fazel Maryam, Pong TK, Sun D, Tseng P. Hankel matrix rank minimization with applications to system identification and realization. SIAM J Matrix Anal Appl. 2013;34:946–977. [Google Scholar]
14.Gordon Y. Geometric Aspects of Functional Analysis, 1986/87. Vol. 1317. Springer; Berlin: 1988. On Milman’s inequality and random subspaces which escape through a mesh in Rn; pp. 84–106. Lecture Notes in Math. [Google Scholar]
15.Hua Y, Sarkar TK. Matrix pencil method for estimating parameters of exponentially damped/undamped sinusoids in noise. IEEE Trans Acoust Speech Signal Process. 1990;38:814–824. [Google Scholar]
16.Lustig M, Donoho D, Pauly JM. Sparse MRI: the application of compressed sensing for rapid MR imaging. Magn Reson Med. 2007;58:1182–1195. doi: 10.1002/mrm.21391. [DOI] [PubMed] [Google Scholar]
17.Oymak S, Thrampoulidis C, Hassibi B. The squared-error of generalized LASSO: a precise analysis. 51st Annual Allerton Conference on Communication, Control, and Computing, Allerton; IEEE; 2013. pp. 1002–1009. [Google Scholar]
18.Thrampoulidis C, Oymak S, Hassibi B. Simple error bounds for regularized noisy linear inverse problems. IEEE International Symposium on Information Theory; IEEE; 2014. pp. 3007–3011. [Google Scholar]
19.Qu X, Mayzel M, Cai JF, Chen Z, Orekhov V. Accelerated NMR spectroscopy with low-rank reconstruction. Angew Chem, Int Ed. 2015;54:852–854. doi: 10.1002/anie.201409291. [DOI] [PubMed] [Google Scholar]
20.Recht B, Fazel M, Parrilo P. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 2010;52:471–501. [Google Scholar]
21.Rockafellar RT. Princeton Landmarks in Mathematics. Princeton University Press; Princeton, NJ: 1997. Convex analysis. Reprint of the 1970 original, Princeton Paperbacks. [Google Scholar]
22.Stojnic M. A framework to characterize performance of lasso algorithms. 2013 arXiv:1303.7291. [Google Scholar]
23.Tang G, Bhaskar BN, Shah P, Recht B. Compressive sensing off the grid. IEEE Trans Inform Theory. 2013;59:7465–7490. [Google Scholar]
24.Tao T. Topics in Random Matrix Theory. American Mathematical Society; Providence, Rhode Island: 2012. [Google Scholar]
25.Tropp JA, Laska JN, Duarte MF, Romberg JK, Baraniuk RG. Beyond Nyquist: efficient sampling of sparse bandlimited signals. IEEE Trans Inform Theory. 2010;56:520–544. [Google Scholar]
26.Rao Richard, Kailath Thomas. ESPRIT-estimation of signal parameters via rotational invariance techniques. IEEE Trans Acoust Speech Signal Process. 1989;37(7):984–995. [Google Scholar]
27.Liao W, Fannjiang A. MUSIC for single-snapshot spectral estimation: stability and super-resolution. arXiv:1404.1484. [Google Scholar]
28.Moitra A. Super-resolution, extremal functions and the condition number of Vandermonde matrices. Proceedings of the 47th Annual ACM Symposium on Theory of Computing; STOC; 2015. [Google Scholar]

[R1] 1.Borcea L, Papanicolaou G, Tsogka C, Berryman J. Imaging and time reversal in random media. Inverse Probl. 2002;18:1247–1279. [Google Scholar]

[R2] 2.Cai JF, Candès EJ, Shen Z. A singular value thresholding algorithm for matrix completion. SIAM J Optim. 2010;20:1956–1982. [Google Scholar]

[R3] 3.Candes E, Li X, Ma Y, Wright J. Robust principal component analysis? J ACM. 2011:1–37. [Google Scholar]

[R4] 4.Candes E, Fernandez-Granda C. Towards a mathematical theory of super-resolution. Comm Pure Appl Math. 2014;67(6):906–956. [Google Scholar]

[R5] 5.Candes E, Plan Y. Matrix completion with noise. Proc IEEE. 2009;98:925–936. [Google Scholar]

[R6] 6.Candes E, Recht B. Exact matrix completion via convex optimization. Found Comput Math. 2009;9:717–772. [Google Scholar]

[R7] 7.Candes E, Tao T. The power of convex relaxation: near-optimal matrix completion. IEEE Trans Inform Theory. 2010;56:2053–2080. [Google Scholar]

[R8] 8.Candès EJ, Romberg J, Tao T. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inform Theory. 2006;52:489–509. [Google Scholar]

[R9] 9.Chandrasekaran V, Recht B, Parrilo PA, Willsky AS. The convex geometry of linear inverse problems. Found Comput Math. 2012;12:805–849. [Google Scholar]

[R10] 10.Chen Y, Chi Y. Robust spectral compressed sensing via structured matrix completion. IEEE Trans Inform Theory. 2014;60:6576–6601. [Google Scholar]

[R11] 11.Chi Y, Scharf LL, Pezeshki A, Calderbank AR. Sensitivity to basis mismatch in compressed sensing. IEEE Trans Signal Process. 2011;59:2182–2195. [Google Scholar]

[R12] 12.Donoho DL. Compressed sensing. IEEE Trans Inform Theory. 2006;52:1289–1306. [Google Scholar]

[R13] 13.Fazel M, Fazel Maryam, Pong TK, Sun D, Tseng P. Hankel matrix rank minimization with applications to system identification and realization. SIAM J Matrix Anal Appl. 2013;34:946–977. [Google Scholar]

[R14] 14.Gordon Y. Geometric Aspects of Functional Analysis, 1986/87. Vol. 1317. Springer; Berlin: 1988. On Milman’s inequality and random subspaces which escape through a mesh in Rn; pp. 84–106. Lecture Notes in Math. [Google Scholar]

[R15] 15.Hua Y, Sarkar TK. Matrix pencil method for estimating parameters of exponentially damped/undamped sinusoids in noise. IEEE Trans Acoust Speech Signal Process. 1990;38:814–824. [Google Scholar]

[R16] 16.Lustig M, Donoho D, Pauly JM. Sparse MRI: the application of compressed sensing for rapid MR imaging. Magn Reson Med. 2007;58:1182–1195. doi: 10.1002/mrm.21391. [DOI] [PubMed] [Google Scholar]

[R17] 17.Oymak S, Thrampoulidis C, Hassibi B. The squared-error of generalized LASSO: a precise analysis. 51st Annual Allerton Conference on Communication, Control, and Computing, Allerton; IEEE; 2013. pp. 1002–1009. [Google Scholar]

[R18] 18.Thrampoulidis C, Oymak S, Hassibi B. Simple error bounds for regularized noisy linear inverse problems. IEEE International Symposium on Information Theory; IEEE; 2014. pp. 3007–3011. [Google Scholar]

[R19] 19.Qu X, Mayzel M, Cai JF, Chen Z, Orekhov V. Accelerated NMR spectroscopy with low-rank reconstruction. Angew Chem, Int Ed. 2015;54:852–854. doi: 10.1002/anie.201409291. [DOI] [PubMed] [Google Scholar]

[R20] 20.Recht B, Fazel M, Parrilo P. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 2010;52:471–501. [Google Scholar]

[R21] 21.Rockafellar RT. Princeton Landmarks in Mathematics. Princeton University Press; Princeton, NJ: 1997. Convex analysis. Reprint of the 1970 original, Princeton Paperbacks. [Google Scholar]

[R22] 22.Stojnic M. A framework to characterize performance of lasso algorithms. 2013 arXiv:1303.7291. [Google Scholar]

[R23] 23.Tang G, Bhaskar BN, Shah P, Recht B. Compressive sensing off the grid. IEEE Trans Inform Theory. 2013;59:7465–7490. [Google Scholar]

[R24] 24.Tao T. Topics in Random Matrix Theory. American Mathematical Society; Providence, Rhode Island: 2012. [Google Scholar]

[R25] 25.Tropp JA, Laska JN, Duarte MF, Romberg JK, Baraniuk RG. Beyond Nyquist: efficient sampling of sparse bandlimited signals. IEEE Trans Inform Theory. 2010;56:520–544. [Google Scholar]

[R26] 26.Rao Richard, Kailath Thomas. ESPRIT-estimation of signal parameters via rotational invariance techniques. IEEE Trans Acoust Speech Signal Process. 1989;37(7):984–995. [Google Scholar]

[R27] 27.Liao W, Fannjiang A. MUSIC for single-snapshot spectral estimation: stability and super-resolution. arXiv:1404.1484. [Google Scholar]

[R28] 28.Moitra A. Super-resolution, extremal functions and the condition number of Vandermonde matrices. Proceedings of the 47th Annual ACM Symposium on Theory of Computing; STOC; 2015. [Google Scholar]

PERMALINK

Robust recovery of complex exponential signals from random Gaussian projections via low rank Hankel matrix reconstruction

Jian-Feng Cai

Xiaobo Qu

Weiyu Xu

Gui-Bo Ye

Abstract

1. Introduction

2. Model and main results

Theorem 1

2.1. Hankel matrix completion for recovering off-the-grid frequencies

3. Proof of Theorem 1

3.1. Orthonormal basis of the N × N Hankel matrices subspace

3.2. Recovery condition based on restricted minimum gain condition

Lemma 1

Proof

3.3. Bound of minimum gain via Gaussian width

Definition 1

Theorem 2. (See Corollary 1.2 in [14].)

Lemma 2

Proof

3.4. Estimation of Gaussian width w(Tℝ(y^)∩Sℝ4N-3)

Lemma 3

Proof

Lemma 4

Proof

3.5. Bound of E(||𝒢g||2)

Theorem 3

Lemma 5

Proof

Proof of Theorem 3

3.6. Proof of Theorem 1

Proof of Theorem 1

4. Extension to structured low-rank matrix reconstruction

Corollary 1 (Low-rank Hankel matrix reconstruction)

5. Numerical experiments

Fig. 1.

6. Conclusion and future works

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3.4. Estimation of Gaussian width $w (T_{ℝ} (\hat{y}) \cap S_{ℝ}^{4 N - 3})$

3.5. Bound of E(||𝒢g||₂)