ASYMMETRY HELPS: EIGENVALUE AND EIGENVECTOR ANALYSES OF ASYMMETRICALLY PERTURBED LOW-RANK MATRICES

Yuxin Chen; Chen Cheng; Jianqing Fan

doi:10.1214/20-aos1963

. Author manuscript; available in PMC: 2021 Jul 23.

Published in final edited form as: Ann Stat. 2021 Jan 29;49(1):435–458. doi: 10.1214/20-aos1963

ASYMMETRY HELPS: EIGENVALUE AND EIGENVECTOR ANALYSES OF ASYMMETRICALLY PERTURBED LOW-RANK MATRICES

Yuxin Chen ^†, Chen Cheng ^‡, Jianqing Fan ^†

PMCID: PMC8300484 NIHMSID: NIHMS1639565 PMID: 34305194

Abstract

This paper is concerned with the interplay between statistical asymmetry and spectral methods. Suppose we are interested in estimating a rank-1 and symmetric matrix $M^{⋆} \in ℝ^{n \times n}$ , yet only a randomly perturbed version M is observed. The noise matrix M − M^⋆ is composed of independent (but not necessarily homoscedastic) entries and is, therefore, not symmetric in general. This might arise if, for example, we have two independent samples for each entry of M^⋆ and arrange them in an asymmetric fashion. The aim is to estimate the leading eigenvalue and the leading eigenvector of M^⋆.

We demonstrate that the leading eigenvalue of the data matrix M can be $O (\sqrt{n})$ times more accurate (up to some log factor) than its (unadjusted) leading singular value of M in eigenvalue estimation. Moreover, the eigen-decomposition approach is fully adaptive to heteroscedasticity of noise, without the need of any prior knowledge about the noise distributions. In a nutshell, this curious phenomenon arises since the statistical asymmetry automatically mitigates the bias of the eigenvalue approach, thus eliminating the need of careful bias correction. Additionally, we develop appealing non-asymptotic eigenvector perturbation bounds; in particular, we are able to bound the perterbation of any linear function of the leading eigenvector of M (e.g. entrywise eigenvector perturbation). We also provide partial theory for the more general rank-r case. The takeaway message is this: arranging the data samples in an asymmetric manner and performing eigen-decomposition could sometimes be quite beneficial.

Keywords: asymmetric matrices, eigenvalue perturbation, entrywise eigenvector perturbation, linear form of eigenvectors, heteroscedasticity

1. Introduction.

Consider an unknown symmetric and low-rank matrix $M^{⋆} \in ℝ^{n \times n}$ . What we have observed is a corrupted version

M = M^{⋆} + H,

(1)

with H denoting a noise matrix. A classical problem is concerned with estimating the leading eigenvalues and eigenspace of M^⋆ given observation M.

The current paper concentrates on a scenario where the noise matrix H (and hence M) consists of independently generated random entries and is hence asymmetric in general. This might arise, for example, when we have available multiple (e.g. two) samples for each entry of M^⋆ and arrange the samples in an asymmetric fashion. A natural approach that immediately comes to mind is based on singular value decomposition (SVD), which employs the leading singular values (resp. subspace) of M to approximately estimate the eigenvalues (resp. eigenspace) of M^⋆. By contrast, a much less popular alternative is based on eigen-decomposition of the asymmetric data matrix M, which attempts approximation using the leading eigenvalues and eigenspace of M. Given that eigen-decomposition of an asymmetric matrix is in general not as numerically stable as SVD, conventional wisdom often favors the SVD-based approach, unless certain symmetrization step is implemented prior to eigen-decomposition.

When comparing these two approaches numerically, however, a curious phenomenon arises, which largely motivates the research in this paper. Let us generate M^⋆ as a random rank-1 matrix with leading eigenvalue λ^⋆ = 1, and let H be a Gaussian random matrix whose entries are i.i.d. $N (0, σ^{2})$ with $σ = 1 / \sqrt{n log n}$ . Fig. 1(a) compares the empirical accuracy of estimating the 1st eigenvalue of M^⋆ via the leading eigenvalue (the blue line) and via the leading singular value of M (the red line). As it turns out, eigen-decomposition significantly outperforms vanilla SVD in estimating λ^⋆, and the advantage seems increasingly more remarkable as the dimensionality n grows. To facilitate comparison, we include an additional green line in Fig. 1(a), obtained by rescaling the red line by $2.5 / \sqrt{n}$ . Interestingly, this green line coincides almost perfectly with the blue line, thus suggesting orderwise gain of eigen-decomposition compared to SVD. What is more, this phenomenon does not merely happen under i.i.d. noise. Similar numerical behaviors are observed in the problem of matrix completion — as displayed in Fig. 1(b) — even though the components of the equivalent perturbation matrix are apparently far from identically distributed or homoscedastic.

Fig 1: — Numerical error |λ − λ^⋆| vs. the matrix dimension n, where λ is either the leading eigenvalue (the blue line) or the leading singular value (the red line) of M. Here, (a) is the case when {H_ij} are i.i.d. $N (0, σ^{2})$ with $σ = 1 / \sqrt{n log n}$ , and (b) is the matrix completion case with sampling rate p = 3 log n/n, where $M_{i, j} = \frac{1}{p} M_{i, j}^{⋆}$ independently with probability p and 0 otherwise. The results are averaged over 100 independent trials. The green lines are obtained by rescaling the corresponding red lines by $2.5 / \sqrt{n}$ .

The goal of the current paper is thus to develop a systematic understanding of this phenomenon, that is, why statistical asymmetry empowers eigen-decomposition and how to exploit this feature in statistical estimation. Informally, our findings suggest that: when M^⋆ is rank-1 and H is composed of zero-mean and independent (but not necessarily identically distributed or homoscedastic) entries,

the leading eigenvalue of M could be $O (\sqrt{n})$ times (up to some logarithmic factor) more accurate than the (unadjusted) leading singular value of M when estimating the 1st eigenvalue of M^⋆;¹
the perturbation of the leading eigenvector is well-controlled along an arbitrary deterministic direction; for example, the eigenvector perturbation is well-controlled in any coordinate, indicating that the eigenvector estimation error is spread out across all coordinates.

We will further provide partial theory to accommodate the rank-r case. As an important application, such a theory allows us to estimate the leading singular value and singular vectors of an asymmetric rank-1 matrix via eigen-decomposition of a certain dilation matrix, which also outperforms the vanilla SVD approach.

We would like to immediately remark that: for some scenarios (e.g. the case with i.i.d. Gaussian noise), it is possible to adjust the leading singular value of M to obtain the same accuracy as the leading eigenvalue of M. As it turns out, the advantages of the eigen-decomposition approach may become more evident in the presence of heteroscedasticity — the case where the noise has location-varying and unknown variance. We shall elaborate on this point in Section 4.1.2.

All in all, when it comes to low-rank matrix estimation, arranging the observed matrix samples in an asymmetric manner and invoking eigen-decomposition properly could sometimes be statistically beneficial.

2. Problem formulation.

2.1. Models and assumptions.

In this section, we formally introduce our models and assumptions. Consider a symmetric and low-rank matrix $M^{⋆} = {[M_{i j}^{⋆}]}_{1 \leq i, j \leq n} \in ℝ^{n \times n}$ . Suppose we are given a random copy of M^⋆ as follows

M = M^{⋆} + H,

(2)

where H = [H_ij]_1≤i,j≤n is a random noise matrix.

The current paper concentrates on independent — but not necessarily identically distributed or homoscedastic — noise. Specifically, we impose the following assumptions on H throughout this paper.

Assumption 1.

(Independent entries) The entries {H_ij}1_≤i,j≤n are independently generated;
(Zero mean) $E [H_{i j}] = 0$ for all 1 ≤ i, j ≤ n;
(Variance) $Var (H_{i j}) = E [H_{i j}^{2}] \leq σ_{n}^{2}$ for all 1 ≤ i, j ≤ n;
(Magnitude) Each H_ij (1 ≤ i, j ≤ n) satisfies either of the following conditions:
1. |H_ij| ≤ B_n;
2. H_ij has a symmetric distribution obeying $ℙ {| H_{i j} | > B_{n}} \leq c_{b} n^{- 12}$ for some universal constant c_b > 0.

Remark 1 (Notational convention).

In what follows, the dependency of σ_n and B_n on n shall often be suppressed whenever it is clear from the context, so as to simplify notation.

Note that we do not enforce the constraint H_ij = H_ji, and hence H and M are in general asymmetric matrices. Also, Condition 3 does not require the H_ij’s to have equal variance across different locations; in fact, they can be heteroscedastic. In addition, while Condition 4(a) covers the class of bounded random variables, Condition 4(b) allows us to accommodate a large family of heavy-tailed distributions (e.g. sub-exponential distributions). An immediate consequence of Assumption 1 is the following bound on the spectral norm ∥H∥ of H.

Lemma 1.

Under Assumption 1, there exist some universal constants c₀, C₀ > 0 such that with probability exceeding 1 − C₀n⁻¹⁰,

‖ H ‖ \leq c_{0} σ \sqrt{n log n} + c_{0} B log n .

(3)

Proof. This is a standard non-asymptotic result that follows immediately from the matrix Bernstein inequality Tropp (2015) and the union bound (for Assumption 4(b)). We omit the details for conciseness. □

2.2. Our goal.

The aim is to develop non-asymptotic eigenvalue and eigenvector perturbation bounds under this family of random and asymmetric noise matrices. Our theoretical development is divided into two parts. Below we introduce our goal as well as some notation used throughout.

Rank-1 symmetric case.. For the rank-1 case, we assume the eigen-decomposition of M^⋆ to be

M^{⋆} = λ^{⋆} u^{⋆} u^{⋆ ⊤}

(4)

with λ^⋆ and u^⋆ being its leading eigenvalue and eigenvector, respectively. We also denote by λ and u the leading eigenvalue and eigenvector of M, respectively. The following quantities are the focal points of this paper (see Section 4):

Eigenvalue perturbation: |λ − λ^⋆|;
Perturbation of linear forms of eigenvectors: min{|a^⊤(u − u^⋆)|, |a^⊤(u + u^⋆)|} for any fixed unit vector $min {| a^{⊤} (u - u^{⋆}) |, | a^{⊤} (u + u^{⋆}) |}$ ;
Entrywise eigenvector perturbation: min{∥u − u^⋆∥_∞, ∥u + u^⋆∥_∞}.

Rank-r symmetric case.. For the general rank-r case, we let the eigen-decomposition of M^⋆ be

M^{⋆} = U^{⋆} Σ^{⋆} U^{⋆ ⊤},

(5)

where the columns of $U^{⋆} = [u_{1}^{⋆}, \dots, u_{r}^{⋆}] \in ℝ^{n \times r}$ are the eigenvectors, and $Σ^{⋆} = diag (λ_{1}^{⋆}, \dots, λ_{r}^{⋆}) \in ℝ^{r \times r}$ is a diagonal matrix with the eigenvalues arranged in descending order by their magnitude, i.e. $|λ_{1}^{⋆}| \geq \dots \geq |λ_{r}^{⋆}|$ . We let $λ_{max}^{*} = |λ_{1}^{⋆}|$ and $λ_{min}^{*} = |λ_{r}^{⋆}|$ . In addition, we let the top-r eigenvalues (in magnitude) of M be λ₁, ⋯, λ_r (obeying |λ₁| ≥ ⋯ ≥ |λ_r|) and their corresponding normalized eigenvectors be u₁, ⋯, u_r. We will present partial eigenvalue perturbation results for this more general case, as detailed in Section 5.

As is well-known, eigen-decomposition can be applied to estimate the singular values and singular vectors of an asymmetric matrix M^⋆ via the standard dilation trick Tropp (2015). As a consequence, our results are also applicable for singular value and singular vector estimation. See Section 5.2 for details.

2.3. Incoherence conditions.

Finally, we single out an incoherence parameter that plays an important role in our theory, which captures how well the energy of the eigenvectors is spread out across all entries.

Definition 1 (Incoherence parameter).

The incoherence parameter of a rank-r symmetric matrix M^⋆ with eigen-decomposition M^⋆ = U^⋆Σ^⋆U^⋆⊤ is defined to be the smallest quantity μ obeying

{‖ U^{⋆} ‖}_{\infty} \leq \sqrt{\frac{μ}{n}},

(6)

where ∥·∥_∞ denotes the entrywise ℓ_∞ norm.

Remark 2.

An alternative definition of the incoherence parameter (Candès and Recht, 2009; Keshavan, Montanari and Oh, 2010; Chi, Lu and Chen, 2019; Chen et al., 2019a) is the smallest quantity μ₀ satisfying $\sqrt{μ_{0} r / n}$ . This is a weaker assumption than Definition 1, as it only requires the energy of U^⋆ to be spread out across all of its rows rather than all of its entries. Note that these two incoherent parameters are consistent in the rank-1 case; in the rank-r case one has μ₀ ≤ μ ≤ μ₀r.

2.4. Notation.

The standard basis vectors in $ℝ^{n}$ are denoted by e₁, ⋯, e_n. For any vector z, we let ∥z∥₂ and ∥z∥_∞ denote the ℓ₂ norm and the ℓ_∞ norm of z, respectively. For any matrix M, denote by ∥M∥, ∥M∥_F and ∥M∥_∞ the spectral norm, the Frobenius norm and the entrywise ℓ_∞ norm (the largest magnitude of all entries) of M, respectively. Let [n] := {1, ⋯, n}. In addition, the notation f(n) = O (g(n)) or f(n) ≲ g(n) means that there is a constant c > 0 such that |f(n)| ≤ c|g(n)|, f(n) ≳ g(n) means that there is a constant c > 0 such that |f(n)| ≥ c|g(n)|, and f(n) ≍ g(n) means that there exist constants c₁, c₂ > 0 such that c₁|g(n)| ≤ |f(n)| ≤ c₂|g(n)|.

3. Preliminaries.

Before continuing, we gather several preliminary facts that will be useful throughout. The readers familiar with matrix perturbation theory may proceed directly to the main theoretical development in Section 4.

3.1. Perturbation of eigenvalues of asymmetric matrices.

We begin with a standard result concerning eigenvalue perturbation of a diagonalizable matrix Bauer and Fike (1960). Note that the matrices under study might be asymmetric.

Theorem 1 (Bauer-Fike Theorem).

Consider a diagonalizable matrix $A \in ℝ^{n \times n}$ with eigen-decomposition A = VΛV⁻¹, where $V \in ℂ^{n \times n}$ is a non-singular eigenvector matrix and Λ is diagonal. Let $\tilde{λ}$ be an eigenvalue of A + H. Then there exists an eigenvalue λ of A such that

|λ - \tilde{λ}| \leq ‖ V ‖ ‖V^{- 1}‖ ‖ H ‖ .

(7)

In addition, if A is symmetric, then there exists an eigenvalue λ of A such that

| λ - \tilde{λ} | \leq ‖ H ‖ .

(8)

However, caution needs to be exercised as the Bauer-Fike Theorem does not specify which eigenvalue of A is close to an eigenvalue of A + H. Encouragingly, in the low-rank case of interest, the Bauer-Fike Theorem together with certain continuity of the spectrum allows one to localize the leading eigenvalues of the perturbed matrix.

Lemma 2.

Suppose M^⋆ is a rank-r symmetric matrix whose top-r eigenvalues obey $| λ_{1}^{⋆} | \geq \dots \geq | λ_{r}^{⋆} | > 0$ . If $‖ H ‖ < | λ_{r}^{⋆} | / 2$ , then the top-r eigenvalues λ₁, ⋯, λ_r of M = M^⋆ + H, sorted by modulus, obey that: for any 1 ≤ l ≤ r,

| λ_{l} - λ_{j}^{⋆} | \leq ‖ H ‖ for some 1 \leq j \leq r .

(9)

In addition, if r = 1, then both the leading eigenvalue and the leading eigenvector of M are real-valued.

This result, which we establish in Appendix 1.1, parallels Weyl’s inequality for symmetric matrices. Note, however, that the above bound (9) might be quite loose for specific settings. We will establish much sharper perturbation bounds when H contains independent random entries (see, e.g. Corollary 1).

3.2. The Neumann trick and eigenvector perturbation.

Next, we introduce a classical result dubbed as the “Neumann trick” Eldridge, Belkin and Wang (2018). This theorem, which is derived based on the Neumann series for a matrix inverse, has been applied to analyze eigenvectors in various settings Erdős et al. (2013); Jain and Netrapalli (2015); Eldridge, Belkin and Wang (2018).

Theorem 2 (Neumann trick).

Consider the matrices M^⋆ and M (see (5) and (2)). Suppose ∥H∥ < |λ_l| for some 1 ≤ l ≤ n. Then

u_{l} = \sum_{j = 1}^{r} \frac{λ_{j}^{⋆}}{λ_{l}} (u_{j}^{⋆ ⊤} u_{l}) {\sum_{s = 0}^{\infty} \frac{1}{λ_{l}^{s}} H^{s} u_{j}^{⋆}} .

(10)

Proof. We supply the proof in Appendix 1.2 for self-containedness. □

Remark 3.

In particular, if M^⋆ is a rank-1 matrix and ∥H∥ < |λ₁|, then

u_{1} = \frac{λ_{1}^{⋆}}{λ_{1}} (u_{1}^{⋆ ⊤} u_{1}) {\sum_{s = 0}^{\infty} \frac{1}{λ_{1}^{s}} H^{s} u_{1}^{⋆}} .

(11)

An immediate consequence of the Neumann trick is the following lemma, which asserts that each of the top-r eigenvectors of M resides almost within the top-r eigen-subspace of M^⋆, provided that ∥H∥ is sufficiently small. The proof is deferred to Appendix 1.3.

Lemma 3.

Suppose M^⋆ is a rank-r symmetric matrix with r non-zero eigenvalues obeying $1 = λ_{max}^{*} = |λ_{1}^{⋆}| \geq \dots \geq |λ_{r}^{⋆}| = λ_{min}^{*} > 0$ and associated eigenvectors $κ ≜ λ_{max}^{*} / λ_{min}^{*}$ . If ∥H∥ ≤ 1/(4κ), then the top-r eigenvectors u₁, ⋯, u_r of M = M^⋆ + H obey

\sum_{j = 1}^{r} {| u_{j}^{⋆ ⊤} u_{l} |}^{2} \geq 1 - \frac{64 κ^{4}}{9} ‖ H ‖^{2}, 1 \leq l \leq r .

(12)

In addition, if r = 1, then one further has

min {{‖ u_{1} - u_{1}^{⋆} ‖}_{2}, {‖ u_{1} + u_{1}^{⋆} ‖}_{2}} \leq \frac{8 \sqrt{2}}{3} ‖ H ‖ .

(13)

4. Perturbation analysis for the rank-1 case.

4.1. Main results: the rank-1 case.

This section presents perturbation analysis results when the truth M^⋆ is a symmetric rank-1 matrix. We shall start by presenting a master bound which, as we will see, immediately leads to our main findings.

4.1.1. A master bound.

Our master bound is concerned with the perturbation of linear forms of eigenvectors, as stated below.

Theorem 3 (Perturbation of linear forms of eigenvectors (rank-1)).

Consider a rank-1 symmetric matrix $M^{⋆} = λ^{⋆} u^{⋆} u^{⋆ ⊤} \in ℝ^{n \times n}$ with incoherence parameter μ (cf. Definition 1). Suppose the noise matrix H obeys Assumption 1, and assume the existence of some sufficiently small constant c₁ > 0 such that

max {σ \sqrt{n log n}, B log n} \leq c_{1} | λ^{⋆} | .

(14)

Then for any fixed vector $a \in ℝ^{n}$ with ∥a∥₂ = 1, probability at least 1 − O(n⁻¹⁰) one has

| a^{⊤} (u - \frac{u^{⋆ ⊤} u}{λ / λ^{⋆}} u^{⋆}) | ≲ \frac{max {σ \sqrt{n log n}, B log n}}{| λ^{⋆} |} \sqrt{\frac{μ}{n}} .

(15)

Remark 4 (The noise size).

We would like to remark on the range of the noise size covered by our theory. If the incoherence parameter of the truth M^⋆ obeys μ ≍ 1, then even the magnitude of the largest entry of M^⋆ cannot exceed the order of |λ^⋆|/n. One can thus interpret the condition (14) in this case as

σ ≲ \sqrt{\frac{n}{log n}} {‖ M^{⋆} ‖}_{\infty} and B ≲ \frac{n}{log n} {‖ M^{⋆} ‖}_{\infty} .

In other words, the standard deviation σ of each noise component is allowed to be substantially larger (i.e. $\sqrt{n / log n}$ times larger) than the magnitude of any of the true entries. In fact, this condition (14) matches, up to some log factor, the one required for spectral methods to perform noticeably better than random guessing.

In words, Theorem 3 tells us that: the quantity $\frac{u^{⋆ ⊤} u}{λ / λ^{⋆}} a^{⊤} u^{⋆}$ serves as a remarkably accurate approximation of the linear form a^⊤u. In particular, the approximation error is at most $O (1 / \sqrt{n})$ under the condition (14) for incoherent matrices. Encouragingly, this approximation accuracy holds true for an arbitrary deterministic direction (reflected by a). As a consequence, one can roughly interpret Theorem 3 as

u \approx \frac{λ^{⋆}}{λ} u^{⋆} u^{⋆ ⊤} u = \frac{1}{λ} M^{⋆} u,

(16)

where such an approximation is fairly accurate along any fixed direction. Compared with the identity $u = \frac{1}{λ} M u = \frac{1}{λ} (M^{⋆} + H) u$ , our results imply that Hu is exceedingly small along any fixed direction, even though H and u are highly dependent. As we shall explain in Section 4.3, this observation usually cannot happen when H is a symmetric random matrix or when one uses the leading singular vector instead, due to the significant bias resulting from symmetry.

This master theorem has several interesting implications, as we shall elucidate momentarily.

4.1.2. Eigenvalue perturbation.

To begin with, Theorem 3 immediately yields a much sharper non-asymptotic perturbation bound regarding the leading eigenvalue λ of M.

Corollary 1.

Under the assumptions of Theorem 3, with probability at least 1 − O(n⁻¹⁰) we have

| λ - λ^{⋆} | ≲ max {σ \sqrt{n log n}, B log n} \sqrt{\frac{μ}{n}} .

(17)

Proof. Without loss of generality, assume that λ^⋆ = 1. Taking a = u^⋆ in Theorem 3, we get

| u^{⋆ ⊤} u | \frac{| λ - 1 |}{λ} = | u^{⋆ ⊤} u - u^{⋆ ⊤} u^{⋆} \frac{u^{⋆ ⊤} u}{λ} | ≲ max {σ \sqrt{n log n}, B log n} \sqrt{\frac{μ}{n}} .

(18)

From Lemma 1 and the condition (14), we know ∥H∥ < 1/4, which combines with Lemma 2 and Lemma 3 yields λ ≍ |u^⋆⊤u| ≍ 1. Substitution into (18) yields

\begin{array}{l} |λ - 1| & ≲ |\frac{λ}{u^{⋆ ⊤} u}| max {σ \sqrt{n log n}, B log n} \sqrt{\frac{μ}{n}} \\ ≲ max {σ \sqrt{n log n}, B log n} \sqrt{\frac{μ}{n}} . \end{array}

For the vast majority of applications we encounter, the maximum possible noise magnitude B (cf. Assumption 1) obeys $B ≲ σ \sqrt{n / log n}$ , in which case the bound in Corollary 1 simplifies to

| λ - λ^{⋆} | ≲ σ \sqrt{μ log n} .

(19)

This means that the eigenvalue estimation error is not much larger than the variability of each noise component. In addition, we remind the reader that for a fairly broad class of noise (see Remark 4), the leading eigenvalue λ of M is guaranteed to be real-valued, an observation that has been made in Lemma 2. In practice, however, one might still encounter some scenarios where λ is complex-valued. As a result, we recommend the practitioner to use the real part of λ as the eigenvalue estimate, which clearly enjoys the same statistical guarantee as in Corollary 1.

Comparison to the vanilla SVD-based approach.. In order to facilitate comparison, we denote by λ_svd the largest singular value of M, and look at |λ_svd − λ^⋆|. Combining Weyl’s inequality, Lemma 1 and the condition (14), we arrive at

| λ_{svd} - λ^{⋆} | \leq ‖ H ‖ ≲ max {σ \sqrt{n log n}, B log n} .

(20)

When μ ≍ 1, this error bound w.r.t. this (unadjusted) singular value could be $\sqrt{n}$ times larger than the perturbation bound (17) derived for the leading eigenvalue. This corroborates our motivating experiments in Fig. 1.

Comparison to vanilla eigen-decomposition after symmetrization.. The reader might naturally wonder what would happen if we symmetrize the data matrix before performing eigen-decomposition. Consider, for example, the i.i.d. Gaussian noise case where $H_{i j} \overset{i.i . d.}{~} N (0, σ^{2})$ , and assume λ^⋆ > 0 for simplicity. The leading eigenvalue λ_sym of the symmetrized matrix (M + M^⊤)/2 has been extensively studied in the literature Füredi and Komlós (1981); Yin, Bai and Krishnaiah (1988); Péché (2006); Féral and Péché (2007); Benaych-Georges and Nadakuditi (2011); Renfrew and Soshnikov (2013); Knowles and Yin (2013). In particular, it has been shown (e.g. Capitaine, Donati-Martin and Féral (2009)) that, with probability approaching one,

λ_{sym} = λ^{⋆} + \frac{n σ^{2}}{2 λ^{⋆}} + O (σ \sqrt{log n}) .

(21)

If $σ = | λ^{⋆} | \sqrt{1 / (n log n)}$ (which is the setting in our numerical experiment), then this can be translated into

\frac{λ_{sym} - λ^{⋆}}{λ^{⋆}} = \frac{1}{2 log n} + O (\frac{1}{\sqrt{n}}) .

This implies that λ_sym suffers from a substantially larger bias than the leading eigenvalue λ obtained without symmetrization, since in this case we have (cf. Corollary 1)

| \frac{λ - λ^{⋆}}{λ^{⋆}} | ≲ \frac{1}{\sqrt{n}} .

(22)

Comparison to properly adjusted eigen-decomposition and SVD-based methods.. Armed with the approximation (21) in the i.i.d. Gaussian noise case, the careful reader might naturally suggest a properly corrected estimate λ_sym,c as follows (again assuming λ > 0)

λ_{sym, c} = \frac{1}{2} (λ_{sym} + \sqrt{λ_{sym}^{2} - 2 n σ^{2}}),

(23)

which is a shrinkage-type estimate chosen to satisfy $λ_{sym} = λ_{sym, c} + \frac{n σ^{2}}{2 λ_{sym, c}}$ . A little algebra reveals that: if $σ = 1 / \sqrt{n log n}$ , then

| \frac{λ_{sym, c} - λ^{⋆}}{λ^{⋆}} | ≲ \frac{1}{\sqrt{n}},

thus matching the estimation accuracy of λ (cf. (22)). In addition, some sort of universality results has been established as well in the literature (Capitaine, Donati-Martin and Féral, 2009), implying that the same approximation and correction are applicable to a broad family of zero-mean noise with identical variance. As we shall illustrate numerically in Section 4.2, this approach (i.e. λ_sym,c) performs almost identically to the one using vanilla eigen-decomposition without symmetrization. In addition, very similar observations have been made for the SVD-based approach (Silverstein, 1994; Yin, Bai and Krishnaiah, 1988; Péché, 2006; Féral and Péché, 2007; Benaych-Georges and Nadakuditi, 2012; Bryc and Silverstein, 2018); for the sake of brevity, we do not repeat the arguments here.

We would nevertheless like to single out a few statistical advantages of the eigen-decomposition approach without symmetrization. To begin with, λ is obtained via vanilla eigen-decomposition, and computing it does not rely on any kind of noise statistics. This is in stark contrast to the bias correction (23) in the presence of symmetric data, which requires prior knowledge about (or a very precise estimate of) the noise variance σ². Leaving out this prior knowledge matter, a more important issue is that the approximation formula (21) assumes identical variance of noise components across all entries (i.e. homoscedasticity). While an approximation of this kind has been found for more general cases beyond homoscedastic noise (e.g. Bryc and Silverstein (2018)), the approximation formula (e.g. (Bryc and Silverstein, 2018, Theorem 1.1)) becomes fairly complicated, requires prior knowledge about all variance parameters, and is thus difficult to implement in practice. In comparison, the vanilla eigen-decomposition approach analyzed in Corollary 1 imposes no restriction on the noise statistics and is fully adaptive to heteroscedastic noise.

Lower bounds.. To complete the picture, we provide a simple information-theoretic lower bound for the i.i.d. Gaussian noise case, which will be established in Appendix 2.

Lemma 4.

Fix any small constant ε > 0. Suppose that $H_{i j} \overset{i . i . d .}{~} N (0, σ^{2})$ . Consider three matrices

M = λ^{⋆} u^{⋆} u^{⋆ ⊤} + H, \tilde{M} = (λ^{⋆} + Δ) u^{⋆} u^{⋆ ⊤} + H, \hat{M} = (λ^{⋆} - Δ) u^{⋆} u^{⋆ ⊤} + H

with ${‖ u^{⋆} ‖}_{2} = 1.$ if $Δ \leq σ \sqrt{({log}_{2} 1.5 - ε) log 2}$ , then no algorithm can distinguish M, $\tilde{M}$ and $\hat{M}$ with p_e ≤ ε, where p_e is the minimax probability of error for testing three hypotheses (namely, the ones claiming that the true eigenvalues are λ^⋆, λ^⋆ + Δ, and λ^⋆ − Δ, respectively).

In short, Lemma 4 asserts that one cannot possibly locate an eigenvalue to within a precision of Δ much better than σ, which reveals a fundamental limit that cannot be broken by any algorithm. In comparison, the vanilla eigen-decomposition method based on asymmetric data achieves an accuracy of $| λ - λ^{⋆} | ≲ σ \sqrt{log n}$ (cf. Corollary 1 and (19)) for the incoherent case, thus matching the information-theoretic lower bound up to some log factor. In fact, the extra $\sqrt{log n}$ factor arises simply because we are aiming for a high-probability guarantee.

4.1.3. Perturbation of linear forms of eigenvectors.

The master bound in Theorem 3 admits a more convenient form when controlling linear functions of the eigenvectors. The result is this:

Corollary 2.

Under the same setting of Theorem 3, with probability at least 1 − O(n⁻¹⁰) we have

min {| a^{⊤} (u - u^{⋆}) |, | a^{⊤} (u + u^{⋆}) |} ≲ (| a^{⊤} u^{⋆} | + \sqrt{\frac{μ}{n}}) \frac{max {σ \sqrt{n log n}, B log n}}{| λ^{⋆} |} .

(24)

Proof. Without loss of generality, assume that u^⋆⊤u ≥ 0 and that λ^⋆ = 1. Then one has

\begin{array}{l} |a^{⊤} (u - u^{⋆})| & \leq |a^{⊤} u - a^{⊤} u^{⋆} \frac{u^{⋆ ⊤} u}{λ}| + |a^{⊤} u^{⋆}| |\frac{u^{⋆ ⊤} u}{λ} - 1| \\ \leq max {σ \sqrt{n log n}, B log n} \sqrt{\frac{μ}{n}} + |a^{⊤} u^{⋆}| |\frac{u^{⋆ ⊤} u}{λ} - 1|, \end{array}

where the last inequality arises from Theorem 3 as well as the definition of μ. In addition, apply Lemma 2 and Lemma 3 to obtain

| \frac{u^{⋆ ⊤} u}{λ} - 1 | \leq \frac{u^{⋆ ⊤} u}{λ} | 1 - λ | + | u^{⋆ ⊤} u - 1 | ≲ ‖ H ‖ ≲ max {σ \sqrt{n log n}, B log n} .

Putting the above bounds together concludes the proof. □

The perturbation of linear forms of eigenvectors (or singular vectors) has not yet been well explored even for the symmetric case. One scenario that has been studied is linear forms of singular vectors under i.i.d. Gaussian noise (Koltchinskii and Xia, 2016; Xia, 2016). Our analysis — which is certainly different from Koltchinskii and Xia (2016) as our emphasis is eigen-decomposition — does not rely on the Gaussianality assumption, and accommodates a much broader class of random noise. Another work that has looked at linear forms of the leading singular vector is Ma et al. (2019) for phase retrieval and blind deconvolution, although the vector a therein is specific to the problems (i.e. the design vectors) and cannot be made general.

Remark 5.

The perturbation theory for linear forms of eigenvectors has been substantially extended in our follow-up work; the interested reader is referred to Cheng, Wei and Chen (2020) for details.

4.1.4. Entrywise eigenvector perturbation.

A straightforward consequence of Corollary 2 that is worth emphasizing is sharp entrywise control of the leading eigenvector as follows.

Corollary 3.

Under the same setting of Theorem 3, with probability at least 1 − O(n⁻⁹) we have

min {{‖ u - u^{⋆} ‖}_{\infty}, {‖ u + u^{⋆} ‖}_{\infty}} ≲ \frac{max {σ \sqrt{n log n}, B log n}}{| λ^{⋆} |} \sqrt{\frac{μ}{n}} .

(25)

Proof. Recognizing that ${‖ u - u^{⋆} ‖}_{\infty} = {max}_{i} | e_{i}^{⊤} u - e_{i}^{⊤} u^{⋆} |$ and recalling our assumption $| e_{i}^{⊤} u | \leq \sqrt{μ / n}$ , we can invoke Corollary 2 and the union bound to establish this entrywise bound. □

We note that: while the ℓ₂ perturbation (or sin Θ distance) of eigenvectors or singular vectors has been extensively studied Davis and Kahan (1970); Wedin (1972); Vu (2011); Wang (2015); O’Rourke, Vu and Wang (2013); Cai and Zhang (2018), the entrywise eigenvector behavior was much less explored. The prior literature contains only a few entrywise eigenvector perturbation analysis results for settings very different from ours, e.g. the i.i.d. random matrix case Vu and Wang (2015); O’Rourke, Vu and Wang (2016), the symmetric low-rank case Fan, Wang and Zhong (2018); Abbe et al. (2017); Eldridge, Belkin and Wang (2018), and the case with transition matrices for reversible Markov chains (Chen et al., 2019b). Our results add another instance to this body of works in providing entrywise eigenvector perburation bounds.

4.2. Applications.

We apply our main results to two concrete matrix estimation problems and examine the effectiveness of these bounds. As before, M^⋆ is a rank-1 matrix with incoherence parameter μ and leading eigenvalue λ^⋆.

Low-rank matrix estimation from Gaussian noise.

Suppose that H is composed of i.i.d. Gaussian random variables $N (0, σ^{2})$ .² If $σ ≲ \frac{1}{\sqrt{n log n}}$ , applying Corollaries 1–3 reveals that with high probability,

| λ - λ^{⋆} | ≲ σ \sqrt{μ log n}

(26a)

min {{‖ u - u^{⋆} ‖}_{\infty}, {‖ u + u^{⋆} ‖}_{\infty}} ≲ \frac{σ \sqrt{μ log n}}{| λ^{⋆} |}

(26b)

min {| a^{⊤} (u - u^{⋆}) |, | a^{⊤} (u + u^{⋆}) |} ≲ (| a^{⊤} u^{⋆} | + \sqrt{\frac{μ}{n}}) \frac{σ \sqrt{n log n}}{| λ^{⋆} |}

(26c)

for any fixed unit vector $a \in ℝ^{n}$ . We have conducted additional numerical experiments in Fig. 2, which confirm our findings. It is also worth noting that empirically, eigen-decomposition and SVD applied to M achieve nearly identical ℓ₂ and ℓ_∞ errors when estimating the leading eigenvector of M^⋆. In addition, we also include the numerical estimation error of the corrected eigenvalue λ_sym,c (cf. (23)) of the symmetrized matrix (M + M^⊤)/2. As can be seen from Fig. 2, vanilla eigen-decomposition without symmetrization performs nearly identically to the one with symmetrization and proper correction.

Low-rank matrix completion.

Suppose that M is generated using random partial entries of M^⋆ as follows

M_{i j} = {\begin{array}{l} \frac{1}{p} M_{i j}^{⋆}, & with probability p, \\ 0, & else, \end{array}

(27)

where p denotes the fraction of the entries of M^⋆ being revealed. It is straightforward to verify that H = M − M^⋆ is zero-mean and obeys $| H_{i j} | \leq \frac{μ}{n p} ≔ B$ and $Var (H_{i j}) \leq \frac{μ^{2}}{p n^{2}}$ . Consequently, if $p ≳ \frac{μ^{2} log n}{n}$ , then invoking Corollaries 1–3 yields

\frac{| λ - λ^{⋆} |}{| λ^{⋆} |} ≲ \frac{1}{\sqrt{n}} \sqrt{\frac{μ^{3} log n}{p n}}

(28a)

min {{‖ u - u^{⋆} ‖}_{\infty}, {‖ u + u^{⋆} ‖}_{\infty}} ≲ \frac{1}{\sqrt{n}} \sqrt{\frac{μ^{3} log n}{p n}}

(28b)

min {| a^{⊤} (u - u^{⋆}) |, | a^{⊤} (u + u^{⋆}) |} ≲ (| a^{⊤} u^{⋆} | + \sqrt{\frac{μ}{n}}) \sqrt{\frac{μ^{2} log n}{p n}}

(28c)

with high probability, where $a \in ℝ^{n}$ is any fixed unit vector. Additional numerical simulations have been carried out in Fig. 3 to verify these findings. Empirically, eigen-decomposition outperforms SVD in estimating both the leading eigenvalue and eigenvector of M^⋆.

Finally, we remark that all the above applications assume the availability of an asymmetric data matrix M. One might naturally wonder whether there is anything useful we can say if only a symmetric matrix M is available. While this is in general difficult, our theory does have direct implications for both matrix completion and the case with i.i.d. Gaussian noise in the presence of symmetric data matrices; that is, it is possible to first asymmetrize the data matrix followed by eigen-decomposition. The interested reader is referred to Appendix 10 for details.

4.3. Why asymmetry helps?.

We take a moment to develop some intuition underlying Theorem 3, focusing on the case with λ^⋆ = 1 for simplicity. The key ingredient is the Neumann trick stated in Theorem 2. Specifically, in the rank-1 case we can expand

u = \frac{1}{λ} (u^{⋆ ⊤} u) \sum_{s = 0}^{\infty} \frac{1}{λ^{s}} H^{s} u^{⋆} .

A little algebra yields

| a^{⊤} (u - \frac{u^{⋆ ⊤} u}{λ} u^{⋆}) | = | \frac{u^{⋆ ⊤} u}{λ} \sum_{s = 1}^{\infty} \frac{a^{⊤} H^{s} u^{⋆}}{λ^{s}} | ≲ \sum_{s = 1}^{\infty} | \frac{a^{⊤} H^{s} u^{⋆}}{λ^{s}} |,

(29)

where the last inequality holds since (i) |u^⋆⊤u| ≤ 1, and (ii) λ is real-valued and obeys λ ≈ 1 if ∥H∥ ≪ 1 (in view of Lemma 2). As a result, the perturbation can be well-controlled as long as |a^⊤H^su^⋆| is small for every s ≥ 1.

As it turns out, a^⊤H^su^⋆ might be much better controlled when H is random and asymmetric, in comparison to the case where H is random and symmetric. To illustrate this point, it is perhaps the easiest to inspect the second-order term.

Asymmetric case: when H is composed of independent zero-mean entries each with variance $σ_{n}^{2}$ , one has
$E [a^{⊤} H^{2} u^{⋆}] = a^{⊤} E [H^{2}] u^{⋆} = a^{⊤} (σ^{2} I) u^{⋆} = σ^{2} a^{⊤} u^{⋆} .$
Symmetric case: when H is symmetric and its upper trangular part consists of independent zero-mean entries with variance $σ_{n}^{2}$ , it holds that
$E [a^{⊤} H^{2} u^{⋆}] = a^{⊤} E [H^{2}] u^{⋆} = a^{⊤} (n σ^{2} I) u^{⋆} = n σ^{2} a^{⊤} u^{⋆} .$

In words, the term a^⊤H²u^⋆ in the symmetric case might have a significantly larger bias compared to the asymmetric case. This bias effect is substantial when a^⊤u^⋆ is large (e.g. when a = u^⋆), which plays a crucial role in determining the size of eigenvalue perturbation.

The vanilla SVD-based approach can be interpreted in a similar manner. Specifically, we recognize that the leading singular value (resp. left singular vector) can be computed via the leading eigenvalue (resp. eigenvector) of the symmetric matrix MM^⊤. Given that MM^⊤ − M^⋆M^⋆⊤ is also symmetric, the aforementioned bias issue arises as well. This explains why vanilla eigen-decomposition might have an advantage over vanilla SVD when dealing with asymmetric matrices.

Finally, we remark that the aforementioned bias issue becomes less severe as ∥H∥ decreases. For example, when ∥H∥ is exceedingly small, the only dominant term on the right-hand side of (29) is a^⊤Hu^⋆, with all higher-order terms being vanishingly small. In this case, $E [a^{⊤} H u^{⋆}] = 0$ for both symmetric and asymmetric zero-mean noise matrices. As a consequence, the advantage of eigen-decomposition becomes negligible when dealing with nearly-zero noise. This observation is also confirmed in the numerical experiments reported in Fig. 2(a) and Fig. 3(a), where the two approaches achieve similar eigenvalue estimation accuracy when σ → 0 (resp. p → 1) in matrix estimation under Gaussian noise (resp. matrix completion). In fact, the case with very small ∥H∥ has been studied in the literature O’Rourke, Vu and Wang (2013); Vu (2011); Eldridge, Belkin and Wang (2018). For example, it was shown in O’Rourke, Vu and Wang (2013) that when $‖ H ‖ ≲ \frac{1}{\sqrt{n}} M^{⋆}$ , the singular value perturbation is also $\sqrt{n}$ times smaller than the bound predicted by Weyl’s theorem; similar improvement can be observed w.r.t. eigenvalue perturbation when H is symmetric (cf. (Eldridge, Belkin and Wang, 2018, Theorem 6)). By contrast, our eigenvalue perturbation results achieve this gain even when ∥H∥ is nearly as large as ∥M^⋆∥ (up to some logarithmic factor).

4.4. Proof outline of Theorem 3.

This subsection outlines the main steps for establishing Theorem 3. To simplify presentation, we shall assume without loss of generality that

λ^{⋆} = 1.

(30)

Throughout this paper, all the proofs are provided for the case when Conditions 1–3, 4(a) in Assumption 1 are valid. Otherwise, if Condition 4(b) is valid, then we can invoke the union bound to show that

M = M^{⋆} + \tilde{H}

(31)

with probability exceeding 1 − O(n⁻¹⁰), where ${\tilde{H}}_{i j} ≜ H_{i j} 1_{{| H_{i j} | \leq B}}$ is the truncated noise and has magnitude bounded by B. Since H_ij has symmetric distribution, it is seen that $E [{\tilde{H}}_{i j}] = 0$ and $Var ({\tilde{H}}_{i j}) \leq σ^{2}$ , which coincides with the case obeying Conditions 1–3, 4(a) in Assumption 1.

As already mentioned in Section 4.3, everything boils down to controlling |a^⊤H^su^⋆| for s ≥ 1. This is accomplished via the following lemma.

Lemma 5 (Bounding higher-order terms).

Consider any fixed unit vector $a \in ℝ^{n}$ and any positive integers s, k satisfying Bsk ≤ 2 and nσ²sk ≤ 2. Under the assumptions of Theorem 3,

| E [{(a^{⊤} H^{s} u^{⋆})}^{k}] | \leq \frac{s k}{2} max {{(B s k)}^{s k}, {(2 n σ^{2} s k)}^{s k / 2}} {(\sqrt{\frac{μ}{n}})}^{k} .

(32)

Proof. The proof of Lemma 5 is combinatorial in nature, which we defer to Appendix 3. □

Remark 6.

A similar result in (Tao, 2013, Lemma 2.3) has studied the bilinear forms of the high order terms of an i.i.d. random matrix, with a few distinctions. First of all, Tao (2013) assumes that each entry of the noise matrix is i.i.d. and has finite fourth moment (if the noise variance is rescaled to be 1); these assumptions break in examples like matrix completion. Moreover, Tao (2013) focuses on the case with k = 2, and does not lead to high-probability bounds (which are crucial for, e.g. entrywise error control).

Using Markov’s inequality and the union bound, we can translate Lemma 5 into a high probability bound as follows.

Corollary 4.

Under the assumptions of Lemma 5, there exists some universal constant c₂ > 0 such that

| a^{⊤} H^{s} u^{⋆} | \leq {(c_{2} max {B log n, \sqrt{n σ^{2} log n}})}^{s} \sqrt{\frac{μ}{n}}, \forall s \leq 20 log n

with probability 1 − O(n⁻¹⁰).

Proof. See Appendix 6. □

In addition, in view of Lemma 1 and the condition (14), one has

‖ H ‖ ≲ max {B log n, \sqrt{n σ^{2} log n}} < 1 / 10

(33)

with probability 1 − O(n⁻¹⁰), which together with Lemma 2 implies λ ≥ 3∥H∥. This further leads to

\begin{array}{l} \sum_{s : s \geq 20 log n} {(\frac{‖ H ‖}{λ})}^{s} & \leq \frac{‖ H ‖}{λ} \sum_{s : s \geq 20 log n - 1} {(\frac{‖ H ‖}{λ})}^{s} \leq \frac{‖ H ‖}{λ} \sum_{s : s \geq 20 log n - 1} \frac{1}{3^{s}} \\ ≲ max \{B log n, \sqrt{n σ^{2} log n}\} \cdot n^{- 10} . \end{array}

Putting the above bounds together and using the fact that λ is real-valued and λ ≥ 1/2 (cf. Lemma 2), we have

\begin{array}{l} | a^{⊤} (u - \frac{u^{⋆ ⊤} u}{λ} u^{⋆}) | \\ = | \frac{u^{⋆ ⊤} u}{λ} \sum_{s = 1}^{+ \infty} \frac{a^{⊤} H^{s} u^{⋆}}{λ^{s}} | \\ ≲ \sum_{s = 1}^{20 log n} \frac{1}{λ^{s}} | a^{⊤} H^{s} u^{⋆} | + \sum_{s = 20 log n}^{+ \infty} {(\frac{‖ H ‖}{λ})}^{s} \\ \leq \sqrt{\frac{μ}{n}} \sum_{s = 1}^{20 log n} {(2 c_{2} max {B log n, \sqrt{n σ^{2} log n}})}^{s} + \frac{max {B log n, \sqrt{n σ^{2} log n}}}{n^{10}} \\ ≲ max {B log n, \sqrt{n σ^{2} log n}} \sqrt{\frac{μ}{n}}, \end{array}

as long as max ${B log n, \sqrt{n σ^{2} log n}}$ is sufficiently small. Here, the last line also uses the fact that μ ≥ 1 (and hence $\sqrt{μ / n} ≫ n^{- 10}$ ). This concludes the proof.

5. Extension: perturbation analysis for the rank-r case.

5.1. Eigenvalue perturbation for the rank-r case.

The eigenvalue perturbation analysis in Section 4 can be extended to accommodate the case where M^⋆ is symmetric and rank-r, as detailed in this section. As before, assume that the r non-zero eigenvalues of M^⋆ obey $λ_{{max}^{⋆}} = | λ_{1}^{⋆} | \geq \dots \geq | λ_{r}^{⋆} | = λ_{{min}^{⋆}}$ . Once again, we start with a master bound.

Theorem 4 (Perturbation of linear forms of eigenvectors (rank-r)).

Consider a rank-r symmetric matrix $M^{⋆} \in ℝ^{n \times n}$ with incoherence parameter μ. Define $κ ≜ λ_{max}^{*} / λ_{min}^{*}$ . Suppose that

\frac{max \{σ \sqrt{n log n}, B log n\}}{λ_{max}^{*}} \leq \frac{c_{1}}{κ}

(34)

for some sufficiently small constants c₁ > 0. Then for any fixed unit vector $a \in ℝ^{n}$ and any 1 ≤ l ≤ r, with probability at least 1 − O(n⁻¹⁰) one has

\begin{array}{l} (35) & a^{⊤} (u_{l} - \sum_{j = 1}^{r} \frac{λ_{j}^{⋆} u_{j}^{⋆ ⊤} u_{l}}{λ_{l}} u_{j}^{⋆}) ∣ & ≲ max {σ \sqrt{n log n}, B log n} \frac{κ}{|λ_{l}|} \sqrt{\frac{μ r}{n}} \\ (36) & ≲ \frac{max {σ \sqrt{n log n}, B log n}}{λ_{max}^{*}} κ^{2} \sqrt{\frac{μ r}{n}} . \end{array}

This result allows us to control the perturbation of the linear form of eigenvectors. The perturbation upper bound grows as either the rank r or the condition number κ increases.

One of the most important consequences of Theorem 4 is a refinement of the Bauer-Fike theorem concerning eigenvalue perturbations as follows.

Corollary 5.

Consider the lth (1 ≤ l ≤ r) eigenvalue λ_l of M. Under the assumptions of Theorem 4, with probability at least 1 − O(n⁻¹⁰), there exists 1 ≤ j ≤ r such that

| λ_{l} - λ_{j}^{⋆} | ≲ max {σ \sqrt{n log n}, B log n} κ r \sqrt{\frac{μ}{n}},

(37)

provided that

\frac{max \{σ \sqrt{n log n}, B log n\}}{λ_{max}^{*}} \leq c_{1} / κ^{2}

(38)

for some sufficiently small constant c₁ > 0.

Proof. See Appendix 7. □

In comparison, the Bauer-Fike theorem (Lemma 2) together with Lemma 1 gives a perturbation bound

| λ_{l} - λ_{j}^{⋆} | \leq ‖ H ‖ ≲ max {σ \sqrt{n log n}, B log n} for some 1 \leq j \leq r .

(39)

For the low-rank case where $r ≪ \sqrt{n}$ , the eigenvalue perturbation bound derived in Corollary 5 can be much sharper than the Bauer-Fike theorem.

Another result that comes from Theorem 4 is the following bound that concerns linear forms of the eigen-subspace.

Corollary 6.

Under the same setting of Theorem 4, with probability 1 − O(n⁻⁹) we have

{‖a^{⊤} U‖}_{2} ≲ κ \sqrt{r} {‖a^{⊤} U^{⋆}‖}_{2} + \frac{max \{σ \sqrt{n log n}, B log n\}}{λ_{max}^{*}} κ^{2} r \sqrt{\frac{μ}{n}} .

(40)

Proof. See Appendix 8. □

Consequently, by taking a = e_i (1 ≤ i ≤ n) in Corollary 6, we arrive at the following statement regarding the alternative definition of the incoherence of the eigenvector matrix U (see Remark 2).

Corollary 7.

Under the same setting of Theorem 4, with probability 1 − O(n⁻⁸) we have

‖ U ‖_{2, \infty} ≲ κ r \sqrt{\frac{μ}{n}} .

(41)

Proof. Given that $‖ U ‖_{2, \infty} = {max}_{1 \leq i \leq n} {‖ e_{i}^{⊤} U ‖}_{2}$ and recalling our assumption implies ${‖ U^{⋆} ‖}_{2, \infty} \leq \sqrt{μ r / n}$ , we can invoke Corollary 6 and the union bound to derive the advertised entrywise bounds. □

Remark 7.

The eigenvector matrix is often employed to form a reasonably good initial guess for several nonconvex statistical estimation problems Keshavan, Montanari and Oh (2010), and the above kind of incoherence property is crucial in guaranteeing fast convergence of the subsequent nonconvex iterative refinement procedures Ma et al. (2019).

Unfortunately, these results fall short of providing simple perturbation bounds for the eigenvectors; in other words, the above-mentioned bounds do not imply the size of the difference between U and U^⋆. The challenge arises in part due to the lack of orthonormality of the eigenvectors when dealing with asymmetric matrices. Analyzing the eigenspace perturbation for the general rank-r case will likely require new analysis techniques, which we leave for future work. There is, however, some special case in which we can develop eigenvector perturbation theory, as detailed in the next subsection.

Remark 8.

The theory for the rank-r case has recently been significantly improved; see our follow-up work Cheng, Wei and Chen (2020) for details.

5.2. Application: spectral estimation when M^⋆ is asymmetric and rank-1.

In some scenarios, the above general rank results allow us to improve spectral estimation when M^⋆ is asymmetric. Consider the case where $M^{⋆} = λ^{⋆} u^{⋆} v^{⋆ ⊤} \in ℝ^{n_{1} \times n_{2}}$ is an asymmetric rank-1 matrix with leading singular value λ^⋆. Suppose that we observe two independent noisy copies of M^⋆, namely,

M_{1} = M^{⋆} + H_{1}, M_{2} = M^{⋆} + H_{2},

(42)

where H₁ and H₂ are independent noise matrices. The goal is to estimate the singular value and singular vectors of M^⋆ from M₁ and M₂.

We attempt estimation via the standard dilation trick (e.g. Tao (2012)). This consists of embedding the matrices of interest within a larger block matrix

M_{d}^{⋆} ≜ [\begin{matrix} 0 & M^{⋆} \\ M^{⋆ ⊤} & 0 \end{matrix}], M_{d} ≜ [\begin{matrix} 0 & M_{1} \\ M_{2}^{⊤} & 0 \end{matrix}] .

(43)

Here, we place M₁ and M₂ in two different subblocks, in order to “asymmetrize” the dilation matrix. The rationale is that $M_{d}^{⋆}$ is a rank-2 symmetric matrix with exactly two nonzero eigenvalues

λ_{1} (M_{d}^{⋆}) = λ^{⋆} and λ_{2} (M_{d}^{⋆}) = - λ^{⋆},

whose corresponding eigenvectors are given by

\frac{1}{\sqrt{2}} (\begin{array}{l} u^{⋆} \\ v^{⋆} \end{array}) and \frac{1}{\sqrt{2}} (\begin{array}{l} u^{⋆} \\ - v^{⋆} \end{array}),

respectively. This motivates us to perform eigen-decomposition of M_d, and use the top-2 eigenvalues and eigenvectors to estimate λ^⋆, u^⋆ and v^⋆, respectively.

Eigenvalue perturbation analysis.

As an immediate consequence of Corollary 5, the two leading eigenvalues of M_d provide fairly accurate estimates of the leading singular value λ^⋆ of M^⋆, as stated below.

Corollary 8.

Assume $M^{⋆} \in ℝ^{n_{1} \times n_{2}}$ is a rank-1 matrix with leading singular value λ^⋆ and incoherence parameter μ. Define $n ≜ n_{1} + n_{2}$ . Suppose that $λ_{1}^{d} \geq λ_{2}^{d}$ are the two leading eigenvalues of M_d (cf. (43)), and that H₁ and H₂ are independent and satisfy Assumption 1. Then with probability at least 1 − O(n⁻¹⁰),

max {| λ_{1}^{d} - λ^{⋆} |, | λ_{2}^{d} + λ^{⋆} |} ≲ max {σ \sqrt{n log n}, B log n} \sqrt{\frac{μ}{n}},

(44)

provided that

\frac{max {σ \sqrt{n log n}, B log n}}{λ^{⋆}} \leq c_{1}

(45)

for some sufficiently small constant c₁ > 0.

Proof. To begin with, it follows from Corollary 5 that both $λ_{1}^{d}$ and $λ_{2}^{d}$ are close to either λ^⋆ or −λ^⋆. Repeating similar arguments as in the proof of Lemma 2 (which we omit here), we can immediately show the separation between these two eigenvalues, namely, $λ_{1}^{d}$ (resp. $λ_{2}^{d}$ ) is close to λ^⋆ (resp. −λ^⋆). □

Eigenvector perturbation analysis.

We then move on to studying the eigenvector perturbation bounds. Specifically, denote by $u_{1}^{d}$ and $u_{2}^{d}$ the eigenvectors of M_d associated with its two leading eigenvalues $λ_{1}^{d}$ and $λ_{2}^{d}$ , respectively. Without loss of generality, we assume that $λ_{1}^{d} \geq λ_{2}^{d}$ . If we write

u_{1}^{dilation} = (_{u_{1, 2}^{d}}^{u_{1, 1}^{d}}) with u_{1, 1}^{d} \in ℝ^{n_{1}}, u_{1, 2}^{d} \in ℝ^{n_{2}},

then we can employ $u_{1, 1}^{d}$ and $u_{1, 2}^{d}$ to estimate u^⋆ and v^⋆ after proper normalization, namely,

u ≜ \frac{u_{1, 1}^{d}}{{‖u_{1, 1}^{d}‖}_{2}}, v ≜ \frac{u_{1, 2}^{d}}{{‖u_{1, 2}^{d}‖}_{2}} .

(46)

The following theorem develops error bounds for both u and v, which we establish in Appendix 9. Here, we denote min∥x ± y∥₂ = min{∥x − y∥₂, ∥x + y∥₂}, and min∥x ± y∥_∞ = min{∥x − y∥_∞, ∥x + y∥_∞}.

Theorem 5.

Suppose $M^{⋆} = λ^{⋆} u^{⋆} v^{⋆ ⊤} \in ℝ^{n_{1} \times n_{2}}$ is a rank-1 matrix with leading singular value λ^⋆ and incoherence parameter μ, where ∥u^⋆∥₂ = ∥v^⋆∥₂ = 1. Define $n ≜ n_{1} + n_{2}$ , and fix any unit vectors $a \in ℝ^{n_{1}}$ and $b \in ℝ^{n_{2}}$ . Then with probability at least 1 − O(n⁻¹⁰), the estimates u and v (cf. (46)) obey

max {min {‖ u \pm u^{⋆} ‖}_{2}, min {‖ v \pm v^{⋆} ‖}_{2}} ≲ \frac{max {σ \sqrt{n log n}, B log n}}{λ^{⋆}},

(47a)

max {min {‖ u \pm u^{⋆} ‖}_{\infty}, min {‖ v \pm v^{⋆} ‖}_{\infty}} ≲ \frac{max {σ \sqrt{n log n}, B log n}}{λ^{⋆}} \sqrt{\frac{μ}{n}},

(47b)

min | a^{⊤} (u \pm u^{⋆}) | ≲ (| a^{⊤} u^{⋆} | + \sqrt{\frac{μ}{n}}) \frac{max {σ \sqrt{n log n}, B log n}}{λ^{⋆}},

(47c)

min | b^{⊤} (v \pm v^{⋆}) | ≲ (| b^{⊤} v^{⋆} | + \sqrt{\frac{μ}{n}}) \frac{max {σ \sqrt{n log n}, B log n}}{λ^{⋆}},

(47d)

provided that there exists some some sufficiently small constant c₁ > 0 such that

\frac{max {σ \sqrt{n log n}, B log n}}{λ^{⋆}} \leq c_{1} .

(48)

Similar to the symmetric rank-1 case, the estimation errors of the estimates u and v are well-controlled in any deterministic direction (e.g. the entrywise errors are well-controlled). This allows us to complete the theory for the case when M^⋆ is a real-valued and rank-1 matrix.

Further, we conduct numerical experiments for matrix completion when M^⋆ is a rank-1 and asymmetric matrix in Fig. 4. Here, we suppose that at most 1 sample is observed for each entry, and we estimate the singular value and singular vectors of M^⋆ via the above-mentioned dilation trick, coupled with the asymmetrization procedure discussed in Section 10. The numerical performance confirms that the proposed technique outperforms vanilla SVD in spectral estimation.

Finally, we remark that the asymptotic behavior of the eigenvalues of asymmetric random matrices has been extensively explored in the physics literature (e.g. (Sommers et al., 1988; Khoruzhenko, 1996; Brezin and Zee, 1998; Chalker and Mehlig, 1998; Feinberg and Zee, 1997; Lytova and Tikhomirov, 2018)). Their focus, however, has largely been to pin down the asymptotic density of the eigenvalues, similar to the semi-circle law in the symmetric case. Nevertheless, a sharp perturbation bound for the leading eigenvalue — particularly for the low-rank case — is beyond their reach. A few recent papers began to explore the locations of eigenvalue outliers that fall outside the bulk predicted by the circular law Tao (2013); Rajagopalan (2015); Benaych-Georges and Rochet (2016); Bordenave and Capitaine (2016). The results reported therein either do not focus on obtaining the right convergence rate (e.g. providing only a bound like |λ − λ^⋆| = o(|λ^⋆|)) or are restricted to a special family of ground truth (e.g. the one with a diagonal block equal to identity) or i.i.d. noise. As a result, these prior results are insufficient to demonstrate the power and benefits of the eigen-decomposition method in the presence of data asymmetry.

5.3. Proof of Theorem 4.

Without loss of generality, we shall assume $λ_{max}^{*} = λ_{1}^{⋆} = 1$ throughout the proof. To begin with, Lemma 2 implies that for all 1 ≤ l ≤ r,

|λ_{l}| \geq |λ_{max}^{*}| - ‖ H ‖ > 1 / (2 κ) > ‖ H ‖

(49)

as long as ∥H∥ < 1/(2κ). In view of the Neumann trick (Theorem 2), we can derive

\begin{array}{l} (50) & |a^{⊤} u_{l} - \sum_{j = 1}^{r} \frac{λ_{j}^{⋆} u_{j}^{⋆ ⊤} u_{l}}{λ_{l}} a^{⊤} u_{j}^{⋆}| \\ = |\sum_{j = 1}^{r} \frac{λ_{j}^{⋆}}{λ_{l}} (u_{j}^{⋆ ⊤} u_{l}) \{\sum_{s = 1}^{\infty} \frac{1}{λ_{l}^{s}} a^{⊤} H^{s} u_{j}^{⋆}\}| \\ \leq (\sum_{j = 1}^{r} \frac{|λ_{j}^{⋆}|}{|λ_{l}|} |u_{j}^{⋆ ⊤} u_{l}|) \{max_{1 \leq j \leq r} \sum_{s = 1}^{\infty} \frac{1}{{|λ_{l}|}^{s}} |a^{⊤} H^{s} u_{j}^{⋆}|\} \\ \leq \sqrt{r \sum_{j = 1}^{r} {|u_{j}^{⋆ ⊤} u_{l}|}^{2}} \{max_{1 \leq j \leq r} \frac{|λ_{j}^{⋆}|}{|λ_{l}|}\} \{max_{1 \leq j \leq r} \sum_{s = 1}^{\infty} \frac{1}{{|λ_{l}|}^{s}} |a^{⊤} H^{s} u_{j}^{⋆}|\} \\ (51) & \leq \sqrt{r} \cdot \frac{1}{|λ_{l}|} \cdot \{max_{1 \leq j \leq r} \sum_{s = 1}^{\infty} \frac{1}{{|λ_{l}|}^{s}} |a^{⊤} H^{s} u_{j}^{⋆}|\}, \end{array}

where the third line follows since $\sum_{J = 1}^{R} {| u_{j}^{⋆ ⊤} u_{l} |}^{2} \leq {‖ u_{l} ‖}_{2}^{2} = 1$ , and the last inequality makes use of (49). Apply Corollary 4 to reach

\begin{array}{l} (51) & \leq \frac{\sqrt{r}}{|λ_{l}|} \sum_{s = 1}^{\infty} {(2 c_{2} κ max \{B log n, \sqrt{n σ^{2} log n}\})}^{s} \sqrt{\frac{μ}{n}} \\ ≲ \frac{κ}{|λ_{l}|} max \{B log n, \sqrt{n σ^{2} log n}\} \sqrt{\frac{μ r}{n}} \\ ≲ κ^{2} max \{B log n, \sqrt{n σ^{2} log n}\} \sqrt{\frac{μ r}{n},} \end{array}

with the proviso that |λ_l| > 1/(2κ) and $max {B log n, \sqrt{n σ^{2} log n}} \leq c_{1} / κ$ for some sufficiently small constant c₁ > 0. The condition |λ_l| > 1/(2κ) follows immediately by combining Lemma 2, Lemma 1 and the condition (34).

6. Discussions.

In this paper, we demonstrate the remarkable advantage of eigen-decomposition over SVD in the presence of asymmetric noise matrices. This is in stark contrast to conventional wisdom, which is generally not in favor of eigen-decomposition for asymmetric matrices. Our results only reflect the tip of an iceberg, and there are many outstanding issues left answered. We conclude the paper with a few future directions.

Sharper eigenvalue perturbation bounds for the rank-r case.

Our current results in Section 5 provide an eigenvalue perturbation bound on the order of $r / \sqrt{n}$ , assuming the truth is rank-r. However, numerical experiments suggest that the dependency on r might be improvable. It would be interesting to see whether further theoretical refinement is possible, e.g. whether it is possible to improve it to $O (\sqrt{r / n})$ .

Eigenvector perturbation bounds for the rank-r case.

As mentioned before, the current theory falls short of providing eigenvector perturbation bounds for the general rank-r case. The main difficulty lies in the lack of orthogonality of the eigenvectors of the observed matrix M. Nevertheless, when the size of the noise is not too large, it is possible to establish certain near-orthogonality of the eigenvectors, which might in turn lead to sharp control of eigenvector perturbation.

A challenging signal-to-noise ratio regime.

Take the rank-1 case for example: the present work focuses on the regime where $‖ H ‖ ≲ ‖M^{⋆}‖ / \sqrt{log n}$ , and it is known that spectral methods fail to yield reliable estimation if ∥H∥ ≫ ∥M^⋆∥. There is, however, a “gray” region (which includes, for example, the case with ∥H∥ ≈ ∥M^⋆∥) that has not been addressed. Developing non-asymptotic yet informative perturbation bounds for this regime is likely very challenging and requires new analysis techniques, which we leave for future investigation.

Correlated noise.

The current theoretical development relies heavily on the assumption that the noise matrix H contains independent random entries. There is no shortage of examples where the noise matrix is asymmetric but is not composed of independent entries. For instance, in blind deconvolution Li et al. (2018), the noise matrix is a sum of independent asymmetric matrices. Can we develop eigenvalue perturbation theory for this class of noise?

Statistical inference of eigenvalues and eigenvectors.

In various applications like network analysis and inference, one might be interested in determining the (asymptotic) eigenvalue and eigenvector distributions of a random data matrix, in order to produce valid confidence intervals Johnstone (2001); Bai and Yao (2008); Cai, Han and Pan (2017); Cape, Tang and Priebe (2018); Xia (2018); Bao, Ding and Wang (2018); Chen et al. (2019c). Can we use the current framework to characterize the distributions of the leading eigenvalues as well as certain linear forms of the eigenvectors of M when the noise matrix is non-symmetric?

Asymmetrization for other applications.

Given the abundant applications of spectral estimation, our findings are likely to be useful for other matrix eigenvalue problems and might extend to the tensor case Zhang and Xia (2018); Cai et al. (2019). Here, we conclude the paper with an example in covariance estimation (Baik and Silverstein, 2006; Fan, Wang and Zhong, 2018). Imagine that we observe a collection of n independent Gaussian vectors $X_{1}, \dots, X_{n} \in ℝ^{d}$ , which have mean zero and covariance matrix

Σ^{⋆} = v v^{⊤} + I_{d}

(52)

with v being a unit vector. This falls under the category of the spiked covariance model (Johnstone and Lu, 2009). One strategy to estimate the spectral norm λ^⋆ = 2 of Σ^⋆ is to look at the spectrum of the sample covariance matrix $\hat{Σ} = \frac{1}{n} \sum_{i = 1}^{n} X_{i} X_{i}^{⊤}$ . Motivated by the results of this paper, we propose an alternative strategy by looking at the following asymmetrized sample covariance matrix

{\hat{Σ}}_{asym} = \frac{2}{n} ({\sum_{i = 1}^{n / 2} Upper}_{} (X_{i} X_{i}^{⊤}) + \sum_{i = n / 2 + 1}^{n} Lower (X_{i} X_{i}^{⊤})),

(53)

where Upper(·) (resp. Lower(·)) extracts out the upper (resp. lower) triangular part of the matrix, including (resp. excluding) the diagonal entries. As can be seen from Fig. 5, the largest eigenvalue of the asymmetrized ${\hat{Σ}}_{asym}$ is much closer to the true spectral norm of Σ^⋆, compared to the largest singular value of the sample covariance matrix $\hat{Σ}$ . We leave the theoretical understanding of such findings to future investigation.

Fig 5: — Numerical experiments for the spiked covariance model, where the sample vectors are zero-mean Gaussian vectors with covariance matrix Σ^⋆ (cf. (52)). We plot |λ − λ^⋆| vs. n with d = n/10. The blue (resp. red) lines represent the average errors over 100 independent trials when λ is the leading eigenvalue of ${\hat{Σ}}_{asym}$ (resp. $\hat{Σ}$ ).

Supplementary Material

supplement

NIHMS1639565-supplement-supplement.pdf^{(1.1MB, pdf)}

Acknowledgment.

Y. Chen is supported in part by the AFOSR YIP award FA9550-19-1-0030, by the ARO grant W911NF-18-1-0303, by the ONR grant N00014-19-1-2120, by the NSF grants CCF-1907661 and IIS-1900140, and by the Princeton SEAS innovation award. J. Fan is supported in part by the NSF grants DMS-1662139 and DMS-1712591, by the ONR grant N00014-19-1-2120, and by the NIH grant 2R01-GM072611-13. C. Cheng is supported in part by the Elite Undergraduate Training Program of School of Mathematical Sciences in Peking University, and by the William R. Hewlett Stanford graduate fellowship. We thank Cong Ma for helpful discussions, and Zhou Fan for telling us an example of asymmetrizing the Gaussian matrix.

Footnotes

Supplement: Additional Proofs

(http://www.e-publications.org/ims/support/dowload/imsart-ims.zip). Additional proofs of the results in the paper can be found in the Supplementary Material.

More precisely, this gain is possible when ∥H∥ is nearly as large as ∥M^⋆∥ (up to some logarithmic factor).

In this case, one can take $B ≍ σ \sqrt{log n}$ , which clearly satisfies $B log n ≪ \sqrt{n σ^{2} log n}$ .

References.

Abbe E, Fan J, Wang K and Zhong Y (2017). Entrywise Eigenvector Analysis of Random Matrices with Low Expected Rank. under revision, Annals of Statistics. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bai Z and Yao J. f. (2008). Central limit theorems for eigenvalues in a spiked population model. In Annales de l’Institut Henri Poincaré, Probabilités et Statistiques 44 447–474. Institut Henri Poincaré. [Google Scholar]
Baik J and Silverstein JW (2006). Eigenvalues of large sample covariance matrices of spiked population models. Journal of multivariate analysis 97 1382–1408. [Google Scholar]
Bao Z, Ding X and Wang K (2018). Singular vector and singular subspace distribution for the matrix denoising model. arXiv preprint arXiv:1809.10476. [Google Scholar]
Bauer FL and Fike CT (1960). Norms and exclusion theorems. Numerische Mathematik 2 137–141. [Google Scholar]
Benaych-Georges F and Nadakuditi RR (2011). The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics 227 494–521. [Google Scholar]
Benaych-Georges F and Nadakuditi RR (2012). The singular values and vectors of low rank perturbations of large rectangular random matrices. Journal of Multivariate Analysis 111 120–135. [Google Scholar]
Benaych-Georges F and Rochet J (2016). Outliers in the single ring theorem. Probability Theory and Related Fields 165 313–363. [Google Scholar]
Bordenave C and Capitaine M (2016). Outlier eigenvalues for deformed iid random matrices. Communications on Pure and Applied Mathematics 69 2131–2194. [Google Scholar]
Brezin E and Zee A (1998). Non-Hermitean Delocalization: Multiple Scattering and Bounds. Nuclear Physics B 509 599–614. [Google Scholar]
Bryc W and Silverstein JW (2018). Singular values of large non-central random matrices. arXiv preprint arXiv:1802.02960. [Google Scholar]
Cai T, Han X and Pan G (2017). Limiting Laws for Divergent Spiked Eigenvalues and Largest Non-spiked Eigenvalue of Sample Covariance Matrices. arXiv preprint arXiv:1711.00217. [Google Scholar]
Cai TT and Zhang A (2018). Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics. The Annals of Statistics 46 60–89. [Google Scholar]
Cai C, Li G, Chi Y, Poor HV and Chen Y (2019). Subspace estimation from unbalanced and incomplete data matrices: ℓ2,∞ statistical guarantees. arXiv preprint arXiv:1910.04267. [Google Scholar]
Candès EJ and Recht B (2009). Exact Matrix Completion via Convex Optimization. Foundations of Computational Mathematics 9 717–772. [Google Scholar]
Cape J, Tang M and Priebe CE (2018). Signal-plus-noise matrix models: eigenvector deviations and fluctuations. arXiv preprint arXiv:1802.00381. [Google Scholar]
Capitaine M, Donati-Martin C and Féral D (2009). The largest eigenvalues of finite rank deformation of large Wigner matrices: convergence and nonuniversality of the fluctuations. The Annals of Probability 37 1–47. [Google Scholar]
Chalker JT and Mehlig B (1998). Eigenvector statistics in non-Hermitian random matrix ensembles. Physical review letters 81 3367. [Google Scholar]
Chen Y, Chi Y, Fan J, Ma C and Yan Y (2019a). Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex optimization. arXiv:1902.07698. [DOI] [PMC free article] [PubMed]
Chen Y, Fan J, Ma C and Wang K (2019b). Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking. Annals of Statistics 47 2204–2235. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen Y, Fan J, Ma C and Yan Y (2019c). Inference and uncertainty quantification for noisy matrix completion. Proceedings of the National Academy of Sciences 116 22931–22937. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cheng C, Wei Y and Chen Y (2020). Inference for linear forms of eigenvectors under minimal eigenvalue separation: Asymmetry and heteroscedasticity. arXiv preprint arXiv:2001.04620. [Google Scholar]
Chi Y, Lu YM and Chen Y (2019). Nonconvex optimization meets low-rank matrix factorization: An overview. IEEE Transactions on Signal Processing 67 5239–5269. [Google Scholar]
Davis C and Kahan WM (1970). The rotation of eigenvectors by a perturbation. III. SIAM Journal on Numerical Analysis 7 1–46. [Google Scholar]
Eldridge J, Belkin M and Wang Y (2018). Unperturbed: spectral analysis beyond Davis-Kahan. In Proceedings of Algorithmic Learning Theory 321–358. [Google Scholar]
Erdős L, Knowles A, Yau H-T, Yin J et al. (2013). Spectral statistics of Erdős-Rényi graphs I: local semicircle law. The Annals of Probability 41 2279–2375. [Google Scholar]
Fan J, Wang W and Zhong Y (2018). An ℓ_∞ Eigenvector Perturbation Bound and Its Application to Robust Covariance Estimation. Journal of Machine Learning 18 1–42. [PMC free article] [PubMed] [Google Scholar]
Feinberg J and Zee A (1997). Non-hermitian random matrix theory: Method of hermitian reduction. Nuclear Physics B 504 579–608. [Google Scholar]
Féral D and Péché S (2007). The largest eigenvalue of rank one deformation of large Wigner matrices. Communications in mathematical physics 272 185–228. [Google Scholar]
Füredi Z and Komlós J (1981). The eigenvalues of random symmetric matrices. Combinatorica 1 233–241. [Google Scholar]
Jain P and Netrapalli P (2015). Fast exact matrix completion with finite samples. In Conference on Learning Theory 1007–1034. [Google Scholar]
Johnstone IM (2001). On the distribution of the largest eigenvalue in principal components analysis. Annals of statistics 295–327. [Google Scholar]
Johnstone IM and Lu AY (2009). On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association 104 682–693. [DOI] [PMC free article] [PubMed] [Google Scholar]
Keshavan RH, Montanari A and Oh S (2010). Matrix Completion From a Few Entries. IEEE Transactions on Information Theory 56 2980–2998. [Google Scholar]
Khoruzhenko B (1996). Large-N eigenvalue distribution of randomly perturbed asymmetric matrices. Journal of Physics A: Mathematical and General 29 L165. [Google Scholar]
Knowles A and Yin J (2013). The isotropic semicircle law and deformation of Wigner matrices. Communications on Pure and Applied Mathematics 66 1663–1749. [Google Scholar]
Koltchinskii V and Xia D (2016). Perturbation of linear forms of singular vectors under Gaussian noise. In High Dimensional Probability VII 397–423. Springer. [Google Scholar]
Li X, Ling S, Strohmer T and Wei K (2018). Rapid, robust, and reliable blind deconvolution via nonconvex optimization. Applied and Computational Harmonic Analysis. [Google Scholar]
Lytova A and Tikhomirov K (2018). On delocalization of eigenvectors of random non-Hermitian matrices. arXiv preprint arXiv:1810.01590. [Google Scholar]
Ma C, Wang K, Chi Y and Chen Y (2019). Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval, matrix completion and blind deconvolution. accepted to Foundations of Computational Mathematics. [Google Scholar]
O’Rourke S, Vu V and Wang K (2013). Random perturbation of low rank matrices: Improving classical bounds. arXiv preprint arXiv:1311.2657. [Google Scholar]
O’Rourke S, Vu V and Wang K (2016). Eigenvectors of random matrices: a survey. Journal of Combinatorial Theory, Series A 144 361–442. [Google Scholar]
Péché S (2006). The largest eigenvalue of small rank perturbations of Hermitian random matrices. Probability Theory and Related Fields 134 127–173. [Google Scholar]
Rajagopalan AB (2015). Outlier eigenvalue fluctuations of perturbed iid matrices. arXiv preprint arXiv:1507.01441. [Google Scholar]
Renfrew D and Soshnikov A (2013). On finite rank deformations of Wigner matrices II: Delocalized perturbations. Random Matrices: Theory and Applications 2 1250015. [Google Scholar]
Silverstein JW (1994). The spectral radii and norms of large dimensional non-central random atrices matrices. Stochastic Models 10 525–532. [Google Scholar]
Sommers H, Crisanti A, Sompolinsky H and Stein Y (1988). Spectrum of large random asymmetric matrices. Physical review letters 60 1895. [DOI] [PubMed] [Google Scholar]
Tao T (2012). Topics in Random Matrix Theory. Graduate Studies in Mathematics. American Mathematical Society, Providence, Rhode Island. [Google Scholar]
Tao T (2013). Outliers in the spectrum of iid matrices with bounded rank perturbations. Probability Theory and Related Fields 155 231–263. [Google Scholar]
Tropp JA (2015). An Introduction to Matrix Concentration Inequalities. Found. Trends Mach. Learn 8 1–230. [Google Scholar]
Vu V (2011). Singular vectors under random perturbation. Random Structures & Algorithms 39 526–538. [Google Scholar]
Vu V and Wang K (2015). Random weighted projections, random quadratic forms and random eigenvectors. Random Structures and Algorithms 47 792–821. [Google Scholar]
Wang R (2015). Singular vector perturbation under Gaussian noise. SIAM Journal on Matrix Analysis and Applications 36 158–177. [Google Scholar]
Wedin P (1972). Perturbation bounds in connection with singular value decomposition. BIT Numerical Mathematics 12 99–111. [Google Scholar]
Xia D (2016). Statistical inference for large matrices, PhD thesis, Georgia Institute of Technology. [Google Scholar]
Xia D (2018). Confidence interval of singular vectors for high-dimensional and low-rank matrix regression. arXiv preprint arXiv:1805.09871. [Google Scholar]
Yin Y-Q, Bai Z-D and Krishnaiah PR (1988). On the limit of the largest eigenvalue of the large dimensional sample covariance matrix. Probability theory and related fields 78 509–521. [Google Scholar]
Zhang A and Xia D (2018). Tensor SVD: Statistical and Computational Limits. IEEE Transactions on Information Theory 64 7311–7338. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplement

NIHMS1639565-supplement-supplement.pdf^{(1.1MB, pdf)}

[R1] Abbe E, Fan J, Wang K and Zhong Y (2017). Entrywise Eigenvector Analysis of Random Matrices with Low Expected Rank. under revision, Annals of Statistics. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Bai Z and Yao J. f. (2008). Central limit theorems for eigenvalues in a spiked population model. In Annales de l’Institut Henri Poincaré, Probabilités et Statistiques 44 447–474. Institut Henri Poincaré. [Google Scholar]

[R3] Baik J and Silverstein JW (2006). Eigenvalues of large sample covariance matrices of spiked population models. Journal of multivariate analysis 97 1382–1408. [Google Scholar]

[R4] Bao Z, Ding X and Wang K (2018). Singular vector and singular subspace distribution for the matrix denoising model. arXiv preprint arXiv:1809.10476. [Google Scholar]

[R5] Bauer FL and Fike CT (1960). Norms and exclusion theorems. Numerische Mathematik 2 137–141. [Google Scholar]

[R6] Benaych-Georges F and Nadakuditi RR (2011). The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics 227 494–521. [Google Scholar]

[R7] Benaych-Georges F and Nadakuditi RR (2012). The singular values and vectors of low rank perturbations of large rectangular random matrices. Journal of Multivariate Analysis 111 120–135. [Google Scholar]

[R8] Benaych-Georges F and Rochet J (2016). Outliers in the single ring theorem. Probability Theory and Related Fields 165 313–363. [Google Scholar]

[R9] Bordenave C and Capitaine M (2016). Outlier eigenvalues for deformed iid random matrices. Communications on Pure and Applied Mathematics 69 2131–2194. [Google Scholar]

[R10] Brezin E and Zee A (1998). Non-Hermitean Delocalization: Multiple Scattering and Bounds. Nuclear Physics B 509 599–614. [Google Scholar]

[R11] Bryc W and Silverstein JW (2018). Singular values of large non-central random matrices. arXiv preprint arXiv:1802.02960. [Google Scholar]

[R12] Cai T, Han X and Pan G (2017). Limiting Laws for Divergent Spiked Eigenvalues and Largest Non-spiked Eigenvalue of Sample Covariance Matrices. arXiv preprint arXiv:1711.00217. [Google Scholar]

[R13] Cai TT and Zhang A (2018). Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics. The Annals of Statistics 46 60–89. [Google Scholar]

[R14] Cai C, Li G, Chi Y, Poor HV and Chen Y (2019). Subspace estimation from unbalanced and incomplete data matrices: ℓ2,∞ statistical guarantees. arXiv preprint arXiv:1910.04267. [Google Scholar]

[R15] Candès EJ and Recht B (2009). Exact Matrix Completion via Convex Optimization. Foundations of Computational Mathematics 9 717–772. [Google Scholar]

[R16] Cape J, Tang M and Priebe CE (2018). Signal-plus-noise matrix models: eigenvector deviations and fluctuations. arXiv preprint arXiv:1802.00381. [Google Scholar]

[R17] Capitaine M, Donati-Martin C and Féral D (2009). The largest eigenvalues of finite rank deformation of large Wigner matrices: convergence and nonuniversality of the fluctuations. The Annals of Probability 37 1–47. [Google Scholar]

[R18] Chalker JT and Mehlig B (1998). Eigenvector statistics in non-Hermitian random matrix ensembles. Physical review letters 81 3367. [Google Scholar]

[R19] Chen Y, Chi Y, Fan J, Ma C and Yan Y (2019a). Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex optimization. arXiv:1902.07698. [DOI] [PMC free article] [PubMed]

[R20] Chen Y, Fan J, Ma C and Wang K (2019b). Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking. Annals of Statistics 47 2204–2235. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Chen Y, Fan J, Ma C and Yan Y (2019c). Inference and uncertainty quantification for noisy matrix completion. Proceedings of the National Academy of Sciences 116 22931–22937. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Cheng C, Wei Y and Chen Y (2020). Inference for linear forms of eigenvectors under minimal eigenvalue separation: Asymmetry and heteroscedasticity. arXiv preprint arXiv:2001.04620. [Google Scholar]

[R23] Chi Y, Lu YM and Chen Y (2019). Nonconvex optimization meets low-rank matrix factorization: An overview. IEEE Transactions on Signal Processing 67 5239–5269. [Google Scholar]

[R24] Davis C and Kahan WM (1970). The rotation of eigenvectors by a perturbation. III. SIAM Journal on Numerical Analysis 7 1–46. [Google Scholar]

[R25] Eldridge J, Belkin M and Wang Y (2018). Unperturbed: spectral analysis beyond Davis-Kahan. In Proceedings of Algorithmic Learning Theory 321–358. [Google Scholar]

[R26] Erdős L, Knowles A, Yau H-T, Yin J et al. (2013). Spectral statistics of Erdős-Rényi graphs I: local semicircle law. The Annals of Probability 41 2279–2375. [Google Scholar]

[R27] Fan J, Wang W and Zhong Y (2018). An ℓ_∞ Eigenvector Perturbation Bound and Its Application to Robust Covariance Estimation. Journal of Machine Learning 18 1–42. [PMC free article] [PubMed] [Google Scholar]

[R28] Feinberg J and Zee A (1997). Non-hermitian random matrix theory: Method of hermitian reduction. Nuclear Physics B 504 579–608. [Google Scholar]

[R29] Féral D and Péché S (2007). The largest eigenvalue of rank one deformation of large Wigner matrices. Communications in mathematical physics 272 185–228. [Google Scholar]

[R30] Füredi Z and Komlós J (1981). The eigenvalues of random symmetric matrices. Combinatorica 1 233–241. [Google Scholar]

[R31] Jain P and Netrapalli P (2015). Fast exact matrix completion with finite samples. In Conference on Learning Theory 1007–1034. [Google Scholar]

[R32] Johnstone IM (2001). On the distribution of the largest eigenvalue in principal components analysis. Annals of statistics 295–327. [Google Scholar]

[R33] Johnstone IM and Lu AY (2009). On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association 104 682–693. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Keshavan RH, Montanari A and Oh S (2010). Matrix Completion From a Few Entries. IEEE Transactions on Information Theory 56 2980–2998. [Google Scholar]

[R35] Khoruzhenko B (1996). Large-N eigenvalue distribution of randomly perturbed asymmetric matrices. Journal of Physics A: Mathematical and General 29 L165. [Google Scholar]

[R36] Knowles A and Yin J (2013). The isotropic semicircle law and deformation of Wigner matrices. Communications on Pure and Applied Mathematics 66 1663–1749. [Google Scholar]

[R37] Koltchinskii V and Xia D (2016). Perturbation of linear forms of singular vectors under Gaussian noise. In High Dimensional Probability VII 397–423. Springer. [Google Scholar]

[R38] Li X, Ling S, Strohmer T and Wei K (2018). Rapid, robust, and reliable blind deconvolution via nonconvex optimization. Applied and Computational Harmonic Analysis. [Google Scholar]

[R39] Lytova A and Tikhomirov K (2018). On delocalization of eigenvectors of random non-Hermitian matrices. arXiv preprint arXiv:1810.01590. [Google Scholar]

[R40] Ma C, Wang K, Chi Y and Chen Y (2019). Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval, matrix completion and blind deconvolution. accepted to Foundations of Computational Mathematics. [Google Scholar]

[R41] O’Rourke S, Vu V and Wang K (2013). Random perturbation of low rank matrices: Improving classical bounds. arXiv preprint arXiv:1311.2657. [Google Scholar]

[R42] O’Rourke S, Vu V and Wang K (2016). Eigenvectors of random matrices: a survey. Journal of Combinatorial Theory, Series A 144 361–442. [Google Scholar]

[R43] Péché S (2006). The largest eigenvalue of small rank perturbations of Hermitian random matrices. Probability Theory and Related Fields 134 127–173. [Google Scholar]

[R44] Rajagopalan AB (2015). Outlier eigenvalue fluctuations of perturbed iid matrices. arXiv preprint arXiv:1507.01441. [Google Scholar]

[R45] Renfrew D and Soshnikov A (2013). On finite rank deformations of Wigner matrices II: Delocalized perturbations. Random Matrices: Theory and Applications 2 1250015. [Google Scholar]

[R46] Silverstein JW (1994). The spectral radii and norms of large dimensional non-central random atrices matrices. Stochastic Models 10 525–532. [Google Scholar]

[R47] Sommers H, Crisanti A, Sompolinsky H and Stein Y (1988). Spectrum of large random asymmetric matrices. Physical review letters 60 1895. [DOI] [PubMed] [Google Scholar]

[R48] Tao T (2012). Topics in Random Matrix Theory. Graduate Studies in Mathematics. American Mathematical Society, Providence, Rhode Island. [Google Scholar]

[R49] Tao T (2013). Outliers in the spectrum of iid matrices with bounded rank perturbations. Probability Theory and Related Fields 155 231–263. [Google Scholar]

[R50] Tropp JA (2015). An Introduction to Matrix Concentration Inequalities. Found. Trends Mach. Learn 8 1–230. [Google Scholar]

[R51] Vu V (2011). Singular vectors under random perturbation. Random Structures & Algorithms 39 526–538. [Google Scholar]

[R52] Vu V and Wang K (2015). Random weighted projections, random quadratic forms and random eigenvectors. Random Structures and Algorithms 47 792–821. [Google Scholar]

[R53] Wang R (2015). Singular vector perturbation under Gaussian noise. SIAM Journal on Matrix Analysis and Applications 36 158–177. [Google Scholar]

[R54] Wedin P (1972). Perturbation bounds in connection with singular value decomposition. BIT Numerical Mathematics 12 99–111. [Google Scholar]

[R55] Xia D (2016). Statistical inference for large matrices, PhD thesis, Georgia Institute of Technology. [Google Scholar]

[R56] Xia D (2018). Confidence interval of singular vectors for high-dimensional and low-rank matrix regression. arXiv preprint arXiv:1805.09871. [Google Scholar]

[R57] Yin Y-Q, Bai Z-D and Krishnaiah PR (1988). On the limit of the largest eigenvalue of the large dimensional sample covariance matrix. Probability theory and related fields 78 509–521. [Google Scholar]

[R58] Zhang A and Xia D (2018). Tensor SVD: Statistical and Computational Limits. IEEE Transactions on Information Theory 64 7311–7338. [Google Scholar]

PERMALINK

ASYMMETRY HELPS: EIGENVALUE AND EIGENVECTOR ANALYSES OF ASYMMETRICALLY PERTURBED LOW-RANK MATRICES

Yuxin Chen

Chen Cheng

Jianqing Fan

Abstract

1. Introduction.

Fig 1:

2. Problem formulation.

2.1. Models and assumptions.

Assumption 1.

Remark 1 (Notational convention).

Lemma 1.

2.2. Our goal.

2.3. Incoherence conditions.

Definition 1 (Incoherence parameter).

Remark 2.

2.4. Notation.

3. Preliminaries.

3.1. Perturbation of eigenvalues of asymmetric matrices.

Theorem 1 (Bauer-Fike Theorem).

Lemma 2.

3.2. The Neumann trick and eigenvector perturbation.

Theorem 2 (Neumann trick).

Remark 3.

Lemma 3.

4. Perturbation analysis for the rank-1 case.

4.1. Main results: the rank-1 case.

4.1.1. A master bound.

Theorem 3 (Perturbation of linear forms of eigenvectors (rank-1)).

Remark 4 (The noise size).

4.1.2. Eigenvalue perturbation.

Corollary 1.

Lemma 4.

4.1.3. Perturbation of linear forms of eigenvectors.

Corollary 2.

Remark 5.

4.1.4. Entrywise eigenvector perturbation.

Corollary 3.

4.2. Applications.

Low-rank matrix estimation from Gaussian noise.

Fig 2:

Low-rank matrix completion.

Fig 3:

4.3. Why asymmetry helps?.

4.4. Proof outline of Theorem 3.

Lemma 5 (Bounding higher-order terms).

Remark 6.

Corollary 4.

5. Extension: perturbation analysis for the rank-r case.

5.1. Eigenvalue perturbation for the rank-r case.

Theorem 4 (Perturbation of linear forms of eigenvectors (rank-r)).

Corollary 5.

Corollary 6.

Corollary 7.

Remark 7.

Remark 8.

5.2. Application: spectral estimation when M⋆ is asymmetric and rank-1.

Eigenvalue perturbation analysis.

Corollary 8.

Eigenvector perturbation analysis.

Theorem 5.

Fig 4:

5.3. Proof of Theorem 4.

6. Discussions.

Sharper eigenvalue perturbation bounds for the rank-r case.

Eigenvector perturbation bounds for the rank-r case.

A challenging signal-to-noise ratio regime.

Correlated noise.

Statistical inference of eigenvalues and eigenvectors.

Asymmetrization for other applications.

Fig 5:

Supplementary Material

Acknowledgment.

Footnotes

References.

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

5.2. Application: spectral estimation when M^⋆ is asymmetric and rank-1.