Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jan 1.
Published in final edited form as: J Am Stat Assoc. 2020 Dec 8;117(538):996–1009. doi: 10.1080/01621459.2020.1840990

Asymptotic Theory of Eigenvectors for Random Matrices with Diverging Spikes

Jianqing Fan 1, Yingying Fan 2, Xiao Han 2, Jinchi Lv 2
PMCID: PMC9438751  NIHMSID: NIHMS1683727  PMID: 36060554

Abstract

Characterizing the asymptotic distributions of eigenvectors for large random matrices poses important challenges yet can provide useful insights into a range of statistical applications. To this end, in this paper we introduce a general framework of asymptotic theory of eigenvectors (ATE) for large spiked random matrices with diverging spikes and heterogeneous variances, and establish the asymptotic properties of the spiked eigenvectors and eigenvalues for the scenario of the generalized Wigner matrix noise. Under some mild regularity conditions, we provide the asymptotic expansions for the spiked eigenvalues and show that they are asymptotically normal after some normalization. For the spiked eigenvectors, we establish asymptotic expansions for the general linear combination and further show that it is asymptotically normal after some normalization, where the weight vector can be arbitrary. We also provide a more general asymptotic theory for the spiked eigenvectors using the bilinear form. Simulation studies verify the validity of our new theoretical results. Our family of models encompasses many popularly used ones such as the stochastic block models with or without overlapping communities for network analysis and the topic models for text analysis, and our general theory can be exploited for statistical inference in these large-scale applications.

Keywords: Random matrix theory, Generalized Wigner matrix, Low-rank matrix, Eigenvectors, Spiked eigenvalues, Asymptotic distributions, Asymptotic normality, High dimensionality, Networks and texts

1. Introduction

The big data era has brought us a tremendous amount of both structured and unstructured data including networks and texts in many modern applications. For network and text data, we are often interested in learning the cluster and other structural information for the underlying network communities and text topics. In these large-scale applications, we are given a network data matrix or can create such a matrix by calculating some similarity measure between text documents, where each entry of the data matrix is binary indicating the absence or presence of a link, or continuous indicating the strength of similarity between each pair of nodes or documents. Such applications naturally give rise to random matrices that can be used to reveal interesting latent structures of networks and texts for effective predictions and recommendations.

Random matrix has been widely exploited to model the interactions among the nodes of a network for applications ranging from physics and social sciences to genomics and neuroscience. Random matrix theory (RMT) has a long history and was originated by Wigner in Wigner (1955) for modeling the nucleon-nucleus interactions to understand the behavior of atomic nuclei and link the spacings of the levels of atomic nuclei to those of the eigenvalues of a random matrix. See, for example, Bai (1999) for a review of some classical technical tools such as the moment method and Stieltjes transform as well as some more recent developments on the RMT, and Mehta (2004); Tao (2004); Bai and Silverstein (2006) for detailed book-length accounts of the topic of random matrices.

There is a rich recent literature in mathematics on the asymptotic behaviors of eigenvalues and eigenvectors of random matrices (Erdős et al., 2013; Bourgade et al., 2018; Bourgade and Yau, 2017; Rudelson and Vershynin, 2016; Dekel et al., 2007). The main challenge in many RMT problems is caused by the strong dependence of eigenvalues if they are close to each other. Using the terminologies in RMT, four regimes are often of interests: bulk, subcritical edge, critical edge, and supercritical regimes. The first three regimes all have eigenvalues that are highly correlated with each other, and the last regime has weaker interactions among the eigenvalues. The last regime can be further divided into two categories according to the relative strength of spiked eigenvalues compared to noise, which can be roughly understood as the signal-to-noise ratio. There have been exciting mathematical developments in the recent mathematical literature when the smallest spiked eigenvalue has the same order as the noise (Capitaine and Donati-Martin, 2018; Knowles and Yin, 2013; Bao et al., 2018). Due to the challenge caused by constant signal-to-noise ratio, these existing results often take complicated forms and the asymptotic distributions depend generally on the noise matrix distribution in a complex way, limiting their practical usage to statisticians. In this paper, we consider the setting of diverging spikes where the spiked eigenvalues are an order of magnitude larger than the noise level asymptotically. Although mathematically easier, such random matrices are of great interests to statisticians, because many statistical applications such as network analysis and text analysis often fall into this regime. Yet there lack any formal results on the asymptotic expansions and asymptotic distributions of spiked eigenvectors even in this setting. This motivates our study in this paper.

There is a larger literature on the limiting distributions of eigenvalues than eigenvectors in RMT. For instance, the limiting spectral distribution of the Wigner matrix was generalized by Arnold (1967) and Arnold (1971). Marchenko and Pastur (1967) established the well-known Marchenko–Pastur law for the limiting spectral distribution of the sample covariance matrix including the Wishart matrix which plays an important role in statistical applications. In contrast, the asymptotic distribution of the largest nonspiked eigenvalue of Wigner matrix with Gaussian ensemble was revealed to be the Tracy–Widom law in Tracy and Widom (1994) and Tracy and Widom (1996). More recent developments on the asymptotic distribution of the largest nonspiked eigenvalue include Johnstone (2001), El Karoui (2007), Johnstone (2008), Erdös et al. (2011), and Knowles and Yin (2017). See also Füredi and Komlós (1981), Baik et al. (2005), Bai and Yao (2008), Knowles and Yin (2013), Pizzo et al. (2013), Renfrew and Soshnikov (2013), Knowles and Yin (2014), and Wang and Fan (2017) for the asymptotic distributions of the spiked eigenvalues of various random matrices and sample covariance matrices. For the eigenvectors, Capitaine and Donati-Martin (2018) and Bao et al. (2018) established their asymptotic distributions, which depend on the specific distribution of the Wigner matrix in a complicated way, in the challenging setting of constant signal-to-noise ratio. There is also a growing literature on the specific scenario and applications of large network matrices. To ensure consistency, Johnstone and Lu (2009) proposed the sparse principal component analysis to reduce the noise accumulation in high-dimensional random matrices. See, for example, McSherry (2001), Spielman and Teng (2007), Bickel and Chen (2009), Decelle et al. (2011), Rohe et al. (2011), Lei (2016), Abbe (2017), Jin et al. (2017), Chen and Lei (2018), and Vu (2018).

Matrix perturbation theory has been commonly used to characterize the deviations of empirical eigenvectors from the population ones, often under the average errors (Horn and Johnson, 2012). In contrast, recently Fan et al. (2018) and Abbe et al. (2019) investigated random matrices with low expected rank and provided a tight bound for the difference between the empirical eigenvector and some linear transformation of the population eigenvector through a delicate entrywise eigenvector analysis for the first-order approximation under the maximum norm. See also Paul (2007), Koltchinskii and Lounici (2016), Koltchinskii and Xia (2016), and Wang and Fan (2017) for the asymptotics of empirical eigenstructure for large random matrices. Yet despite these endeavors, the precise asymptotic distributions of the eigenvectors for large spiked random matrices still remain largely unknown even for the case of Wigner matrix noise. Indeed characterizing the exact asymptotic distributions of eigenvectors in such setting can provide useful insights into a range of statistical applications that involve the eigenspaces. In this sense, the asymptotic expansions and asymptotic distributions of eigenvectors established in this paper complement the existing work in the statistics literature.

The major contribution of this paper is introducing a general framework of asymptotic theory of eigenvectors (ATE) for large spiked random matrices with diverging spikes, where the mean matrix is low-rank and the noise matrix is the generalized Wigner matrix. The generalized Wigner matrix refers to a symmetric random matrix whose diagonal and upper diagonal entries are independent with zero mean, allowing for heterogeneous variances. Our family of models includes a variety of popularly used ones such as the stochastic block models with or without overlapping communities for network analysis and the topic models for text analysis. Under some mild regularity conditions, we establish the asymptotic expansions for the spiked eigenvalues and prove that they are asymptotically normal after some normalization. For the spiked eigenvectors, we provide asymptotic expansions for the general linear combination and further establish that it is asymptotically normal after some normalization for arbitrary weight vector. We also present a more general asymptotic theory for the spiked eigenvectors based on the bilinear form. To the best of our knowledge, these theoretical results are new to the literature. Our general theory can be exploited for statistical inference in a range of large-scale applications including network analysis and text analysis. For detailed comparisons with the literature, see Section 3.6.

The rest of the paper is organized as follows. Section 2 presents the model setting and theoretical setup for ATE. We establish the asymptotic expansions and asymptotic distributions for the spiked eigenvectors as well as the asymptotic distributions for the spiked eigenvalues in Section 3. Several specific statistical applications of our new asymptotic theory are discussed in Section 4. Section 5 presents some numerical examples to demonstrate our theoretical results. We further provide a more general asymptotic theory extending the results from Section 3 using the bilinear form in Section 6. Section 7 discusses some implications and extensions of our work. The proofs of main results are relegated to the Appendix. Additional technical details are provided in the Supplementary Material.

2. Model setting and theoretical setup

2.1. Model setting

As mentioned in the introduction, we focus on the class of large spiked symmetric random matrices with low-rank mean matrices and generalized Wigner matrices of noises. It is worth mentioning that our definition of the generalized Wigner matrix specified in Section 1 is broader than the conventional one in the classical RMT literature; see, for example, Yau (2012) for the formal mathematical definition with additional assumptions. To simplify the technical presentation, consider an n × n symmetric random matrix with the following structure

X=H+W, (1)

where H = VDVT is a deterministic latent mean matrix of low rank structure, V = (v1, · · ·, vK) is an n×K orthonormal matrix of population eigenvectors vk’s with VT V = IK, D = diag(d1, · · ·, dK) is a diagonal matrix of population eigenvalues dk’s with |d1| ≥ ··· ≥ |dK| > 0, and W = (wij)1≤i,jn is a symmetric random matrix of independent noises on and above the diagonal with zero mean Ewij=0, variances σij2=Ewij2, and max1≤i,jn |wij| ≤ 1. The rank K of the mean part is assumed typically to be a smaller order of the random matrix size n, which is referred to as matrix dimensionality hereafter for convenience. The bounded assumption on wij is made frequently for technical simplification and satisfied in many real applications such as network analysis and text analysis. It can be relaxed to E|wij|lCl2E|wij|2, l ≥ 2, with C some positive constant, and all the proofs and results can carry through.

In practice, it is either matrix X or matrix X – diag(X) that is readily available to us, where diag(·) denotes the diagonal part of a matrix. In the context of graphs, random matrix X characterizes the connectivity structure of a graph with self loops, while random matrix X–diag(X) corresponds to a graph without self loops. In the latter case, the observed data matrix can be decomposed as

Xdiag(X)=H+[Wdiag(X)]. (2)

Observe that W–diag(X) has the similar structure as W in the sense of being symmetric and having bounded independent entries on and above the diagonal, by assuming that diag(X) has bounded entries for such a case. Thus models (1) and (2) share the same decomposition of a deterministic low rank matrix plus some symmetric noise matrix of bounded entries, which is roughly all we need for the theoretical framework and technical analysis. For these reasons, to simplify the technical presentation we abuse slightly the notation by using X and W to represent the observed data matrix and the latent noise matrix, respectively, in either model (1) or model (2). Therefore, throughout the paper the data matrix X may have diagonal entries all equal to zero and correspondingly the noise matrix W may have a nonzero diagonal mean matrix, and our theory covers both cases.

In either of the two scenarios discussed above, we are interested in inferring the structural information in models (1) and (2), which often boils down to the latent eigenstructure (D,V). Since both the eigenvector matrix V and eigenvalue matrix D are unavailable to us, we resort to the observable random data matrix X for extracting the structural information. To this end, we conduct a spectral decomposition of X, and denote by λ1, · · ·, λn its eigenvalues and v^1,,v^n the corresponding eigenvectors. Without loss of generality, assume that |λ1| ≥ ··· ≥ |λn| and denote by V^=(v^1,,v^K) an n × K matrix of spiked eigenvectors. As mentioned before, we aim at investigating the precise asymptotic behavior of the spiked empirical eigenvalues λ1, · · ·, λK and spiked empirical eigenvectors v^1,,v^K of data matrix X. It is worth mentioning that our definition of spikedness differs from the conventional one in that the underlying rank order depends on the magnitude of eigenvalues instead of the nonnegative eigenvalues that are usually assumed.

One concrete example is the stochastic block model (SBM), where the latent mean matrix H takes the form H = ΠPΠT with Π=(π1,,πn)Tn×K a matrix of community membership vectors and P=(pkl)K×K a nonsingular matrix with pkl ∈ [0, 1] for 1 ≤ k, lK. Here, for each 1 ≤ in, πi ∈ {e1, · · ·, eK} with ejK, 1 ≤ jK, a unit vector with the kth component being one and all other components being zero. It is well known that the community information of the SBM is encoded completely in the eigenstructure of the mean matrix H, which serves as one of our motivations for investigating the precise asymptotic distributions of the empirical eigenvectors and eigenvalues.

2.2. Theoretical setup

We first introduce some notation that will be used throughout the paper. We use ab to represent a/b → 0 as matrix size n increases. We say that an event En holds with significant probability if (En)=1O(nl) for some positive constant l and sufficiently large n. For a matrix A, we use λj(A) to denote the jth largest eigenvalue in magnitude, and ‖AF, ‖A‖, and ‖A to denote the Frobenius norm, the spectral norm, and the matrix entrywise maximum norm, respectively. Denote by Ak the submatrix of A formed by removing the kth column. For any n-dimensional unit vector x = (x1, · · ·, xn)T, let dx = ‖x represent the maximum norm of the vector.

We next introduce a definition that plays a key role in proving all asymptotic normality results in this paper.

Definition 1.

A pair of unit vectors (x, y) of appropriate dimensions is said to satisfy the Wl-CLT condition for some positive integer l if xT(WlEWl)y is asymptotically standard normal after some normalization, where CLT refers to the central limit theorem.

Lemmas 1 and 2 below provide some sufficient conditions under which (x, y) can satisfy the Wl-CLT condition defined in Definition 1 for l = 1 and 2, which is all we need for our technical analysis of asymptotic distributions. In this paper, we apply these lemmas with either x or y equal to vk. Therefore, a sufficient condition for the results in our paper is that ‖vk is small enough.

Lemma 1.

Assume that n-dimensional unit vectors x and y satisfy

xy[var(xTWy)]1/2=sn. (3)

Then xT Wy satisfies the Lyapunov condition for CLT and we have (xTWyExTWy)/snDN(0,1) as n → ∞, which entails that (x, y) satisfies the Wl-CLT condition with l = 1.

To introduce W2-CLT, for any given unit vectors x = (x1, · · ·, xn)T and y = (y1, · · ·, yn)T, we denote respectively sx,y2 and κx,y the mean and variance of the random variable

1k,in,kiσki2[1l<knwil(xkyl+ykxl)+1l<inwkl(xiyl+yixl)+(1δki)Ewii(xiyk+xkyi)]2+21k,in,kiγki(xkyk+xiyi)×[1l<knwil(xkyl+ykxl)+1l<inwkl(xiyl+yixl)+(1δki)Ewii(xiyk+xkyi)]+1k,in,kiκki(xkyk+xiyi)2, (4)

where γki=Ewki3 and κki=E(wki2σki2)2 for ki, γkk=2(Eωkk3σkk2Eωkk), κkk=4E(ωkk2σkk2)2 with ωkk = 2−1wkk, σkk2=Eωkk2, and δki = 1 when k = i and 0 otherwise. It is worth mentioning that the random variable given in (4) coincides with the one defined in (A.7) in Section B.2 of Supplementary Material, which is simply the conditional variance of random variable xT(W2EW2)y given in (A.5) when expressed as a sum of martingale differences with respect to a suitably defined σ-algebra; see Section B.2 for more technical details and the precise expressions for sx,y2 and κx,y given in (A.8) and (A.9), respectively.

Lemma 2.

Assume that n-dimensional unit vectors x and y satisfy xy0,κx,y1/4sx,y, and sx,y → ∞. Then we have [xT(W2EW2)y]/sx,yDN(0,1) as n → ∞, which entails that (x, y) satisfies the W2-CLT condition.

Remark 1.

To provide more insights into the conditions of Lemmas 1 and 2, we discuss the special case of standard Wigner matrix where σij2=p(1p) with p the expected value of entries of X. Then sn2:=var(xTWy)[p(1p),2p(1p)] and condition (3) in Lemma 1 reduces to

xyvar(xWy)~p(1p).

Moreover, (A.13) in the Supplementary Material ensures that Lemma 2 holds under the following sufficient conditions

xy0,n3/2p(1p)x2y20,andp(1p)n. (5)

Thus if eitherx ory is small enough, both lemmas hold. Indeed in this scenario, direct calculations show that sx,y2~np(1p).

We see from Lemmas 1 and 2 that the Wl-CLT condition defined in Definition 1 can indeed be satisfied under some mild regularity conditions. In particular, Definition 1 is important to our technical analysis since to establish the asymptotic normality of the spiked eigenvectors and spiked eigenvalues, we first need to expand the target to the form of xT(WlEWl)y with l some positive integer plus some small order term, and then the asymptotic normality follows naturally if (x, y) satisfies the Wl-CLT condition. To facilitate our technical presentation, let us introduce some further notation. For any t ≠ 0 and given matrices M1 and M2 of appropriate dimensions, we define the function

R(M1,M2,t)=l=0,l1Lt(l+1)M1TEWlM2, (6)

where L is some sufficiently large positive integer that will be specified later in our technical analysis. For each 1 ≤ kK, any given matrices M1 and M2 of appropriate dimensions, and n-dimensional vector u, we further define functions

P(M1,M2,t)=tR(M1,M2,t),P˜k,t=[t2(Avk,k,t/t)]1, (7)
bu,k,t=uVk[(Dk)1+R(Vk,Vk,t)]1R(u,Vk,t)T, (8)

where Dk denotes the submatrix of the diagonal matrix D by removing the kth row and kth column,

Au,k,t=P(u,vk,t)P(u,Vk,t)[t(Dk)1+P(Vk,Vk,t)]1P(Vk,vk,t), (9)

(·)′ denotes the derivative with respect to scalar t or complex variable z throughout the paper, and the rest of notation is the same as introduced before.

3. Asymptotic distributions of spiked eigenvectors

3.1. Technical conditions

To facilitate our technical analysis, we need some basic regularity conditions.

Condition 1.

Assume that αn=E(WEW)21/2 as n → ∞.

Condition 2.

There exists a positive constant c0 < 1 such that min{|di|/|dj| : 1 ≤ i < jK + 1, di ≠ −dj} ≥ 1 + c0. In addition, either of the following two conditions holds:

  1. |dK|=(nϵαn) → ∞ with some small positive constant ϵ,

  2. maxi,jvar(wij)(c12αn2)/n and |dK| > n log n with some constants c1 ≥ 1 and c > 4c1(1 + 2−1c0).

Condition 3.

It holds that |d1| = O(|dK|), |dK|σminn → ∞, vk2/σmin0, αn4vk4/(nσmin2)0, and σmin2n , where σmin={min1i,jn,ijEwij2}1/2.

Conditions 12 are needed in all our Theorems 15 and imposed for our general model (1), including the specific case of sparse models. In contrast, condition 3 is required only for Theorem 3 under some specific models with dense structures such as the stochastic block models with or without overlapping communities.

Condition 1 restricts essentially the sparsity level of the random matrix (e.g., given by a network). Note that it follows easily from max1≤i,jn |wij| ≤ 1 that αnn1/2. It is a rather mild condition that can be satisfied by very sparse networks. For example, if Ew122==Ew1logn2=1/2 and the other w1j’s are equal to zero, then we have αn221logn. Many network models in the literature satisfy this condition; see, for example, Jin et al. (2017), Lei (2016), and Zhang et al. (2015).

Condition 2 requires that the spiked population eigenvalues of the mean matrix H (in the diagonal matrix D) are simple and there is enough gap between the eigenvalues. The constant c0 can be replaced by some o(1) term and our theoretical results can still be proved with more delicate derivations. This requirement ensures that we can obtain higher order expansions of the general linear combination for each empirical eigenvector precisely. Otherwise if there exist some eigenvalues such that di = di+1, then v^i and v^i+1 are generally no longer identifiable so we cannot derive clear asymptotic expansions for them; see also Abbe et al. (2019) for related discussions. Condition 2 also requires a gap between αn and |dK|. Since parameter αn reflects the strength of the noise matrix W, it requires essentially the signal part H to dominate the noise part W with some asymptotic rate. Similar condition is used commonly in the network literature; see, for instance, Abbe et al. (2019) and Jin et al. (2017).

Condition 3 restricts our attention to some specific dense network models. In particular, |d1| = O(|dK|) assumes that the eigenvalues in D share the same order. The other assumptions in Condition 3 require essentially that the minimum variance of the off-diagonal entries of W cannot tend to zero too fast, which is used only to establish a more simplified theory under the more restrictive model; see Theorem 3.

3.2. Asymptotic distributions of spiked eigenvalues

We first present the asymptotic expansions and CLT for the spiked empirical eigenvalues λ1, · · ·, λK. For each 1 ≤ kK, denote by tk the solution to equation

fk(z)=1+dk{R(vk,vk,z)R(vk,Vk,z)[(Dk)1+R(Vk,Vk,z)]1×R(Vk,vk,z)}=0 (10)

when restricted to the interval z ∈ [ak, bk], where

ak={dk/(1+21c0),dk>0(1+21c0)dk,dk<0andbk={(1+21c0)dk,dk>0dk/(1+21c0),dk<0.

The following lemma characterizes the properties of the population quantity tk’s defined in (10): It is unique and the asymptotic mean of λk.

Lemma 3.

Equation (10) has a unique solution in the interval z ∈ [ak, bk] and thus tk’s are well defined. Moreover, for each 1 ≤ kK we have tk/dk → 1 as n → ∞.

It is seen from Lemma 3 that when the matrix size n is large enough, the values of tk and dk are very close to each other. The following theorem establishes the asymptotic expansions and CLT for λk and reveals that tk is in fact its asymptotic mean.

Theorem 1.

Under Conditions 12, for each 1 ≤ kK we have

λktk=vkTWvk+Op(αndk1). (11)

Moreover, if var(vkTWvk)αn2dk2 and the pair of vectors (vk, vk) satisfies the W1-CLT condition, then we have

λktkEvkTWvk[var(vkTWvk)]1/2DN(0,1). (12)

Capitaine et al. (2012) and Knowles and Yin (2014) established the joint distribution of the spiked eigenvalues for the deformed Wigner matrix in different settings than ours. Capitaine et al. (2012) assumed that Ewii2=1/2 and Ewij2=1 for ij, while Knowles and Yin (2014) assumed that Ewij2=1 for all (I,j). Under their model settings, the smallest spiked eigenvalue |dK| and the noise level αn are of the same order, and as a result, their asymptotic distributions depend on the distributions of the Wigner matrix. In contrast, our Theorem 1 is proved in the setting of diverging spikes. Thanks to the stronger signal-to-noise ratio, the noise matrix contributes to the distributions of the spiked eigenvalues in Theorem 1 in a global way, allowing for more heterogeneity in the variances of entries of the noise matrix W.

Theorem 1 requires that (vk, vk) satisfies the W1-CLT condition and var(vkTWvk)αn2dk2. To gain some insights into these two conditions, we will provide some sufficient conditions for such assumptions. Let us consider the specific case of σmin > 0, that is, the generalized Wigner matrix W is nonsparse. We will show that as long as

vk2σmin10andσminαn|dk|1, (13)

the aforementioned two conditions in Theorem 1 hold. We first verify the W1-CLT condition. By Lemma 1, a sufficient condition for (vk, vk) to satisfy the W1-CLT condition is that

vk2[var(vkTWvk)]1/2=[E(vkTWvkEvkTWvk)2]1/2. (14)

Observe that it follows from 1in(vk)i2=vkTvk=1 and 1in(vk)i4vk21 that

[E(vkTWvkEvkTWvk)2]1/2[21i,jn,ijσij2(vk)i2(vk)j2]1/2σmin[21i,jn,ij(vk)i2(vk)j2]1/2=σmin[221in(vk)i4]1/2σmin(22vk2)1/2, (15)

where (vk)i stands for the ith component of vector vk. The assumption vk2σmin10 in (13) together with (15) ensures (14), which consequently entails that (vk, vk) satisfies the W1-CLT condition.

We next check the condition var(vkTWvk)=E(vkTWvkEvkTWvk)2αn2dk2. It follows directly from (15) that this condition holds under (13). In fact, since Condition 2 guarantees that αn/|dk| asymptotically vanishes, the assumption σminαn|dk|1 can be very mild. In particular, for the Wigner matrix W with σij ≡ 1 for all 1 ≤ i, jn, it holds that

E(vkTWvkEvkTWvk)2=2. (16)

Thus the condition of E(vkTWvkEvkTWvk)2αn2dk2 reduces to that of αn2dk21, which is guaranteed to hold under Condition 2.

We also would like to point out that one potential application of the new results in Theorem 1 is determining the number of spiked eigenvalues, which in the network models reduces to determining the number of non-overlapping (or possibly overlapping) communities or clusters.

3.3. Asymptotic distributions of spiked eigenvectors

We now present the asymptotic distributions of the spiked empirical eigenvectors v^k for 1 ≤ kK. To this end, we will first establish the asymptotic expansions and CLT for the bilinear form

xTv^kv^kTy

with 1 ≤ kK, where x,yn are two arbitrary non-random unit vectors. Then by setting y = vk, we can establish the asymptotic expansions and CLT for the general linear combination xTv^k. Although the limiting distribution of the bilinear form xTv^kv^kTy is the theoretical foundation for establishing the limiting distribution of the general linear combination xTv^k, due to the technical complexities we will defer the theorems summarizing the limiting distribution of xTv^kv^kTy to a later technical section (i.e., Section 6), and present only the results for xTv^k in this section. This should not harm the flow of the paper. For readers who are also interested in our technical proofs, they can refer to Section 6 for more technical details; otherwise it is safe to skip that technical section. For each 1 ≤ kK, let us choose the direction of v^k such that vkTv^k0 for the theoretical derivations, which is always possible after a sign change when needed.

Theorem 2.

Under Conditions 12, for each 1 ≤ kK we have the following properties:

  1. If the unit vector u satisfies that |uT vk| ∈ [0, 1) and αn2dk2var[(bu,k,tkTuTvkvkT)Wvk], then it holds that
    tk(uTv^k+Au,k,tkP˜k,tk1/2)=(bu,k,tkTuTvkvkT)Wvk+op({var[(bu,k,tkTuTvkvkT)Wvk]}1/2), (17)
    where the asymptotic mean has the expansion Au,k,tkP˜k,tk1/2=uTvk+O(αn2dk2). Furthermore, if (bu,k,tkvkvkTu,vk) satisfies the W1-CLT condition, then it holds that
    tk(uTv^k+Au,k,tkP˜k,tk1/2)E[(bu,k,tkTuTvkvkT)Wvk]{var[(bu,k,tkTuTvkvkT)Wvk]}1/2DN(0,1).
  2. If (αn4dk2+1)var(vkTW2vk), then it holds that
    2tk2(vkTv^k+Avk,k,tkP˜k,tk1/2)=vkT(W2EW2)vk+op{[var(vkTW2vk)]1/2}, (18)
    where the asymptotic mean has the expansion Avk,k,tkP˜k,tk1/2=1+21tk2vkTEW2vk+O(αn3dk3). Furthermore, if (vk, vk) satisfies the W2-CLT condition, then it holds that
    2tk2(vkTv^k+Avk,k,tkP˜k,tk1/2)[var(vkTW2vk)]1/2DN(0,1).

The two parts of Theorem 2 correspond to two different cases when var(uTv^k) can be of different magnitude. To understand this, note that for large enough matrix size n, we have |tK| ≫ αn by Condition 2 and Lemma 3. In view of (18), the asymptotic variance of vkTv^k is equal to var(21tk2vkTW2vk). In contrast, in light of (17), the asymptotic variance of uTv^k with |uT vk| ∈ [0, 1) is equal to var[tk1(bu,k,tkTuTvkvkT)Wvk]. Let us consider a specific case when var[(bu,k,tkTuTvkvkT)Wvk]~1. By Lemma 4 in Section 6, we have

var(21tk2vkTW2vk)=O(αn2tk4)var[tk1(bu,k,tkTuTvkvkT)Wvk]=O(tk2).

This shows that the above two cases can be very different in the magnitude for the asymptotic variance of uTv^k and thus should be analyzed separately.

To gain some insights into why vkTv^k has smaller variance, let us consider the simple case of K = 1. Then in view of our technical arguments, it holds that

v1Tv^1v^1Tv1=12πiΩ1v1T(WzI)1v11+d1v1T(WzI)1v1dz=12πiΩ11[v1T(WzI)1v1]1+d1dz, (19)

where i=1 is associated with the complex integrals represents the imaginary unit and the line integrals are taken over the contour Ω1 that is centered at (a1 + b1)/2 with radius c0|d1|/2. Then we can see that the population eigenvalue d1 is enclosed by the contour Ω1. By the Taylor expansion, we can show that with significant probability,

[v1T(WzI)1v1]1=zv1TWv1+O(|z|1αn2logn).

Substituting the above expansion into (19) results in

12πiΩ11[v1T(WzI)1v1]1+d1dz=12πiΩ11d1zv1TWv1+O(|z|1αn2logn)dz=12πiΩ11d1z12πiΩ1v1TWv1(d1z)2dz+O(d12αn2logn)=12πiΩ11d1z+O(d12αn2logn) (20)

with significant probability. Thus the asymptotic distribution of v1Tv^1v^1Tv1 is determined by Op(d12αn2logn), which has no contribution from v1TWv1. On the other hand, our technical analysis for uTv^1v^1v1 (which is much more complicated and can be found in the technical proofs section) reveals that the dominating term is uTWv1 when uv1 or −v1. This explains why we need to treat differently the two cases of u close to or far away from v1.

3.4. A more specific structure and an application

Theorem 2 in Section 3.3 provides some general sufficient conditions to ensure the asymptotic normality for the spiked empirical eigenvectors. Under some simplified but stronger assumptions in Condition 3, the same results on the empirical eigenvectors and eigenvalues continue to hold. Note that the stochastic block models with non-overlapping or overlapping communities can both be included as specific cases of our theoretical analysis. As mentioned before, we choose the direction of v^k such that vkTv^k0 for each 1 ≤ kK.

Theorem 3.

Under Conditions 13, for each 1 ≤ kK we have the following properties:

  1. (Eigenvalues) It holds that
    λktkEvkTWvk[E(vkTWvkEvkTWvk)2]1/2DN(0,1).
  2. (Eigenvectors) If the unit vector u satisfies that σmin1vk(bu,k,tkTuTvkvkT)0 and |uTvk|[0,1ϵ] for some positive constant, then it holds that
    tk(uTv^k+Au,k,tkP˜k,tk1/2)E[(bu,k,tkTuTvkvkT)Wvk]{var[(bu,k,tkTuTvkvkT)Wvk]}1/2DN(0,1). (21)
    Moreover, it also holds that
    2tk2(vkTv^k+Avk,k,tkP˜k,tk1/2)[var(vkTW2vk)]1/2DN(0,1). (22)

Theorem 2 also gives the asymptotic expansions for the asymptotic mean term Au,k,tkP˜k,tk1/2. It is seen that if |dK| diverges to infinity much faster than αn2, then the O(·) terms in the asymptotic expansions of the mean become smaller order terms and thus the following corollary follows immediately from Theorem 3.

Corollary 1.

Assume that Conditions 13 hold. For each 1 ≤ kK, if the unit vector u satisfies that |uTvk| ∈ [0, 1 − ϵ] for some positive constant ϵ and αn4dk2var[(bu,k,tkTuTvkvkT)Wvk], then we have

tk(uTv^kuTvk)E[(bu,k,tkTuTvkvkT)Wvk]{var[(bu,k,tkTuTvkvkT)Wvk]}1/2DN(0,1). (23)

Moreover, if αn6dk3var(vkTW2vk) then we have

2tk2(vkTv^k1)+vkTEW2vk[var(vkTW2vk)]1/2DN(0,1). (24)

Theorem 3 includes the stochastic block model as a specific case. If X is the affinity matrix from a stochastic block model with K non-overlapping communities and the size of each community is of the same order O(n), then it holds that ‖vk = O(n−1/2), dK = O(n), αnn1/2, and αnvkO(1). Thus Condition 3 can be satisfied as long as σminn−¼, leading to the asymptotic normalities in Theorem 3.

Our Theorem 3 also covers the stochastic block models with overlapping communities. For example, the following network model was considered in Zhang et al. (2015)

EX=ΘΠPΠTΘT, (25)

where Θ is an n × n diagonal degree heterogeneity matrix, Π is an n × K community membership matrix, and P is a K × K nonsingular irreducible matrix with unit diagonal entries. Observe that the above model has low-rank mean matrix and thus can be connected to our general form of eigendecomposition EX=H=VDVT. If the spiked eigenvalues and spiked eigenvectors satisfy that |dk| = O(n) and ‖vk = O(n−1/2) for all 1 ≤ kK, then Condition 3 can be satisfied when σminn−¼. Consequently, the asymptotic normalities in Theorem 3 can hold.

3.5. Proofs architecture

The key mathematical tools are from complex analysis and random matrix theory. At a high level, our technical proofs consist of four steps. First, we apply Cauchy’s residue theorem to represent the desired bilinear form xTv^kv^kTy with 1 ≤ kK as a complex integral over a contour for a functional of the Green function associated with the original random matrix X = H + W. It is worth mentioning that such an approach was used before to study the asymptotic distributions for linear combinations of eigenvectors in the setting of covariance matrix estimation for the case of i.i.d. Gaussian random matrix coupled with linear dependency. Second, we reduce the problem to one that involves a functional of the new Green function associated with only the noise part W by extracting the spiked part. Such a step enables us to conduct precise high order asymptotic expansions. Third, we conduct delicate high order Taylor expansions for the noise part using new Green function corresponding to the noise part. In this step, we apply the asymptotic expansion directly to the evaluated complex integral over the contour instead of an expansion of the integrand. Such a new way of asymptotic expansion is crucial to our study. Fourth, we bound the variance of xT(WlEWl)y using delicate random matrix techniques. In contrast to just counting the number of certain paths in a graph used in classical random matrix theory literature, we need to carefully bound the individual contributions toward the quantity αn=E(WEW)21/2; otherwise simple counting leads to rather loose upper bound.

3.6. Comparisons with the statistics literature

In a related work, Tang and Priebe (2018) established the CLT for the entries of eigenvectors of a random adjacency matrix. Our work differs significantly from theirs in at least four important aspects. First, Tang and Priebe (2018) assumed a prior distribution on the mean adjacency matrix, while we assume a deterministic mean matrix. As a result, the asymptotic variance in Tang and Priebe (2018) is determined by the prior distribution and is the same for each entry of an eigenvector, while in our paper the CLT for different entries of an eigenvector can be different and the asymptotic variance depends on all entries of the eigenvector. While Tang and Priebe (2018) also provided the conditional CLT under the setting of the stochastic block model, their result conditions on just one node. Second, our model is much more general than that in Tang and Priebe (2018) in that the spiked eigenvalues can have different orders and different signs. Third, we establish the CLT for the general linear combinations of the components of normalized eigenvectors and the CLT for eigenvalues, while Tang and Priebe (2018) proved the CLT for the rows of Λ1/2V^T, where ΛK×K is the diagonal matrix formed by K spiked eigenvalues of the adjacency matrix and V^=(v^1,,v^K) is the matrix collecting the corresponding eigenvectors of the adjacency matrix. Fourth, through a dedicated analysis of the higher order expansion for the general linear combination uTv^k, we uncover an interesting phase transition phenomenon that the limiting distribution of uTv^k is different when the deterministic weight vector u is close to or far away from vk (modulo the sign), which is new to the literature.

Wang and Fan (2017) proved the asymptotic distribution of the linear form viTv^k with 1 ≤ i, kK, where vi’s and v^k‘s are the spiked population and empirical eigenvectors for some covariance matrix, respectively. Their asymptotic normality results cover the case of v1Tv^1 when K = 1, and viTv^k for 1 ≤ i, kK with ik when K ≥ 2. Similarly, Koltchinskii and Lounici (2016) considered the sample covariance matrix under the Gaussian distribution assumption, and derived the asymptotic expansion of the bilinear form xTv^kv^kTy, where x, y are two deterministic unit vectors. They also obtained the asymptotic distribution of xTv^k. Different from Wang and Fan (2017) and Koltchinskii and Lounici (2016), in this paper we establish the asymptotic distribution for the general linear combination uTv^k for the large structured symmetric random matrix from model (1) under fairly weak regularity conditions. Our proof techniques differ from those in Wang and Fan (2017) and Koltchinskii and Lounici (2016), and are also distinct from most of existing ones in the literature.

4. Statistical applications

The new asymptotic expansions and asymptotic distributions of spiked eigenvectors and eigenvalues established in Section 3 have many natural statistical applications. Next we discuss three specific ones. See also Fan et al. (2019) for another application on testing the node membership profiles in network models.

4.1. Detecting the existence of clustering power

One potential application of Theorem 3 is to improve the results on community detection under model setting (25). Spectral methods have been used popularly in the literature for recovering the memberships of nodes in network models. For example, applying the K-means clustering algorithm to the K spiked eigenvectors calculated from the adjacency matrix has been a prevalent method for inferring the memberships of nodes. However, it may not be true that all these K eigenvectors are useful for clustering. For example, if eigenvector vk=1/n, then it has zero clustering power and should be dropped in the K-means clustering algorithm. This is especially important in large networks because including a useless high-dimensional eigenvector may significantly increase the noise in clustering. Theorem 3 suggests that we can test the hypothesis H0:vk=1/n using the test statistic v^kT1. Then with the aid of Theorem 3, the asymptotic null distribution can be established and the critical value can be calculated. This naturally suggests a method for selecting important eigenvectors in community detection.

4.2. Detecting the existence of denser subgraph

Another application of Theorem 3 is to detect the existence of a denser community in a given random graph, the same problem as studied in Arias-Castro et al. (2014) and Verzelen et al. (2015). Specifically, assume that the data matrix X = (xij) is a symmetric adjacency matrix with independent Bernoulli entries on and above the diagonal. Let H=E[X] be the mean adjacency matrix. Consider the following null and alternative hypotheses

H0:H=p11Tvs.H1:H=p11T+(qp)llT,

where is the vector with the first n1 entries being 1 and all remaining entries being 0, and q ∈ (p, 1]. It can be seen that under the alternative hypothesis, there is a denser subgraph and q measures the connectivity of nodes within it. Arias-Castro et al. (2014) and Verzelen et al. (2015) proposed tests for the above hypothesis in the setting of n1 = o(n). We focus on the same setting and in addition assume that n−1p < q and q ~ p. We next discuss how to exploit our Theorem 3 to test the same hypothesis.

Under the null hypothesis, a natural estimator of p is given by p^=1n(n1)1ijnxij. Moreover, direct calculations show that

v1TEW2v1=np(1p)andvar(v1TW2v1)=p(1p)[2(n1)+p3+(1p)3]. (26)

Thus the mean and variance of v1TW2v1 in (26) can be estimated as

np^(1p^)andp^(1p^)[2(n1)+p^3+(1p^)3], (27)

receptively. In view of (24) in Corollary 1, since v1 = n−1/21 under the null hypothesis H0 : H = p11T, a natural test statistic for testing H0 : H = p11T is given by

Tn=2λ12(n1/21Tv^11)+np^(1p^)[p^(1p^)[2(n1)+p^3+(1p^)3]]1/2.

It can be seen that since λ1t1 (see Lemma 3), the asymptotic null distribution of Tn is expected to be N(0, 1) by resorting to (24) in Corollary 1. On the other hand, under the alternative hypothesis, since the leading eigenvector differs from n−1/21, the term n1/21Tv^11 in the numerator of Tn is expected to take some negative value, and thus Tn is expected to have different asymptotic behavior than N(0, 1). In fact, we provide the proof sketch in Section D.5 of Supplementary Material on the asymptotic null and alternative distributions. In particular, we show that the asymptotic null distribution of Tn is N(0, 1), and if n12(qp)2np+n12(qp)n1, then Tn → −∞ with asymptotic probability one under the alternative hypothesis.

4.3. Rank inference

Our theory can also be applied to statistical testing on the true rank K of the mean matrix H. Rank inference is an important problem in many high-dimensional network applications. See, for example, Lei (2016), Chen and Lei (2018), and Li et al. (2020), and the importance of the problem discussed therein. Consider the following hypotheses

H0:K=K0vs.H1:K>K0,

where K0 is some prespecified positive integer satisfying K0K. Define

w^ij=xijk=1K0λkeiTv^kv^kTej=wijk=1K[λkeiTv^kv^kTejdkeiTvkvkTej]+k=K0+1KλkeiTv^kv^kTej. (28)

Under the null hypothesis H0 : K = K0, the last term in (28) disappears and we can obtain the asympttoic expansion of w^ij around wij explicitly by an application of Theorems 1 and 2. Then under some additional regularity conditions, it is expected that w^ij is close to wij. By the independence of wii, i = 1, · · ·, n, it holds that

i=1nwiii=1nwii2DN(0,1)asn

Since w^iiwii under the null hypothesis, the following asymptotic distribution is expected to hold as well

Tn:=i=1nw^iii=1nw^ii2DN(0,1). (29)

This naturally suggests a statistical test based on statistic Tn for testing H0 : K = K0. Under the alternative hypothesis, since w^ij contains the smallest KK0 spiked eigenvalues and the corresponding eigenvectors, its asymptotic behavior is expected to be different, and consequently, the test can have nontrivial power. In fact, a more sophisticated version of this test constructed based on the off-diagonal entries of W^ was investigated recently in Han et al. (2019).

The above asymptotic distribution can also be used to construct confidence intervals for the rank K. To understand this, note that Tn defined in (29) is a function of K0. Thus an immediate idea for the 100(1–α)% confidence interval construction is to identify all K0 such that the corresponding Tn falls into the range of [−Φ−1(1–α),Φ–1(1–α)], where Φ–1(·) is the inverse distribution function of the standard normal. Similar ideas can also be exploited to construct confidence intervals for other parameters in network models.

5. Simulation studies

In this section, we use simulation studies to verify the validity of our theoretical results. We consider the stochastic block model with K = 2 communities. Assume that the number of nodes is n, the first n/2 nodes belong to the first community, and the rest belong to the second one. Then the adjacency matrix X has the mean structure EX=H=ARAT, where R is a 2 × 2 matrix of the connectivity probabilities, and A=(a1,a2)n×2 with a1 = n−1/2(1T, 0T )T and a2 = n−1/2(0T, 1T )T, where 0,1n/2 are vectors of zeros and ones, respectively. It is worth mentioning that ARAT is not the eigendecomposition of the mean matrix H, which is why we use different notation than that in model (1).

For the connectivity probability matrix R, we consider the structure

R=r(2112),

where parameter r takes 6 different values 0.02, 0.05, 0.1, 0.2, 0.3, and 0.4. A similar model was considered in Abbe et al. (2019) and Lei (2016). For the connectivity matrix X, we simulate its entries on and above the diagonal as independent Bernoulli random variables with means given by the corresponding entries in the mean matrix H, and set the entries below the diagonal to be the same as the corresponding ones above the diagonal. We choose the number of nodes as n = 3000 and repeat the simulations for 10, 000 times.

To verify our theoretical results, for each simulated connectivity matrix X we calculate its eigenvalues and corresponding eigenvectors. For the eigenvalues, we compare the empirical distribution of

λktk[var(vkTWvk)]1/2 (30)

with the standard normal distribution, where tk is the solution to equation (10). The exact expression of R(vk,Vk,z)[(Dk)1+R(Vk,Vk,z)]1R(Vk,vk,z) in (10) is complicated. Since this term is much smaller than R(vk,vk,z), we can calculate an approximation of tk by solving the equation

1+dkR(vk,vk,z)=0 (31)

using the Newton–Raphson method. Guided by the theoretical derivations, we use L = 4 in the asymptotic expansion of R(x,y,t) in (6) for all of our simulation examples. Tables 12 summarize the means and standard deviations of (30) with k = 1 and 2 calculated from the 10,000 repetitions as well as the p-values from the Anderson–Darling (AD) test for the normality. Figure 1 presents the histograms of the normalized first and second eigenvalues (i.e., (30) when k = 1 and 2) from the 10,000 repetitions.

Table 1:

Simulation results for (λ1t1)/[var(v1TWv1)]1/2

r 0.02 0.05 0.1 0.2 0.3 0.4

Mean 0.0719 0.0149 −0.0068 −0.0080 −0.0024 0.0124
Standard deviation 1.0107 1.0085 0.9927 1.0115 1.0023 1.0125
AD.p-value 0.0725 0.5387 0.6263 0.2342 0.9243 0.2010

Table 2:

Simulation results for (λ2t2)/[var(v2TWv2)]1/2

r 0.02 0.05 0.1 0.2 0.3 0.4

Mean 1.0761 0.2552 0.0681 0.0272 0.0093 0.0052
Standard deviation 0.9630 0.9820 0.9872 1.0100 1.0057 1.0005
AD p-value 0.5349 0.6722 0.8406 0.1806 0.0535 0.8341

Figure 1:

Figure 1:

Histograms of the normalized eigenvalues (30) when r = 0.4, with the blue curves representing the standard normal density. Left panel: the first eigenvalue; right panel: the second eigenvalue.

For the eigenvectors, we evaluate the asymptotic normality of the linear combination uTv^k with k = 1 and 2. We experiment with three different values for u: a1, (1, 0, · · ·, 0)T, and vk. When u = a1 or (1, 0, · · ·, 0)T, we calculate the normalized statistic

tk(uTv^k+Au,k,tkP˜k,tk1/2){var[(bu,k,tkTuTvkvkT)Wvk]}1/2

using the 10,000 simulated data sets, while when u = vk we calculate the normalized statistic

2tk2(vkTv^k+Avk,k,tkP˜k,tk1/2)[var(vkW2vk)]1/2

instead. In either of the two cases above, the variance in the denominator is calculated as the sample variance from 2,000 simulated independent copies of the noise matrix W. We compare the empirical distributions of the above two normalized statistics with the standard normal distribution. The simulation results are summarized in Tables 38 and Figures 23.

Table 3:

Simulation results for uTv^1> with u = a1

r 0.02 0.05 0.1 0.2 0.3 0.4

Mean −0.0573 −0.0140 −0.0023 −0.0045 −0.0071 −0.0069
Standard deviation 1.0335 1.0244 1.0011 1.0001 1.0214 1.0016
AD.p-value 0.7879 0.4012 0.2417 0.5300 0.9482 0.9935

Table 8:

Simulation results for uTv^2 with u = (1, 0, ···, 0)T

r 0.02 0.05 0.1 0.2 0.3 0.4

Mean 0.0622 0.0204 0.0018 −0.0074 −0.0119 −0.0049
Standard deviation 1.1221 1.0537 1.0272 1.0022 1.0088 0.9933
AD p-value 0.0003 0.5853 0.0930 0.6011 0.2423 0.4385

Figure 2:

Figure 2:

Histograms corresponding to the first eigenvector v^1 when r = 0.4, with the blue curves representing the standard normal density. Left panel:u1Tv^2; middle panel:v2Tv^2; right panel:u3Tv^1, where u1 = a1 and u3 = (1,0, · · ·, 0)T.

Figure 3:

Figure 3:

Histograms corresponding to the second eigenvector v^2 when r = 0.4, with the blue curves representing the standard normal density. Left panel:u1Tv^2; middle panel:v2Tv^2; right panel:u3Tv^1, where u1 = a1 and u3 = (1,0, · · ·, 0)T.

Our simulation results in Figure 1 and Tables 12 suggest that the normalized spiked eigenvalues have distributions very close to standard Gaussian which supports our results in Theorem 1. Indeed, such a large p-value is extremely impressive given the “sample size” (the number of simulations is 10,000). In general, the simulation results for the eigenvectors support our theoretical findings in Section 3. However, the results corresponding to the first spiked eigenvector v^1 (Tables 35) are better than those for the second spiked eigenvector v^2 (Tables 48). This is reasonable since for the larger spiked eigenvalue, the negligible terms that we dropped in the proofs of the asymptotic normality become relatively smaller and thus have smaller finite-sample effects on the asymptotic distributions. For the linear form uTv^k, when u = vk the convergence to standard normal is slower when compared to the case of uvk. This again supports our theoretical findings in Section 3 and explains why we need to separate the cases of u = vk and uvk. Such effect is especially prominent for v2Tv^2, whose sample mean is −11.8020 when r = 0.02 as shown in Table 7. However, it is seen from the same table (and other tables) that as the spiked eigenvalue increases with r, the distribution gets closer and closer to standard Gaussian.

Table 5:

Simulation results for uTv^1 with u = (1, 0, ···, 0)T

r 0.02 0.05 0.1 0.2 0.3 0.4

Mean 0.0025 0.0021 0.0003 0.0105 0.0061 −0.0122
Standard deviation 1.0432 1.0354 0.9871 1.0016 1.0205 0.9898
AD.p-value 0.0044 0.4877 0.3752 0.1514 0.1304 0.3400

Table 4:

Simulation results for v1Tv^1

r 0.02 0.05 0.1 0.2 0.3 0.4

Mean −1.3288 −0.4817 −0.1900 −0.0742 −0.0409 −0.0186
Standard deviation 1.0940 1.0545 1.0338 0.9749 1.0030 1.0005
AD.p-value 0.0582 0.4251 0.0251 0.0225 0.3312 0.2912

Table 7:

Simulation results for v2Tv^2

r 0.02 0.05 0.1 0.2 0.3 0.4

Mean −11.8020 −4.3274 −2.0057 −0.7447 −0.3526 −0.1650
Standard deviation 1.3775 1.1192 1.0980 1.0343 1.0104 1.0089
AD p-value 0.0000 0.0011 0.0422 0.3964 0.4980 0.1186

6. A more general asymptotic theory

As mentioned before, the asymptotic theory on the spiked eigenvectors in terms of the general linear combination and on the spiked eigenvalues presented in Section 3 is in fact a consequence of a more general asymptotic theory on the spiked eigenvectors in terms of the bilinear form. In this section, we focus our attention on such a more general asymptotic theory for the bilinear form xTv^kv^kTy with 1 ≤ kK, where x and y are two arbitrary n-dimensional unit vectors. See Sections 3.5 and 3.6 for detailed discussions on the technical innovations of our novel ATE theoretical framework and comparisons with the existing literature on the asymptotic distributions of eigenvectors.

For technical reasons, we will break our main results on the asymptotic distributions of the bilinear form xTv^kv^kTy down to two theorems, where we consider in Theorem 4 the case when either vector x or vector y is sufficiently further away from the population eigenvector vk, and then we study in Theorem 5 the case when both vectors x and y are very close to vk The technical treatments for these two cases are different since in the latter scenario, the first order term which determines the asymptotic distribution in Theorem 4 vanishes, and thus we need to consider higher order expansions to obtain the asymptotic distribution in Theorem 5. Let Jx,y,k,tk, Lx,y,k,tk, and QX,y,k,tk be the three rank one matrices given in (113)(115), respectively, in the proof of Theorem 5 in Section A.6. Denote by σk2=var[tr(WJx,y,k,tk)] and

σ˜k2=var{tr[WJx,y,k,tk(W2EW2)Lx,y,k,tk]+tr(WvkvkT)tr(WQx,y,k,tk)}. (32)

Both of the quantities above play an important role in our more general asymptotic theory.

Theorem 4.

Assume that Conditions 12 hold and x and y are two n-dimensional unit vectors. Then for each 1 ≤ kK, if σk2tk4αn2(|Ax,k,tk|+|Ay,k,tk|)2+tk4 we have the asymptotic expansion

xTv^kv^kTy=ak+tr(WJx,y,k,tk)+Op{tk2αn(|Ax,k,tk|+|Ay,k,tk|)+tk2}, (33)

where the quantity ak=Ax,k,tkAy,k,tkP˜k,tk.

The assumption of σk2tk4αn2(|Ax,k,tk|+|Ay,k,tk|)2+tk4 in Theorem 4 requires the variance of random variable tr(WJx,y,k,tk) not too small, which at high level, requires that either vector x or y is sufficiently faraway from the population eigenvector vk. If σij ~ 1 for each (i, j) pair, then such an assumption restricts essentially that JX,y,k,tk should not be too close to zero. This in turn ensures that the first order expansion is sufficient for deriving the asymptotic normality of xTv^kv^kTy. Theorem 4 also entails that a simple upper bound for σ˜k as defined in (32) can be shown to be O(tk2αn).

Theorem 5.

Assume that Conditions 12 hold and x and y are two n-dimensional unit vectors. Then for each 1 ≤ kK, if σk2=O(σ˜k2) and σ˜k2tk6αn4(|Ax,k,tk|+|Ay,k,tk|)2+tk6 we have the asymptotic expansion

xTv^kv^kTy=ak+tr[WJx,y,k,tk(W2EW2)Lx,y,k,tk]+tr(WvkvkT)tr(WQx,y,k,tk)+Op{|tk|3αn2(|Ax,k,tk|+|Ay,k,tk|)+|tk|3}, (34)

where the quantity ak is given in (33).

The ATE theoretical framework for the more general asymptotic theory established in Theorems 4 and 5 is empowered by the following two technical lemmas.

Lemma 4.

For any n-dimensional unit vectors x and y, we have

xT(WlEWl)y=Op(min{αnl1,dXαnl,dyαnl}) (35)

with l ≥ 1 some bounded positive integer and dx = ‖x.

Lemma 5.

For any n-dimensional unit vectors x and y, we have ExTWly=O(1) and

ExTWly=O(αnl) (36)

with l ≥ 2 some bounded positive integer.

The detailed proofs of Lemmas 4 and 5 are provided in Sections B.5 and B.6 of Supplementary Material. Our delicate technical arguments therein establish useful refinements to the classical idea of counting the number of nonzero terms from the random matrix theory. In particular, Lemma 4 is the key building block for high order Taylor expansions that involve polynomials of quantities in the lemma with different choices of (x, y, l).

7. Discussions

In contrast to the immense literature on the asymptotic distributions for eigenvalues of large spiked random matrices, the counterpart asymptotic theory for eigenvectors has remained largely underdeveloped in statistics literature for years. Yet such a theory is much desired for understanding the precise asymptotic properties of various statistical and machine learning algorithms that build upon the spectral information of the eigenspace constructed from observed data matrix. Our work in this paper provides a first attempt with a general ATE theoretical framework for underpinning the precise asymptotic expansions and asymptotic distributions for spiked eigenvectors and spiked eigenvalues of large spiked random matrices with diverging spikes. Our results complement existing ones in the RMT literature as well as the networks literature.

The family of models in our ATE framework includes many popularly used ones for large-scale applications including network analysis and text analysis such as the stochastic block models with or without overlapping communities and the topic models. Our general asymptotic theory for eigenvectors can be exploited to develop new useful tools for precise statistical inference in these applications. It would be interesting to investigate the problem of reproducible large-scale inference as in Barber and Candès (2015); Candès et al. (2018); Lu et al. (2018); Fan et al. (2019); Fan et al. (2019) in these model settings. It would also be interesting to develop a general method to determine the rank and provide robust rank inference in such high-dimensional low-rank models. These extensions are beyond the scope of the current paper and will be interesting topics for future research.

Supplementary Material

Supplementary containing appendix

Table 6:

Simulation results for uTv^2 with u = a1

r 0.02 0.05 0.1 0.2 0.3 0.4

Mean 4.2611 1.0129 0.3067 0.0745 0.0219 0.0037
Standard deviation 1.2384 1.0952 1.0294 1.0098 1.0280 1.0044
AD p-value 0.3829 0.7535 0.3759 0.4105 0.9129 0.9873

Acknowledgments

This work was supported by NIH grants R01-GM072611-14 and 1R01GM131407-01, NSF grants DMS-1662139, DMS-1712591, and DMS-1953356, NSF CAREER Award DMS-1150318, a grant from the Simons Foundation, and Adobe Data Science Research Award. The authors sincerely thank the Joint Editor, Associate Editor, and referees for their valuable comments that helped improve the paper substantially.

A. Proofs of main results

Recall that Condition 2 involves two scenarios of the spike strength. We will first prove all the results under scenario i). Then in Section D of Supplementary Material, we will adapt the proofs to show that the same results also hold under scenario ii). We provide the proofs of Theorems 15 and Corollary 1 in this appendix. Additional technical details including the proofs of all the lemmas and further discussions on when the asymptotic normality can hold for the asymptotic expansion in Theorem 5 are contained in the Supplementary Material.

A.1. Proof of Theorem 1

The results on the asymptotic distributions of spiked eigenvalues in Theorem 1 are in fact a consequence of those on the asymptotic expansions and asymptotic distributions for the spiked eigenvectors, where a more general asymptotic theory of the eigenvectors is presented in Theorems 45 in Section 6. Let us define a matrix-valued function that is referred to as the Green function associated with only the noise part W

G(z)=(WzI)1 (37)

for z in the complex plane , where I stands for the identity matrix of size n. Recall that λ1, · · ·, λn are the eigenvalues of matrix X and v^1,,v^n are the corresponding eigenvectors. By Weyl’s inequality, it holds that max|λidi| ≤ ‖W‖. Thus, in view of Condition 2 and Lemma 6 in Supplementary Material, all the spiked eigenvalues λk with 1 ≤ kK of the observed random matrix X have magnitudes of larger order than the eigenvalues of the noise matrix W with significant probability as the matrix size n increases. This entails that with significant probability, matrices G(λk) with 1 ≤ kK are well defined and nonsingular. For the rest of this proof, we restrict all the derivations on such an event that holds with asymptotic probability one.

It follows from the definition of the eigenvalue, the representation X = H+W = VDVT + W, (37), and the properties of the determinant function det(·) that for each 1 ≤ kK,

0=det(XλkI)=det(WλkI+VDVT)=det[G1(λk)+VDVT]=det[G1(λk)]det[I+G(λk)VDVT],

which leads to det[I + G(λk)VDVT ] = 0 since det[G−1(λk)] = det[G(λk)]−1 is nonzero. Using the identity det(I + AB) = det(I + BA) for matrices A and B, we obtain for each 1 ≤ kK,

0=det[I+G(λk)VDVT]=det[I+DVTG(λk)V], (38)

where the second I represents an identity matrix of size K and we slightly abuse the notation for simplicity. Since the diagonal matrix D is nonsingular by assumption, it follows from (38) that

det[dkVTG(λk)V+dkD1]=dkdet(D1)det[I+DVTG(λk)V]=0 (39)

for each 1 ≤ kK.

By the asymptotic expansions in (79), Lemmas 4 and 5, and Weyl’s inequality max|λkdk| ≤ ‖W‖, we have for j ≠ ℓ, dkvjTG(λk)vl=dkOp(λk2)=Op(1/|dk|). Thus, we can see that all off diagonal entries of matrix dkVT G(λk)V+dkD−1 in (39) are of order Op(1/|dk|). For jk, the jth diagonal entry of dkVT G(λk)V + dkD−1 equals dkvjTG(λk)vj+dk/dj. By (78) and Lemma 4, we have dkvjTG(λk)vj+1=op(1). Moreover, by Condition 2 |dk/dj – 1| ≥ c for some positive constant c. Hence, all these diagonal entries but the kth one are of order at least Op(1). Thus the matrix (dkviTG(λk)vj+δijdk/di)1i,jK,i,jk is invertible with significant probability, where δij = 1 when i = j and 0 otherwise. Recall the determinant identity for block matrices from linear algebra

det(A11A12A21A22)=det(A22)det(A11A12A221A21)

when the lower right block matrix A22 is nonsingular. Treating the kth diagonal entry of dkVT G(λk)V + dkD−1 as the first block, we have with significant probability

det[dkVTG(λk)V+dkD1]=0 (40)

entailing dkvkTG(λk)vk+1=dkvkTFk(λk)vk, where Fk(z)=G(z)Vk[Dk1+VkTG(z)Vk]1VkTG(z) and Ak denotes the submatrix of matrix A by removing the kth column. In light of (40) and the solution t^k to equation (94) in the proof of Theorem 4 in Section A.5, it holds from the uniqueness of t^k that

λk=t^k. (41)

Therefore, combining equality (41) with asymptotic expansion t^ktk=vkTWvk+Op(αn/tk) obtained in (99) completes the proof of Theorem 1.

A.2. Proof of Theorem 2

The results on the asymptotic distributions of spiked eigenvectors in Theorem 2 are also an implication of a more general asymptotic theory of the eigenvectors presented in Theorems 45 in Section 6 on the delicate asymptotic expansions and asymptotic distributions for the spiked eigenvectors. Recall that V^=(v^1,,v^K) with v^k for 1 ≤ kK the empirical spiked eigenvectors of the observed random matrix X. Without loss of generality, let us choose the direction of eigenvector v^k such that v^kTvk0. Clearly, fixing the direction of v^k does not affect the distribution of xTv^kv^kTy; that is, its distribution stays the same when v^k is chosen as the eigenvector. We will separately consider the two cases of vkTv^k and uTv^k with uvk, where the former relies on the second order expansion given in (111) in the proof of Theorem 5 in Section A.6, and the latter utilizes the first order expansion given in (107) in the proof of Theorem 4 in Section A.5.

We first consider vkTv^k. Choosing x = y = vk in Theorem 4 gives ak=Avk,k,tk2P˜k,tk. By Lemma 5, it holds that

P(vk,vk,tk)=l=0,l2LvkTEWlvktkl=1+O(αn2/tk2) (42)

and

P(vk,Vk,tk)=l=0,l2LvkTEWlVktkl=O(αn2/tk2). (43)

Moreover, recalling the definition of Au,k,tk in (9), Au,k,tk can be rewritten as

AVk,k,tk=P(vk,vk,t)tk1P(vk,Vk,tk)(Dk1+R(Vk,Vk,tk))1P(Vk,vk,tk).

Therefore, by (42)(43), (A.16), and (91), we have

Avk,k,tk=1+O(αn2/tk2)andP˜k,tk=1+O(αn2/tk2). (44)

Now recall the second order expansion of xTv^kv^ky given in (111) in the proof of Theorem 5. We next calculate the orders of each term in the expansion (111). First, we consider bvk,k,tkT. By (43), (A.16), and the definition (7), we have

bvk,k,tkTvkT=R(vk,Vk,tk)((Dk)1+R(Vk,Vk,tk))1VkT=O(αn2/tk2). (45)

This together with (44) entails that

bvk,k,tkT+Avk,k,tkP˜k,tkvkT=O(αn2/tk2). (46)

It follows from Lemma 4 and (44) that

Avk,k,tkP˜k,tk(bvk,k,tkT+Avk,k,tkP˜k,tkvkT)Wvk/tk=Op(αn2/|tk|3),
P˜k,tktk2[2P˜k,tk(Avk,k,tkbvk,k,tkT+Avk,k,tkbvk,k,tkT)WvkvkT+bvk,k,tkTWvkbvk,k,tkT]Wvk+2Avk,k,tkAy,k,tktk2(vkTWvk)2+Avk,k,tkP˜k,tk{tk2xTWvkvkTWvktk2vkTWvkR(vk,Vk,t)×[Dk1+R(Vk,Vk,tk)]1VkTWvk}+Avk,k,tkP˜k,tk{tk2vkTWvkvkTWvktk2vkTWvkR(vk,Vk,t)×[Dk1+R(Vk,Vk,tk)]1VkTWvk}+Avk,k,tkP˜k,tktk2R(vk,Vk,tk)[Dk1+R(Vk,Vk,tk)]1VkT(W2EW2)vk+Avk,t,tkP˜k,tktk2R(vk,Vk,tk)[Dk1+R(Vk,Vk,tk)]1VkT(W2EW2)vk=Op(1|tk|2+αn|tk|3),

and

P˜k,tktk2(Avk,k,tkxT+Avk,k,tkyT)(W2EW2)vk+3tk2Avk,k,tkAvk,k,tkP˜k,tkvkT(W2EW2)vk=vkT(W2EW2)vktk2+Op(αn3/tk4).

Substituting the above equations into (111) results in

vkTv^kv^kTvkAvk,k,tk2P˜k,tk=vkT(W2EW2)vk/tk2+Op(|tk|2+αn2/|tk|3), (47)

where the leading term of the asymptotic expansion now depends on the second moments of the noise matrix W. Recall that vkTv^k0. By (44) and (47) we have

vkTv^k+Avk,k,tkP˜k,tk1/2=vkT(W2EW2)vk2Avk,k,tkP˜k,tk1/2tk2+Op(αn2/tk3)=vkT(W2EW2)vk2tk2+Op(|tk|2+αn2/|tk|3). (48)

We now consider an arbitrary unit vector un with |uT vk| ∈ [0, 1) for investigating the asymptotic distributions of the general linear combinations uTv^k. Recall the first order expansion given in (107) in the proof of Theorem 4 and (46) that

uTv^kv^kTvkAu,k,tkAvk,k,tkP˜k,tk=Avk,k,tkP˜k,tk(bu,k,tkT+Au,k,tkP˜k,tkvkT)Wvk/tk+Op(αn/tk2). (49)

Then dividing (49) by vkTv^k and using (44) and (48), we can deduce that

uTv^k+Au,k,tkP˜k,tk1/2=P˜k,tk1/2(bu,k,tkT+Au,k,tkP˜k,tkvkT)Wvk/tk+Op(αn/tk2)=(bu,k,tkTuTvkvkT)Wvk/tk+Op(αn/tk2). (50)

In view of the asymptotic expansions in (50) and (48), we can see that the desired asymptotic normalities in the two parts of Theorem 2 follow from the conditions of Lemmas 1 or 2. More specifically, for (50) if αn2dk2var[(bu,k,tkTuTvkvkT)Wvk], then we have αn/tk2{var[(bu,k,tkTuTvkvkT)Wvk]}1/2 and thus the first part of Theorem 2 in (17) holds in view of (50). Furthermore, if (bu,k,tkTuTvkvkT)Wvk is W1-CLT, then (bu,k,tkvkvkTu,vk) is also W1-CLT and thus we have

tk(uTv^k+Au,k,tkP˜k,tk1/2)E[(bu,k,tkTuTvkvkT)Wvk]{var[(bu,k,tkTuTvkvkT)Wvk]}1/2DN(0,1).

Similarly, the second part of Theorem 2 in (18) also holds under the condition (αn4dk2+1)var[vkT(W2EW2)vk] and the CLT holds if (vk, vk) is W2-CLT. This concludes the proof of Theorem 2.

A.3. Proof of Theorem 3

The results on the asymptotic distributions of spiked eigenvalues and spiked eigenvectors in Theorem 3 are an application of those in Theorems 1 and 2 for a more specific structure of the low rank model (1), including the stochastic block model with both non-overlapping and overlapping communities as special cases.

First, note that (15) implies that the condition of Lemma 1 holds for vkTWvk under Condition 3. Consequently, (vk, vk) is W1-CLT. In addition, (15) ensures that E(vkTWvkEvkTWvk)2αn2/dk2 under Condition 3. Therefore, it follows from Theorem 1 that the first result of Theorem 3 holds. Recall that in (A.12), sx,y is defined as the expected value of the conditional variance of vkT(W2EW2)vk. By definition, we have var[vkT(W2EW2)vk]sx,ycσmin2n. Thus the condition (αn4dk2+1)var[vkT(W2EW2)vk] in Theorem 2 is ensured by the assumptions

σmin2n,|dK|σminαn,αnn1/2

in Condition 3. Moreover, by (A.13) we can see that the conditions of Lemma 2 are satisfied for vkT(W2EW2)vk under Condition 3. Thus (vk, vk) is W2-CLT. Therefore, (22) holds by an application of (18) in Theorem 2.

It remains to show that the condition

var[(bu,k,tkTuTvkvkT)Wvk]αn2/dk2 (51)

in Theorem 2 can be guaranteed by Condition 3. Then the expansion in (17) holds. Moreover, the condition σmin1vk[bu,k,tkTuTvkvkT]0 ensures that (bu,k,tkvkvkTu,vk) is W1-CLT. Combining these results entails that the asymptotic normality (21) holds. Now we proceed to verify (51). Consider an arbitrary unit vector un satisfying |uTvk|[0,1ϵ] for some positive constant ϵ. Recalling the definition of bu,k,t in (8), we have bu,k,tTvk={uTR(u,Vk,t)[(Dk)1+R(Vk,Vk,t)]1VkT}vk=uTvk. Thus it holds that bu,k,tkTuTvkvkT=bu,k,tkTbu,k,tkTvkvkT=bu,k,tkT(IvkvkT). Moreover, similar to (15) we can show that

[E(uTWvkEuTWvk)2]1/2σmin(22vk2)1/2. (52)

This ensures that there exists some positive constant c1 such that

var[(bu,k,tkTuTvkvkT)Wvk]2σmin2(22vk2)bu,k,tkTuTvkvkT2σmin2(22vk2)bu,k,tkT(IvkvkT)2c1σmin2[(uTvk)2+bu,k,tkTbu,k,tk], (53)

where we have applied bu,k,tTvk=uTv again in the last step.

Let V˜=(vK+1,,vn) be an n × (nK) matrix such that (V,V˜) is an orthogonal matrix of size n. Then the n-dimensional unit vector u can be represented as u=i=1naivi for some scalars ai’s. For each 1 ≤ kK, by the definition of R in (6) and Lemma 5 we can show that

R(Vk,Vk,tk)+tk1I=O(αn2|tk|3)andR(u,Vk,tk)+tk1uTVk=O(αn2|tk|3). (54)

Therefore it holds that

R(u,Vk,tk)[Dk1+R(Vk,Vk,tk)]1VkT+1ikKai(tkdi11)1viT=O(αn2/tk2). (55)

Then it follows from (55) and (8) that

bu,k,tkT=i=1naiviTR(u,Vk,tk)[Dk1+R(Vk,Vk,tk)]1VkT

and

bu,k,tkakvk1ikKai[1+(tkdi11)1]vii=K+1naivi=O(αn2/tk2). (56)

We denote by ck=akvk+1ikKai[1+(tkdi11)1]vi+i=K+1naivi. By (56), we can obtain

(uTvk)2+bu,k,tkTbu,k,tk=ak2+ck2+bu,k,tkck2+2(bu,k,tkck)Tck=1ikKai2[1+(tkdi11)1]2+i=K+1nai2+O(αn2/tk2)+somesmallorderterm, (57)

where the small order term takes a rather complicated form and thus we omit its expression for simplicity. Since by assumption |uTvk|[0,1ϵ],u=i=1naivi is a unit vector, and (v1, · · ·, vn) is an orthogonal matrix, it holds that

1iknnai21(1ϵ)2. (58)

Moreover, Condition 3 and Lemma 3 together entail that |tkdi1| is bounded away from 0 and 1. Thus there exists some positive constant c2 < 1 such that

[1+(tkdi11)1]2c2 (59)

for each 1 ≤ ikK. Therefore, combining (53) and (57)(59), and by the assumption σminαn/tk, we can obtain the desired claim in (51), which completes the proof of Theorem 3.

A.4. Proof of Corollary 1

The conclusions of Corollary 1 follow directly from the results of Theorem 3.

A.5. Proof of Theorem 4

The more general asymptotic theory in Theorem 4 focuses on the first order asymptotic expansion for the bilinear form xTv^kv^kTy with x and y two arbitrary n-dimensional unit vectors, while that in Theorem 5 further establishes the higher order (which is second order) asymptotic expansion for the same bilinear form. We begin with the analysis for the first order asymptotic expansion. The main ingredients of the proof are as follows. First, we represent xTv^kv^kTy as an integral which is a functional of X = H + W. By doing so we can deal with the matrix H + W instead of the eigenvectors. Second, for the functional of H+W obtained in the previous step we extract the H part from H+W and further obtain a functional of W. Roughly speaking, we can get an explicit function of form f((WtI)−1) with |t| ≫ ‖W‖. Third, by the matrix series expansion (WtI)1=l=0t(l+1)Wl, the function f((WtI)−1) can be approximated by f(l=0Lt(l+1)Wl) for some positive integer L. Fourth, we can then calculate the first (second or higher) order expansion of f(l=0Lt(l+1)Wl) since we have an explicit expression of function f.

To facilitate our technical derivations, let us recall some basic matrix identities from the Sherman–Morrison–Woodbury formula. For any matrices A, B, C, and F of appropriate dimensions and any vectors a and b of appropriate dimensions, it holds that

(A+BFC)1=A1A1B(F1+CA1B)1CA1 (60)

and

(C+abT)1a=C1a1+bTC1a (61)

when the corresponding matrices for matrix inversion are nonsingular.

To illustrate the main ideas of our proof, we first consider the simple case of K = 1 and x = y = v1. The general case of K ≥ 1 and arbitrary unit vectors will be discussed later. Let Ω1 be a contour centered at (a1 + b1)/2 with radius |b1a1|/2, where the quantities ak and bk with 1 ≤ kK are defined in Section 3.2. Then it is seen that d1 is enclosed by Ω1. In view of Condition 2, Lemma 6, and Weyl’s inequality, we have

|λ1d1|W<min{|d1a1|,|d1b1|}

and

|λjd1||d1|W>max{|d1a1|,|d1b1|},j2

with significant probability. We can see that the contour Ω1 does not enclose any other eigenvalues λj with j ≠ 1. Thus, by Cauchy’s residue theorem from complex analysis, we have with significant probability

12πiΩ11λ1zdz=1and12πiΩ11λjzdz=0,j2,

where i associated with the complex integrals represents the imaginary unit (–1)1/2 and the line integrals are taken over the contour Ω1. Noticing that (XzI)1=i=1ni=1n(λjz)1v^jv^jT, we can then obtain an integral representation of the desired bilinear form that with significant probability

v1Tv^1v^1Tv1=v1Tv^1v^1Tv12πiΩ11λ1zdz=12πiΩ1v1T(j=1nv^jv^jTλjz)v1dz=12πiΩ1v1TG˜(z)v1dz, (62)

where the matrix-valued function G˜(z)=(XzI)1 for z in the complex plane is referred to as the Green function associated with the original random matrix X = H + W.

Note that by (1) and K = 1 for the simple case, we have X=H+W=d1v1v1T+W. Thus the line integral in (62) can be rewritten as

v1Tv^1v^1Tv1=12πiΩ1v1T(WzI+d1v1v1T)1v1dz. (63)

With the aid of (60) and (61), the line integral in (63) can be further represented as

v1Tv^1v^1Tv1=12πiΩ1v1T(WzI)1v11+d1v1T(WzI)1v1dz. (64)

To analyze the integrand of the line integral on the right hand side of (64), we first consider the term (WzI)−1. Such a term admits the matrix series expansion

(WzI)1=l=0z(l+1)Wl. (65)

Let L be the smallest positive integer such that

αnL+1(logn)(L+1)/2|dK|L20. (66)

Such an integer L always exists since |dK|/(nϵαn) for small positive constant ϵ by Condition (2) and αnn1/2 by definition. Since we consider z on the contour Ω1, it follows that |z| ≥ c|d1| for some positive constant c. Thus, by (65), Condition 1, and Lemma 6 in Section B.7 of Supplementary Material, with the above choice of L in (66) we have with probability tending to one that

l=L+1z(l+1)Wll=L+1Clαnl(logn)l/2|z|l+1=O{CL+1αnL+1(logn)(L+1)/2}|z|L+2=O(1)|z|4, (67)

where C is some positive constant. In light of (65) and (67), we can obtain the asymptotic expansion

v1T(WzI)1v1=l=0Lz(l+1)v1TWlv1l=L+1z(l+1)v1TWlv1=l=0Lz(l+1)v1TWlv1+Op(1)d14 (68)

for z on the contour Ω1.

Directly working with the line integral in (62) or (64) is challenging in deriving the CLT for the bilinear form v1Tv^1v^1Tv1. Next we introduce some simple facts about Cauchy’s residue theorem. Assume that a complex function f(z) is a holomorphic function inside Ω1 except at one point t. Then it holds that

12πiΩ1f(z)dz=Res(f,t),

where Res(f, t) represents the residue of function f at point t. In addition, assume that the Laurent series expansion of f around point t is given by

f(z)=j=aj(zt)j

with aj some constants. Then we have Res(f,t)=(2πi)1Ω1f(z)dz=a1. Furthermore, if limzt(zt)f(z) exists then the Laurent series expansion of f entails that

limzt(zt)f(z)=a1. (69)

Now let us consider the line integral in (64). Observe that the only singular point of function v1T(WzI)1v1/[1+d1v1T(WzI)1v1] inside Ω1 is the solution to equation

1+d1v1T(WzI)1v1=0,

which we denote as t^1. Let us use [(Wt^1I)1] as a shorthand notation for h(t^1) with h(t) = (WtI)−1. Then by Cauchy’s residue theorem and in view of (64), we have

v1Tv^1v^1Tv1=12πiΩ1v1T(WzI)1v11+d1v1T(WzI)1v1dz=limzt^1(zt^1)v1T(WzI)1v11+d1v1T(WzI)1v1=v1T(Wt^1I)1v1d1v1T[(Wt^1I)1]v1.

Therefore, an application of the Taylor expansion to function v1T(Wt^1I)1v1/{d1v1T[(Wt^1I)1]v1} yields

v1T(Wt^1I)1v1d1v1T[(Wt^1I)1]v1=l=1Lt^1(l+1)v1TWlv1+d14Op(1)d1l=0L(l+1)t^1(l+2)v1TWlv1+d14Op(1). (70)

Note that t^1 is a random variable that depends on random matrix X. In fact, from (99) we can see that the asymptotic expansion of t^1 is a polynomial of v1TWlv1. Thus the asymptotic expansion of (70) is also a polynomial function of v1TWlv1. Therefore, controlling the variance of v1TWlv1 can facilitate us in identifying the leading term of the asymptotic expansion. So far we have laid out the major steps in deriving the asymptotic expansion for v1Tv^1v^1Tv1. This can shed light on the detailed proof for the general case of xTv^kv^kTy with K ≥ 1.

We now move on to the general case of K ≥ 1 and arbitrary n-dimensional unit vectors x and y. The technical arguments for the general case are similar to those for the simple case of K = 1 and x = y = v1 presented above, but with more delicate technical derivations. Similarly as in (62), it follows from Cauchy’s residue theorem, the definitions of the eigenvalue and eigenvector, and (1) that the bilinear form xTv^kv^kTy for each 1 ≤ kK admits a natural integral representation; that is, with significant probability,

xTv^kv^kTy=12πiΩkxTG˜(z)ydz=12πiΩkxT(WzI+j=1KdjvjvjT)1ydz=12πiΩkdkxT(WzI+1jkKdjvjvjT)1vkvkT(WzI+1jkKdiviviT)1y1+dkvkT(WzI+1jkKdjvjvjT)1vkdz, (71)

where the Green function G˜(z) associated with the original random matrix X is defined in (62) and the line integral is taken over a contour Ωk that is centered at (ak + bk)/2 with radius |bkak|/2. Then the contour Ωk encloses the population eigenvalue dk of the latent mean matrix H. Note that in the representation above, we have used the results, which can be derived from Condition 2, Lemma 6, and Weyl’s inequality, that for each j = 1, · · ·, K,

|λkdk|W<min{|dkak|,|dkbk|},
|λjdk||djdk||λjdj||djdk|W>max{|dkak|,|dkbk|}

for jk with significant probability; that is, the contour Ωk encloses λk but not any other eigenvalues with high probability.

An application of (60) leads to

(WzI+1jkKdjvjvjT)1=G(z)G(z)Vk[Dk1+VkTG(z)Vk]1VkTG(z), (72)

where the Green function G(z) associated with only the noise part W is defined in (37). To simplify the expression, let

Fk(z)=G(z)Vk[Dk1+VkTG(z)Vk]1VkTG(z) (73)

Then in view of (72), the last line integral in (71) can be further represented as

xTv^kv^kTy=12πiΩkdkxT[G(z)Fk(z)]vkvkT[G(z)Fk(z)]y1+dkvkT[G(z)Fk(z)]vkdz. (74)

It is challenging to analyze the terms in (74) since the expression of Fk(z) is complicated and we need to study the asymptotic expansion of Fk(z) carefully. In the proof below, we will see that Lemma 4 in Section 6 is a key ingredient of the technical arguments; see Section B.5 of Supplementary Material for the proof of this lemma.

We will conduct detailed calculations for the asymptotic expansion of Fk(z). Let us choose L as the same positive integer as in (66). Then we have l=L+1z(l+1)xTWly=Op(|z|4) for z on the contour Ωk. It follows from Lemma 4 and Condition 2 that

l=2Lz(l+1)xT(WlEWl)y=Op{αn|z|3+αn2|z|4++αnL1|z|(L+1)}=Op(αn|z|3).

Therefore, similar to (68) we can show that

xTG(z)y=z1xTyz2xTWyl=2Lz(l+1)xTEWlyl=L+1z(l+1)xTWlyl=2Lz(l+1)xT(WlEWl)y=z1xTyz2xTWyl=2Lz(l+1)xTEWly+Op(|z|4+αn|z|3). (75)

Moreover, since for z ∈ Ωk we have |z|–4αn|z|−3 by Condition 1, we can further obtain

xTG(z)y=z1xTyz2xTWyl=2Lz(l+1)xTEWly+Op(αn|z|3). (76)

In fact, the probabilistic event associated with the small order term Op(αn|z|–3) in (76) holds uniformly over z since the term Op(αn|z|−3) is simply |z|−3Op(αn).

To simplify the technical presentation, hereafter we use the generic notation u to denote either x or y unless specified otherwise, which means that the corresponding derivations and results hold when u is replaced by x and y. Since x and y can be chosen as any unit vectors, we can obtain from (76) the following asymptotic expansions by different choices of x and y

uTG(z)vk=z1uTvkz2uTWvkl=2Lz(l+1)uTEWlvk+Op(αn|z|3), (77)
vkTG(z)vk=z1z2vkTWvkl=2Lz(l+1)vkTEWlvk+Op(αn|z|3), (78)
vkTG(z)Vk=z2vkTWVkl=2Lz(l+1)vkTEWlVk+Op(αn|z|3), (79)
uTG(z)Vk=z1uTVkz2uTWVkl=2Lz(l+1)uTEWlVk+Op(αn|z|3), (80)
VkTG(z)Vk=z1Iz2VkTWVkl=2Lz(l+1)VkTEWlVk+Op(αn|z|3). (81)

Thus it follows from (76)(81) that

uTFk(z)vk=R(u,Vk,z)[Dk1+R(Vk,Vk,z)]1R(Vk,vk,z)z2R(u,Vk,z)[Dk1+R(Vk,Vk,z)]1VkTWvkz2uTWVk[Dk1+R(Vk,Vk,z)]1R(Vk,vk,z)+z2R(u,Vk,z)[Dk1+R(Vk,Vk,z)]1VkTWVk=R(u,Vk,z)[Dk1+R(Vk,Vk,z)]1VkTWVkz2R(u,Vk,z)[Dk1+R(Vk,Vk,z)]1VkTWvk+Op(αn|z|3) (82)

and

vkTFk(z)vk=vkTG(z)Vk[Dk1+VkTG(z)Vk]1VkTG(z)vk=R(vk,Vk,z)[Dk1+R(Vk,Vk,z)]1R(Vk,vk,z)z2R(vk,Vk,z)[Dk1+R(Vk,Vk,z)]1VkTWvk+Op(αn|z|3)=R(vk,Vk,z)[Dk1+R(Vk,Vk,z)]1R(Vk,vk,z)+Op(αn|z|3), (83)

where Fk(z) is defined in (73) and R is defined in (6).

With all the technical preparations above, we are now ready to analyze the terms in representation (74). Specifically let us consider the ratio {dkxT[G(z)Fk(z)]vkvkT[G(z)Fk(z)]y}/{1+dkvkT[G(z)Fk(z)]vk} that appears as the integrand on the left hand side of (74). Similar to (75), taking the derivative of G(z) we have

xTG(z)y=xT(WzI)2y=l=0(l+1)z(l+2)xTWly=R(x,y,z)+2z3xTWy+z4Op(αn). (84)

It follows from Lemmas 45 that

R(vk,Vk,z)=O(αn2/z4),R(vk,vk,z)1z2=O(αn2/z4),R(Vk,Vk,z)z2I=O(αn2/z4) (85)

By (79) and Lemmas 45, we can conclude that

vkTG(z)Vk=z2Op(1)+|z|3Op(αn2). (86)

Moreover, by (80) and (A.16) we have

{[Dk1+VkTG(z)Vk]1[Dk1+R(Vk,Vk,z)]1}=[Dk1+VkTG(z)Vk]1VkTG(z)Vk[Dk1+VkTG(z)Vk]1[Dk1+R(Vk,Vk,z)]1R(Vk,Vk,z)[Dk1+R(Vk,Vk,z)]1=O{VkTG(z)VkR(Vk,Vk,z)[Dk1+VkTG(z)Vk]12}+O{[Dk1+VkTG(z)Vk]1[Dk1+R(Vk,Vk,z)]1[Dk1+VkTG(z)Vk]1R(Vk,Vk,z)}=|z|1Op(1)+z2Op(αn) (87)

and

{[Dk1+R(Vk,Vk,z)]1}=[Dk1+R(Vk,Vk,z)]1R(Vk,Vk,z)[Dk1+R(Vk,Vk,z)]1=O(1). (88)

Note that in light of (84)(87), we can obtain

vkTFk(z)vk=2vkTG(z)Vk[Dk1+VkTG(z)Vk]1VkTG(z)vk+vkTG(z)Vk{[Dk1+VkTG(z)Vk]1}VkTG(z)vk=2R(vk,Vk,z)[Dk1+R(Vk,Vk,z)]1R(Vk,vk,z)+R(vk,Vk,z){[Dk1+R(Vk,Vk,z)]1}R(Vk,vk,z)+z4Op(1)+z6Op(αn3). (89)

Combining the above result with (84) leads to

dkvkT[G(z)Fk(z)]vk=dkz2P˜k,z+2z3dkvkTWvk+z4Op(|dk|αn) (90)

for z ∈ [ak, bk]. Further, recalling the definition in (7) and by (88), it holds that

1z2P˜k,z=(Avk,k,zz)=R(vk,vk,z)2R(vk,Vk,z)[Dk1+R(Vk,Vk,z)]1×R(Vk,vk,z)R(vk,Vk,z){[Dk1+R(Vk,Vk,z)]1}R(Vk,vk,z)=z2+O(αn2/z4). (91)

Plugging this into (90) and by Lemmas 45, we have for all z ∈ [ak, bk],

dkvkT[G(z)Fk(z)]vk=dkz2+2z3dkvkTWvk+z4Op(|dk|αn2)=dkz2[1+Op(|z|1+|z|2αn2)]=dkz2[1+op(1)]. (92)

Thus 1+dkvkT[G(z)Fk(z)]vk is a monotone function with probability tending to one.

Further, in light of expressions (78) and (83) we can obtain the asymptotic expansion

1+dkvkT[G(z)Fk(z)]vk=fk(z)dkz2vkTWvk+z2Op(αn) (93)

for all z ∈ [ak,bk], where fk(z) is defined in (10). Note that fk(ak) = O(1), fk(bk) = O(1), and fk(ak)fk(bk) < 0 as shown in the proof of Lemma 3 in Section B.4 of Supplementary Material. These results together with (92), which gives the order for the derivative of 1+dkvkT[G(z)Fk(z)]vk, entail that there exists a unique solution t^k to the equation

1+dkvkT[G(z)Fk(z)]vk=0 (94)

for z in the interval [ak, bk]. Using Lemma 4, we can further show that (93) becomes

1+dkvkT[G(z)Fk(z)]vkfk(z)=dkz2vkTWvk+Op(|z|2αn)=Op(|z|1) (95)

for z ∈ [ak, bk]. Note that fk(z) is a monotone function over z ∈ [ak, bk] as shown in the proof of Lemma 3 and (A.17). Thus it follows from (94) and (95) that

t^ktk=Op(1). (96)

In fact, we can obtain a more precise order of t^ktk than the initial one in (96). In view of (93) and the definition of tk, we have

1+dkvkT[G(tk)Fk(tk)]vk=dktk2vkTWvk+Op(αntk2). (97)

By (92) and (97), an application of the mean value theorem yields

0=1+dkvkT[G(t^k)Fk(t^k)]vk=1+dkvkT[G(tk)Fk(tk)]vk+dkt˜k2[1+Op(|dk|1+|dk|2αn2)](t^ktk), (98)

where t˜k is some number between tk and t^k. The asymptotic expansions in (98) and (97) entail further that

t^ktk=tk2tk2vkTWvk+Op(αntk1)=vkTWvk+Op(αntk1). (99)

Now by the similar arguments as for obtaining (69), the integral (74) can be evaluated as

xTv^kv^kTy=12πiΩkdkxT[G(z)Fk(z)]vkvkT[G(z)Fk(z)]y1+dkvkT[G(z)Fk(z)]vkdz=t^k2xT[G(t^k)Fk(t^k)]vkvkT[G(t^k)Fk(t^k)]yt^k2vkT[G(t^k)Fk(t^k)]vk. (100)

By (90) we have

1t^k2vkT[G(t^k)Fk(t^k)]vk=P˜k,t^k2t^k1P˜k,t^k2vkTWvk+t^k2Op(αn) (101)

and (100) can be written as

xTv^kv^kTy=t^k2xT[G(t^k)Fk(t^k)]vkvkT[G(t^k)Fk(t^k)]yt^k2vkT[G(t^k)Fk(t^k)]vk=[P˜k,t^k2t^k1vkTWvk+t^k2Op(αn)]t^k2xT[G(t^k)Fk(t^k)]vkvkT×[G(t^k)Fk(t^k)]y. (102)

Recall the definitions in (6) and (7). Then it follows from (77), (82), and (99) that

t^kuT[G(t^k)Fk(t^k)]vk=P(u,vk,t^k)P(u,Vk,t^k)[t^kDk1+P(Vk,Vk,t^k)]1×P(Vk,vk,t^k)t^k1uTWvk+t^k1R(u,Vk,t^k)[Dk1+R(Vk,Vk,t^k)]1×VkTWvk+Op(αnt^k2)=P(u,vk,tk)P(u,Vk,tk)[tkDk1+P(Vk,Vk,tk)]1×P(Vk,vk,tk)tk1uTWvk+tk1R(u,Vk,tk)[Dk1+R(Vk,Vk,tk)]1×VkTWvk+Op(αntk2)=Au,k,tktk1bu,k,tkTWvk+Op(αntk2), (103)

where u stands for both x and y as mentioned before. Furthermore, by Lemma 5 and (99) we can conclude that

P˜k,tk^=P˜k,tk+Op(αn2tk3). (104)

Combining the representation (102) and asymptotic expansions (103)(104), by Lemma 4 we can deduce that (100) can be further written as

xTv^kv^kTy=t^k2xT[G(t^k)Fk(t^k)]vkvkT[G(t^k)Fk(t^k)]yt^k2vkT[G(t^k)Fk(t^k)]vk=[P˜k,tk2tk1P˜k,tk2vkTWvk+Op(αntk2)][Ax,k,tktk1bx,k,tkTWvk+Op(αntk2)]×[Ay,k,tktk1by,k,tkTWvk+Op(αntk2)]=[P˜k,tk2tk1P˜k,tk2vkTWvk+Op(αntk2)]×[Ax,k,tkAy,k,tktk1(Ax,k,tkbx,k,tkT+Ay,k,tkby,k,tkT)Wvk+tk2bx,k,tkTWvkby,k,tkTWvk+Op(αncktk2)], (105)

where ck=|Ax,k,tk|+|Ay,k,tk|+|tk|1.

We can expand (105), or equivalently (100), further as

xTv^kv^kTy=[P˜k,tk2tk1P˜k,tk2vkTWvk+Op(αntk2)]×[Ax,k,tkAy,k,tktk1(Ax,k,tkbx,k,tkT+Ay,k,tkby,k,tkT)Wvk+tk2bx,k,tkTWvkby,k,tkTWvk+Op(αncktk2)]=P˜k,tkAx,k,tkAy,k,tktk1Ax,k,tkP˜k,tk(by,k,tkT+Ay,k,tkP˜k,tkvkT)Wvktk1Ay,k,tkP˜k,tk(bx,k,tkT+Ax,k,tkP˜k,tkvkT)Wvk+tk2P˜k,tk[2P˜k,tk(Ax,k,tkbx,k,tkT+Ay,k,tkby,k,tkT)WvkvkT+bx,k,tkTWvkby,k,tkT]Wvk2tk3P˜k,tk2bx,k,tkTWvkby,k,tkTWvkvkTWvk+Op{αncktk2}. (106)

Therefore, we have characterized the terms involving tk1 for the desired first order asymptotic expansion. That is, by (106) we have

xTv^kv^kTy=P˜k,tkAx,k,tkAy,k,tktk1Ax,k,tkP˜k,tk(by,k,tkT+Ay,k,tkP˜k,tkvkT)Wvktk1Ay,k,tkP˜k,tk(bx,k,tkT+Ax,k,tkP˜k,tkvkT)Wvk+Op{(αnck+1)tk2}. (107)

Thus if σk2=tk2P˜k,tk2E[(Ax,k,tkby,k,tkT+Ay,k,tkbx,k,tkT+2Ax,k,tkAy,k,tkP˜k,tkvkT)Wvk]2(αnck+1)2tk4~σn2(|Ax,k,tk|+|Ay,k,tk|)2tk4+tk4(αnck+1)2tk4~σn2(|Ax,k,tk|+|Ay,k,tk|)(αnck+1)2tk4~σn2(|Ax,k,tk|+|Ay,k,tk|)2tk4+tk42tk4+tk42Ax,k,tkAy,k,tkP˜k,tkvkT,vk) and (Ax,k,tkby,k,tkT+Ay,k,tkbx,k,tkT+2Ax,k,tkAy,k,tkP˜k,tkvkT,vk) is W1-CLT, then (33) holds, where ~ means the asymptotic order. This concludes the proof of Theorem 4.

A.6. Proof of Theorem 5

We have characterized the first order asymptotic expansion for the bilinear form xTv^kv^kTy in the proof of Theorem 4 in Section A.5, where x and y are two arbitrary n-dimensional unit vectors. We now proceed with investigating the higher order (which is second order) asymptotic expansion for the same bilinear form. More specifically, the proof of Theorem 5 involves further expansion for the Op{αncktk2} term given in (106).

To gain some intuition, let us recall (75) and compare with (77)(81). By Lemma 4, we see that the order Op(αn|z|−3) comes from the terms of form xT(W2EW2)y/z3. Therefore, to obtain a higher order expansion we need to identify all terms of form xT(W2EW2)y/z3. It follows from (75) and Lemmas 4 and 5 that

xTG(z)y=z1xTyz2xTWyxT(W2EW2)yz3l=2Lz(l+1)xTEWly+Op(|z|4+αn2|z|4). (108)

Moreover, using similar arguments as for proving (101) and (103) but expanding to higher orders we can obtain

{t^k2vkT[G(t^k)Fk(t^k)]vk}1=P˜k,tk{12tk1P˜k,tkvkTWvktk2P˜k,tk×[3vkT(W2EW2)vk2(vkTWvk)2]}+Op(αn2|tk|3) (109)

and

t^kuT[G(t^k)Fk(t^k)]vk=Au,k,tktk1uTWvk+tk1R(u,Vk,tk)[Dk1+R(Vk,Vk,tk)]1VkTWvk+tk2uTWvkvkTWvktk2vkTWvkR(u,Vk,tk)[Dk1+R(Vk,Vk,tk)]1VkTWvk+tk2R(u,Vk,tk)[Dk1+R(Vk,Vk,tk)]1VkT(W2EW2)vktk2uT(W2EW2)vk+2tk3vkTWvkR(u,Vk,tk)[Dk1+R(Vk,Vk,tk)]2×VkTWvk+Op(αn2|tk|3), (110)

where u represents both x and y as mentioned before.

Using the representations (100) and (102), and by the asymptotic expansions (109)(110), we can obtain the Op(tk2) term for the desired second order asymptotic expansion as follows

xTv^kv^kTy=t^k2xT[G(t^k)Fk(t^k)]vkvkT[G(t^k)Fk(t^k)]yt^k2vkT[G(t^k)Fk(t^k)]vk=(P˜k,tk×{12tk1P˜k,tkvkTWvktk2P˜k,tk[3vkT(W2EW2)vk2(vkTWvk)2]}+Op(αn2|tk|3))[t^kxT[G(t^k)Fk(t^k)]vk][t^kvkT[G(t^k)Fk(t^k)]y]=Ax,k,tkP˜k,tktk1(by,k,tkT+Ay,k,tkP˜k,tkvkT)WvkAy,k,tkP˜k,tktk1×(bx,k,tkT+Ax,k,tkP˜k,tkvkT)Wvk+P˜k,tktk2[2P˜k,tk(Ax,k,tkbx,k,tkT+Ay,k,tkby,k,tkT)WvkvkT+bx,k,tkTWvkby,k,tkT]Wvk+2Ax,k,tkAy,k,tk(vkTWvk)2+Ay,k,tkP˜k,tk{tk2xTWvkvkTWvktk2vkTWvkR(x,Vk,t)×[Dk1+R(Vk,Vk,tk)]1VkTWvk}+Ax,k,tkP˜k,tk{tk2yTWvkvkTWvktk2vkTWvkR(y,Vk,t)×[Dk1+R(Vk,Vk,tk)]1VkTWvk}+Ay,k,tkP˜k,tktk2R(x,Vk,tk)[Dk1+R(Vk,Vk,tk)]1VkT(W2EW2)vk+Ax,k,tkP˜k,tktk2R(y,Vk,tk)[Dk1+R(Vk,Vk,tk)]1VkT(W2EW2)vkP˜k,tktk2(Ay,k,tkxT+Ax,k,tkyT)(W2EW2)vk3tk2Ax,k,tkAy,k,tkP˜k,tkvkT(W2EW2)vk+Op{(αn2ck+1)|tk|3}. (111)

In contrast to the small order term Op{αncktk2} in (106) from the first order asymptotic expansion, we now have the small order term Op{(αn2ck+1)|tk|3} from the second order asymptotic expansion.

Let us conduct some simplifications for the expressions given in the above asymptotic expansions in (111). A combination of (106) and (111) shows that the asymptotic distribution is determined by

Ax,k,tkP˜k,tktk1(by,k,tkT+Ay,k,tkP˜k,tkvkT)WvkAy,k,tkP˜k,tktk1(bx,k,tkT+Ax,k,tkP˜k,tkvkT)Wvk+P˜k,tktk2[2P˜k,tk(Ax,k,tkbx,k,tkT+Ay,k,tkby,k,tkT)WvkvkT+bx,k,tkTWvkby,k,tkT]Wvk+2Ax,k,tkAy,k,tk(vkTWvk)2+Ay,k,tkP˜k,tk{tk2xTWvkvkTWvktk2vkTWvkR(x,Vk,t)×[Dk1+R(Vk,Vk,tk)]1VkTWvk}+Ax,k,tkP˜k,tk{tk2yTWvkvkTWvktk2vkTWvkR(y,Vk,t)×[Dk1+R(Vk,Vk,tk)]1VkTWvk}+Ay,k,tkP˜k,tktk2R(x,Vk,t)[Dk1+R(Vk,Vk,tk)]1VkT(W2EW2)vk+Ax,k,tkP˜k,tktk2R(y,Vk,t)[Dk1+R(Vk,Vk,tk)]1VkT(W2EW2)vkP˜k,tktk2(Ay,k,tkxT+Ax,k,tkyT)(W2EW2)vk3tk2Ax,k,tkAy,k,tkP˜k,tkvkT(W2EW2)vk. (112)

To further simplify the notation, we define three terms

Jx,y,k,tk=P˜k,tktk1vk(Ay,k,tkbx,k,tkT+Ax,k,tkby,k,tkT+2Ax,k,tkAy,k,tkP˜k,tkvkT), (113)
Lx,y,k,tk=P˜k,tktk2vk{[Ay,k,tkR(x,Vk,t)+Ax,k,tkR(y,Vk,t)]×[Dk1+R(Vk,Vk,tk)]1VkT+Ay,k,tkxT+Ax,k,tkyT+3Ax,k,tkAy,k,tkvkT}, (114)
Qx,y,k,tk=Lx,y,k,tkP˜k,tktk2Ax,k,tkAy,k,tkvkvkT+2P˜k,tk2tk2vk(Ax,k,tkbx,k,tkT+Ay,k,tkby,k,tkT). (115)

Note that all the three matrices defined in (113)(115) are of rank one and the identity xT Ay = tr(AyxT ) holds for any matrix A and vectors x and y. Thus in view of (113)(115), the lengthy expression given in (112) can be rewritten in a compact form as

tr[WJx,y,k,tk(W2EW2)Lx,y,k,tk]+tr(WvkvkT)tr(WQx,y,k,tk). (116)

So far we have shown that the second order expansion of xTv^kv^kTy is given in (111). Note that σ˜k2 defined in (32) is essentially the variance of (116). Thus if σ˜k2(αn2ck+1)2tk6~σn4(|Ax,k,tk|+|Ay,k,tk|)2tk6+tk6, then (116) is the leading term of (111). Furthermore, the assumption of σk2=O(σ˜k2) entails that the first order expansion in Theorem 4 does not dominate the second order expansion. Therefore, we see that the asymptotic distribution in Theorem 5 is determined by the joint distribution of the three random variables specified in expression (116). This completes the proof of Theorem 5.

References

  1. Abbe E (2017). Community detection and stochastic block models: recent developments. Journal of Machine Learning Research 18(1), 6446–6531. [Google Scholar]
  2. Abbe E, Fan J, Wang K, and Zhong Y (2019). Entrywise eigenvector analysis of random matrices with low expected rank. The Annals of Statistics, to appear. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Arias-Castro E, Verzelen N, et al. (2014). Community detection in dense random networks. The Annals of Statistics 42(3), 940–969. [Google Scholar]
  4. Arnold L (1967). On the asymptotic distribution of the eigenvalues of random matrices. J. Math. Anal. Appl 20, 262–268. [Google Scholar]
  5. Arnold L (1971). On Wigner’s semicircle law for the eigenvalues of random matrices. Probability Theory and Related Fields 19, 191–198. [Google Scholar]
  6. Bai ZD (1999). Methodologies in spectral analysis of large dimensional random matrices, a review. Statistica Sinica 9, 611–677. [Google Scholar]
  7. Bai ZD and Silverstein JW (2006). Spectral Analysis of Large Dimensional Random Matrices. Springer. [Google Scholar]
  8. Bai ZD and Yao JF (2008). Central limit theorems for eigenvalues in a spiked population model. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques 44, 447–474. [Google Scholar]
  9. Baik J, Arous GB, and Péché S (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. The Annals of Probability 33, 682–693. [Google Scholar]
  10. Bao Z, Ding X, and Wang K (2018). Singular vector and singular subspace distribution for the matrix denoising model. arXiv preprint arXiv:1809.10476. [Google Scholar]
  11. Barber RF and Candès EJ (2015). Controlling the false discovery rate via knockoffs. The Annals of Statistics 43, 2055–2085. [Google Scholar]
  12. Bickel PJ and Chen A (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proceedings of the National Academy of Sciences 106, 21068–21073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Billingsley P (1995). Probability and Measure. Wiley. [Google Scholar]
  14. Bourgade P and Yau H-T (2017). The eigenvector moment flow and local quantum unique ergodicity. Communications in Mathematical Physics 350, 231–278. [Google Scholar]
  15. Bourgade P, Yau H-T, and Yin J (2018). Random band matrices in the delocalized phase, i: Quantum unique ergodicity and universality. arXiv preprint arXiv:1807.01559. [Google Scholar]
  16. Candès EJ, Fan Y, Janson L, and Lv J (2018). Panning for gold: ‘model-X’ knockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society Series B 80, 551–577. [Google Scholar]
  17. Capitaine M and Donati-Martin C (2018). Non universality of fluctuations of outlier eigenvectors for block diagonal deformations of wigner matrices. arXiv preprint arXiv:1807.07773. [Google Scholar]
  18. Capitaine M, Donati-Martin C, and Féral D (2012). Central limit theorems for eigenvalues of deformations of Wigner matrices. Ann. Inst. H. Poincaré Probab. Statist 48, 107–133. [Google Scholar]
  19. Chen K and Lei J (2018). Network cross-validation for determining the number of communities in network data. Journal of the American Statistical Association 113, 241–251. [Google Scholar]
  20. Decelle A, Krzakala F, Moore C, and Zdeborová L (2011). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E 84, 066106. [DOI] [PubMed] [Google Scholar]
  21. Dekel Y, Lee JR, and Linial N (2007). Eigenvectors of random graphs: Nodal domains. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pp. 436–448. Springer. [Google Scholar]
  22. El Karoui N (2007). Tracy–Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. Ann. Probab 35, 663–714. [Google Scholar]
  23. Erdős L, Knowles A, Yau H-T, and Yin J (2013). Delocalization and diffusion profile for random band matrices. Communications in Mathematical Physics 323, 367–416. [Google Scholar]
  24. Erdös L, Yau H-T, and Yin J (2011). Rigidity of eigenvalues of generalized Wigner matrices. Advances in Mathematics 229, 1435–1515. [Google Scholar]
  25. Fan J, Fan Y, Han X, and Lv J (2019). SIMPLE: statistical inference on membership profiles in large networks. arXiv preprint arXiv:1910.01734. [Google Scholar]
  26. Fan J, Wang W, and Zhong Y (2018). An ℓ eigenvector perturbation bound and its application to robust covariance estimation. Journal of Machine Learning Reserarch 18, 1–42. [PMC free article] [PubMed] [Google Scholar]
  27. Fan Y, Demirkaya E, Li G, and Lv J (2019). RANK: large-scale inference with graphical nonlinear knockoffs. Journal of the American Statistical Association, to appear. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Fan Y, Lv J, Sharifvaghefi M, and Uematsu Y (2019). IPAD: stable interpretable forecasting with knockoffs inference. Journal of the American Statistical Association, to appear. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Füredi Z and Komlós J (1981). The eigenvalues of random symmetric matrices. Combinatorica 1, 233–241. [Google Scholar]
  30. Han X, Yang Q, and Fan Y (2019). Universal rank inference via residual subsampling with application to large networks. arXiv preprint arXiv:1912.11583. [Google Scholar]
  31. Horn RA and Johnson CR (2012). Matrix Analysis (2nd edition). Cambridge University Press. [Google Scholar]
  32. Jin J, Ke ZT, and Luo S (2017). Estimating network memberships by simplex vertex hunting. https://arxiv.org/pdf/1708.07852.pdf.
  33. Johnstone IM (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist 29, 295–327. [Google Scholar]
  34. Johnstone IM (2008). Multivariate analysis and Jacobi ensembles: Largest eigenvalue, Tracy–Widom limits and rates of convergence. Ann. Statist 36, 2638–2716. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Johnstone IM and Lu AY (2009). On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association 104, 682–693. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Knowles A and Yin J (2013). The isotropic semicircle law and deformation of Wigner matrices. Comm. Pure Appl. Math 66, 1663–1749. [Google Scholar]
  37. Knowles A and Yin J (2014). The outliers of a deformed Wigner matrix. The Annals of Probability 42, 1980–2031. [Google Scholar]
  38. Knowles A and Yin J (2017). Anisotropic local laws for random matrices. Probability Theory and Related Fields 169, 257–352. [Google Scholar]
  39. Koltchinskii V and Lounici K (2016). Asymptotics and concentration bounds for bilinear forms of spectral projectors of sample covariance. Ann. Inst. H. Poincaré Probab. Statist 52, 1976–2013. [Google Scholar]
  40. Koltchinskii V and Xia D (2016). Perturbation of linear forms of singular vectors under Gaussian noise. In: Houdré C, Mason D, Reynaud-Bouret P, Rosiński J (eds) High Dimensional Probability VII, 397–423. [Google Scholar]
  41. Lei J (2016). A goodness-of-fit test for stochastic block models. The Annals of Statistics 44, 401–424. [Google Scholar]
  42. Li T, Levina E, and Zhu J (2020). Network cross-validation by edge sampling. Biometrika 107(2), 257–276. [Google Scholar]
  43. Lu Y, Fan Y, Lv J, and Noble WS (2018). DeepPINK: reproducible feature selection in deep neural networks. Advances in Neural Information Processing Systems (NeurIPS 2018). [Google Scholar]
  44. Marchenko VA and Pastur LA (1967). Distribution of eigenvalues for some sets of random matrices. Mathematics of the USSR-Sbornik 1, 457–483. [Google Scholar]
  45. McSherry F (2001). Spectral partitioning of random graphs. Proceedings of the Fourty-Second IEEE Symposium on Foundations of Computer Science, FOCS, 529–537. [Google Scholar]
  46. Mehta ML (2004). Random Matrices (3rd edition). Academic Press. [Google Scholar]
  47. Paul D (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17, 1617–1642. [Google Scholar]
  48. Pizzo A, Renfrew D, and Soshnikov A (2013). On finite rank deformations of Wigner matrices. Ann. Inst. Henri Poincaré Probab. Stat 49, 64–94. [Google Scholar]
  49. Renfrew D and Soshnikov A (2013). On finite rank deformations of Wigner matrices ii: Delocalized perturbations. Random Matrices: Theory Appl 2, 1250015. [Google Scholar]
  50. Rohe K, Chatterjee S, and Yu B (2011). Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics 39, 1878–1915. [Google Scholar]
  51. Rudelson M and Vershynin R (2016). No-gaps delocalization for general random matrices. Geometric and Functional Analysis 26, 1716–1776. [Google Scholar]
  52. Spielman DA and Teng S-H (2007). Spectral partitioning works: Planar graphs and finite element meshes. Linear Algebra and Its Applications 421, 284–305. [Google Scholar]
  53. Tang M and Priebe CE (2018). Limit theorems for eigenvectors of the normalized Laplacian for random graphs. The Annals of Statistics 46, 2360–2415. [Google Scholar]
  54. Tao T (2004). Topics in Random Matrix Theory. American Mathematical Society. [Google Scholar]
  55. Tracy CA and Widom H (1994). Level-spacing distributions and the Airy kernel. Comm. Math. Phys 159, 151–174. [Google Scholar]
  56. Tracy CA and Widom H (1996). On orthogonal and symplectic matrix ensembles. Comm. Math. Phys 177, 727–754. [Google Scholar]
  57. Tropp J (2012). User-friendly tail bounds for sums of random matrices. Found. Comput. Math 12, 389–434. [Google Scholar]
  58. Verzelen N, Arias-Castro E, et al. (2015). Community detection in sparse random networks. The Annals of Applied Probability 25(6), 3465–3510. [Google Scholar]
  59. Vu V (2018). A simple SVD algorithm for finding hidden partitions. Combinatorics, Probability and Computing 27, 124–140. [Google Scholar]
  60. Wang W and Fan J (2017). Asymptotics of empirical eigenstructure for high dimensional spiked covariance. The Annals of Statistics 45, 1342–1374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Wigner EP (1955). Characteristic vectors of bordered matrices with infinite dimensions. Ann. Math 62, 548–564. [Google Scholar]
  62. Yau H-T (2012). Universality of generalized Wigner matrices. Quantum Theory from Small to Large Scales: Lecture Notes of the Les Houches Summer School 95, 675–692. [Google Scholar]
  63. Zhang Y, Levina E, and Zhu J (2015). Detecting overlapping communities in networks using spectral methods. https://arxiv.org/pdf/1412.3432.pdf.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary containing appendix

RESOURCES