Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Apr 7.
Published in final edited form as: Stat Sin. 2021 Apr;31(2):571–601. doi: 10.5705/ss.202019.0052

Asymptotics of eigenstructure of sample correlation matrices for high-dimensional spiked models

David Morales-Jimenez 1, Iain M Johnstone 2, Matthew R McKay 3, Jeha Yang 2
PMCID: PMC8026145  NIHMSID: NIHMS1602372  PMID: 33833489

Abstract

Sample correlation matrices are widely used, but for high-dimensional data little is known about their spectral properties beyond “null models”, which assume the data have independent coordinates. In the class of spiked models, we apply random matrix theory to derive asymptotic first-order and distributional results for both leading eigenvalues and eigenvectors of sample correlation matrices, assuming a high-dimensional regime in which the ratio p/n, of number of variables p to sample size n, converges to a positive constant. While the first-order spectral properties of sample correlation matrices match those of sample covariance matrices, their asymptotic distributions can differ significantly. Indeed, the correlation-based fluctuations of both sample eigenvalues and eigenvectors are often remarkably smaller than those of their sample covariance counterparts.

Keywords: Sample correlation, eigenstructure, spiked models

1. Introduction

Estimating a correlation matrix is a fundamental statistical task. It is widely applied in areas such as viral sequence analysis and vaccine design in biology (Dahirel et al., 2011, Quadeer et al., 2014, 2018), large portfolio design in finance (Plerou et al., 2002), signal detection in radio astronomy (Leshem and van der Veen, 2001), and collaborative filtering (Liu et al., 2014, Ruan et al., 2016), among many others. In classical statistical settings, with a limited number of variables p and a large sample size n, the sample correlation matrix performs well and its statistical properties are well understood; see, for example, Girshick (1939), Konishi (1979), Fang and Krishnaiah (1982), Schott (1991), Kollo and Neudecker (1993), and Boik (2003). Modern applications, however, often exhibit high dimensionality, with large p and, in many cases, limited n. In such cases, sample correlation matrices become inaccurate owing to an aggregation of statistical noise across the matrix coordinates that is visible in the eigen-spectrum (El Karoui, 2009). This is particularly important in principal component analysis (PCA), which often involves projecting data onto the leading eigenvectors of the sample correlation matrix or, equivalently, onto those of the sample covariance matrix after standardizing the data.

Despite the extensive use of sample correlation matrices, relatively little is known about theoretical properties of their eigen-spectra in high dimensions. In contrast, sample covariance matrices have been studied extensively, and a rich body of literature now exists (e.g., Yao et al. (2015)). Their asymptotic properties have typically been described in high-dimensional settings in which the number of samples and variables both grow large, often though not always at the same rate, based on the theory of random matrices. Specific first- and second-order results for the eigenvalues and eigenvectors of sample covariance matrices are reviewed in Bai and Silverstein (2009), Couillet and Debbah (2011), and Yao et al. (2015).

For the spectra of high-dimensional sample correlation matrices, current theoretical results focus on the simplest “null model” scenario, in which the data are assumed to be independent. In this null model, correlation matrices share many of the same asymptotic properties as covariance matrices from independent and identically distributed (i.i.d.) data, with zero mean and unit variance. Thus, the empirical eigenvalue distribution converges to the Marchenko–Pastur distribution, almost surely (Jiang, 2004b), and the largest and smallest eigenvalues converge to the edges of this distribution (Jiang, 2004b, Xiao and Zhou, 2010). Moreover, the rescaled largest and smallest eigenvalues asymptotically follow the Tracy–Widom law (Bao et al., 2012, Pillai and Yin, 2012). Central limit theorems (CLTs) for linear spectral statistics have also been derived (Gao et al., 2017). A separate line of work studies the maximum absolute off-diagonal entry of sample correlation matrices, referred to as “coherence” (Jiang, 2004a, Cai and Jiang, 2011, 2012), which has been proposed as a statistic for conducting independence tests; see also Cochran et al. (1995), Mestre and Vallet (2017), and the references therein. Hero and Rajaratnam (2011, 2012) use a related statistic to identify variables exhibiting strong correlations, an approach referred to as “correlation screening.”

For non-trivial correlation models, however, asymptotic results for the spectra of sample correlation matrices are quite scarce. Notably, El Karoui (2009) shows that, for a fairly general class of covariance models with bounded spectral norm, to first order, the eigenvalues of sample correlation matrices asymptotically coincide with those of sample covariance matrices with unit-variance data, generalizing earlier results of Jiang (2004b) and Xiao and Zhou (2010). Under similar covariance assumptions, recent work also presents CLTs for linear spectral statistics of sample correlation matrices (Mestre and Vallet, 2017), extending the work of Gao et al. (2017). First order behavior again coincides with that of sample covariances. However, the asymptotic fluctuations are quite different for sample correlation matrices.

This study considers a particular class of correlation matrix models, the so-called “spiked models,” in which a few large or small eigenvalues of the population covariance (or correlation) matrix are assumed to be well separated from the rest (Johnstone, 2001). Spiked covariance models are relevant in applications in which the primary covariance information lies in a relatively small number of eigenmodes. Such applications include collaborative signal detection in cognitive radio systems (Bianchi et al., 2009), fault detection in sensor networks (Couillet and Hachem, 2013), adaptive beamforming in array processing (Hachem et al., 2013, Vallet et al., 2015, Yang et al., 2018), and protein contact prediction in biology (Cocco et al., 2011, 2013). The spectral properties of spiked covariance models have been well studied, with precise analytical results established for the asymptotic first-order and distributional properties of both eigenvalues and eigenvectors; see, for example, Baik et al. (2005), Baik and Silverstein (2006), Paul (2007), Bai and Yao (2008), Benaych-Georges and Nadakuditi (2011), Couillet and Hachem (2013), Bloemendal et al. (2016). For reviews, see also Couillet and Debbah (2011, Chapter 9) and Yao et al. (2015, Chapter 11).

Less is known about the spectrum of sample correlation matrices under spiked models. Although the asymptotic first-order behavior is expected to coincide with that of the sample covariance, as a consequence of El Karoui (2009), a simple simulation reveals striking differences in the fluctuations of both sample eigenvalues and eigenvectors; see Figure 1.

Figure 1:

Figure 1:

A simple simulation shows remarkable distributional differences between sample covariance and sample correlation. From n = 200 i.i.d. Gaussian samples, xi100, with covariance Σ = blkdiag(Σs, I90), where (Σs)i,j=110=(r|ij|)i,j=110, for r = 0.95, we compute the sample covariance and sample correlation, and show: (a) the empirical density (normalized histogram) of the largest sample eigenvalue, along with a Gaussian distribution with its estimated mean and standard deviation (solid line), and (b) a scatter plot of the leading sample eigenvector, projected onto the second (x-axis) and fourth (y-axis) population eigenvectors. A striking variance reduction is observed in the sample correlation for both (a) and (b). A similar variance reduction is observed for different choices of population eigenvectors in (b); the selected choice (being the second and fourth eigenvectors) facilitates the illustration of an additional correlation effect in the sample-to-population eigenvector projections.

Here, we present theoretical results to describe these observed phenomena. We obtain asymptotic first-order and distribution results for the eigenvalues and eigenvectors of sample correlation matrices under a spiked model. Paul (2007) proved theorems for sample covariance matrices in the special case of Gaussian data. In essence, we present analogs of these theorems for sample correlation matrices, and extend them to non-Gaussian data. To first order, the eigenvalues and eigenvectors coincide asymptotically with those of sample covariance matrices; however, their fluctuations can be very different. Indeed, for both the largest sample correlation eigenvalues (Theorem 1) and the projections of the corresponding eigenvectors (Theorem 2), the asymptotic variances admit a decomposition into three terms. The first term is just the asymptotic variance for sample covariance matrices generated from Gaussian data; the second adds corrections due to non-Gaussianity, and the third captures further corrections due to data normalization imposed by the sample correlation matrix. (This last amounts to normalizing the entries of the sample covariance matrix using the sample variances). Consistent with the example shown in Figure 1(a), in the CLT for the leading sample eigenvalues, the sample correlation eigenvalues often show lower fluctuations—despite the variance normalization—than those of the sample covariance eigenvalues. As seen in Figure 1(b), the (normalized) eigenvector projections are typically asymptotically correlated, even for Gaussian data, unlike the sample covariance setting of Paul (2007, Theorem 5).

Technical contributions

We build on and extend a set of random matrix tools for studying spiked covariance models. The companion manuscript (Johnstone and Yang, 2018) [JY], gives an exposition and parallel treatment for sample covariance matrices. Important adaptations are needed here to account for the data normalization imposed by sample correlation matrices. Among key technical contributions of our work, basic to our main theorems, are asymptotic first-order and distributional properties for bilinear forms and matrix quadratic forms with normalized entries, Section 4. A novel regularization-based proof strategy is used to establish the inconsistency of eigenvector projections in the case of “subcritical” spiked eigenvalues, Theorem 3.

Model M

Let xm+p be a random vector with finite (4+δ)th moment for some δ > 0. Consider the partition

x=[ξη].

Assume that ξm has mean zero and covariance Σ, and is independent of ηp, which has i.i.d components ηi with mean zero and unit variance. Let ΣD=diag(σ12,,σm2) be the diagonal matrix containing the variances of ξi, and let Γ=ΣD1/2ΣΣD1/2 be the correlation matrix of ξ with eigen-decomposition Γ = PLPT, where P = [p1, …, pm] is the eigenvector matrix, and L = diag(1, …, m) contains the spike correlation eigenvalues 1 ≥ … ≥ m > 0.

The correlation matrix of x is therefore Γx = blkdiag(Γ, I), with eigenvalues 1, …, m,1, …, 1, and corresponding eigenvectors p1,,pm, em+1, …, em+p, where pi=[piT0pT]T and ej is the jth canonical vector (i.e., a vector of all zeros, except for a one in the jth coordinate).

Consider a sequence of i.i.d. copies of x, the first n of which fill the columns of the (m + p) × n data matrix X = (xij). We assume m is fixed, whereas p and n increase with

γn=p/nγ>0asp,n.

Notation

Let S = n−1XXT be the sample covariance matrix, and SD=diag(σ^12,,σ^m+p2) be the diagonal matrix containing the sample variances. Let R=SD1/2SSD1/2 be the sample correlation matrix, with corresponding νth sample eigenvalue and eigenvector satisfying

Rp^ν=^νp^ν,

where, for later use, we partition p^ν=[p^νT,v^νT]T. Here p^ν is the subvector of p^ν restricted to the first m coordinates.

For >1+γ, define

ρ(,γ)=+γ1,ρ˙(,γ)=ρ(,γ)=1γ(1)2.

For an index ν, for which ν>1+γ is a simple eigenvalue, set

ρν=ρ(ν,γ),ρνn=ρ(ν,γn),ρ˙ν=ρ˙(ν,γ),ρ˙vn=ρ˙(ν,γn). (1.1)

We refer to eigenvalues satisfying ν>1+γ as “supercritical,” and those satisfying ν1+γ as “subcritical,” with the quantity 1+γ referred to as the “phase transition.”

To describe and interpret the variance terms in the limiting distributions to follow, we need some definitions. Let ξ¯i=ξi/σi and κij=Eξ¯iξ¯j denote the scaled components of ξ and their covariances; of course κii = 1. The corresponding scaled fourth-order cumulants are

κijij=E[ξ¯iξ¯jξ¯iξ¯j]κijκijκijκjiκiiκjj. (1.2)

When ξ is Gaussian,κijij0.

The effect of variance scaling in the correlation matrix is described using additional quadratic functions of (ξ¯i), defined by

χij=ξ¯iξ¯j,ψij=κij(ξ¯i2+ξ¯j2)/2 (1.3)
κˇijij=Cov(ψij,ψij)Cov(ψij,χij)Cov(χij,ψij). (1.4)

Tensor notation

For convenience, it is useful to consider κijij and κˇijij as entries of four-dimensional tensor arrays κ and κˇ, respectively, and to define an additional array Pμμνν with entries pμ,ipμ,jpν,ipν,j. In addition, define Pν as Pνννν. Finally, for a second array A of the same dimensions,

[Pν,A]=i,j,i,jPijijνAijij.

2. Main results

Our first main result, proved in Section 5, gives the asymptotic properties of the largest (spike) eigenvalues of the sample correlation matrix:

Theorem 1

Assume Model M, and that ν>1+γ is a simple eigenvalue. As p/nγ > 0,

(i)^νa.s.ρν,(ii)n(^νρνn)DN(0,σ˜ν2), (2.5)

where

σ˜ν2=2ρ˙νν2+ρ˙ν2[Pν,κ]+ρ˙ν2[Pν,κˇ]. (2.6)

Centering at ρνn rather than at ρν is important. If, for example, γn = γ + an−1/2, then

n(^νρν)DN(aν(ν1)1,σ˜ν2),

and we see a limiting shift. Furthermore, it may also be beneficial to consider σ˜νn2 instead of σ˜ν2, obtained by replacing ρ˙ν with ρ˙νn in (2.6), such that

n(^νρνn)/σ˜νnDN(0,1).

The asymptotic first-order limit in (i), which follows as an easy consequence of El Karoui (2009), coincides with that of the νth largest eigenvalue of a sample covariance matrix computed from data with population covariance Γ (Paul, 2007). This implies that, when constructing R, normalizing by the sample variances has no effect on the leading eigenvalues, at least to first order.

However, key differences are seen when looking at the asymptotic distribution, given in (ii), and in the variance formula (2.6) in particular. This can be readily interpreted. The first term corresponds to the variance in the Gaussian-covariance case of Paul (2007), again for samples with covariance Γ. The second provides a correction of that result for non-Gaussian data, see the companion article [JY]. The third term describes the contribution specific to sample correlation matrices, representing the effect of normalizing the data by the sample variances. This term is often negative, and is evaluated explicitly for Gaussian data in Corollary 1 below, proved in the Supplementary Material, S1.1.

Corollary 1

For ξ Gaussian, the asymptotic variance in Theorem 1 simplifies to

σ˜ν2=2ν2ρ˙ν[1ρ˙ν(2νtrPD,ν4tr(PD,νΓPD,ν)2)],

where PD,ν = diag(pν,1, …, pν,m).

Thus, computing the sample correlation results in the asymptotic variance being scaled by 1ρ˙νΔν, relative to the sample covariance, where

Δν=2νtrPD,ν4tr(PD,νΓPD,ν)2=2νipν,i4i,j(pν,iκijpν,j)2

is often positive, implying that spiked eigenvalues of the sample correlation often exhibit a smaller variance than those of the sample covariance. Indeed, such variance reduction occurs iff

i,j(pν,iκijpν,j)2<2νipν,i4=i,jpν,iκijpν,j(pν,i2+pν,j2), (2.7)

with the last identity following from the fact that νpν,i=jκijpν,j. Condition (2.7), and variance reduction, holds in the following cases:

  1. both Γ and pν have nonnegative entries, or

  2. 2νipν,i4>1, or

  3. 2ν>12.

In case (i), the inequalities 0pν,iκijpν,j2pν,ipν,jpν,i2+pν,j2 yield (2.7). Note that if Γ has nonnegative entries, then the Perron–Frobenius theorem establishes the existence of an eigenvector with nonnegative components for 1; furthermore, if Γ has positive entries, by the same theorem, 1 is simple and associated with an eigenvector with positive components. Case (ii) follows from i,j(pν,iκijpν,j)2i,j(pν,ipν,j)2=1, and holds if ν > m/2, because ipν,i41/m. Case (iii) follows from the inequalities 2pν,i2pν,j2pν,i4+pν,j4 and jκij2=(Γ2)iiΓ2=12. Note that this is rather special, in that it has nothing to do with eigenvectors, and a necessary condition for it to hold is 1 ≤ 2.

Condition (2.7) can fail, however. For example, for even m and r ∈ (0, 1), consider

Γ=(1rr1)1m/21m/2T,

where 1m/2 is the (m/2)-dimensional vector of all ones, which corresponds to two negatively correlated groups of identical random vectors. This has simple supercritical eigenvalues 1 = (1 + r)m/2 and 2 = (1 − r)m/2 when m>2(1+γ)/(1r), with pν,i2=m1 for ν = 1, 2. One finds that Δ2 = (1 − 2rr2)/2 < 0 for r>21, although Δ1 > 0 because 1 > m/2, which implies case (ii).

We turn now to the eigenvectors. Again, fix an index ν for which ν>1+γ is a simple eigenvalue of Γ, with corresponding eigenvector pν=[pνT0pT]T. Recall that p^ν=[p^νTv^νT]T is the νth sample eigenvector of R, and let aν=p^ν/p^ν be the corresponding normalized subvector of p^ν, restricted to the first m coordinates. The next result establishes a limit for the eigenvector projection p^ν,pν, and a CLT for the normalized cross-projections PTaν=[p1Taν,,pmTaν]T; see Sections 6.1 and 6.2.

Theorem 2

Assume Model M, and that ν>1+γ is a simple eigenvalue. Then, as p/nγ > 0,

(i)p^ν,pν2a.s.ρ˙νν/ρν,(ii)n(PTaνeν)DN(0,Σν),

where Σν=DνΣ˜νDν with

Dν=kνm(νk)1ekekT (2.8)
Σ˜ν,kl=ρ˙ν1kνδk,l+[Pkνlν,κ]+[Pkνlν,κˇ], (2.9)

where δk,l = 1 if k = l, and zero otherwise.

The CLT result in (ii) can be rephrased in terms of the entries of aν, for which we readily obtain n(aνpν)DN(0,PΣνPT); note that Σν has zeros in the νth row and the νth column.

As for the eigenvalues, Theorem 2 shows that the spiked eigenvectors of sample correlation matrices exhibit the same first-order behavior as those of the sample covariance (Paul, 2007). The difference again lies in the asymptotic fluctuations, captured by the covariance matrix Σν. Note that this is decomposed as a product of Dνa diagonal matrix—and the matrix Σ˜ν, which involves the three terms in (2.9). These terms have similar interpretations as those discussed previously in (2.6). That is, the first term captures the asymptotic fluctuations for a Gaussian-covariance model (Paul, 2007), the second term captures the effect of non-Gaussianity in the covariance case [JY], and the third term captures information specific to the correlation case, representing fluctuations due to sample variance normalization. Note that only the first term is diagonal in general, suggesting that the eigenvector projections may be asymptotically correlated, as seen earlier in Figure 1(b), right panel. This holds also for Gaussian data, evaluated explicitly in Corollary 2 below; see Supplementary Material, S1.2, for the proof. We note an interesting contrast with the eigenvector projections for covariance matrices (Paul, 2007), described only by the leading term in (2.9).

Corollary 2

For ξ Gaussian, the asymptotic covariance in Theorem 2 reduces to Σν=DνΣ˜νDν,

Σ˜ν=νρ˙νL+(νI+L)(12ZνY)(νI+L)+ν(ν2YLYL),

where Z=PTPD,ν(ΓΓ)PD,νP, Y=PTPD,ν2P, anddenotes the Hadamard product.

Thus, for Gaussian data, the entries of the asymptotic covariance matrix are given by (for k, lν)

Σν,kl=(νk)1(νl)1[νρ˙νkδk,l+(ν+k)(ν+l)Zkl2ν(ν(k+l)+2kl)Ykl].

Consider now the subcritical case in which ν is such that 1<ν1+γ. Let pν denote the corresponding population eigenvector, and let ^ν and p^ν denote the corresponding sample eigenvalue and eigenvector, respectively. With proofs deferred to Sections 5.1 and 6.3, we have the following result:

Theorem 3

Assume Model M, and that 1<ν1+γ is a simple eigenvalue. Then, as p/nγ > 0,

(i)^νa.s.(1+γ)2,(ii)p^ν,pν2a.s.0.

Once again, the asymptotic first-order limits of the sample eigenvalue and its associated eigenvector are the same as those obtained for the sample covariance (Paul, 2007).

Recall that our high-dimensional results assume an asymptotic regime where p/nγ > 0, as opposed to the classical regime where p is fixed and n → ∞. The case of fixed p corresponds to γ = 0 and the spectral properties of the sample correlation matrix are well understood; see, for example, Girshick (1939), Konishi (1979), Fang and Krishnaiah (1982), Schott (1991), Kollo and Neudecker (1993), and Boik (2003). When γ = 0, the function ρ() reduces to the identity. Indeed, for fixed p, there is no high-dimensional component η in Model M, and hence no biasing effect on ρ(, γ) that occurs when γ > 0. In particular, for fixed p there is no counterpart to our Theorem 3.

To summarize, in comparison to the high-dimensional (p/nγ > 0) sample covariance setting, our results for the spiked eigenvalues and eigenvectors of sample correlation matrices confirm that the first-order asymptotic behavior is indeed equivalent to that of sample covariance matrices, in agreement with previous results and observations (El Karoui, 2009, Mestre and Vallet, 2017). While the eigenvalue limits in Theorem 1 and Theorem 3 follow as a straightforward consequence of El Karoui (2009), the eigenvector results of Theorem 2-(i) and Theorem 3-(ii) do not. In contrast to the first-order equivalences, important differences arise in the fluctuations of both the eigenvalues and eigenvectors, as shown by the asymptotic distributions of Theorem 1-(ii) and Theorem 2-(ii).

We illustrate these differences with a simple example having covariance Γ=(1r)Im+r1m1mT, where r ∈ [0, 1]; that is, a model with unit variances and constant correlation r across all components. Moreover, ξ is assumed to be Gaussian for simplicity. In this setting, L = diag(1, 1 − r, …, 1 − r), where 1 = 1 + r(m − 1) is supercritical iff r>γ/(m1). Consider the largest sample eigenvalue ^1 in such a supercritical case. From Corollary 1, the asymptotic variances for the sample covariance and the sample correlation can be computed, yielding

σ12=212ρ˙1,σ˜12=σ12(1ρ˙1Δ),

respectively, with Δ=21trPD4tr(PDΓPD)2, and where

PDPD,1=m1/2Im,ρ˙1=1γr2(m1)2.

Figure 2(a) plots these asymptotic variances versus r for various (γ,m). Indeed, the variance (fluctuation) for the sample correlation is consistently smaller than for the sample covariance. The difference is striking, becoming extremely large as r ↗ 1. Similar trends are observed for various choices of m and γ, being more pronounced for higher m, while not much affected by varying γ. This may be understood from the fact that, after writing Δ = r(2 − r) + (1 − r)2m−1 = 1 − (1 − r)2(1 − m−1),

σ˜12σ12=1ρ˙1Δ{γ(m1)2asr1,mfixed(1r)2asm,rfixed.

Turn now to the fluctuations of the leading sample eigenvector, in the same setting as above. Note that, in Corollary 2, for this particular case, one can deduce from PTΓP = L that

Z=m1(1r2)Im+r2e1e1T,Y=m1Im.

Also from Corollary 2, the asymptotic variances for the normalized sample-to-population eigenvector projection p2Ta1, in the sample covariance and sample correlation cases, are computed as

Σ1,22cov=12(rm)2ρ˙1,Σ1,22=Σ1,22covζ(rm)212(1+2)m,

respectively, where ζ=1r+12(1+r)(1+1rrm)1, and we recall that 1 = 1 − r + rm and 2 = 1 − r. These variances are numerically evaluated in Figure 2(b) for the same parameter choices as before and, again, as functions of r. Note, however, that for better visual appreciation, the range of r has been restricted to supercritical values sufficiently above the critical point γ/(m1), because the variance explodes at that point. The comparative evaluation again shows smaller variances for the sample correlation. The variance reduction here is less visible in the graphs, because both Σ1,22 and Σ1,22cov vanish as r → 1. The ratio, however, behaves quite similarly to the variance ratio σ˜12/σ12:

Σ1,22Σ1,22cov=1ζρ˙1(1+2)m{γ(m1)2asr1,mfixed(1r)(1r/2)asm,rfixed.

Figure 2:

Figure 2:

Differences in the fluctuations of sample eigenvalues and eigenvectors for an example Gaussian model with Γ=(1r)Im+r1m1mT. Asympotic variances are shown for (a) the largest sample eigenvalue ^1, and (b) the normalized sample-to-population eigenvector projection p2Ta1.

We end the discussion of our main results with a few remarks about possible extensions. Our results assume that ν > 1 is a simple eigenvalue, but extensions for small spikes with ν < 1 and for spikes with multiplicities should be possible. Analogous results for eigenvalues have been obtained for sample covariance matrices for ν < 1, including multiplicities greater than one (e.g., see Bai and Yao (2008)), giving reason to expect corresponding results for correlation matrices. Extensions of our results for eigenvalues and eigenvectors of sample correlation matrices for simple ν < 1 should be fairly straightforward, though the cases γ < 1, γ = 1, and γ > 1 would need separate treatment. Extensions for spikes with multiplicities are also possible, but in this case the eigenvectors are not well defined and one would need to consider subspace projections, requiring non-trivial modifications of our technical arguments.

The remainder of the paper proceeds as follows. First, in Section 3, we introduce key quantities and identities used in the derivations. Section 4 presents necessary asymptotic properties for bilinear forms and matrix quadratic forms with normalized entries, with the corresponding proofs relegated to the Supplementary Material, Section S3. These properties provide a foundation for describing the asymptotic convergence and distribution of eigenvalues and eigenvectors of sample correlation matrices, derived in Sections 5 and 6 respectively.

As already noted, a parallel treatment for the simpler case of covariance matrices is given in a supplementary manuscript [JY]. This aims at a unified exposition of known spectral properties of spiked covariance matrices as a benchmark for the current work, along with additional citations to the literature.

3. Preliminaries

We begin with a block representation and some associated reductions for the sample correlation matrix R. These are well known in the covariance matrix setting. As with the partition of x in Model M, consider

X=[X1X2],X1m×n,X2p×n.

Write SD = blkdiag(SD1, SD2), with SD1 containing the sample variances corresponding to ξ, and SD2 containing those corresponding to η. Define the “normalized” data matrices X¯1=SD11/2X1 and X¯2=SD21/2X2, such that

R=n1[X¯1X¯1TX¯1X¯2TX¯2X¯1TX¯2X¯2T]=[R11R12R21R22];p^ν=[p^νv^ν].

This partitioning of the eigenvector equation Rp^ν=^νp^ν, along with p^ν=[p^νT,v^νT]T, yields

R11p^ν+R12v^ν=^νp^v
R21p^ν+R22v^ν=^νv^ν.

From the second equation, v^ν=(^νIpR22)1R21p^ν. Substituting this into the first equation yields

K(^ν)p^ν=^νp^ν,withK(t)=R11+R12(tIpR22)1R21.

Thus, ^v is an eigenvalue of K(^ν), with associated eigenvector p^ν; this is central to our derivations. Note that K(^ν) is well defined if ^v is well separated from the eigenvalues of R22; Section 5.1 shows that this occurs with probability one for all large n when ν is supercritical. Furthermore, the normalization condition, p^νTp^ν+v^νTv^ν=1 yields

p^νT(Im+Qν)p^ν=1,Qν=R12(^νIpR22)2R21.

Phrased in terms of the signal-space normalized eigenvector aν=p^ν/p^ν, we have

K(^ν)aν=^νaν,aνT(Im+Qν)aν=p^ν2. (3.10)

Note also that the sample-to-population inner product can be rewritten as

p^ν,pν=p^ν,pν=p^νaν,pν. (3.11)

In the derivation of our CLT results, we use an eigenvector perturbation formula with quadratic error bound given in [JY, Lemma 13], itself a modification of the arguments in Paul (2007). This yields the key expansion

aνpν=RνnDνpν+rν, (3.12)

where

Rνn=νρνnkνm(kν)1pkpkT,Dν=K(^ν)(ρνn/ν)Γ,rν=O(Dν2).

The derivations of our eigenvalue and eigenvector results, presented in Sections 5 and 6 respectively, take (3.10), (3.11) and (3.12) as points of departure, and rely on asymptotic properties of the key objects K(^ν) and Qν. In particular, K(t) can be expressed as the random matrix quadratic form

K(t)=n1X¯1Bn(t)X¯1T, (3.13)

where, using the Woodbury identity,

Bn(t)=In+n1X¯2T(tIpR22)1X¯2=t(tInn1X¯2TX¯2)1.

Thus, our key objects are random quadratic forms involving the normalized data matrices X¯1 and X¯2. The asymptotic properties of these forms are foundational to our results, and are presented next.

4. Quadratic forms with normalized entries

In this section, we establish the first-order (deterministic) convergence and a CLT for matrix quadratic forms of the type n1X¯1BnX¯1T, where Bn is a matrix with bounded spectral norm. While being essential to our purposes, some of the technical results may be of independent interest; thus, we first present the general results, and then apply these in the context of Model M.

4.1. First-order convergence

To establish the first-order convergence, we first require some results on bilinear forms involving correlated random vectors of unit length. A main technical result (see Supplementary Material, S3.1) is the following:

Lemma 1

Let B be an n × n nonrandom symmetric matrix, and let x,yn be random vectors of i.i.d. entries with mean zero, variance one, E|xi|l, E|yi|lνl, and E[xiyi]=ρ. Let x¯=nx/x and y¯=ny/y. Then, for any s ≥ 1,

E|n1x¯TBy¯ρn1trB|sCs[ns(ν2strBs+(ν4trB2)s/2)+Bs(ns/2ν4s/2+ns+1ν2s)],

where Cs is a constant depending only on s.

This is a generalization of Gao et al. (2017, Lemma 5), which established a corresponding bound for normalized quadratic forms. Lemma 1 leads to the following first-order convergence result:

Corollary 3

Let x,y be random vectors of i.i.d. entries with mean zero, variance one, E|xi|4+δ, E|yi|4+δ< for some δ > 0, and E[xiyi]=ρ. Define x¯=nx/x and y¯=ny/y, and let Bn be a sequence of n × n symmetric matrices, withBnbounded. Then,

n1x¯TBny¯n1ρtrBna.s.0.

Proof. Because the (4 + δ)th moment and ‖Bn‖ are bounded, from Lemma 1,

E|n1x¯TBny¯n1ρtrBn|2+δ/2O(n(1+δ/4)).

The convergence then follows from Markov’s inequality and the Borel–Cantelli lemma. ☐

We now apply this to our Model M with random matrices Bn(X¯2), independent of X¯1:

Lemma 2

Assume Model M, and suppose that Bn=Bn(X¯2) is a sequence of random symmetric matrices, for which ‖Bn‖ is Oa.s.(1). Then,

n1X¯1Bn(X¯2)X¯1Tn1trBn(X¯2)Γa.s.0.

Proof. This follows from Fubini’s theorem. Specifically, one may use the arguments in the proof of [JY, Lemma 5], applying Corollary 3, and noting that X¯1 is independent of Bn(X¯2). ☐

4.2. Central Limit Theorem

To establish our main matrix quadratic-form CLT result, we first derive a CLT for scalar bilinear forms involving normalized random vectors. To this end, we must introduce some further notation. Consider zero-mean random vectors (x,y)M×M, with

Cov(xy)=C=(CxxCxyCyxCyy),

where Cllxy=E[xlyl]. Assume Cllxx=Cllyy=1; that is, all components of the x and y vectors have unit variance and ρl=Cllxy=E[xlyl]. We first introduce notation for some quadratic functions of xl, yl. Let z,wM, with

zl=xlyl,wl=ρl(xl2+yl2)/2,Czz=Cov(z),Cwz=Cov(z,w),etc.

Let X = (xli)M×n and Y = (yli)M×n be data matrices based on n i.i.d. observations of (x, y), and define the “normalized” data matrices X¯=Σ^x1/2X and Y¯=Σ^y1/2Y, where Σ^x=diag(σ^x12,,σ^xM2), Σ^y=diag(σ^y12,,σ^yM2), and σ^xl2=n1i=1nxli2,σ^yl2=n1i=1nyli2. Then, we use the following notation for the rows x¯lT and y¯lT of the normalized data matrices

X¯=(x¯li)M×n=[x¯1.Tx¯MT],Y¯=(y¯li)M×n=[y¯1Ty¯MT].

With this setup, we have the following result, proved in the Supplementary Material, S3.2:

Proposition 1

Let Bn = (bn,ij) be random symmetric n × n matrices, independent of X, Y, such that for some finite β, ‖Bn‖ ≤ β for all n, and

n1i=1nbn,ii2pω,n1trBn2pθ,(n1trBn)2pϕ,

all finite. In addition, define ZnM, with components

Zn,l=n1/2[x¯l.TBny¯l.ρltrBn].

Then, ZnDNM(0,D), with

D=(θω)J+ωK1+ϕK2=θJ+ωK+ϕK2, (4.14)

where K = K1 − J and J,K1,K2 are matrices defined by

J=CxyCyx+CxxCyyK1=CzzK2=CwwCwzCzw. (4.15)

The entries of K are fourth-order cumulants of x and y:

Kll=E(xlylxlyl)E(xlyl)E(xlyl)E(xlyl)E(ylxl)E(xlxl)E(ylyl). (4.16)

Hence, K vanishes if x, y are Gaussian.

The corresponding result with unnormalized vectors is established in [JY Theorem 10]. The terms θJ + ωK appear in that case, and the additional term ϕK2 reflects the normalization in x¯l. and y¯l.. As in [JY], the proof is based on the martingale CLT, rather than the moment method used in Bai and Yao (2008), which stated a similar result for quadratic forms involving unnormalized random vectors.

While potentially of independent interest, Proposition 1 is important for our purposes through its application to Model M.

Proposition 2

Assume Model M, and consider Bn as in Proposition 1. Then,

Wn=n1/2[X¯1BnX¯1T(trBn)Γ]DW,

where W is a symmetric m × m Gaussian matrix with entries Wij, mean zero, and covariances given by

Cov[Wij,Wij]=θ(κijκji+κiiκjj)+ωκijij+ϕκˇijij, (4.17)

for i ≤ j and i′ ≤ j′.

Proof. The result follows from Proposition 1 by turning the matrix quadratic form X¯1BnX¯1T into a vector of bilinear forms; see, for example, [JY, Proposition 6] and Bai and Yao (2008, Proposition 3.1). Specifically, use an index l for the M = m(m + 1)/2 pairs (i, j), with 1 ≤ ijm. Build the random vectors (x, y) for Proposition 1 as follows: if l = (i, j), then set xl = ξii and yl = ξjj. In the resulting covariance matrix C for (x, y), if also l′ = (i′, j′),

Cllxy=E[ξiξj]/(σiσj)=κij,Cllyx=κji,Cllxx=κii,Cllyy=κjj

and, in particular, ρl=Cllxy=κij and ρl=κij, whereas Cllxx=Cllyy=1. Component Wn,ij corresponds to component Zl in Proposition 1. Thus, we conclude that WnDW, where W is a Gaussian matrix with zero mean and Cov(Wij,Wi,j)=Dll, given by Proposition 1. It remains to interpret the quantities in (4.14) in terms of Model M. Substituting xl=ξ¯i and yl=ξ¯j into (4.16) and chasing definitions, we obtain Jll=κijκji+κiiκjj and Kll=κijij. Observing that zl = xlyl = χij and wl=ρl(xl2+yl2)/2=ψij, we similarly find that K2,ll=κˇijij. ☐

5. Proofs of the eigenvalue results

In this section, we derive the main eigenvalue results, presented in Theorem 1 and Theorem 3-(i).

5.1. Preliminaries

Convergence properties of the eigenvalues of R22

It is well known that the empirical spectral density (ESD) of S22 converges weakly a.s. to the Marchenko–Pastur (MP) law Fγ, and that the extreme non-trivial eigenvalues converge to the edges of the support of Fγ. For the sample correlation case, Jiang (2004b) shows that the same is true for R22. That is, the empirical distribution of the eigenvalues μ1 ≥ … ≥ μp of the “noise” correlation matrix R22=n1X¯2X¯2T converges weakly a.s. to the MP law Fγ, supported on [aγ,bγ]=[(1γ)2,(1+γ)2], if γ ≤ 1, and on {0} ∪ [aγ, bγ] otherwise. Furthermore, the ESD of the n × n companion matrix Cn=n1X¯2TX¯2, denoted by Fn, converges weakly a.s. to the “companion MP law” Fγ = (1 − γ)1[0,∞) + γFγ, where 1A denotes the indicator function on set A.

In addition, Jiang (2004b) shows that

μ1a.s.bγandμpna.s.aγ. (5.18)

Based on these results, if fnf uniformly as continuous functions on the closure I of a bounded neighborhood of the support of Fγ, then:

fn(x)Fn(dx)a.s.f(x)Fγ(dx). (5.19)

If supp(Fn) is not contained in I, then the left side integral may not be defined. However, such an event occurs for at most finitely many n with probability one.

Almost sure limit of ^ν

The statements in Theorem 1-(i) and Theorem 3-(i) follow easily from known results. Specifically, denote the νth eigenvalue of the sample covariance S by λ^ν. The almost sure limits

λ^νa.s.{ρν,ν>1+γ(1+γ)2,1<ν1+γ (5.20)

were established in Baik and Silverstein (2006). From the proof of El Karoui (2009, Lemma 1),

maxi=1,...,m|λ^i^i|a.s.0.

Therefore, the same almost sure limits as (5.20) hold for ^ν.

High-probability events J, Jnϵ1

When necessary, we may confine attention to the event Jnϵ={^ν>min(ρν,ρνn)ϵ,μ1bγ+ϵ} or Jnϵ1={μ1bγ+ϵ}, with ϵ > 0 chosen such that ρνbγ ≥ 3ϵ, because from (2.5) (proven above) and (5.18), these events occur with probability one for all large n.

Asymptotic expansion of K(^ν)

We establish an asymptotic stochastic expansion for the quadratic form K(^ν). Specifically, using the decomposition

K(^ν)=K(ρνn)+[K(^ν)K(ρνn)], (5.21)

we show that

K(ρνn)a.s.ρνm(ρν;γ)Γ=(ρν/ν)Γ (5.22)

and

K(^ν)K(ρνn)=(^νρνn)[c(ρν)Γ+oa.s.(1)], (5.23)

where, for tsupp(Fγ),

m(t;γ)=(xt)1Fγ(dx),c(t)=x(tx)2Fγ(dx).

Here, m is the Stieltjes transform of the companion distribution Fγ.

In establishing (5.22), start by taking sufficiently large n such that |ρνnρν| ≤ ϵ, with ϵ defined as above. For such n, on Jnϵ1, we have

Bn(ρνn)ρν+ϵϵ.

Because Jnϵ1 holds with probability one for all large n, ‖Bn(ρνn)‖ = Oa.s.(1) and, therefore, it follows from Lemma 2 that

K(ρνn)n1trBn(ρνn)Γa.s.0.

In addition, (5.19) yields

n1trBn(ρνn)=ρνn(ρνnx)1Fn(dx)a.s.ρν(ρνx)1Fγ(dx)=ρνm(ρν;γ).

Explicit evaluation gives m(ρν; γ) = −1/ℓν, [JY, Appendix A], and (5.22) follows.

To establish (5.23), we start by recalling that Cn=n1X¯2TX¯2, and introduce the resolvent notation Z(t) = (tInCn)−1, such that Bn(t) = tZ(t) and K(t)=n1X¯1tZ(t)X¯1T. From the resolvent identity, that is, A−1B−1 = A−1(BA)B−1 for square invertible A and B, and noting that tZ(t) = CnZ(t) + I from the Woodbury identity, we have, for t1, t2 > bγ,

t1Z(t1)t2Z(t2)=(t1t2)CnZ(t1)Z(t2)

and, therefore,

K(^ν)K(ρνn)=(^νρνn)n1X¯1CnZ(^ν)Z(ρνn)X¯1T.

Moreover, again by the resolvent identity, Z(^ν)=Z(ρνn)(^νρνn)Z(^ν)Z(ρνn), which yields

K(^ν)K(ρνn)=(^νρνn)n1X¯1Bn1(ρνn,ρνn)X¯1T+(^νρνn)2n1X¯1Bn2(^ν,ρνn)X¯1T, (5.24)

with Bnr(t1, t2) defined as

Bnr(t1,t2)=CnZ(t1)Zr(t2). (5.25)

We now characterize the first-order behavior of the two matrix quadratic forms in (5.24). For the first, we simply mirror the arguments of the proof of (5.22) to obtain

n1X¯1Bn1(ρνn,ρνn)X¯1Ta.s.c(ρν)Γ.

For the second, we again apply similar reasoning, operating on the event J. Specifically, it is easy to establish that on J, and for n sufficiently large that |ρνnρν| ≤ ϵ, Bn2(^ν,ρνn) is bounded. Hence, Bn2(^ν,ρνn)=Oa.s.(1), and it follows from Lemma 2 and (5.19) that

n1X¯1Bn2(^ν,ρνn)X¯1T=Oa.s.(1).

The expansion in (5.23) is obtained by combining the latter two equations with (5.24).

CLT of K(ρνn)

We now specialize Proposition 2 for the matrix quadratic form K(ρνn).

Proposition 3

Assume Model M, and define ρνn by (1.1) and K(ρνn) by (3.13). Then,

Wn(ρνn)=n[K(ρνn)n1trBn(ρνn)Γ]DWν,

which is a symmetric Gaussian random matrix with entries Wijν, mean zero, and covariances given by

Cov[Wijν,Wijν]=ρν2ν2ρ˙ν(κijκji+κiiκjj)+ρν2ν2(κijij+κˇijij), (5.26)

where ρν and ρ˙ν are defined in (1.1), and the terms in parentheses are defined in (1.2) and (1.4).

Proof. Recall that J1 = {μ1bγ + ϵ}, and consider sufficiently large n such that ρνn > ρνϵ. Then, we may apply Proposition 2 with Bn=Bn(ρνn)1Jn1, which is independent of X¯1, and for which ‖Bn‖ is bounded. Specifically, the result follows by applying Proposition 2 to Wn(ρνn)1Jn1, along with the fact that 1Jn1a.s.1, and particularizing ω, θ, and ϕ in (4.17). These quantities, denoted respectively by ων, θν, and ϕν, can be computed as in [JY, Appendix A], yielding

ων=ϕν=(ν1+γ)2(ν1)2=ρν2ν2,θν=(ν1+γ)2(ν1)2γ=ωνρ˙ν.

Tightness properties

Lastly, we establish some tightness properties essential to the derivation of our second-order results.

We first establish a refinement of (5.22). Define K0(ρ; γ) := −ρm(ρ; γ)Γ, such that (5.22) is rewritten as K(ρνn)a.s.K0(ρν;γ). Set gρ(x) = ρ(ρx)−1, and write

trBn(ρ)=i=1nρ(ρμi)1=i=1ngρ(μi).

In addition, introducing

Gn(g):=i=1ng(μi)ng(x)Fγn(dx),

we have

K(ρ)K0(ρ;γn)=K(ρ)n1trBn(ρ)Γ+ρn1[i=1n(ρμi)1n(ρx)1Fγn(dx)]Γ=n1/2Wn(ρ)+n1Gn(gρ)Γ. (5.27)

Lemma 3

Assume that Model M holds, and that ν>1+γ is simple. For some b > ρ1, let I denote the interval [bγ + 3ϵ, b]. Then,

{Gn(gρ),ρI}isuniformlytight, (5.28)
{n1/2[K(ρ)K0(ρ;γn)],ρI}isuniformlytight, (5.29)
^νρνn=Op(n1/2), (5.30)
aνpν=Op(n1/2). (5.31)

Proof. The proofs of (5.28)(5.30) appear in the Supplementary Material, S2. We show (5.31) using the expansion aνpν=RνnDνpν+rν, given in (3.12), from which we recall ‖rν‖ = O(‖Dν2) and note that RνnC and Dν=K(^ν)K0(ρνn;γn). We then have aνpν = Op(‖Dν‖ + ‖Dν2). Furthermore, from

DνK(^ν)K(ρνn)+K(ρνn)K0(ρνn;γn),

the first term is Op(n−1/2) by (5.23) and (5.30), as is the second term by (5.29). Hence,

Dν=Op(n1/2), (5.32)

and the proof is completed. ☐

5.2. Eigenvalue uctuations (Theorem 1-(ii))

The proof of Theorem 1-(ii) relies on the key expansion

n(^νρνn)[1+c(ρν)ν+op(1)]=pνTWn(ρνn)pν+op(1), (5.33)

which is obtained by combining the vector equations K(^ν)aν=^νaν and K0(ρνn;γn)pν = ρνnpν with expansions (5.24) for K(^ν)K(ρνn) and (5.27) for K(ρνn) − K0(ρνn;γn). Specifically, we first use [K(^ν)^νIm]aν=0 to obtain

pνT[K(^ν)^νIm]pν=(aνpν)T[K(^ν)^νIm](aνpν)=Op(n1), (5.34)

because K(^ν)^νIm=Op(1) from (5.21)(5.23) and (2.5), and aνpν = Op(n−1/2) from Lemma 3. In addition, because [K0(ρνn;γn) − ρνnIm]pν = 0, it follows that

pνT[K(^ν)^νIm]pν=pνT[K(^ν)K0(ρνn;γn)(^νρνn)Im]pν=pνT[K(^ν)K(ρνn)(^νρνn)Im]pν+pνT[K(ρνn)K0(ρνn;γn)]pν=(^νρνn)[1+c(ρν)ν+op(1)]+n1/2pνTWn(ρνn)pν+op(n1/2), (5.35)

where the last equality follows from (5.23), (5.27), and (5.28). Combining (5.34) and (5.35) yields (5.33).

The asymptotic normality of n(^νρνn) now follows from Proposition 3, with asymptotic variance

σ˜ν2=[1+c(ρν)ν]2Var[pνTWνpν]=(ρ˙νν/ρν)2i,j,i,jPijijνCov[Wijν,Wijν],

where Wν is the m × m symmetric Gaussian random matrix defined in Proposition 3, with covariance Cov[Wijν,Wijν] given by (5.26). Using this in the developed expression for the variance above leads to

σ˜ν2=ρ˙νi,j,i,jPijijν(κijκji+κiiκjj)+ρ˙ν2[Pν,κ+κˇ]. (5.36)

By symmetry and the eigen equation (Γpν)i=jκijpν,j=νpν,i, we have

i,j,i,jPijijνκiiκjj=i,j,i,jPijijνκijκji=i,jpν,ipν,j(Γpν)i(Γpν)j=ν2i,j(pν,ipν,j)2=ν2.

Therefore, the first sum in (5.36) reduces to 2ρ˙νν2, yielding formula (2.6) of Theorem 1.

6. Proofs of the eigenvector results

We now derive the main eigenvector results, presented in Theorem 2 and Theorem 3-(ii).

6.1. Eigenvector inconsistency (Theorem 2-(i))

The convergence result of Theorem 2-(i) follows from two facts: aνa.s.pν and Qνa.s.c(ρν)Γ, which are shown below. Once these facts are established, from (3.10),

p^ν2a.s.pνT(Im+c(ρν)Γ)pν=1+c(ρν)ν=ρννρ˙ν,

which leads to

a.s.limp^ν,pν2=a.s.limp^ν,pν2=a.s.limp^ν2=νρ˙νρν.

Proof of aνa.s.pν

This is a direct consequence of (3.12) and

Dν=K(ρνn)(ρνn/ν)Γ+K(^ν)K(ρνn)a.s.0,

which follows from (5.22), (5.23), and the fact that ^νρνna.s.0, given in (2.5).

Proof of Qνa.s.c(ρν)Γ

With Zˇ(t)=(tIpR22)1, we have

Qν=R12Zˇ2(ρν)R21+R12[Zˇ2(^ν)Zˇ2(ρν)]R21Qν1+Qν2.

Rewrite Qν1=n1X¯1Bˇn1X¯1T, with Bˇn1=n1X¯2TZˇ2(ρν)X¯2. On the high-probability event J1 = {μ1bγ + ϵ}, with ϵ > 0 such that ρνbγ ≥ 2ϵ, it is easily established that Bˇn1 is bounded and, consequently, that Bˇn1=Oa.s.(1). Hence, Lemma 2 can be applied to Qν1. Moreover, from (5.19) and noting that

n1trBˇn1=n1trBn1(ρν,ρν),

with Bn1 defined in (5.25), we have

n1trBˇn1a.s.x(ρνx)2Fγ(dx)=c(ρν).

This and Lemma 2 imply that Qν1a.s.c(ρν)Γ.

It remains to show Qν2a.s.0. Using a variant of the resolvent identity, that is, A−2B−2 = −A−2(A2B2)B−2 for square invertible A and B, we rewrite

Qν2=2(^νρν)n1X¯1Bˇn2X¯1T,

with Bˇn2=n1X¯2TZˇ2(^ν)[12(^ν+ρν)IR22]Zˇ2(ρν)X¯2. Working on the high-probability event J, it can be verified that Bˇn2=Oa.s.(1). Thus, Lemma 2 together with (5.19) imply that n1X¯1Bˇn2X¯1T=Oa.s.(1). Because ^νa.s.ρν, we conclude that Qν2a.s.0.

6.2. Eigenvector fluctuations (Theorem 2-(ii))

Again, we use the key expansion (3.12). Because ‖rν‖ = O(‖Dν2) = Op(n−1) from (5.32), we have

n(aνpν)=RνnnDνpν+op(1).

Furthermore, using a similar decomposition to the derivation of (5.35),

nDν=n[K(^ν)K(ρνn)]+n[K(ρνn)K0(ρνn,γn)]=Wn(ρνn)n(^νρνn)c(ρν)Γ+op(1),

where we use (5.23) and (5.27), along with (5.28) and (5.30) of Lemma 3. Hence, noting that RνnΓpν=νRνnpν=0 from the definition of Rνn in (3.12), we have

n(aνpν)=RνnWn(ρνn)pν+op(1),

or equivalently,

n(PTaνeν)=R˜νnW˜n(ρνn)eν+op(1),

where

R˜νn=νρνnkνm(kν)1ekekT,W˜n(ρνn)=PTWn(ρνn)P.

The CLT for PT aν now follows from Proposition 3. In particular,

n(PTaνeν)DR˜νwν~N(0,Σν),

where R˜ν=(ν/ρν)Dν, recall (2.8), and wν = PT Wνpν, with Wν defined in Proposition 3. The covariance matrix Σν=R˜νE[wνwνT]R˜ν=DνΣ˜νDν, with Σ˜ν=(ν/ρν)2E[wνwνT]. The kth component of wν is given by wν(k)=pkTWνpν=i,jpk,iWijνpν,j and, therefore,

Σ˜ν,kl=i,j,i,jpk,ipν,jpl,ipν,j(ν/ρν)2Cov[Wijν,Wijν]. (6.37)

Theorem 2-(ii) follows after substituting (5.26) for Cov[Wijν,Wijν] and noting that, when k, lν,

i,j,i,jpk,ipν,jpl,ipν,j(κiiκjj+κijκji)=pkTΓplpνTΓpν+pkTΓpνpνTΓpl=δklkν.

6.3. Eigenvector inconsistency in the subcritical case (Theorem 3-(ii))

From (3.10) and (3.11), it suffices to show that aνTQνaνa.s. in order for Theorem 3-(ii) to hold. We establish this by showing that λmin(Qν)a.s.. The approach uses a regularized version of Qν,

Qνϵ(t)=R12[(tIpR22)2+ϵ2Ip]1R21,

for ϵ > 0. Observe that QνQνϵ(^ν), such that

liminfλmin(Qν)liminfλmin(Qνϵ(^ν))=liminfλmin(Qνϵ(bγ)+Δνϵ),

where Δνϵ:=Qνϵ(^ν)Qνϵ(bγ) (Recall that ^νa.s.bγ). We show that Δνϵa.s.0, and

Qνϵ(bγ)a.s.x[(bγx)2+ϵ2]1Fγ(dx)Γ=cγ(ϵ)Γ, (6.38)

say. Because λmin(·) is a continuous function on m × m matrices, we conclude that

liminfλmin(Qν)cγ(ϵ)λmin(Γ), (6.39)

and because cγ(ϵ) ≥ c(bγ + ϵ) and c(bγ + ϵ) ↗ ∞ as ϵ ↘ 0, by [JY, Appendix A], we obtain λmin(Qν)a.s.. We write Qνϵ(t)=n1X¯1Bˇnϵ(t)X¯1, with

Bˇnϵ(t)=n1X¯2T[(tIpn1X¯2X¯2T)2+ϵ2Ip]1X¯2=Hdiag{fϵ(μi,t)}HT,

if we write the singular-value decomposition of n1/2X¯2=VM1/2HT, with M=diag(μi)i=1p and define fϵ(μ,t)=μ[(tμ)2+ϵ2]1. Evidently, Bˇnϵ(t)ϵ2μ1 is bounded almost surely. Thus, Lemma 2 may be applied to Qνϵ(bγ), and because

n1trBˇnϵ(bγ)a.s.fϵ(x,bγ)Fγ(dx)=cγ(ϵ)

from (5.19), our claim (6.38) follows.

Now consider Δνϵ. Fix am such that ‖a2 = 1, and set b=n1/2HTX¯1Ta. We have

aTΔνϵa=i=1pbi2[fϵ(μi,^ν)fϵ(μi,bγ)].

Because |fϵ(μ,t)/t|=|2μ(tμ)|/[(tμ)2+ϵ2]2μ/ϵ3, for μ, ϵ > 0, by the arithmeticmean–geometric-mean inequality, we have

|aTΔνϵa|μ1ϵ3|^νbγ|b22=μ1ϵ3|^νbγ|aTR11aμ1ϵ3|^νbγ|^1a.s.0,

from Cauchy’s interlacing inequality for eigenvalues of symmetric matrices, Theorem 1-(i) and Theorem 3-(i). Therefore, Δνϵa.s.0, and the proof of (6.39) and, hence, of Theorem 3-(ii) is complete.

Supplementary Material

supp_sinica_final.pdf

Acknowledgments

This work was supported, in part, by NIH R01 EB001988 (IMJ, JY), the Hong Kong RGC General Research Fund 16202918 (MRM, DMJ), and a Samsung Scholarship (JY).

Footnotes

Supplementary Material

The online Supplementary Material provides proofs for the following: (i) the Gaussian particularizations of our main results (Corollaries 1 and 2); (ii) the instrumental tightness properties in Lemma 3; and (iii) the asymptotic properties of normalized bilinear forms in Lemma 1 and Proposition 1; see Sections S1, S2, and S3, respectively.

References

  1. Bai Z and Yao J-F (2008). Central limit theorems for eigenvalues in a spiked population model. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques 44(3), 447–474. [Google Scholar]
  2. Bai ZD and Silverstein J (2009). Spectral Analysis of Large Dimensional Random Matrices (2nd ed.). New York: Springer. [Google Scholar]
  3. Baik J, Ben Arous G, and Péché S (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Annals of Probability 33(5), 1643–1697. [Google Scholar]
  4. Baik J and Silverstein JW (2006). Eigenvalues of large sample covariance matrices of spiked population models. Journal of Multivariate Analysis 97(6), 1382–1408. [Google Scholar]
  5. Bao Z, Pan G, and Zhou W (2012). Tracy-Widom law for the extreme eigenvalues of sample correlation matrices. Electronic Journal of Probability 17, 1–32. [Google Scholar]
  6. Benaych-Georges F and Nadakuditi RR (2011). The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics 227(1), 494–521. [Google Scholar]
  7. Bianchi P, Najim J, Maida M, and Debbah M (2009). Performance analysis of some eigen-based hypothesis tests for collaborative sensing. In 2009 IEEE/SP 15th Workshop on Statistical Signal Processing, pp. 5–8. [Google Scholar]
  8. Bloemendal A, Knowles A, Yau H-T, and Yin J (2016). On the principal components of sample covariance matrices. Probability Theory and Related Fields 164(1), 459–552. [Google Scholar]
  9. Boik RJ (2003). Principal component models for correlation matrices. Biometrika 90(3), 679–701. [Google Scholar]
  10. Cai TT and Jiang T (2011). Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. Annals of Statistics 39(3), 1496–1525. [Google Scholar]
  11. Cai TT and Jiang T (2012). Phase transition in limiting distributions of coherence of high-dimensional random matrices. Journal of Multivariate Analysis 107, 24–39. [Google Scholar]
  12. Cocco S, Monasson R, and Sessak V (2011). High-dimensional inference with the generalized Hopfield model: Principal component analysis and corrections. Physical Review E 83(5), 051123. [DOI] [PubMed] [Google Scholar]
  13. Cocco S, Monasson R, and Weigt M (2013). From principal component to direct coupling analysis of co-evolution in proteins: Low-eigenvalue modes are needed for structure prediction. PLoS Computational Biology 9(8), 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Cochran D, Gish H, and Sinno D (1995). A geometric approach to multiple-channel signal detection. IEEE Transactions on Signal Processing 43(9), 2049–2057. [Google Scholar]
  15. Couillet R and Debbah M (2011). Random Matrix Methods for Wireless Communications. Cambridge University Press. [Google Scholar]
  16. Couillet R and Hachem W (2013). Fluctuations of spiked random matrix models and failure diagnosis in sensor networks. IEEE Transactions on Information Theory 59(1), 509–525. [Google Scholar]
  17. Dahirel V, Shekhar K, Pereyra F, Miura T, Artyomov M, Talsania S, Allen TM, Altfeld M, Carrington MN, Irvine DJ, Walker BD, and Chakraborty AK (2011). Coordinate linkage of HIV evolution reveals regions of immunological vulnerability. Proceedings of the National Academy of Sciences 108(28), 11530–11535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. El Karoui N (2009). Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond. Annals of Applied Probability 19(6), 2362–2405. [Google Scholar]
  19. Fang C and Krishnaiah P (1982). Asymptotic distributions of functions of the eigenvalues of some random matrices for nonnormal populations. Journal of Multivariate Analysis 12(1), 39–63. [Google Scholar]
  20. Gao J, Han X, Pan G, and Yang Y (2017). High dimensional correlation matrices: The central limit theorem and its applications. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79(3), 677–693. [Google Scholar]
  21. Girshick MA (1939). On the sampling theory of roots of determinantal equations. Annals of Mathematical Statistics 10(3), 203–224. [Google Scholar]
  22. Hachem W, Loubaton P, Mestre X, Najim J, and Vallet P (2013). A subspace estimator for fixed rank perturbations of large random matrices. Journal of Multivariate Analysis 114, 427–447. [Google Scholar]
  23. Hero A and Rajaratnam B (2011). Large-scale correlation screening. Journal of the American Statistical Association 106(496), 1540–1552. [Google Scholar]
  24. Hero A and Rajaratnam B (2012). Hub discovery in partial correlation graphs. IEEE Transactions on Information Theory 58(9), 6064–6078. [Google Scholar]
  25. Jiang T (2004a). The asymptotic distributions of the largest entries of sample correlation matrices. Annals of Applied Probability 14(2), 865–880. [Google Scholar]
  26. Jiang T (2004b). The limiting distributions of eigenvalues of sample correlation matrices. Sankhyā: The Indian Journal of Statistics (2003–2007) 66(1), 35–48. [Google Scholar]
  27. Johnstone IM (2001). On the distribution of the largest eigenvalue in principal components analysis. Annals of Statistics 29(2), 295–327. [Google Scholar]
  28. Johnstone IM and Yang J (2018). Notes on asymptotics of sample eigenstructure for spiked models with non-Gaussian data. arXiv:1810.10427. [Google Scholar]
  29. Kollo T and Neudecker H (1993). Asymptotics of eigenvalues and unit-length eigenvectors of sample variance and correlation matrices. Journal of Multivariate Analysis 47(2), 283–300. [Google Scholar]
  30. Konishi S (1979). Asymptotic expansions for the distributions of statistics based on the sample correlation matrix in principal component analysis. Hiroshima Mathematical Journal 9(3), 647–700. [Google Scholar]
  31. Leshem A and van der Veen A-J (2001). Multichannel detection of Gaussian signals with uncalibrated receivers. IEEE Signal Processing Letters 8(4), 120–122. [Google Scholar]
  32. Liu H, Hu Z, Mian A, Tian H, and Zhu X (2014). A new user similarity model to improve the accuracy of collaborative filtering. Knowledge-Based Systems 56, 156–166. [Google Scholar]
  33. Mestre X and Vallet P (2017). Correlation tests and linear spectral statistics of the sample correlation matrix. IEEE Transactions on Information Theory 63(7), 4585–4618. [Google Scholar]
  34. Paul D (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica 17, 1617–1642. [Google Scholar]
  35. Pillai NS and Yin J (2012). Edge universality of correlation matrices. Annals of Statistics 40(3), 1737–1763. [Google Scholar]
  36. Plerou V, Gopikrishnan P, Rosenow B, Amaral L, Guhr T, and Stanley H (2002). A random matrix approach to cross-correlations in financial data. Physical Review E 65, 066126. [DOI] [PubMed] [Google Scholar]
  37. Quadeer AA, Louie RHY, Shekhar K, Chakraborty AK, Hsing I-M, and McKay MR (2014). Statistical linkage analysis of substitutions in patient-derived sequences of genotype 1a Hepatitis C virus nonstructural protein 3 exposes targets for immunogen design. Journal of Virology 88(13), 7628–7644. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Quadeer AA, Morales-Jimenez D, and McKay MR (2018). Co-evolution networks of HIV/HCV are modular with direct association to structure and function. PLoS Computational Biology 14(9), 1–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Ruan D, Meng T, and Gao K (2016). A hybrid recommendation technique optimized by dimension reduction. In 2016 8th International Conference on Modelling, Identification and Control (ICMIC), pp. 429–433. [Google Scholar]
  40. Schott JR (1991). A test for a specific principal component of a correlation matrix. Journal of the American Statistical Association 86(415), 747–751. [Google Scholar]
  41. Vallet P, Mestre X, and Loubaton P (2015). Performance analysis of an improved MUSIC DoA estimator. IEEE Transactions on Signal Processing 63(23), 6407–6422. [Google Scholar]
  42. Xiao H and Zhou W (2010). Almost sure limit of the smallest eigenvalue of some sample correlation matrices. Journal of Theoretical Probability 23(1), 1–20. [Google Scholar]
  43. Yang L, McKay MR, and Couillet R (2018). High-dimensional MVDR beamforming: Optimized solutions based on spiked random matrix models. IEEE Transactions on Signal Processing 66(7), 1933–1947. [Google Scholar]
  44. Yao J, Zheng S, and Bai Z (2015). Large Sample Covariance Matrices and High-Dimensional Data Analysis. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supp_sinica_final.pdf

RESOURCES