Abstract
Sample correlation matrices are widely used, but for high-dimensional data little is known about their spectral properties beyond “null models”, which assume the data have independent coordinates. In the class of spiked models, we apply random matrix theory to derive asymptotic first-order and distributional results for both leading eigenvalues and eigenvectors of sample correlation matrices, assuming a high-dimensional regime in which the ratio p/n, of number of variables p to sample size n, converges to a positive constant. While the first-order spectral properties of sample correlation matrices match those of sample covariance matrices, their asymptotic distributions can differ significantly. Indeed, the correlation-based fluctuations of both sample eigenvalues and eigenvectors are often remarkably smaller than those of their sample covariance counterparts.
Keywords: Sample correlation, eigenstructure, spiked models
1. Introduction
Estimating a correlation matrix is a fundamental statistical task. It is widely applied in areas such as viral sequence analysis and vaccine design in biology (Dahirel et al., 2011, Quadeer et al., 2014, 2018), large portfolio design in finance (Plerou et al., 2002), signal detection in radio astronomy (Leshem and van der Veen, 2001), and collaborative filtering (Liu et al., 2014, Ruan et al., 2016), among many others. In classical statistical settings, with a limited number of variables p and a large sample size n, the sample correlation matrix performs well and its statistical properties are well understood; see, for example, Girshick (1939), Konishi (1979), Fang and Krishnaiah (1982), Schott (1991), Kollo and Neudecker (1993), and Boik (2003). Modern applications, however, often exhibit high dimensionality, with large p and, in many cases, limited n. In such cases, sample correlation matrices become inaccurate owing to an aggregation of statistical noise across the matrix coordinates that is visible in the eigen-spectrum (El Karoui, 2009). This is particularly important in principal component analysis (PCA), which often involves projecting data onto the leading eigenvectors of the sample correlation matrix or, equivalently, onto those of the sample covariance matrix after standardizing the data.
Despite the extensive use of sample correlation matrices, relatively little is known about theoretical properties of their eigen-spectra in high dimensions. In contrast, sample covariance matrices have been studied extensively, and a rich body of literature now exists (e.g., Yao et al. (2015)). Their asymptotic properties have typically been described in high-dimensional settings in which the number of samples and variables both grow large, often though not always at the same rate, based on the theory of random matrices. Specific first- and second-order results for the eigenvalues and eigenvectors of sample covariance matrices are reviewed in Bai and Silverstein (2009), Couillet and Debbah (2011), and Yao et al. (2015).
For the spectra of high-dimensional sample correlation matrices, current theoretical results focus on the simplest “null model” scenario, in which the data are assumed to be independent. In this null model, correlation matrices share many of the same asymptotic properties as covariance matrices from independent and identically distributed (i.i.d.) data, with zero mean and unit variance. Thus, the empirical eigenvalue distribution converges to the Marchenko–Pastur distribution, almost surely (Jiang, 2004b), and the largest and smallest eigenvalues converge to the edges of this distribution (Jiang, 2004b, Xiao and Zhou, 2010). Moreover, the rescaled largest and smallest eigenvalues asymptotically follow the Tracy–Widom law (Bao et al., 2012, Pillai and Yin, 2012). Central limit theorems (CLTs) for linear spectral statistics have also been derived (Gao et al., 2017). A separate line of work studies the maximum absolute off-diagonal entry of sample correlation matrices, referred to as “coherence” (Jiang, 2004a, Cai and Jiang, 2011, 2012), which has been proposed as a statistic for conducting independence tests; see also Cochran et al. (1995), Mestre and Vallet (2017), and the references therein. Hero and Rajaratnam (2011, 2012) use a related statistic to identify variables exhibiting strong correlations, an approach referred to as “correlation screening.”
For non-trivial correlation models, however, asymptotic results for the spectra of sample correlation matrices are quite scarce. Notably, El Karoui (2009) shows that, for a fairly general class of covariance models with bounded spectral norm, to first order, the eigenvalues of sample correlation matrices asymptotically coincide with those of sample covariance matrices with unit-variance data, generalizing earlier results of Jiang (2004b) and Xiao and Zhou (2010). Under similar covariance assumptions, recent work also presents CLTs for linear spectral statistics of sample correlation matrices (Mestre and Vallet, 2017), extending the work of Gao et al. (2017). First order behavior again coincides with that of sample covariances. However, the asymptotic fluctuations are quite different for sample correlation matrices.
This study considers a particular class of correlation matrix models, the so-called “spiked models,” in which a few large or small eigenvalues of the population covariance (or correlation) matrix are assumed to be well separated from the rest (Johnstone, 2001). Spiked covariance models are relevant in applications in which the primary covariance information lies in a relatively small number of eigenmodes. Such applications include collaborative signal detection in cognitive radio systems (Bianchi et al., 2009), fault detection in sensor networks (Couillet and Hachem, 2013), adaptive beamforming in array processing (Hachem et al., 2013, Vallet et al., 2015, Yang et al., 2018), and protein contact prediction in biology (Cocco et al., 2011, 2013). The spectral properties of spiked covariance models have been well studied, with precise analytical results established for the asymptotic first-order and distributional properties of both eigenvalues and eigenvectors; see, for example, Baik et al. (2005), Baik and Silverstein (2006), Paul (2007), Bai and Yao (2008), Benaych-Georges and Nadakuditi (2011), Couillet and Hachem (2013), Bloemendal et al. (2016). For reviews, see also Couillet and Debbah (2011, Chapter 9) and Yao et al. (2015, Chapter 11).
Less is known about the spectrum of sample correlation matrices under spiked models. Although the asymptotic first-order behavior is expected to coincide with that of the sample covariance, as a consequence of El Karoui (2009), a simple simulation reveals striking differences in the fluctuations of both sample eigenvalues and eigenvectors; see Figure 1.
Figure 1:
A simple simulation shows remarkable distributional differences between sample covariance and sample correlation. From n = 200 i.i.d. Gaussian samples, , with covariance Σ = blkdiag(Σs, I90), where , for r = 0.95, we compute the sample covariance and sample correlation, and show: (a) the empirical density (normalized histogram) of the largest sample eigenvalue, along with a Gaussian distribution with its estimated mean and standard deviation (solid line), and (b) a scatter plot of the leading sample eigenvector, projected onto the second (x-axis) and fourth (y-axis) population eigenvectors. A striking variance reduction is observed in the sample correlation for both (a) and (b). A similar variance reduction is observed for different choices of population eigenvectors in (b); the selected choice (being the second and fourth eigenvectors) facilitates the illustration of an additional correlation effect in the sample-to-population eigenvector projections.
Here, we present theoretical results to describe these observed phenomena. We obtain asymptotic first-order and distribution results for the eigenvalues and eigenvectors of sample correlation matrices under a spiked model. Paul (2007) proved theorems for sample covariance matrices in the special case of Gaussian data. In essence, we present analogs of these theorems for sample correlation matrices, and extend them to non-Gaussian data. To first order, the eigenvalues and eigenvectors coincide asymptotically with those of sample covariance matrices; however, their fluctuations can be very different. Indeed, for both the largest sample correlation eigenvalues (Theorem 1) and the projections of the corresponding eigenvectors (Theorem 2), the asymptotic variances admit a decomposition into three terms. The first term is just the asymptotic variance for sample covariance matrices generated from Gaussian data; the second adds corrections due to non-Gaussianity, and the third captures further corrections due to data normalization imposed by the sample correlation matrix. (This last amounts to normalizing the entries of the sample covariance matrix using the sample variances). Consistent with the example shown in Figure 1(a), in the CLT for the leading sample eigenvalues, the sample correlation eigenvalues often show lower fluctuations—despite the variance normalization—than those of the sample covariance eigenvalues. As seen in Figure 1(b), the (normalized) eigenvector projections are typically asymptotically correlated, even for Gaussian data, unlike the sample covariance setting of Paul (2007, Theorem 5).
Technical contributions
We build on and extend a set of random matrix tools for studying spiked covariance models. The companion manuscript (Johnstone and Yang, 2018) [JY], gives an exposition and parallel treatment for sample covariance matrices. Important adaptations are needed here to account for the data normalization imposed by sample correlation matrices. Among key technical contributions of our work, basic to our main theorems, are asymptotic first-order and distributional properties for bilinear forms and matrix quadratic forms with normalized entries, Section 4. A novel regularization-based proof strategy is used to establish the inconsistency of eigenvector projections in the case of “subcritical” spiked eigenvalues, Theorem 3.
Model M
Let be a random vector with finite (4+δ)th moment for some δ > 0. Consider the partition
Assume that has mean zero and covariance Σ, and is independent of , which has i.i.d components ηi with mean zero and unit variance. Let be the diagonal matrix containing the variances of ξi, and let be the correlation matrix of ξ with eigen-decomposition Γ = PLPT, where P = [p1, …, pm] is the eigenvector matrix, and L = diag(ℓ1, …, ℓm) contains the spike correlation eigenvalues ℓ1 ≥ … ≥ ℓm > 0.
The correlation matrix of x is therefore Γx = blkdiag(Γ, I), with eigenvalues ℓ1, …, ℓm,1, …, 1, and corresponding eigenvectors , em+1, …, em+p, where and ej is the jth canonical vector (i.e., a vector of all zeros, except for a one in the jth coordinate).
Consider a sequence of i.i.d. copies of x, the first n of which fill the columns of the (m + p) × n data matrix X = (xij). We assume m is fixed, whereas p and n increase with
Notation
Let S = n−1XXT be the sample covariance matrix, and be the diagonal matrix containing the sample variances. Let be the sample correlation matrix, with corresponding νth sample eigenvalue and eigenvector satisfying
where, for later use, we partition . Here is the subvector of restricted to the first m coordinates.
For , define
For an index ν, for which is a simple eigenvalue, set
| (1.1) |
We refer to eigenvalues satisfying as “supercritical,” and those satisfying as “subcritical,” with the quantity referred to as the “phase transition.”
To describe and interpret the variance terms in the limiting distributions to follow, we need some definitions. Let and denote the scaled components of ξ and their covariances; of course κii = 1. The corresponding scaled fourth-order cumulants are
| (1.2) |
When ξ is Gaussian,.
The effect of variance scaling in the correlation matrix is described using additional quadratic functions of , defined by
| (1.3) |
| (1.4) |
Tensor notation
For convenience, it is useful to consider and as entries of four-dimensional tensor arrays κ and , respectively, and to define an additional array with entries . In addition, define as . Finally, for a second array A of the same dimensions,
2. Main results
Our first main result, proved in Section 5, gives the asymptotic properties of the largest (spike) eigenvalues of the sample correlation matrix:
Theorem 1
Assume Model M, and that is a simple eigenvalue. As p/n → γ > 0,
| (2.5) |
where
| (2.6) |
Centering at ρνn rather than at ρν is important. If, for example, γn = γ + an−1/2, then
and we see a limiting shift. Furthermore, it may also be beneficial to consider instead of , obtained by replacing with in (2.6), such that
The asymptotic first-order limit in (i), which follows as an easy consequence of El Karoui (2009), coincides with that of the νth largest eigenvalue of a sample covariance matrix computed from data with population covariance Γ (Paul, 2007). This implies that, when constructing R, normalizing by the sample variances has no effect on the leading eigenvalues, at least to first order.
However, key differences are seen when looking at the asymptotic distribution, given in (ii), and in the variance formula (2.6) in particular. This can be readily interpreted. The first term corresponds to the variance in the Gaussian-covariance case of Paul (2007), again for samples with covariance Γ. The second provides a correction of that result for non-Gaussian data, see the companion article [JY]. The third term describes the contribution specific to sample correlation matrices, representing the effect of normalizing the data by the sample variances. This term is often negative, and is evaluated explicitly for Gaussian data in Corollary 1 below, proved in the Supplementary Material, S1.1.
Corollary 1
For ξ Gaussian, the asymptotic variance in Theorem 1 simplifies to
where PD,ν = diag(pν,1, …, pν,m).
Thus, computing the sample correlation results in the asymptotic variance being scaled by , relative to the sample covariance, where
is often positive, implying that spiked eigenvalues of the sample correlation often exhibit a smaller variance than those of the sample covariance. Indeed, such variance reduction occurs iff
| (2.7) |
with the last identity following from the fact that . Condition (2.7), and variance reduction, holds in the following cases:
both Γ and pν have nonnegative entries, or
, or
In case (i), the inequalities yield (2.7). Note that if Γ has nonnegative entries, then the Perron–Frobenius theorem establishes the existence of an eigenvector with nonnegative components for ℓ1; furthermore, if Γ has positive entries, by the same theorem, ℓ1 is simple and associated with an eigenvector with positive components. Case (ii) follows from , and holds if ℓν > m/2, because . Case (iii) follows from the inequalities and . Note that this is rather special, in that it has nothing to do with eigenvectors, and a necessary condition for it to hold is ℓ1 ≤ 2.
Condition (2.7) can fail, however. For example, for even m and r ∈ (0, 1), consider
where 1m/2 is the (m/2)-dimensional vector of all ones, which corresponds to two negatively correlated groups of identical random vectors. This has simple supercritical eigenvalues ℓ1 = (1 + r)m/2 and ℓ2 = (1 − r)m/2 when , with for ν = 1, 2. One finds that Δ2 = (1 − 2r − r2)/2 < 0 for , although Δ1 > 0 because ℓ1 > m/2, which implies case (ii).
We turn now to the eigenvectors. Again, fix an index ν for which is a simple eigenvalue of Γ, with corresponding eigenvector . Recall that is the νth sample eigenvector of R, and let be the corresponding normalized subvector of , restricted to the first m coordinates. The next result establishes a limit for the eigenvector projection , and a CLT for the normalized cross-projections ; see Sections 6.1 and 6.2.
Theorem 2
Assume Model M, and that is a simple eigenvalue. Then, as p/n → γ > 0,
where with
| (2.8) |
| (2.9) |
where δk,l = 1 if k = l, and zero otherwise.
The CLT result in (ii) can be rephrased in terms of the entries of aν, for which we readily obtain ; note that Σν has zeros in the νth row and the νth column.
As for the eigenvalues, Theorem 2 shows that the spiked eigenvectors of sample correlation matrices exhibit the same first-order behavior as those of the sample covariance (Paul, 2007). The difference again lies in the asymptotic fluctuations, captured by the covariance matrix Σν. Note that this is decomposed as a product of diagonal matrix—and the matrix , which involves the three terms in (2.9). These terms have similar interpretations as those discussed previously in (2.6). That is, the first term captures the asymptotic fluctuations for a Gaussian-covariance model (Paul, 2007), the second term captures the effect of non-Gaussianity in the covariance case [JY], and the third term captures information specific to the correlation case, representing fluctuations due to sample variance normalization. Note that only the first term is diagonal in general, suggesting that the eigenvector projections may be asymptotically correlated, as seen earlier in Figure 1(b), right panel. This holds also for Gaussian data, evaluated explicitly in Corollary 2 below; see Supplementary Material, S1.2, for the proof. We note an interesting contrast with the eigenvector projections for covariance matrices (Paul, 2007), described only by the leading term in (2.9).
Corollary 2
For ξ Gaussian, the asymptotic covariance in Theorem 2 reduces to ,
where , , and ∘ denotes the Hadamard product.
Thus, for Gaussian data, the entries of the asymptotic covariance matrix are given by (for k, l ≠ ν)
Consider now the subcritical case in which ν is such that . Let denote the corresponding population eigenvector, and let and denote the corresponding sample eigenvalue and eigenvector, respectively. With proofs deferred to Sections 5.1 and 6.3, we have the following result:
Theorem 3
Assume Model M, and that is a simple eigenvalue. Then, as p/n → γ > 0,
Once again, the asymptotic first-order limits of the sample eigenvalue and its associated eigenvector are the same as those obtained for the sample covariance (Paul, 2007).
Recall that our high-dimensional results assume an asymptotic regime where p/n → γ > 0, as opposed to the classical regime where p is fixed and n → ∞. The case of fixed p corresponds to γ = 0 and the spectral properties of the sample correlation matrix are well understood; see, for example, Girshick (1939), Konishi (1979), Fang and Krishnaiah (1982), Schott (1991), Kollo and Neudecker (1993), and Boik (2003). When γ = 0, the function ρ(ℓ) reduces to the identity. Indeed, for fixed p, there is no high-dimensional component η in Model M, and hence no biasing effect on ρ(ℓ, γ) that occurs when γ > 0. In particular, for fixed p there is no counterpart to our Theorem 3.
To summarize, in comparison to the high-dimensional (p/n → γ > 0) sample covariance setting, our results for the spiked eigenvalues and eigenvectors of sample correlation matrices confirm that the first-order asymptotic behavior is indeed equivalent to that of sample covariance matrices, in agreement with previous results and observations (El Karoui, 2009, Mestre and Vallet, 2017). While the eigenvalue limits in Theorem 1 and Theorem 3 follow as a straightforward consequence of El Karoui (2009), the eigenvector results of Theorem 2-(i) and Theorem 3-(ii) do not. In contrast to the first-order equivalences, important differences arise in the fluctuations of both the eigenvalues and eigenvectors, as shown by the asymptotic distributions of Theorem 1-(ii) and Theorem 2-(ii).
We illustrate these differences with a simple example having covariance , where r ∈ [0, 1]; that is, a model with unit variances and constant correlation r across all components. Moreover, ξ is assumed to be Gaussian for simplicity. In this setting, L = diag(ℓ1, 1 − r, …, 1 − r), where ℓ1 = 1 + r(m − 1) is supercritical iff . Consider the largest sample eigenvalue in such a supercritical case. From Corollary 1, the asymptotic variances for the sample covariance and the sample correlation can be computed, yielding
respectively, with , and where
Figure 2(a) plots these asymptotic variances versus r for various (γ,m). Indeed, the variance (fluctuation) for the sample correlation is consistently smaller than for the sample covariance. The difference is striking, becoming extremely large as r ↗ 1. Similar trends are observed for various choices of m and γ, being more pronounced for higher m, while not much affected by varying γ. This may be understood from the fact that, after writing Δ = r(2 − r) + (1 − r)2m−1 = 1 − (1 − r)2(1 − m−1),
Turn now to the fluctuations of the leading sample eigenvector, in the same setting as above. Note that, in Corollary 2, for this particular case, one can deduce from PTΓP = L that
Also from Corollary 2, the asymptotic variances for the normalized sample-to-population eigenvector projection , in the sample covariance and sample correlation cases, are computed as
respectively, where , and we recall that ℓ1 = 1 − r + rm and ℓ2 = 1 − r. These variances are numerically evaluated in Figure 2(b) for the same parameter choices as before and, again, as functions of r. Note, however, that for better visual appreciation, the range of r has been restricted to supercritical values sufficiently above the critical point , because the variance explodes at that point. The comparative evaluation again shows smaller variances for the sample correlation. The variance reduction here is less visible in the graphs, because both Σ1,22 and vanish as r → 1. The ratio, however, behaves quite similarly to the variance ratio :
Figure 2:
Differences in the fluctuations of sample eigenvalues and eigenvectors for an example Gaussian model with . Asympotic variances are shown for (a) the largest sample eigenvalue , and (b) the normalized sample-to-population eigenvector projection .
We end the discussion of our main results with a few remarks about possible extensions. Our results assume that ℓν > 1 is a simple eigenvalue, but extensions for small spikes with ℓν < 1 and for spikes with multiplicities should be possible. Analogous results for eigenvalues have been obtained for sample covariance matrices for ℓν < 1, including multiplicities greater than one (e.g., see Bai and Yao (2008)), giving reason to expect corresponding results for correlation matrices. Extensions of our results for eigenvalues and eigenvectors of sample correlation matrices for simple ℓν < 1 should be fairly straightforward, though the cases γ < 1, γ = 1, and γ > 1 would need separate treatment. Extensions for spikes with multiplicities are also possible, but in this case the eigenvectors are not well defined and one would need to consider subspace projections, requiring non-trivial modifications of our technical arguments.
The remainder of the paper proceeds as follows. First, in Section 3, we introduce key quantities and identities used in the derivations. Section 4 presents necessary asymptotic properties for bilinear forms and matrix quadratic forms with normalized entries, with the corresponding proofs relegated to the Supplementary Material, Section S3. These properties provide a foundation for describing the asymptotic convergence and distribution of eigenvalues and eigenvectors of sample correlation matrices, derived in Sections 5 and 6 respectively.
As already noted, a parallel treatment for the simpler case of covariance matrices is given in a supplementary manuscript [JY]. This aims at a unified exposition of known spectral properties of spiked covariance matrices as a benchmark for the current work, along with additional citations to the literature.
3. Preliminaries
We begin with a block representation and some associated reductions for the sample correlation matrix R. These are well known in the covariance matrix setting. As with the partition of x in Model M, consider
Write SD = blkdiag(SD1, SD2), with SD1 containing the sample variances corresponding to ξ, and SD2 containing those corresponding to η. Define the “normalized” data matrices and , such that
This partitioning of the eigenvector equation , along with , yields
From the second equation, . Substituting this into the first equation yields
Thus, is an eigenvalue of , with associated eigenvector ; this is central to our derivations. Note that is well defined if is well separated from the eigenvalues of R22; Section 5.1 shows that this occurs with probability one for all large n when ℓν is supercritical. Furthermore, the normalization condition, yields
Phrased in terms of the signal-space normalized eigenvector , we have
| (3.10) |
Note also that the sample-to-population inner product can be rewritten as
| (3.11) |
In the derivation of our CLT results, we use an eigenvector perturbation formula with quadratic error bound given in [JY, Lemma 13], itself a modification of the arguments in Paul (2007). This yields the key expansion
| (3.12) |
where
The derivations of our eigenvalue and eigenvector results, presented in Sections 5 and 6 respectively, take (3.10), (3.11) and (3.12) as points of departure, and rely on asymptotic properties of the key objects and Qν. In particular, K(t) can be expressed as the random matrix quadratic form
| (3.13) |
where, using the Woodbury identity,
Thus, our key objects are random quadratic forms involving the normalized data matrices and . The asymptotic properties of these forms are foundational to our results, and are presented next.
4. Quadratic forms with normalized entries
In this section, we establish the first-order (deterministic) convergence and a CLT for matrix quadratic forms of the type , where Bn is a matrix with bounded spectral norm. While being essential to our purposes, some of the technical results may be of independent interest; thus, we first present the general results, and then apply these in the context of Model M.
4.1. First-order convergence
To establish the first-order convergence, we first require some results on bilinear forms involving correlated random vectors of unit length. A main technical result (see Supplementary Material, S3.1) is the following:
Lemma 1
Let B be an n × n nonrandom symmetric matrix, and let be random vectors of i.i.d. entries with mean zero, variance one, , , and . Let and . Then, for any s ≥ 1,
where is a constant depending only on s.
This is a generalization of Gao et al. (2017, Lemma 5), which established a corresponding bound for normalized quadratic forms. Lemma 1 leads to the following first-order convergence result:
Corollary 3
Let x,y be random vectors of i.i.d. entries with mean zero, variance one, , for some δ > 0, and . Define and , and let Bn be a sequence of n × n symmetric matrices, with ‖Bn‖ bounded. Then,
Proof. Because the (4 + δ)th moment and ‖Bn‖ are bounded, from Lemma 1,
The convergence then follows from Markov’s inequality and the Borel–Cantelli lemma. ☐
We now apply this to our Model M with random matrices , independent of :
Lemma 2
Assume Model M, and suppose that is a sequence of random symmetric matrices, for which ‖Bn‖ is Oa.s.(1). Then,
Proof. This follows from Fubini’s theorem. Specifically, one may use the arguments in the proof of [JY, Lemma 5], applying Corollary 3, and noting that is independent of . ☐
4.2. Central Limit Theorem
To establish our main matrix quadratic-form CLT result, we first derive a CLT for scalar bilinear forms involving normalized random vectors. To this end, we must introduce some further notation. Consider zero-mean random vectors , with
where . Assume ; that is, all components of the x and y vectors have unit variance and . We first introduce notation for some quadratic functions of xl, yl. Let , with
Let X = (xli)M×n and Y = (yli)M×n be data matrices based on n i.i.d. observations of (x, y), and define the “normalized” data matrices and , where , , and ,. Then, we use the following notation for the rows and of the normalized data matrices
With this setup, we have the following result, proved in the Supplementary Material, S3.2:
Proposition 1
Let Bn = (bn,ij) be random symmetric n × n matrices, independent of X, Y, such that for some finite β, ‖Bn‖ ≤ β for all n, and
all finite. In addition, define , with components
Then, , with
| (4.14) |
where K = K1 − J and J,K1,K2 are matrices defined by
| (4.15) |
The entries of K are fourth-order cumulants of x and y:
| (4.16) |
Hence, K vanishes if x, y are Gaussian.
The corresponding result with unnormalized vectors is established in [JY Theorem 10]. The terms θJ + ωK appear in that case, and the additional term ϕK2 reflects the normalization in and . As in [JY], the proof is based on the martingale CLT, rather than the moment method used in Bai and Yao (2008), which stated a similar result for quadratic forms involving unnormalized random vectors.
While potentially of independent interest, Proposition 1 is important for our purposes through its application to Model M.
Proposition 2
Assume Model M, and consider Bn as in Proposition 1. Then,
where W is a symmetric m × m Gaussian matrix with entries Wij, mean zero, and covariances given by
| (4.17) |
for i ≤ j and i′ ≤ j′.
Proof. The result follows from Proposition 1 by turning the matrix quadratic form into a vector of bilinear forms; see, for example, [JY, Proposition 6] and Bai and Yao (2008, Proposition 3.1). Specifically, use an index l for the M = m(m + 1)/2 pairs (i, j), with 1 ≤ i ≤ j ≤ m. Build the random vectors (x, y) for Proposition 1 as follows: if l = (i, j), then set xl = ξi/σi and yl = ξj/σj. In the resulting covariance matrix C for (x, y), if also l′ = (i′, j′),
and, in particular, and , whereas . Component Wn,ij corresponds to component Zl in Proposition 1. Thus, we conclude that , where W is a Gaussian matrix with zero mean and , given by Proposition 1. It remains to interpret the quantities in (4.14) in terms of Model M. Substituting and into (4.16) and chasing definitions, we obtain and . Observing that zl = xlyl = χij and , we similarly find that . ☐
5. Proofs of the eigenvalue results
In this section, we derive the main eigenvalue results, presented in Theorem 1 and Theorem 3-(i).
5.1. Preliminaries
Convergence properties of the eigenvalues of R22
It is well known that the empirical spectral density (ESD) of S22 converges weakly a.s. to the Marchenko–Pastur (MP) law Fγ, and that the extreme non-trivial eigenvalues converge to the edges of the support of Fγ. For the sample correlation case, Jiang (2004b) shows that the same is true for R22. That is, the empirical distribution of the eigenvalues μ1 ≥ … ≥ μp of the “noise” correlation matrix converges weakly a.s. to the MP law Fγ, supported on , if γ ≤ 1, and on {0} ∪ [aγ, bγ] otherwise. Furthermore, the ESD of the n × n companion matrix , denoted by Fn, converges weakly a.s. to the “companion MP law” Fγ = (1 − γ)1[0,∞) + γFγ, where 1A denotes the indicator function on set A.
In addition, Jiang (2004b) shows that
| (5.18) |
Based on these results, if fn → f uniformly as continuous functions on the closure of a bounded neighborhood of the support of Fγ, then:
| (5.19) |
If supp(Fn) is not contained in , then the left side integral may not be defined. However, such an event occurs for at most finitely many n with probability one.
Almost sure limit of
The statements in Theorem 1-(i) and Theorem 3-(i) follow easily from known results. Specifically, denote the νth eigenvalue of the sample covariance S by . The almost sure limits
| (5.20) |
were established in Baik and Silverstein (2006). From the proof of El Karoui (2009, Lemma 1),
Therefore, the same almost sure limits as (5.20) hold for .
High-probability events Jnϵ, Jnϵ1
When necessary, we may confine attention to the event or , with ϵ > 0 chosen such that ρν – bγ ≥ 3ϵ, because from (2.5) (proven above) and (5.18), these events occur with probability one for all large n.
Asymptotic expansion of
We establish an asymptotic stochastic expansion for the quadratic form . Specifically, using the decomposition
| (5.21) |
we show that
| (5.22) |
and
| (5.23) |
where, for ,
Here, m is the Stieltjes transform of the companion distribution Fγ.
In establishing (5.22), start by taking sufficiently large n such that |ρνn – ρν| ≤ ϵ, with ϵ defined as above. For such n, on Jnϵ1, we have
Because Jnϵ1 holds with probability one for all large n, ‖Bn(ρνn)‖ = Oa.s.(1) and, therefore, it follows from Lemma 2 that
In addition, (5.19) yields
Explicit evaluation gives m(ρν; γ) = −1/ℓν, [JY, Appendix A], and (5.22) follows.
To establish (5.23), we start by recalling that , and introduce the resolvent notation Z(t) = (tIn − Cn)−1, such that Bn(t) = tZ(t) and . From the resolvent identity, that is, A−1 − B−1 = A−1(B − A)B−1 for square invertible A and B, and noting that tZ(t) = CnZ(t) + I from the Woodbury identity, we have, for t1, t2 > bγ,
and, therefore,
Moreover, again by the resolvent identity, , which yields
| (5.24) |
with Bnr(t1, t2) defined as
| (5.25) |
We now characterize the first-order behavior of the two matrix quadratic forms in (5.24). For the first, we simply mirror the arguments of the proof of (5.22) to obtain
For the second, we again apply similar reasoning, operating on the event Jnϵ. Specifically, it is easy to establish that on Jnϵ, and for n sufficiently large that |ρνn – ρν| ≤ ϵ, is bounded. Hence, , and it follows from Lemma 2 and (5.19) that
The expansion in (5.23) is obtained by combining the latter two equations with (5.24).
CLT of K(ρνn)
We now specialize Proposition 2 for the matrix quadratic form K(ρνn).
Proposition 3
Assume Model M, and define ρνn by (1.1) and K(ρνn) by (3.13). Then,
which is a symmetric Gaussian random matrix with entries , mean zero, and covariances given by
| (5.26) |
where ρν and are defined in (1.1), and the terms in parentheses are defined in (1.2) and (1.4).
Proof. Recall that Jnϵ1 = {μ1 ≤ bγ + ϵ}, and consider sufficiently large n such that ρνn > ρν – ϵ. Then, we may apply Proposition 2 with , which is independent of , and for which ‖Bn‖ is bounded. Specifically, the result follows by applying Proposition 2 to , along with the fact that , and particularizing ω, θ, and ϕ in (4.17). These quantities, denoted respectively by ων, θν, and ϕν, can be computed as in [JY, Appendix A], yielding
Tightness properties
Lastly, we establish some tightness properties essential to the derivation of our second-order results.
We first establish a refinement of (5.22). Define K0(ρ; γ) := −ρm(ρ; γ)Γ, such that (5.22) is rewritten as . Set gρ(x) = ρ(ρ − x)−1, and write
In addition, introducing
we have
| (5.27) |
Lemma 3
Assume that Model M holds, and that is simple. For some b > ρ1, let I denote the interval [bγ + 3ϵ, b]. Then,
| (5.28) |
| (5.29) |
| (5.30) |
| (5.31) |
Proof. The proofs of (5.28)–(5.30) appear in the Supplementary Material, S2. We show (5.31) using the expansion , given in (3.12), from which we recall ‖rν‖ = O(‖Dν‖2) and note that and . We then have aν − pν = Op(‖Dν‖ + ‖Dν‖2). Furthermore, from
the first term is Op(n−1/2) by (5.23) and (5.30), as is the second term by (5.29). Hence,
| (5.32) |
and the proof is completed. ☐
5.2. Eigenvalue uctuations (Theorem 1-(ii))
The proof of Theorem 1-(ii) relies on the key expansion
| (5.33) |
which is obtained by combining the vector equations and K0(ρνn;γn)pν = ρνnpν with expansions (5.24) for and (5.27) for K(ρνn) − K0(ρνn;γn). Specifically, we first use to obtain
| (5.34) |
because from (5.21)–(5.23) and (2.5), and aν − pν = Op(n−1/2) from Lemma 3. In addition, because [K0(ρνn;γn) − ρνnIm]pν = 0, it follows that
| (5.35) |
where the last equality follows from (5.23), (5.27), and (5.28). Combining (5.34) and (5.35) yields (5.33).
The asymptotic normality of now follows from Proposition 3, with asymptotic variance
where Wν is the m × m symmetric Gaussian random matrix defined in Proposition 3, with covariance given by (5.26). Using this in the developed expression for the variance above leads to
| (5.36) |
By symmetry and the eigen equation , we have
Therefore, the first sum in (5.36) reduces to , yielding formula (2.6) of Theorem 1.
6. Proofs of the eigenvector results
We now derive the main eigenvector results, presented in Theorem 2 and Theorem 3-(ii).
6.1. Eigenvector inconsistency (Theorem 2-(i))
The convergence result of Theorem 2-(i) follows from two facts: and , which are shown below. Once these facts are established, from (3.10),
which leads to
Proof of
This is a direct consequence of (3.12) and
which follows from (5.22), (5.23), and the fact that , given in (2.5).
Proof of
With , we have
Rewrite , with . On the high-probability event Jnϵ1 = {μ1 ≤ bγ + ϵ}, with ϵ > 0 such that ρν – bγ ≥ 2ϵ, it is easily established that is bounded and, consequently, that . Hence, Lemma 2 can be applied to Qν1. Moreover, from (5.19) and noting that
with Bn1 defined in (5.25), we have
This and Lemma 2 imply that .
It remains to show . Using a variant of the resolvent identity, that is, A−2 − B−2 = −A−2(A2 − B2)B−2 for square invertible A and B, we rewrite
with . Working on the high-probability event Jnϵ, it can be verified that . Thus, Lemma 2 together with (5.19) imply that . Because , we conclude that .
6.2. Eigenvector fluctuations (Theorem 2-(ii))
Again, we use the key expansion (3.12). Because ‖rν‖ = O(‖Dν‖2) = Op(n−1) from (5.32), we have
Furthermore, using a similar decomposition to the derivation of (5.35),
where we use (5.23) and (5.27), along with (5.28) and (5.30) of Lemma 3. Hence, noting that from the definition of in (3.12), we have
or equivalently,
where
The CLT for PT aν now follows from Proposition 3. In particular,
where , recall (2.8), and wν = PT Wνpν, with Wν defined in Proposition 3. The covariance matrix , with . The kth component of wν is given by and, therefore,
| (6.37) |
Theorem 2-(ii) follows after substituting (5.26) for and noting that, when k, l ≠ ν,
6.3. Eigenvector inconsistency in the subcritical case (Theorem 3-(ii))
From (3.10) and (3.11), it suffices to show that in order for Theorem 3-(ii) to hold. We establish this by showing that . The approach uses a regularized version of Qν,
for ϵ > 0. Observe that , such that
where (Recall that . We show that , and
| (6.38) |
say. Because λmin(·) is a continuous function on m × m matrices, we conclude that
| (6.39) |
and because cγ(ϵ) ≥ c(bγ + ϵ) and c(bγ + ϵ) ↗ ∞ as ϵ ↘ 0, by [JY, Appendix A], we obtain . We write , with
if we write the singular-value decomposition of , with and define . Evidently, is bounded almost surely. Thus, Lemma 2 may be applied to Qνϵ(bγ), and because
from (5.19), our claim (6.38) follows.
Now consider Δνϵ. Fix such that ‖a‖2 = 1, and set . We have
Because , for μ, ϵ > 0, by the arithmeticmean–geometric-mean inequality, we have
from Cauchy’s interlacing inequality for eigenvalues of symmetric matrices, Theorem 1-(i) and Theorem 3-(i). Therefore, , and the proof of (6.39) and, hence, of Theorem 3-(ii) is complete.
Supplementary Material
Acknowledgments
This work was supported, in part, by NIH R01 EB001988 (IMJ, JY), the Hong Kong RGC General Research Fund 16202918 (MRM, DMJ), and a Samsung Scholarship (JY).
Footnotes
Supplementary Material
The online Supplementary Material provides proofs for the following: (i) the Gaussian particularizations of our main results (Corollaries 1 and 2); (ii) the instrumental tightness properties in Lemma 3; and (iii) the asymptotic properties of normalized bilinear forms in Lemma 1 and Proposition 1; see Sections S1, S2, and S3, respectively.
References
- Bai Z and Yao J-F (2008). Central limit theorems for eigenvalues in a spiked population model. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques 44(3), 447–474. [Google Scholar]
- Bai ZD and Silverstein J (2009). Spectral Analysis of Large Dimensional Random Matrices (2nd ed.). New York: Springer. [Google Scholar]
- Baik J, Ben Arous G, and Péché S (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Annals of Probability 33(5), 1643–1697. [Google Scholar]
- Baik J and Silverstein JW (2006). Eigenvalues of large sample covariance matrices of spiked population models. Journal of Multivariate Analysis 97(6), 1382–1408. [Google Scholar]
- Bao Z, Pan G, and Zhou W (2012). Tracy-Widom law for the extreme eigenvalues of sample correlation matrices. Electronic Journal of Probability 17, 1–32. [Google Scholar]
- Benaych-Georges F and Nadakuditi RR (2011). The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics 227(1), 494–521. [Google Scholar]
- Bianchi P, Najim J, Maida M, and Debbah M (2009). Performance analysis of some eigen-based hypothesis tests for collaborative sensing. In 2009 IEEE/SP 15th Workshop on Statistical Signal Processing, pp. 5–8. [Google Scholar]
- Bloemendal A, Knowles A, Yau H-T, and Yin J (2016). On the principal components of sample covariance matrices. Probability Theory and Related Fields 164(1), 459–552. [Google Scholar]
- Boik RJ (2003). Principal component models for correlation matrices. Biometrika 90(3), 679–701. [Google Scholar]
- Cai TT and Jiang T (2011). Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. Annals of Statistics 39(3), 1496–1525. [Google Scholar]
- Cai TT and Jiang T (2012). Phase transition in limiting distributions of coherence of high-dimensional random matrices. Journal of Multivariate Analysis 107, 24–39. [Google Scholar]
- Cocco S, Monasson R, and Sessak V (2011). High-dimensional inference with the generalized Hopfield model: Principal component analysis and corrections. Physical Review E 83(5), 051123. [DOI] [PubMed] [Google Scholar]
- Cocco S, Monasson R, and Weigt M (2013). From principal component to direct coupling analysis of co-evolution in proteins: Low-eigenvalue modes are needed for structure prediction. PLoS Computational Biology 9(8), 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cochran D, Gish H, and Sinno D (1995). A geometric approach to multiple-channel signal detection. IEEE Transactions on Signal Processing 43(9), 2049–2057. [Google Scholar]
- Couillet R and Debbah M (2011). Random Matrix Methods for Wireless Communications. Cambridge University Press. [Google Scholar]
- Couillet R and Hachem W (2013). Fluctuations of spiked random matrix models and failure diagnosis in sensor networks. IEEE Transactions on Information Theory 59(1), 509–525. [Google Scholar]
- Dahirel V, Shekhar K, Pereyra F, Miura T, Artyomov M, Talsania S, Allen TM, Altfeld M, Carrington MN, Irvine DJ, Walker BD, and Chakraborty AK (2011). Coordinate linkage of HIV evolution reveals regions of immunological vulnerability. Proceedings of the National Academy of Sciences 108(28), 11530–11535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- El Karoui N (2009). Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond. Annals of Applied Probability 19(6), 2362–2405. [Google Scholar]
- Fang C and Krishnaiah P (1982). Asymptotic distributions of functions of the eigenvalues of some random matrices for nonnormal populations. Journal of Multivariate Analysis 12(1), 39–63. [Google Scholar]
- Gao J, Han X, Pan G, and Yang Y (2017). High dimensional correlation matrices: The central limit theorem and its applications. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79(3), 677–693. [Google Scholar]
- Girshick MA (1939). On the sampling theory of roots of determinantal equations. Annals of Mathematical Statistics 10(3), 203–224. [Google Scholar]
- Hachem W, Loubaton P, Mestre X, Najim J, and Vallet P (2013). A subspace estimator for fixed rank perturbations of large random matrices. Journal of Multivariate Analysis 114, 427–447. [Google Scholar]
- Hero A and Rajaratnam B (2011). Large-scale correlation screening. Journal of the American Statistical Association 106(496), 1540–1552. [Google Scholar]
- Hero A and Rajaratnam B (2012). Hub discovery in partial correlation graphs. IEEE Transactions on Information Theory 58(9), 6064–6078. [Google Scholar]
- Jiang T (2004a). The asymptotic distributions of the largest entries of sample correlation matrices. Annals of Applied Probability 14(2), 865–880. [Google Scholar]
- Jiang T (2004b). The limiting distributions of eigenvalues of sample correlation matrices. Sankhyā: The Indian Journal of Statistics (2003–2007) 66(1), 35–48. [Google Scholar]
- Johnstone IM (2001). On the distribution of the largest eigenvalue in principal components analysis. Annals of Statistics 29(2), 295–327. [Google Scholar]
- Johnstone IM and Yang J (2018). Notes on asymptotics of sample eigenstructure for spiked models with non-Gaussian data. arXiv:1810.10427. [Google Scholar]
- Kollo T and Neudecker H (1993). Asymptotics of eigenvalues and unit-length eigenvectors of sample variance and correlation matrices. Journal of Multivariate Analysis 47(2), 283–300. [Google Scholar]
- Konishi S (1979). Asymptotic expansions for the distributions of statistics based on the sample correlation matrix in principal component analysis. Hiroshima Mathematical Journal 9(3), 647–700. [Google Scholar]
- Leshem A and van der Veen A-J (2001). Multichannel detection of Gaussian signals with uncalibrated receivers. IEEE Signal Processing Letters 8(4), 120–122. [Google Scholar]
- Liu H, Hu Z, Mian A, Tian H, and Zhu X (2014). A new user similarity model to improve the accuracy of collaborative filtering. Knowledge-Based Systems 56, 156–166. [Google Scholar]
- Mestre X and Vallet P (2017). Correlation tests and linear spectral statistics of the sample correlation matrix. IEEE Transactions on Information Theory 63(7), 4585–4618. [Google Scholar]
- Paul D (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica 17, 1617–1642. [Google Scholar]
- Pillai NS and Yin J (2012). Edge universality of correlation matrices. Annals of Statistics 40(3), 1737–1763. [Google Scholar]
- Plerou V, Gopikrishnan P, Rosenow B, Amaral L, Guhr T, and Stanley H (2002). A random matrix approach to cross-correlations in financial data. Physical Review E 65, 066126. [DOI] [PubMed] [Google Scholar]
- Quadeer AA, Louie RHY, Shekhar K, Chakraborty AK, Hsing I-M, and McKay MR (2014). Statistical linkage analysis of substitutions in patient-derived sequences of genotype 1a Hepatitis C virus nonstructural protein 3 exposes targets for immunogen design. Journal of Virology 88(13), 7628–7644. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quadeer AA, Morales-Jimenez D, and McKay MR (2018). Co-evolution networks of HIV/HCV are modular with direct association to structure and function. PLoS Computational Biology 14(9), 1–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ruan D, Meng T, and Gao K (2016). A hybrid recommendation technique optimized by dimension reduction. In 2016 8th International Conference on Modelling, Identification and Control (ICMIC), pp. 429–433. [Google Scholar]
- Schott JR (1991). A test for a specific principal component of a correlation matrix. Journal of the American Statistical Association 86(415), 747–751. [Google Scholar]
- Vallet P, Mestre X, and Loubaton P (2015). Performance analysis of an improved MUSIC DoA estimator. IEEE Transactions on Signal Processing 63(23), 6407–6422. [Google Scholar]
- Xiao H and Zhou W (2010). Almost sure limit of the smallest eigenvalue of some sample correlation matrices. Journal of Theoretical Probability 23(1), 1–20. [Google Scholar]
- Yang L, McKay MR, and Couillet R (2018). High-dimensional MVDR beamforming: Optimized solutions based on spiked random matrix models. IEEE Transactions on Signal Processing 66(7), 1933–1947. [Google Scholar]
- Yao J, Zheng S, and Bai Z (2015). Large Sample Covariance Matrices and High-Dimensional Data Analysis. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


