Asymptotics of eigenstructure of sample correlation matrices for high-dimensional spiked models

David Morales-Jimenez; Iain M Johnstone; Matthew R McKay; Jeha Yang

doi:10.5705/ss.202019.0052

. Author manuscript; available in PMC: 2021 Apr 7.

Published in final edited form as: Stat Sin. 2021 Apr;31(2):571–601. doi: 10.5705/ss.202019.0052

Asymptotics of eigenstructure of sample correlation matrices for high-dimensional spiked models

David Morales-Jimenez ¹, Iain M Johnstone ², Matthew R McKay ³, Jeha Yang ²

PMCID: PMC8026145 NIHMSID: NIHMS1602372 PMID: 33833489

Abstract

Sample correlation matrices are widely used, but for high-dimensional data little is known about their spectral properties beyond “null models”, which assume the data have independent coordinates. In the class of spiked models, we apply random matrix theory to derive asymptotic first-order and distributional results for both leading eigenvalues and eigenvectors of sample correlation matrices, assuming a high-dimensional regime in which the ratio p/n, of number of variables p to sample size n, converges to a positive constant. While the first-order spectral properties of sample correlation matrices match those of sample covariance matrices, their asymptotic distributions can differ significantly. Indeed, the correlation-based fluctuations of both sample eigenvalues and eigenvectors are often remarkably smaller than those of their sample covariance counterparts.

Keywords: Sample correlation, eigenstructure, spiked models

1. Introduction

Estimating a correlation matrix is a fundamental statistical task. It is widely applied in areas such as viral sequence analysis and vaccine design in biology (Dahirel et al., 2011, Quadeer et al., 2014, 2018), large portfolio design in finance (Plerou et al., 2002), signal detection in radio astronomy (Leshem and van der Veen, 2001), and collaborative filtering (Liu et al., 2014, Ruan et al., 2016), among many others. In classical statistical settings, with a limited number of variables p and a large sample size n, the sample correlation matrix performs well and its statistical properties are well understood; see, for example, Girshick (1939), Konishi (1979), Fang and Krishnaiah (1982), Schott (1991), Kollo and Neudecker (1993), and Boik (2003). Modern applications, however, often exhibit high dimensionality, with large p and, in many cases, limited n. In such cases, sample correlation matrices become inaccurate owing to an aggregation of statistical noise across the matrix coordinates that is visible in the eigen-spectrum (El Karoui, 2009). This is particularly important in principal component analysis (PCA), which often involves projecting data onto the leading eigenvectors of the sample correlation matrix or, equivalently, onto those of the sample covariance matrix after standardizing the data.

Despite the extensive use of sample correlation matrices, relatively little is known about theoretical properties of their eigen-spectra in high dimensions. In contrast, sample covariance matrices have been studied extensively, and a rich body of literature now exists (e.g., Yao et al. (2015)). Their asymptotic properties have typically been described in high-dimensional settings in which the number of samples and variables both grow large, often though not always at the same rate, based on the theory of random matrices. Specific first- and second-order results for the eigenvalues and eigenvectors of sample covariance matrices are reviewed in Bai and Silverstein (2009), Couillet and Debbah (2011), and Yao et al. (2015).

For the spectra of high-dimensional sample correlation matrices, current theoretical results focus on the simplest “null model” scenario, in which the data are assumed to be independent. In this null model, correlation matrices share many of the same asymptotic properties as covariance matrices from independent and identically distributed (i.i.d.) data, with zero mean and unit variance. Thus, the empirical eigenvalue distribution converges to the Marchenko–Pastur distribution, almost surely (Jiang, 2004b), and the largest and smallest eigenvalues converge to the edges of this distribution (Jiang, 2004b, Xiao and Zhou, 2010). Moreover, the rescaled largest and smallest eigenvalues asymptotically follow the Tracy–Widom law (Bao et al., 2012, Pillai and Yin, 2012). Central limit theorems (CLTs) for linear spectral statistics have also been derived (Gao et al., 2017). A separate line of work studies the maximum absolute off-diagonal entry of sample correlation matrices, referred to as “coherence” (Jiang, 2004a, Cai and Jiang, 2011, 2012), which has been proposed as a statistic for conducting independence tests; see also Cochran et al. (1995), Mestre and Vallet (2017), and the references therein. Hero and Rajaratnam (2011, 2012) use a related statistic to identify variables exhibiting strong correlations, an approach referred to as “correlation screening.”

For non-trivial correlation models, however, asymptotic results for the spectra of sample correlation matrices are quite scarce. Notably, El Karoui (2009) shows that, for a fairly general class of covariance models with bounded spectral norm, to first order, the eigenvalues of sample correlation matrices asymptotically coincide with those of sample covariance matrices with unit-variance data, generalizing earlier results of Jiang (2004b) and Xiao and Zhou (2010). Under similar covariance assumptions, recent work also presents CLTs for linear spectral statistics of sample correlation matrices (Mestre and Vallet, 2017), extending the work of Gao et al. (2017). First order behavior again coincides with that of sample covariances. However, the asymptotic fluctuations are quite different for sample correlation matrices.

This study considers a particular class of correlation matrix models, the so-called “spiked models,” in which a few large or small eigenvalues of the population covariance (or correlation) matrix are assumed to be well separated from the rest (Johnstone, 2001). Spiked covariance models are relevant in applications in which the primary covariance information lies in a relatively small number of eigenmodes. Such applications include collaborative signal detection in cognitive radio systems (Bianchi et al., 2009), fault detection in sensor networks (Couillet and Hachem, 2013), adaptive beamforming in array processing (Hachem et al., 2013, Vallet et al., 2015, Yang et al., 2018), and protein contact prediction in biology (Cocco et al., 2011, 2013). The spectral properties of spiked covariance models have been well studied, with precise analytical results established for the asymptotic first-order and distributional properties of both eigenvalues and eigenvectors; see, for example, Baik et al. (2005), Baik and Silverstein (2006), Paul (2007), Bai and Yao (2008), Benaych-Georges and Nadakuditi (2011), Couillet and Hachem (2013), Bloemendal et al. (2016). For reviews, see also Couillet and Debbah (2011, Chapter 9) and Yao et al. (2015, Chapter 11).

Less is known about the spectrum of sample correlation matrices under spiked models. Although the asymptotic first-order behavior is expected to coincide with that of the sample covariance, as a consequence of El Karoui (2009), a simple simulation reveals striking differences in the fluctuations of both sample eigenvalues and eigenvectors; see Figure 1.

Figure 1: — A simple simulation shows remarkable distributional differences between sample covariance and sample correlation. From n = 200 i.i.d. Gaussian samples, $x_{i} \in ℝ^{100}$ , with covariance Σ = blkdiag(Σ_s, I₉₀), where ${(Σ_{s})}_{i, j = 1}^{10} = {(r^{| i - j |})}_{i, j = 1}^{10}$ , for r = 0.95, we compute the sample covariance and sample correlation, and show: (a) the empirical density (normalized histogram) of the largest sample eigenvalue, along with a Gaussian distribution with its estimated mean and standard deviation (solid line), and (b) a scatter plot of the leading sample eigenvector, projected onto the second (x-axis) and fourth (y-axis) population eigenvectors. A striking variance reduction is observed in the sample correlation for both (a) and (b). A similar variance reduction is observed for different choices of population eigenvectors in (b); the selected choice (being the second and fourth eigenvectors) facilitates the illustration of an additional correlation effect in the sample-to-population eigenvector projections.

Here, we present theoretical results to describe these observed phenomena. We obtain asymptotic first-order and distribution results for the eigenvalues and eigenvectors of sample correlation matrices under a spiked model. Paul (2007) proved theorems for sample covariance matrices in the special case of Gaussian data. In essence, we present analogs of these theorems for sample correlation matrices, and extend them to non-Gaussian data. To first order, the eigenvalues and eigenvectors coincide asymptotically with those of sample covariance matrices; however, their fluctuations can be very different. Indeed, for both the largest sample correlation eigenvalues (Theorem 1) and the projections of the corresponding eigenvectors (Theorem 2), the asymptotic variances admit a decomposition into three terms. The first term is just the asymptotic variance for sample covariance matrices generated from Gaussian data; the second adds corrections due to non-Gaussianity, and the third captures further corrections due to data normalization imposed by the sample correlation matrix. (This last amounts to normalizing the entries of the sample covariance matrix using the sample variances). Consistent with the example shown in Figure 1(a), in the CLT for the leading sample eigenvalues, the sample correlation eigenvalues often show lower fluctuations—despite the variance normalization—than those of the sample covariance eigenvalues. As seen in Figure 1(b), the (normalized) eigenvector projections are typically asymptotically correlated, even for Gaussian data, unlike the sample covariance setting of Paul (2007, Theorem 5).

Technical contributions

We build on and extend a set of random matrix tools for studying spiked covariance models. The companion manuscript (Johnstone and Yang, 2018) [JY], gives an exposition and parallel treatment for sample covariance matrices. Important adaptations are needed here to account for the data normalization imposed by sample correlation matrices. Among key technical contributions of our work, basic to our main theorems, are asymptotic first-order and distributional properties for bilinear forms and matrix quadratic forms with normalized entries, Section 4. A novel regularization-based proof strategy is used to establish the inconsistency of eigenvector projections in the case of “subcritical” spiked eigenvalues, Theorem 3.

Model M

Let $x \in ℝ^{m + p}$ be a random vector with finite (4+δ)th moment for some δ > 0. Consider the partition

x = [\begin{array}{l} ξ \\ η \end{array}] .

Assume that $ξ \in ℝ^{m}$ has mean zero and covariance Σ, and is independent of $η \in ℝ^{p}$ , which has i.i.d components η_i with mean zero and unit variance. Let $Σ_{D} = diag (σ_{1}^{2}, \dots, σ_{m}^{2})$ be the diagonal matrix containing the variances of ξ_i, and let $Γ = Σ_{D}^{- 1 / 2} Σ Σ_{D}^{- 1 / 2}$ be the correlation matrix of ξ with eigen-decomposition Γ = PLP^T, where P = [p₁, …, p_m] is the eigenvector matrix, and L = diag(ℓ₁, …, ℓ_m) contains the spike correlation eigenvalues ℓ₁ ≥ … ≥ ℓ_m > 0.

The correlation matrix of x is therefore Γ_x = blkdiag(Γ, I), with eigenvalues ℓ₁, …, ℓ_m,1, …, 1, and corresponding eigenvectors $p_{1}, \dots, p_{m}$ , e_m+1, …, e_m+p, where $p_{i} = {[p_{i}^{T} 0_{p}^{T}]}^{T}$ and e_j is the jth canonical vector (i.e., a vector of all zeros, except for a one in the jth coordinate).

Consider a sequence of i.i.d. copies of x, the first n of which fill the columns of the (m + p) × n data matrix X = (x_ij). We assume m is fixed, whereas p and n increase with

γ_{n} = p / n \to γ > 0 as p, n \to \infty .

Notation

Let S = n⁻¹XX^T be the sample covariance matrix, and $S_{D} = diag ({\hat{σ}}_{1}^{2}, \dots, {\hat{σ}}_{m + p}^{2})$ be the diagonal matrix containing the sample variances. Let $R = S_{D}^{- 1 / 2} S S_{D}^{- 1 / 2}$ be the sample correlation matrix, with corresponding νth sample eigenvalue and eigenvector satisfying

R {\hat{p}}_{ν} = {\hat{ℓ}}_{ν} {\hat{p}}_{ν},

where, for later use, we partition ${\hat{p}}_{ν} = {[{\hat{p}}_{ν}^{T}, {\hat{v}}_{ν}^{T}]}^{T}$ . Here ${\hat{p}}_{ν}$ is the subvector of ${\hat{p}}_{ν}$ restricted to the first m coordinates.

For $ℓ > 1 + \sqrt{γ}$ , define

ρ (ℓ, γ) = ℓ + γ \frac{ℓ}{ℓ - 1}, \dot{ρ} (ℓ, γ) = \frac{\partial ρ (ℓ, γ)}{\partial ℓ} = 1 - \frac{γ}{{(ℓ - 1)}^{2}} .

For an index ν, for which $ℓ_{ν} > 1 + \sqrt{γ}$ is a simple eigenvalue, set

ρ_{ν} = ρ (ℓ_{ν}, γ), ρ_{ν n} = ρ (ℓ_{ν}, γ_{n}), {\dot{ρ}}_{ν} = \dot{ρ} (ℓ_{ν}, γ), {\dot{ρ}}_{v n} = \dot{ρ} (ℓ_{ν}, γ_{n}) .

(1.1)

We refer to eigenvalues satisfying $ℓ_{ν} > 1 + \sqrt{γ}$ as “supercritical,” and those satisfying $ℓ_{ν} \leq 1 + \sqrt{γ}$ as “subcritical,” with the quantity $1 + \sqrt{γ}$ referred to as the “phase transition.”

To describe and interpret the variance terms in the limiting distributions to follow, we need some definitions. Let ${\bar{ξ}}_{i} = ξ_{i} / σ_{i}$ and $κ_{i j} = E {\bar{ξ}}_{i} {\bar{ξ}}_{j}$ denote the scaled components of ξ and their covariances; of course κ_ii = 1. The corresponding scaled fourth-order cumulants are

κ_{i j i^{'} j^{'}} = E [{\bar{ξ}}_{i} {\bar{ξ}}_{j} {\bar{ξ}}_{i^{'}} {\bar{ξ}}_{j^{'}}] - κ_{i j} κ_{i^{'} j^{'}} - κ_{i j^{'}} κ_{j i^{'}} - κ_{i i^{'}} κ_{j j^{'}} .

(1.2)

When ξ is Gaussian, $κ_{i j i^{'} j^{'}} \equiv 0$ .

The effect of variance scaling in the correlation matrix is described using additional quadratic functions of $({\bar{ξ}}_{i})$ , defined by

χ_{i j} = {\bar{ξ}}_{i} {\bar{ξ}}_{j}, ψ_{i j} = κ_{i j} ({\bar{ξ}}_{i}^{2} + {\bar{ξ}}_{j}^{2}) / 2

(1.3)

{\overset{ˇ}{κ}}_{i j i^{'} j^{'}} = Cov (ψ_{i j}, ψ_{i^{'} j^{'}}) - Cov (ψ_{i j}, χ_{i^{'} j^{'}}) - Cov (χ_{i j}, ψ_{i^{'} j^{'}}) .

(1.4)

Tensor notation

For convenience, it is useful to consider $κ_{i j i^{'} j^{'}}$ and ${\overset{ˇ}{κ}}_{i j i^{'} j^{'}}$ as entries of four-dimensional tensor arrays κ and $\overset{ˇ}{κ}$ , respectively, and to define an additional array $P^{μ μ^{'} ν ν^{'}}$ with entries $p_{μ, i} p_{μ^{'}, j} p_{ν, i^{'}} p_{ν^{'}, j^{'}}$ . In addition, define $P^{ν}$ as $P^{ν ν ν ν}$ . Finally, for a second array A of the same dimensions,

[P^{ν}, A] = \sum_{i, j, i^{'}, j^{'}} P_{i j i^{'} j^{'}}^{ν} A_{i j i^{'} j^{'}} .

2. Main results

Our first main result, proved in Section 5, gives the asymptotic properties of the largest (spike) eigenvalues of the sample correlation matrix:

Theorem 1

Assume Model M, and that $ℓ_{ν} > 1 + \sqrt{γ}$ is a simple eigenvalue. As p/n → γ > 0,

(i) {\hat{ℓ}}_{ν} \overset{a.s.}{\to} ρ_{ν}, (i i) \sqrt{n} ({\hat{ℓ}}_{ν} - ρ_{ν n}) \overset{D}{\to} N (0, {\tilde{σ}}_{ν}^{2}),

(2.5)

where

{\tilde{σ}}_{ν}^{2} = 2 {\dot{ρ}}_{ν} ℓ_{ν}^{2} + {\dot{ρ}}_{ν}^{2} [P^{ν}, κ] + {\dot{ρ}}_{ν}^{2} [P^{ν}, \overset{ˇ}{κ}] .

(2.6)

Centering at ρ_νn rather than at ρ_ν is important. If, for example, γ_n = γ + an^−1/2, then

\sqrt{n} ({\hat{ℓ}}_{ν} - ρ_{ν}) \overset{D}{\to} N (a ℓ_{ν} {(ℓ_{ν} - 1)}^{- 1}, {\tilde{σ}}_{ν}^{2}),

and we see a limiting shift. Furthermore, it may also be beneficial to consider ${\tilde{σ}}_{ν n}^{2}$ instead of ${\tilde{σ}}_{ν}^{2}$ , obtained by replacing ${\dot{ρ}}_{ν}$ with ${\dot{ρ}}_{ν n}$ in (2.6), such that

\sqrt{n} ({\hat{ℓ}}_{ν} - ρ_{ν n}) / {\tilde{σ}}_{ν n} \overset{D}{\to} N (0, 1) .

The asymptotic first-order limit in (i), which follows as an easy consequence of El Karoui (2009), coincides with that of the νth largest eigenvalue of a sample covariance matrix computed from data with population covariance Γ (Paul, 2007). This implies that, when constructing R, normalizing by the sample variances has no effect on the leading eigenvalues, at least to first order.

However, key differences are seen when looking at the asymptotic distribution, given in (ii), and in the variance formula (2.6) in particular. This can be readily interpreted. The first term corresponds to the variance in the Gaussian-covariance case of Paul (2007), again for samples with covariance Γ. The second provides a correction of that result for non-Gaussian data, see the companion article [JY]. The third term describes the contribution specific to sample correlation matrices, representing the effect of normalizing the data by the sample variances. This term is often negative, and is evaluated explicitly for Gaussian data in Corollary 1 below, proved in the Supplementary Material, S1.1.

Corollary 1

For ξ Gaussian, the asymptotic variance in Theorem 1 simplifies to

{\tilde{σ}}_{ν}^{2} = 2 ℓ_{ν}^{2} {\dot{ρ}}_{ν} [1 - {\dot{ρ}}_{ν} (2 ℓ_{ν} tr P_{D, ν}^{4} - tr {(P_{D, ν} Γ P_{D, ν})}^{2})],

where P_D,ν = diag(p_ν,1, …, p_ν,m).

Thus, computing the sample correlation results in the asymptotic variance being scaled by $1 - {\dot{ρ}}_{ν} Δ_{ν}$ , relative to the sample covariance, where

Δ_{ν} = 2 ℓ_{ν} tr P_{D, ν}^{4} - tr {(P_{D, ν} Γ P_{D, ν})}^{2} = 2 ℓ_{ν} \sum_{i} p_{ν, i}^{4} - \sum_{i, j} {(p_{ν, i} κ_{i j} p_{ν, j})}^{2}

is often positive, implying that spiked eigenvalues of the sample correlation often exhibit a smaller variance than those of the sample covariance. Indeed, such variance reduction occurs iff

\sum_{i, j} {(p_{ν, i} κ_{i j} p_{ν, j})}^{2} < 2 ℓ_{ν} \sum_{i} p_{ν, i}^{4} = \sum_{i, j} p_{ν, i} κ_{i j} p_{ν, j} (p_{ν, i}^{2} + p_{ν, j}^{2}),

(2.7)

with the last identity following from the fact that $ℓ_{ν} p_{ν, i} = \sum_{j} κ_{i j} p_{ν, j}$ . Condition (2.7), and variance reduction, holds in the following cases:

both Γ and p_ν have nonnegative entries, or
$2 ℓ_{ν} \sum_{i} p_{ν, i}^{4} > 1$ , or
$2 ℓ_{ν} > ℓ_{1}^{2} .$

In case (i), the inequalities $0 \leq p_{ν, i} κ_{i j} p_{ν, j} \leq 2 p_{ν, i} p_{ν, j} \leq p_{ν, i}^{2} + p_{ν, j}^{2}$ yield (2.7). Note that if Γ has nonnegative entries, then the Perron–Frobenius theorem establishes the existence of an eigenvector with nonnegative components for ℓ₁; furthermore, if Γ has positive entries, by the same theorem, ℓ₁ is simple and associated with an eigenvector with positive components. Case (ii) follows from $\sum_{i, j} {(p_{ν, i} κ_{i j} p_{ν, j})}^{2} \leq \sum_{i, j} {(p_{ν, i} p_{ν, j})}^{2} = 1$ , and holds if ℓ_ν > m/2, because $\sum_{i} p_{ν, i}^{4} \geq 1 / m$ . Case (iii) follows from the inequalities $2 p_{ν, i}^{2} p_{ν, j}^{2} \leq p_{ν, i}^{4} + p_{ν, j}^{4}$ and $\sum_{j} κ_{i j}^{2} = {(Γ^{2})}_{i i} \leq ‖ Γ^{2} ‖ = ℓ_{1}^{2}$ . Note that this is rather special, in that it has nothing to do with eigenvectors, and a necessary condition for it to hold is ℓ₁ ≤ 2.

Condition (2.7) can fail, however. For example, for even m and r ∈ (0, 1), consider

Γ = (\begin{matrix} 1 & - r \\ - r & 1 \end{matrix}) \otimes 1_{m / 2} 1_{m / 2}^{T},

where 1_m/2 is the (m/2)-dimensional vector of all ones, which corresponds to two negatively correlated groups of identical random vectors. This has simple supercritical eigenvalues ℓ₁ = (1 + r)m/2 and ℓ₂ = (1 − r)m/2 when $m > 2 (1 + \sqrt{γ}) / (1 - r)$ , with $p_{ν, i}^{2} = m^{- 1}$ for ν = 1, 2. One finds that Δ₂ = (1 − 2r − r²)/2 < 0 for $r > \sqrt{2} - 1$ , although Δ₁ > 0 because ℓ₁ > m/2, which implies case (ii).

We turn now to the eigenvectors. Again, fix an index ν for which $ℓ_{ν} > 1 + \sqrt{γ}$ is a simple eigenvalue of Γ, with corresponding eigenvector $p_{ν} = {[p_{ν}^{T} 0_{p}^{T}]}^{T}$ . Recall that ${\hat{p}}_{ν} = {[{\hat{p}}_{ν}^{T} {\hat{v}}_{ν}^{T}]}^{T}$ is the νth sample eigenvector of R, and let $a_{ν} = {\hat{p}}_{ν} / ‖ {\hat{p}}_{ν} ‖$ be the corresponding normalized subvector of ${\hat{p}}_{ν}$ , restricted to the first m coordinates. The next result establishes a limit for the eigenvector projection $〈 {\hat{p}}_{ν}, p_{ν} 〉$ , and a CLT for the normalized cross-projections $P^{T} a_{ν} = {[p_{1}^{T} a_{ν}, \dots, p_{m}^{T} a_{ν}]}^{T}$ ; see Sections 6.1 and 6.2.

Theorem 2

Assume Model M, and that $ℓ_{ν} > 1 + \sqrt{γ}$ is a simple eigenvalue. Then, as p/n → γ > 0,

(i) {〈 {\hat{p}}_{ν}, p_{ν} 〉}^{2} \overset{a.s.}{\to} {\dot{ρ}}_{ν} ℓ_{ν} / ρ_{ν}, (i i) \sqrt{n} (P^{T} a_{ν} - e_{ν}) \overset{D}{\to} N (0, Σ_{ν}),

where $Σ_{ν} = D_{ν} {\tilde{Σ}}_{ν} D_{ν}$ with

D_{ν} = \sum_{k \neq ν}^{m} {(ℓ_{ν} - ℓ_{k})}^{- 1} e_{k} e_{k}^{T}

(2.8)

{\tilde{Σ}}_{ν, k l} = {\dot{ρ}}_{ν}^{- 1} ℓ_{k} ℓ_{ν} δ_{k, l} + [P^{k ν l ν}, κ] + [P^{k ν l ν}, \overset{ˇ}{κ}],

(2.9)

where δ_k,l = 1 if k = l, and zero otherwise.

The CLT result in (ii) can be rephrased in terms of the entries of a_ν, for which we readily obtain $\sqrt{n} (a_{ν} - p_{ν}) \overset{D}{\to} N (0, P Σ_{ν} P^{T})$ ; note that Σ_ν has zeros in the νth row and the νth column.

As for the eigenvalues, Theorem 2 shows that the spiked eigenvectors of sample correlation matrices exhibit the same first-order behavior as those of the sample covariance (Paul, 2007). The difference again lies in the asymptotic fluctuations, captured by the covariance matrix Σ_ν. Note that this is decomposed as a product of $D_{ν} - a$ diagonal matrix—and the matrix ${\tilde{Σ}}_{ν}$ , which involves the three terms in (2.9). These terms have similar interpretations as those discussed previously in (2.6). That is, the first term captures the asymptotic fluctuations for a Gaussian-covariance model (Paul, 2007), the second term captures the effect of non-Gaussianity in the covariance case [JY], and the third term captures information specific to the correlation case, representing fluctuations due to sample variance normalization. Note that only the first term is diagonal in general, suggesting that the eigenvector projections may be asymptotically correlated, as seen earlier in Figure 1(b), right panel. This holds also for Gaussian data, evaluated explicitly in Corollary 2 below; see Supplementary Material, S1.2, for the proof. We note an interesting contrast with the eigenvector projections for covariance matrices (Paul, 2007), described only by the leading term in (2.9).

Corollary 2

For ξ Gaussian, the asymptotic covariance in Theorem 2 reduces to $Σ_{ν} = D_{ν} {\tilde{Σ}}_{ν} D_{ν}$ ,

{\tilde{Σ}}_{ν} = \frac{ℓ_{ν}}{{\dot{ρ}}_{ν}} L + (ℓ_{ν} I + L) (\frac{1}{2} Z - ℓ_{ν} Y) (ℓ_{ν} I + L) + ℓ_{ν} (ℓ_{ν}^{2} Y - L Y L),

where $Z = P^{T} P_{D, ν} (Γ \circ Γ) P_{D, ν} P$ , $Y = P^{T} P_{D, ν}^{2} P$ , and ∘ denotes the Hadamard product.

Thus, for Gaussian data, the entries of the asymptotic covariance matrix are given by (for k, l ≠ ν)

Σ_{ν, k l} = {(ℓ_{ν} - ℓ_{k})}^{- 1} {(ℓ_{ν} - ℓ_{l})}^{- 1} [\frac{ℓ_{ν}}{{\dot{ρ}}_{ν}} ℓ_{k} δ_{k, l} + (ℓ_{ν} + ℓ_{k}) (ℓ_{ν} + ℓ_{l}) \frac{Z_{k l}}{2} - ℓ_{ν} (ℓ_{ν} (ℓ_{k} + ℓ_{l}) + 2 ℓ_{k} ℓ_{l}) Y_{k l}] .

Consider now the subcritical case in which ν is such that $1 < ℓ_{ν} \leq 1 + \sqrt{γ}$ . Let $p_{ν}$ denote the corresponding population eigenvector, and let ${\hat{ℓ}}_{ν}$ and ${\hat{p}}_{ν}$ denote the corresponding sample eigenvalue and eigenvector, respectively. With proofs deferred to Sections 5.1 and 6.3, we have the following result:

Theorem 3

Assume Model M, and that $1 < ℓ_{ν} \leq 1 + \sqrt{γ}$ is a simple eigenvalue. Then, as p/n → γ > 0,

(i) {\hat{ℓ}}_{ν} \overset{a.s.}{\to} {(1 + \sqrt{γ})}^{2}, (i i) {〈 {\hat{p}}_{ν}, p_{ν} 〉}^{2} \overset{a.s.}{\to} 0 .

Once again, the asymptotic first-order limits of the sample eigenvalue and its associated eigenvector are the same as those obtained for the sample covariance (Paul, 2007).

Recall that our high-dimensional results assume an asymptotic regime where p/n → γ > 0, as opposed to the classical regime where p is fixed and n → ∞. The case of fixed p corresponds to γ = 0 and the spectral properties of the sample correlation matrix are well understood; see, for example, Girshick (1939), Konishi (1979), Fang and Krishnaiah (1982), Schott (1991), Kollo and Neudecker (1993), and Boik (2003). When γ = 0, the function ρ(ℓ) reduces to the identity. Indeed, for fixed p, there is no high-dimensional component η in Model M, and hence no biasing effect on ρ(ℓ, γ) that occurs when γ > 0. In particular, for fixed p there is no counterpart to our Theorem 3.

To summarize, in comparison to the high-dimensional (p/n → γ > 0) sample covariance setting, our results for the spiked eigenvalues and eigenvectors of sample correlation matrices confirm that the first-order asymptotic behavior is indeed equivalent to that of sample covariance matrices, in agreement with previous results and observations (El Karoui, 2009, Mestre and Vallet, 2017). While the eigenvalue limits in Theorem 1 and Theorem 3 follow as a straightforward consequence of El Karoui (2009), the eigenvector results of Theorem 2-(i) and Theorem 3-(ii) do not. In contrast to the first-order equivalences, important differences arise in the fluctuations of both the eigenvalues and eigenvectors, as shown by the asymptotic distributions of Theorem 1-(ii) and Theorem 2-(ii).

We illustrate these differences with a simple example having covariance $Γ = (1 - r) I_{m} + r 1_{m} 1_{m}^{T}$ , where r ∈ [0, 1]; that is, a model with unit variances and constant correlation r across all components. Moreover, ξ is assumed to be Gaussian for simplicity. In this setting, L = diag(ℓ₁, 1 − r, …, 1 − r), where ℓ₁ = 1 + r(m − 1) is supercritical iff $r > \sqrt{γ} / (m - 1)$ . Consider the largest sample eigenvalue ${\hat{ℓ}}_{1}$ in such a supercritical case. From Corollary 1, the asymptotic variances for the sample covariance and the sample correlation can be computed, yielding

σ_{1}^{2} = 2 ℓ_{1}^{2} {\dot{ρ}}_{1}, {\tilde{σ}}_{1}^{2} = σ_{1}^{2} (1 - {\dot{ρ}}_{1} Δ),

respectively, with $Δ = 2 ℓ_{1} tr P_{D}^{4} - tr {(P_{D} Γ P_{D})}^{2}$ , and where

P_{D} ≜ P_{D, 1} = m^{- 1 / 2} I_{m}, {\dot{ρ}}_{1} = 1 - \frac{γ}{r^{2} {(m - 1)}^{2}} .

Figure 2(a) plots these asymptotic variances versus r for various (γ,m). Indeed, the variance (fluctuation) for the sample correlation is consistently smaller than for the sample covariance. The difference is striking, becoming extremely large as r ↗ 1. Similar trends are observed for various choices of m and γ, being more pronounced for higher m, while not much affected by varying γ. This may be understood from the fact that, after writing Δ = r(2 − r) + (1 − r)²m⁻¹ = 1 − (1 − r)²(1 − m⁻¹),

\frac{{\tilde{σ}}_{1}^{2}}{σ_{1}^{2}} = 1 - {\dot{ρ}}_{1} Δ \to {\begin{array}{l} \frac{γ}{{(m - 1)}^{2}} & as & r \to 1, m fixed \\ {(1 - r)}^{2} & as & m \to \infty, r fixed. \end{array}

Turn now to the fluctuations of the leading sample eigenvector, in the same setting as above. Note that, in Corollary 2, for this particular case, one can deduce from P^TΓP = L that

Z = m^{- 1} (1 - r^{2}) I_{m} + r^{2} e_{1} e_{1}^{T}, Y = m^{- 1} I_{m} .

Also from Corollary 2, the asymptotic variances for the normalized sample-to-population eigenvector projection $p_{2}^{T} a_{1}$ , in the sample covariance and sample correlation cases, are computed as

Σ_{1, 22}^{cov} = \frac{ℓ_{1} ℓ_{2}}{{(r m)}^{2} {\dot{ρ}}_{1}}, Σ_{1, 22} = Σ_{1, 22}^{cov} - \frac{ζ}{{(rm)}^{2}} \frac{ℓ_{1} ℓ_{2} (ℓ_{1} + ℓ_{2})}{m},

respectively, where $ζ = 1 - r + \frac{1}{2} (1 + r) {(1 + \frac{1 - r}{r m})}^{- 1}$ , and we recall that ℓ₁ = 1 − r + rm and ℓ₂ = 1 − r. These variances are numerically evaluated in Figure 2(b) for the same parameter choices as before and, again, as functions of r. Note, however, that for better visual appreciation, the range of r has been restricted to supercritical values sufficiently above the critical point $\sqrt{γ} / (m - 1)$ , because the variance explodes at that point. The comparative evaluation again shows smaller variances for the sample correlation. The variance reduction here is less visible in the graphs, because both Σ_1,22 and $Σ_{1, 22}^{cov}$ vanish as r → 1. The ratio, however, behaves quite similarly to the variance ratio ${\tilde{σ}}_{1}^{2} / σ_{1}^{2}$ :

\frac{Σ_{1, 22}}{Σ_{1, 22}^{cov}} = 1 - ζ_{{\dot{ρ}}_{1}} \frac{(ℓ_{1} + ℓ_{2})}{m} \to {\begin{array}{l} \frac{γ}{{(m - 1)}^{2}} & as r \to 1, m fixed \\ (1 - r) (1 - r / 2) & as m \to \infty, r fixed. \end{array}

We end the discussion of our main results with a few remarks about possible extensions. Our results assume that ℓ_ν > 1 is a simple eigenvalue, but extensions for small spikes with ℓ_ν < 1 and for spikes with multiplicities should be possible. Analogous results for eigenvalues have been obtained for sample covariance matrices for ℓ_ν < 1, including multiplicities greater than one (e.g., see Bai and Yao (2008)), giving reason to expect corresponding results for correlation matrices. Extensions of our results for eigenvalues and eigenvectors of sample correlation matrices for simple ℓ_ν < 1 should be fairly straightforward, though the cases γ < 1, γ = 1, and γ > 1 would need separate treatment. Extensions for spikes with multiplicities are also possible, but in this case the eigenvectors are not well defined and one would need to consider subspace projections, requiring non-trivial modifications of our technical arguments.

The remainder of the paper proceeds as follows. First, in Section 3, we introduce key quantities and identities used in the derivations. Section 4 presents necessary asymptotic properties for bilinear forms and matrix quadratic forms with normalized entries, with the corresponding proofs relegated to the Supplementary Material, Section S3. These properties provide a foundation for describing the asymptotic convergence and distribution of eigenvalues and eigenvectors of sample correlation matrices, derived in Sections 5 and 6 respectively.

As already noted, a parallel treatment for the simpler case of covariance matrices is given in a supplementary manuscript [JY]. This aims at a unified exposition of known spectral properties of spiked covariance matrices as a benchmark for the current work, along with additional citations to the literature.

3. Preliminaries

We begin with a block representation and some associated reductions for the sample correlation matrix R. These are well known in the covariance matrix setting. As with the partition of x in Model M, consider

X = [\begin{array}{l} X_{1} \\ X_{2} \end{array}], X_{1} \in ℝ^{m \times n}, X_{2} \in ℝ^{p \times n} .

Write S_D = blkdiag(S_D1, S_D2), with S_D1 containing the sample variances corresponding to ξ, and S_D2 containing those corresponding to η. Define the “normalized” data matrices ${\bar{X}}_{1} = S_{D 1}^{- 1 / 2} X_{1}$ and ${\bar{X}}_{2} = S_{D 2}^{- 1 / 2} X_{2}$ , such that

R = n^{- 1} [\begin{matrix} {\bar{X}}_{1} {\bar{X}}_{1}^{T} & {\bar{X}}_{1} {\bar{X}}_{2}^{T} \\ {\bar{X}}_{2} {\bar{X}}_{1}^{T} & {\bar{X}}_{2} {\bar{X}}_{2}^{T} \end{matrix}] = [\begin{array}{l} R_{11} & R_{12} \\ R_{21} & R_{22} \end{array}]; {\hat{p}}_{ν} = [\begin{array}{l} {\hat{p}}_{ν} \\ {\hat{v}}_{ν} \end{array}] .

This partitioning of the eigenvector equation $R {\hat{p}}_{ν} = {\hat{ℓ}}_{ν} {\hat{p}}_{ν}$ , along with ${\hat{p}}_{ν} = {[{\hat{p}}_{ν}^{T}, {\hat{v}}_{ν}^{T}]}^{T}$ , yields

R_{11} {\hat{p}}_{ν} + R_{12} {\hat{v}}_{ν} = {\hat{ℓ}}_{ν} {\hat{p}}_{v}

R_{21} {\hat{p}}_{ν} + R_{22} {\hat{v}}_{ν} = {\hat{ℓ}}_{ν} {\hat{v}}_{ν} .

From the second equation, ${\hat{v}}_{ν} = {({\hat{ℓ}}_{ν} I_{p} - R_{22})}^{- 1} R_{21} {\hat{p}}_{ν}$ . Substituting this into the first equation yields

K ({\hat{ℓ}}_{ν}) {\hat{p}}_{ν} = {\hat{ℓ}}_{ν} {\hat{p}}_{ν}, with K (t) = R_{11} + R_{12} {(t I_{p} - R_{22})}^{- 1} R_{21} .

Thus, ${\hat{ℓ}}_{v}$ is an eigenvalue of $K ({\hat{ℓ}}_{ν})$ , with associated eigenvector ${\hat{p}}_{ν}$ ; this is central to our derivations. Note that $K ({\hat{ℓ}}_{ν})$ is well defined if ${\hat{ℓ}}_{v}$ is well separated from the eigenvalues of R₂₂; Section 5.1 shows that this occurs with probability one for all large n when ℓ_ν is supercritical. Furthermore, the normalization condition, ${\hat{p}}_{ν}^{T} {\hat{p}}_{ν} + {\hat{v}}_{ν}^{T} {\hat{v}}_{ν} = 1$ yields

{\hat{p}}_{ν}^{T} (I_{m} + Q_{ν}) {\hat{p}}_{ν} = 1, Q_{ν} = R_{12} {({\hat{ℓ}}_{ν} I_{p} - R_{22})}^{- 2} R_{21} .

Phrased in terms of the signal-space normalized eigenvector $a_{ν} = {\hat{p}}_{ν} / ‖ {\hat{p}}_{ν} ‖$ , we have

K ({\hat{ℓ}}_{ν}) a_{ν} = {\hat{ℓ}}_{ν} a_{ν}, a_{ν}^{T} (I_{m} + Q_{ν}) a_{ν} = {‖ {\hat{p}}_{ν} ‖}^{- 2} .

(3.10)

Note also that the sample-to-population inner product can be rewritten as

〈 {\hat{p}}_{ν}, p_{ν} 〉 = 〈 {\hat{p}}_{ν}, p_{ν} 〉 = ‖ {\hat{p}}_{ν} ‖ 〈 a_{ν}, p_{ν} 〉 .

(3.11)

In the derivation of our CLT results, we use an eigenvector perturbation formula with quadratic error bound given in [JY, Lemma 13], itself a modification of the arguments in Paul (2007). This yields the key expansion

a_{ν} - p_{ν} = - R_{ν n} D_{ν} p_{ν} + r_{ν},

(3.12)

where

R_{ν n} = \frac{ℓ_{ν}}{ρ_{ν n}} \sum_{k \neq ν}^{m} {(ℓ_{k} - ℓ_{ν})}^{- 1} p_{k} p_{k}^{T}, D_{ν} = K ({\hat{ℓ}}_{ν}) - (ρ_{ν n} / ℓ_{ν}) Γ, ‖ r_{ν} ‖ = O ({‖ D_{ν} ‖}^{2}) .

The derivations of our eigenvalue and eigenvector results, presented in Sections 5 and 6 respectively, take (3.10), (3.11) and (3.12) as points of departure, and rely on asymptotic properties of the key objects $K ({\hat{ℓ}}_{ν})$ and Q_ν. In particular, K(t) can be expressed as the random matrix quadratic form

K (t) = n^{- 1} {\bar{X}}_{1} B_{n} (t) {\bar{X}}_{1}^{T},

(3.13)

where, using the Woodbury identity,

B_{n} (t) = I_{n} + n^{- 1} {\bar{X}}_{2}^{T} {(t I_{p} - R_{22})}^{- 1} {\bar{X}}_{2} = t {(t I_{n} - n^{- 1} {\bar{X}}_{2}^{T} {\bar{X}}_{2})}^{- 1} .

Thus, our key objects are random quadratic forms involving the normalized data matrices ${\bar{X}}_{1}$ and ${\bar{X}}_{2}$ . The asymptotic properties of these forms are foundational to our results, and are presented next.

4. Quadratic forms with normalized entries

In this section, we establish the first-order (deterministic) convergence and a CLT for matrix quadratic forms of the type $n^{- 1} {\bar{X}}_{1} B_{n} {\bar{X}}_{1}^{T}$ , where B_n is a matrix with bounded spectral norm. While being essential to our purposes, some of the technical results may be of independent interest; thus, we first present the general results, and then apply these in the context of Model M.

4.1. First-order convergence

To establish the first-order convergence, we first require some results on bilinear forms involving correlated random vectors of unit length. A main technical result (see Supplementary Material, S3.1) is the following:

Lemma 1

Let B be an n × n nonrandom symmetric matrix, and let $x, y \in ℝ^{n}$ be random vectors of i.i.d. entries with mean zero, variance one, $E {| x_{i} |}^{l}$ , $E {| y_{i} |}^{l} \leq ν_{l}$ , and $E [x_{i} y_{i}] = ρ$ . Let $\bar{x} = \sqrt{n} x / ‖ x ‖$ and $\bar{y} = \sqrt{n} y / ‖ y ‖$ . Then, for any s ≥ 1,

E {| n^{- 1} {\bar{x}}^{T} B \bar{y} - ρ n^{- 1} tr B |}^{s} \leq C_{s} [n^{- s} (ν_{2 s} tr B^{s} + {(ν_{4} tr B^{2})}^{s / 2}) + ‖ B ‖^{s} (n^{- s / 2} ν_{4}^{s / 2} + n^{- s + 1} ν_{2 s})],

where $C_{s}$ is a constant depending only on s.

This is a generalization of Gao et al. (2017, Lemma 5), which established a corresponding bound for normalized quadratic forms. Lemma 1 leads to the following first-order convergence result:

Corollary 3

Let x,y be random vectors of i.i.d. entries with mean zero, variance one, $E {| x_{i} |}^{4 + δ}$ , $E {| y_{i} |}^{4 + δ} < \infty$ for some δ > 0, and $E [x_{i} y_{i}] = ρ$ . Define $\bar{x} = \sqrt{n} x / ‖ x ‖$ and $\bar{y} = \sqrt{n} y / ‖ y ‖$ , and let B_n be a sequence of n × n symmetric matrices, with ‖B_n‖ bounded. Then,

n^{- 1} {\bar{x}}^{T} B_{n} \bar{y} - n^{- 1} ρ tr B_{n} \overset{a.s.}{\to} 0.

Proof. Because the (4 + δ)th moment and ‖B_n‖ are bounded, from Lemma 1,

E {| n^{- 1} {\bar{x}}^{T} B_{n} \bar{y} - n^{- 1} ρ tr B_{n} |}^{2 + δ / 2} \leq O (n^{- (1 + δ / 4)}) .

The convergence then follows from Markov’s inequality and the Borel–Cantelli lemma. ☐

We now apply this to our Model M with random matrices $B_{n} ({\bar{X}}_{2})$ , independent of ${\bar{X}}_{1}$ :

Lemma 2

Assume Model M, and suppose that $B_{n} = B_{n} ({\bar{X}}_{2})$ is a sequence of random symmetric matrices, for which ‖B_n‖ is O_a.s.(1). Then,

n^{- 1} {\bar{X}}_{1} B_{n} ({\bar{X}}_{2}) {\bar{X}}_{1}^{T} - n^{- 1} tr B_{n} ({\bar{X}}_{2}) Γ \overset{a . s .}{\to} 0.

Proof. This follows from Fubini’s theorem. Specifically, one may use the arguments in the proof of [JY, Lemma 5], applying Corollary 3, and noting that ${\bar{X}}_{1}$ is independent of $B_{n} ({\bar{X}}_{2})$ . ☐

4.2. Central Limit Theorem

To establish our main matrix quadratic-form CLT result, we first derive a CLT for scalar bilinear forms involving normalized random vectors. To this end, we must introduce some further notation. Consider zero-mean random vectors $(x, y) \in ℝ^{M} \times ℝ^{M}$ , with

Cov (\begin{array}{l} x \\ y \end{array}) = C = (\begin{array}{l} C^{x x} & C^{x y} \\ C^{y x} & C^{y y} \end{array}),

where $C_{l l^{'}}^{x y} = E [x_{l} y_{l^{'}}]$ . Assume $C_{l l}^{x x} = C_{l l}^{y y} = 1$ ; that is, all components of the x and y vectors have unit variance and $ρ_{l} = C_{l l}^{x y} = E [x_{l} y_{l}]$ . We first introduce notation for some quadratic functions of x_l, y_l. Let $z, w \in ℝ^{M}$ , with

z_{l} = x_{l} y_{l}, w_{l} = ρ_{l} (x_{l}^{2} + y_{l}^{2}) / 2, C^{z z} = Cov (z), C^{w z} = Cov (z, w), e t c .

Let X = (x_li)_M×n and Y = (y_li)_M×n be data matrices based on n i.i.d. observations of (x, y), and define the “normalized” data matrices $\bar{X} = {\hat{Σ}}_{x}^{- 1 / 2} X$ and $\bar{Y} = {\hat{Σ}}_{y}^{- 1 / 2} Y$ , where ${\hat{Σ}}_{x} = diag ({\hat{σ}}_{x_{1}}^{2}, \dots, {\hat{σ}}_{x_{M}}^{2})$ , ${\hat{Σ}}_{y} = diag ({\hat{σ}}_{y_{1}}^{2}, \dots, {\hat{σ}}_{y_{M}}^{2})$ , and ${\hat{σ}}_{x_{l}}^{2} = n^{- 1} \sum_{i = 1}^{n} x_{l i}^{2}$ , ${\hat{σ}}_{y_{l}}^{2} = n^{- 1} \sum_{i = 1}^{n} y_{l i}^{2}$ . Then, we use the following notation for the rows ${\bar{x}}_{l}^{T}$ and ${\bar{y}}_{l}^{T}$ of the normalized data matrices

\bar{X} = {({\bar{x}}_{l i})}_{M \times n} = [\begin{matrix} {\bar{x}}_{1.}^{T} \\ ⋮ \\ {\bar{x}}_{M \cdot}^{T} \end{matrix}], \bar{Y} = {({\bar{y}}_{l i})}_{M \times n} = [\begin{matrix} {\bar{y}}_{1}^{T} \\ ⋮ \\ {\bar{y}}_{M}^{T} \end{matrix}] .

With this setup, we have the following result, proved in the Supplementary Material, S3.2:

Proposition 1

Let B_n = (b_n,ij) be random symmetric n × n matrices, independent of X, Y, such that for some finite β, ‖B_n‖ ≤ β for all n, and

n^{- 1} \sum_{i = 1}^{n} b_{n, i i}^{2} \overset{p}{\to} ω, n^{- 1} tr B_{n}^{2} \overset{p}{\to} θ, {(n^{- 1} tr B_{n})}^{2} \overset{p}{\to} ϕ,

all finite. In addition, define $Z_{n} \in ℝ^{M}$ , with components

Z_{n, l} = n^{- 1 / 2} [{\bar{x}}_{l .}^{T} B_{n} {\bar{y}}_{l .} - ρ_{l} tr B_{n}] .

Then, $Z_{n} \overset{D}{\to} N_{M} (0, D)$ , with

D = (θ - ω) J + ω K_{1} + ϕ K_{2} = θ J + ω K + ϕ K_{2},

(4.14)

where K = K₁ − J and J,K₁,K₂ are matrices defined by

J = C^{x y} \circ C^{y x} + C^{x x} \circ C^{y y} K_{1} = C^{z z} K_{2} = C^{w w} - C^{w z} - C^{z w} .

(4.15)

The entries of K are fourth-order cumulants of x and y:

K_{l l^{'}} = E (x_{l} y_{l} x_{l^{'}} y_{l^{'}}) - E (x_{l} y_{l}) E (x_{l^{'}} y_{l^{'}}) - E (x_{l} y_{l^{'}}) E (y_{l} x_{l^{'}}) - E (x_{l} x_{l^{'}}) E (y_{l} y_{l^{'}}) .

(4.16)

Hence, K vanishes if x, y are Gaussian.

The corresponding result with unnormalized vectors is established in [JY Theorem 10]. The terms θJ + ωK appear in that case, and the additional term ϕK₂ reflects the normalization in ${\bar{x}}_{l .}$ and ${\bar{y}}_{l .}$ . As in [JY], the proof is based on the martingale CLT, rather than the moment method used in Bai and Yao (2008), which stated a similar result for quadratic forms involving unnormalized random vectors.

While potentially of independent interest, Proposition 1 is important for our purposes through its application to Model M.

Proposition 2

Assume Model M, and consider B_n as in Proposition 1. Then,

W_{n} = n^{- 1 / 2} [{\bar{X}}_{1} B_{n} {\bar{X}}_{1}^{T} - (tr B_{n}) Γ] \overset{D}{\to} W,

where W is a symmetric m × m Gaussian matrix with entries W_ij, mean zero, and covariances given by

Cov [W_{i j}, W_{i^{'} j^{'}}] = θ (κ_{i j^{'}} κ_{j i^{'}} + κ_{i i^{'}} κ_{j j^{'}}) + ω κ_{i j i^{'} j^{'}} + ϕ {\overset{ˇ}{κ}}_{i j i^{'} j^{'}},

(4.17)

for i ≤ j and i′ ≤ j′.

Proof. The result follows from Proposition 1 by turning the matrix quadratic form ${\bar{X}}_{1} B_{n} {\bar{X}}_{1}^{T}$ into a vector of bilinear forms; see, for example, [JY, Proposition 6] and Bai and Yao (2008, Proposition 3.1). Specifically, use an index l for the M = m(m + 1)/2 pairs (i, j), with 1 ≤ i ≤ j ≤ m. Build the random vectors (x, y) for Proposition 1 as follows: if l = (i, j), then set x_l = ξ_i/σ_i and y_l = ξ_j/σ_j. In the resulting covariance matrix C for (x, y), if also l′ = (i′, j′),

C_{l l^{'}}^{x y} = E [ξ_{i} ξ_{j^{'}}] / (σ_{i} σ_{j^{'}}) = κ_{i j^{'}}, C_{l l^{'}}^{y x} = κ_{j i^{'}}, C_{l l^{'}}^{x x} = κ_{i i^{'}}, C_{l l^{'}}^{y y} = κ_{j j^{'}}

and, in particular, $ρ_{l} = C_{l l}^{x y} = κ_{i j}$ and $ρ_{l^{'}} = κ_{i^{'} j^{'}}$ , whereas $C_{l l}^{x x} = C_{l l}^{y y} = 1$ . Component W_n,ij corresponds to component Z_l in Proposition 1. Thus, we conclude that $W_{n} \overset{D}{\to} W$ , where W is a Gaussian matrix with zero mean and $Cov (W_{i j}, W_{i^{'}, j^{'}}) = D_{l l^{'}}$ , given by Proposition 1. It remains to interpret the quantities in (4.14) in terms of Model M. Substituting $x_{l} = {\bar{ξ}}_{i}$ and $y_{l} = {\bar{ξ}}_{j}$ into (4.16) and chasing definitions, we obtain $J_{l l^{'}} = κ_{i j^{'}} κ_{j i^{'}} + κ_{i i^{'}} κ_{j j^{'}}$ and $K_{l l^{'}} = κ_{i j i^{'} j^{'}}$ . Observing that z_l = x_ly_l = χ_ij and $w_{l} = ρ_{l} (x_{l}^{2} + y_{l}^{2}) / 2 = ψ_{i j}$ , we similarly find that $K_{2, l l^{'}} = {\overset{ˇ}{κ}}_{i j i^{'} j^{'}}$ . ☐

5. Proofs of the eigenvalue results

In this section, we derive the main eigenvalue results, presented in Theorem 1 and Theorem 3-(i).

5.1. Preliminaries

Convergence properties of the eigenvalues of R₂₂

It is well known that the empirical spectral density (ESD) of S₂₂ converges weakly a.s. to the Marchenko–Pastur (MP) law F_γ, and that the extreme non-trivial eigenvalues converge to the edges of the support of F_γ. For the sample correlation case, Jiang (2004b) shows that the same is true for R₂₂. That is, the empirical distribution of the eigenvalues μ₁ ≥ … ≥ μ_p of the “noise” correlation matrix $R_{22} = n^{- 1} {\bar{X}}_{2} {\bar{X}}_{2}^{T}$ converges weakly a.s. to the MP law F_γ, supported on $[a_{γ}, b_{γ}] = [{(1 - \sqrt{γ})}^{2}, {(1 + \sqrt{γ})}^{2}]$ , if γ ≤ 1, and on {0} ∪ [a_γ, b_γ] otherwise. Furthermore, the ESD of the n × n companion matrix $C_{n} = n^{- 1} {\bar{X}}_{2}^{T} {\bar{X}}_{2}$ , denoted by F_n, converges weakly a.s. to the “companion MP law” F_γ = (1 − γ)1_[0,∞) + γF_γ, where 1_A denotes the indicator function on set A.

In addition, Jiang (2004b) shows that

μ_{1} \overset{a . s .}{\to} b_{γ} and μ_{p \land n} \overset{a . s .}{\to} a_{γ} .

(5.18)

Based on these results, if f_n → f uniformly as continuous functions on the closure $I$ of a bounded neighborhood of the support of F_γ, then:

\int f_{n} (x) F_{n} (d x) \overset{a . s .}{\to} \int f (x) F_{γ} (d x) .

(5.19)

If supp(F_n) is not contained in $I$ , then the left side integral may not be defined. However, such an event occurs for at most finitely many n with probability one.

Almost sure limit of ${\hat{ℓ}}_{ν}$

The statements in Theorem 1-(i) and Theorem 3-(i) follow easily from known results. Specifically, denote the νth eigenvalue of the sample covariance S by ${\hat{λ}}_{ν}$ . The almost sure limits

{\hat{λ}}_{ν} \overset{a . s .}{\to} {\begin{matrix} ρ_{ν}, & ℓ_{ν} > 1 + \sqrt{γ} \\ {(1 + \sqrt{γ})}^{2}, & 1 < ℓ_{ν} \leq 1 + \sqrt{γ} \end{matrix}

(5.20)

were established in Baik and Silverstein (2006). From the proof of El Karoui (2009, Lemma 1),

\max_{i = 1, ..., m} | {\hat{λ}}_{i} - {\hat{ℓ}}_{i} | \overset{a . s .}{\to} 0.

Therefore, the same almost sure limits as (5.20) hold for ${\hat{ℓ}}_{ν}$ .

High-probability events J_nϵ, J_nϵ1

When necessary, we may confine attention to the event $J_{n ϵ} = {{\hat{ℓ}}_{ν} > \min (ρ_{ν}, ρ_{ν n}) - ϵ, μ_{1} \leq b_{γ} + ϵ}$ or $J_{n ϵ 1} = {μ_{1} \leq b_{γ} + ϵ}$ , with ϵ > 0 chosen such that ρ_ν – b_γ ≥ 3ϵ, because from (2.5) (proven above) and (5.18), these events occur with probability one for all large n.

Asymptotic expansion of $K ({\hat{ℓ}}_{ν})$

We establish an asymptotic stochastic expansion for the quadratic form $K ({\hat{ℓ}}_{ν})$ . Specifically, using the decomposition

K ({\hat{ℓ}}_{ν}) = K (ρ_{ν n}) + [K ({\hat{ℓ}}_{ν}) - K (ρ_{ν n})],

(5.21)

we show that

K (ρ_{ν n}) \overset{a . s .}{\to} - ρ_{ν} m (ρ_{ν}; γ) Γ = (ρ_{ν} / ℓ_{ν}) Γ

(5.22)

and

K ({\hat{ℓ}}_{ν}) - K (ρ_{ν n}) = - ({\hat{ℓ}}_{ν} - ρ_{ν n}) [c (ρ_{ν}) Γ + o_{a . s .} (1)],

(5.23)

where, for $t \notin supp (F_{γ})$ ,

m (t; γ) = \int {(x - t)}^{- 1} F_{γ} (d x), c (t) = \int x {(t - x)}^{- 2} F_{γ} (d x) .

Here, m is the Stieltjes transform of the companion distribution F_γ.

In establishing (5.22), start by taking sufficiently large n such that |ρ_νn – ρ_ν| ≤ ϵ, with ϵ defined as above. For such n, on J_nϵ1, we have

‖ B_{n} (ρ_{ν n}) ‖ \leq \frac{ρ_{ν} + ϵ}{ϵ} .

Because J_nϵ1 holds with probability one for all large n, ‖B_n(ρ_νn)‖ = O_a.s.(1) and, therefore, it follows from Lemma 2 that

K (ρ_{ν n}) - n^{- 1} tr B_{n} (ρ_{ν n}) Γ \overset{a . s .}{\to} 0.

In addition, (5.19) yields

n^{- 1} tr B_{n} (ρ_{ν n}) = \int ρ_{ν n} {(ρ_{ν n} - x)}^{- 1} F_{n} (d x) \overset{a.s.}{\to} \int ρ_{ν} {(ρ_{ν} - x)}^{- 1} F_{γ} (d x) = - ρ_{ν} m (ρ_{ν}; γ) .

Explicit evaluation gives m(ρ_ν; γ) = −1/ℓ_ν, [JY, Appendix A], and (5.22) follows.

To establish (5.23), we start by recalling that $C_{n} = n^{- 1} {\bar{X}}_{2}^{T} {\bar{X}}_{2}$ , and introduce the resolvent notation Z(t) = (tI_n − C_n)⁻¹, such that B_n(t) = tZ(t) and $K (t) = n^{- 1} {\bar{X}}_{1} t Z (t) {\bar{X}}_{1}^{T}$ . From the resolvent identity, that is, A⁻¹ − B⁻¹ = A⁻¹(B − A)B⁻¹ for square invertible A and B, and noting that tZ(t) = C_nZ(t) + I from the Woodbury identity, we have, for t₁, t₂ > b_γ,

t_{1} Z (t_{1}) - t_{2} Z (t_{2}) = - (t_{1} - t_{2}) C_{n} Z (t_{1}) Z (t_{2})

and, therefore,

K ({\hat{ℓ}}_{ν}) - K (ρ_{ν n}) = - ({\hat{ℓ}}_{ν} - ρ_{ν n}) n^{- 1} {\bar{X}}_{1} C_{n} Z ({\hat{ℓ}}_{ν}) Z (ρ_{ν n}) {\bar{X}}_{1}^{T} .

Moreover, again by the resolvent identity, $Z ({\hat{ℓ}}_{ν}) = Z (ρ_{ν n}) - ({\hat{ℓ}}_{ν} - ρ_{ν n}) Z ({\hat{ℓ}}_{ν}) Z (ρ_{ν n})$ , which yields

K ({\hat{ℓ}}_{ν}) - K (ρ_{ν n}) = - ({\hat{ℓ}}_{ν} - ρ_{ν n}) n^{- 1} {\bar{X}}_{1} B_{n 1} (ρ_{ν n}, ρ_{ν n}) {\bar{X}}_{1}^{T} + {({\hat{ℓ}}_{ν} - ρ_{ν n})}^{2} n^{- 1} {\bar{X}}_{1} B_{n 2} ({\hat{ℓ}}_{ν}, ρ_{ν n}) {\bar{X}}_{1}^{T},

(5.24)

with B_nr(t₁, t₂) defined as

B_{n r} (t_{1}, t_{2}) = C_{n} Z (t_{1}) Z^{r} (t_{2}) .

(5.25)

We now characterize the first-order behavior of the two matrix quadratic forms in (5.24). For the first, we simply mirror the arguments of the proof of (5.22) to obtain

n^{- 1} {\bar{X}}_{1} B_{n 1} (ρ_{ν n}, ρ_{ν n}) {\bar{X}}_{1}^{T} \overset{a.s.}{\to} c (ρ_{ν}) Γ .

For the second, we again apply similar reasoning, operating on the event J_nϵ. Specifically, it is easy to establish that on J_nϵ, and for n sufficiently large that |ρ_νn – ρ_ν| ≤ ϵ, $‖ B_{n 2} ({\hat{ℓ}}_{ν}, ρ_{ν n}) ‖$ is bounded. Hence, $‖ B_{n 2} ({\hat{ℓ}}_{ν}, ρ_{ν n}) ‖ = O_{a . s .} (1)$ , and it follows from Lemma 2 and (5.19) that

n^{- 1} {\bar{X}}_{1} B_{n 2} ({\hat{ℓ}}_{ν}, ρ_{ν n}) {\bar{X}}_{1}^{T} = O_{a . s .} (1) .

The expansion in (5.23) is obtained by combining the latter two equations with (5.24).

CLT of K(ρ_νn)

We now specialize Proposition 2 for the matrix quadratic form K(ρ_νn).

Proposition 3

Assume Model M, and define ρ_νn by (1.1) and K(ρ_νn) by (3.13). Then,

W_{n} (ρ_{ν n}) = \sqrt{n} [K (ρ_{ν n}) - n^{- 1} tr B_{n} (ρ_{ν n}) Γ] \overset{D}{\to} W^{ν},

which is a symmetric Gaussian random matrix with entries $W_{i j}^{ν}$ , mean zero, and covariances given by

Cov [W_{i j}^{ν}, W_{i^{'} j^{'}}^{ν}] = \frac{ρ_{ν}^{2}}{ℓ_{ν}^{2} {\dot{ρ}}_{ν}} (κ_{i j^{'}} κ_{j i^{'}} + κ_{i i^{'}} κ_{j j^{'}}) + \frac{ρ_{ν}^{2}}{ℓ_{ν}^{2}} (κ_{i j i^{'} j^{'}} + {\overset{ˇ}{κ}}_{i j i^{'} j^{'}}),

(5.26)

where ρ_ν and ${\dot{ρ}}_{ν}$ are defined in (1.1), and the terms in parentheses are defined in (1.2) and (1.4).

Proof. Recall that J_nϵ1 = {μ₁ ≤ b_γ + ϵ}, and consider sufficiently large n such that ρ_νn > ρ_ν – ϵ. Then, we may apply Proposition 2 with $B_{n} = B_{n} (ρ_{ν n}) 1_{J_{n \in 1}}$ , which is independent of ${\bar{X}}_{1}$ , and for which ‖B_n‖ is bounded. Specifically, the result follows by applying Proposition 2 to $W_{n} (ρ_{ν n}) 1_{J_{n \in 1}}$ , along with the fact that $1_{J_{n \in 1}} \overset{a.s.}{\to} 1$ , and particularizing ω, θ, and ϕ in (4.17). These quantities, denoted respectively by ω_ν, θ_ν, and ϕ_ν, can be computed as in [JY, Appendix A], yielding

ω_{ν} = ϕ_{ν} = \frac{{(ℓ_{ν} - 1 + γ)}^{2}}{{(ℓ_{ν} - 1)}^{2}} = \frac{ρ_{ν}^{2}}{ℓ_{ν}^{2}}, θ_{ν} = \frac{{(ℓ_{ν} - 1 + γ)}^{2}}{{(ℓ_{ν} - 1)}^{2} - γ} = \frac{ω_{ν}}{{\dot{ρ}}_{ν}} .

Tightness properties

Lastly, we establish some tightness properties essential to the derivation of our second-order results.

We first establish a refinement of (5.22). Define K₀(ρ; γ) := −ρm(ρ; γ)Γ, such that (5.22) is rewritten as $K (ρ_{ν n}) \overset{a.s.}{\to} K_{0} (ρ_{ν}; γ)$ . Set g_ρ(x) = ρ(ρ − x)⁻¹, and write

tr B_{n} (ρ) = \sum_{i = 1}^{n} ρ {(ρ - μ_{i})}^{- 1} = \sum_{i = 1}^{n} g_{ρ} (μ_{i}) .

In addition, introducing

G_{n} (g) : = \sum_{i = 1}^{n} g (μ_{i}) - n \int g (x) F_{γ_{n}} (d x),

we have

K (ρ) - K_{0} (ρ; γ_{n}) = K (ρ) - n^{- 1} tr B_{n} (ρ) Γ + ρ n^{- 1} [\sum_{i = 1}^{n} {(ρ - μ_{i})}^{- 1} - n \int {(ρ - x)}^{- 1} F_{γ_{n}} (d x)] Γ = n^{- 1 / 2} W_{n} (ρ) + n^{- 1} G_{n} (g_{ρ}) Γ .

(5.27)

Lemma 3

Assume that Model M holds, and that $ℓ_{ν} > 1 + \sqrt{γ}$ is simple. For some b > ρ₁, let I denote the interval [b_γ + 3ϵ, b]. Then,

{G_{n} (g_{ρ}), ρ \in I} i s u n i f o r m l y t i g h t,

(5.28)

{n^{1 / 2} [K (ρ) - K_{0} (ρ; γ_{n})], ρ \in I} i s u n i f o r m l y t i g h t,

(5.29)

{\hat{ℓ}}_{ν} - ρ_{ν n} = O_{p} (n^{- 1 / 2}),

(5.30)

a_{ν} - p_{ν} = O_{p} (n^{- 1 / 2}) .

(5.31)

Proof. The proofs of (5.28)–(5.30) appear in the Supplementary Material, S2. We show (5.31) using the expansion $a_{ν} - p_{ν} = - R_{ν n} D_{ν} p_{ν} + r_{ν}$ , given in (3.12), from which we recall ‖r_ν‖ = O(‖D_ν‖²) and note that $‖ R_{ν n} ‖ \leq C$ and $D_{ν} = K ({\hat{ℓ}}_{ν}) - K_{0} (ρ_{ν n}; γ_{n})$ . We then have a_ν − p_ν = O_p(‖D_ν‖ + ‖D_ν‖²). Furthermore, from

‖ D_{ν} ‖ \leq ‖ K ({\hat{ℓ}}_{ν}) - K (ρ_{ν n}) ‖ + ‖ K (ρ_{ν n}) - K_{0} (ρ_{ν n}; γ_{n}) ‖,

the first term is O_p(n^−1/2) by (5.23) and (5.30), as is the second term by (5.29). Hence,

‖ D_{ν} ‖ = O_{p} (n^{- 1 / 2}),

(5.32)

and the proof is completed. ☐

5.2. Eigenvalue uctuations (Theorem 1-(ii))

The proof of Theorem 1-(ii) relies on the key expansion

\sqrt{n} ({\hat{ℓ}}_{ν} - ρ_{ν n}) [1 + c (ρ_{ν}) ℓ_{ν} + o_{p} (1)] = p_{ν}^{T} W_{n} (ρ_{ν n}) p_{ν} + o_{p} (1),

(5.33)

which is obtained by combining the vector equations $K ({\hat{ℓ}}_{ν}) a_{ν} = {\hat{ℓ}}_{ν} a_{ν}$ and K₀(ρ_νn;γ_n)p_ν = ρ_νnp_ν with expansions (5.24) for $K ({\hat{ℓ}}_{ν}) - K (ρ_{ν n})$ and (5.27) for K(ρ_νn) − K₀(ρ_νn;γ_n). Specifically, we first use $[K ({\hat{ℓ}}_{ν}) - {\hat{ℓ}}_{ν} I_{m}] a_{ν} = 0$ to obtain

p_{ν}^{T} [K ({\hat{ℓ}}_{ν}) - {\hat{ℓ}}_{ν} I_{m}] p_{ν} = {(a_{ν} - p_{ν})}^{T} [K ({\hat{ℓ}}_{ν}) - {\hat{ℓ}}_{ν} I_{m}] (a_{ν} - p_{ν}) = O_{p} (n^{- 1}),

(5.34)

because $‖ K ({\hat{ℓ}}_{ν}) - {\hat{ℓ}}_{ν} I_{m} ‖ = O_{p} (1)$ from (5.21)–(5.23) and (2.5), and a_ν − p_ν = O_p(n^−1/2) from Lemma 3. In addition, because [K₀(ρ_νn;γ_n) − ρ_νnI_m]p_ν = 0, it follows that

p_{ν}^{T} [K ({\hat{ℓ}}_{ν}) - {\hat{ℓ}}_{ν} I_{m}] p_{ν} = p_{ν}^{T} [K ({\hat{ℓ}}_{ν}) - K_{0} (ρ_{ν n}; γ_{n}) - ({\hat{ℓ}}_{ν} - ρ_{ν n}) I_{m}] p_{ν} = p_{ν}^{T} [K ({\hat{ℓ}}_{ν}) - K (ρ_{ν n}) - ({\hat{ℓ}}_{ν} - ρ_{ν n}) I_{m}] p_{ν} + p_{ν}^{T} [K (ρ_{ν n}) - K_{0} (ρ_{ν n}; γ_{n})] p_{ν} = - ({\hat{ℓ}}_{ν} - ρ_{ν n}) [1 + c (ρ_{ν}) ℓ_{ν} + o_{p} (1)] + n^{- 1 / 2} p_{ν}^{T} W_{n} (ρ_{ν n}) p_{ν} + o_{p} (n^{- 1 / 2}),

(5.35)

where the last equality follows from (5.23), (5.27), and (5.28). Combining (5.34) and (5.35) yields (5.33).

The asymptotic normality of $\sqrt{n} ({\hat{ℓ}}_{ν} - ρ_{ν n})$ now follows from Proposition 3, with asymptotic variance

{\tilde{σ}}_{ν}^{2} = {[1 + c (ρ_{ν}) ℓ_{ν}]}^{- 2} Var [p_{ν}^{T} W^{ν} p_{ν}] = {({\dot{ρ}}_{ν} ℓ_{ν} / ρ_{ν})}^{2} \sum_{i, j, i^{'}, j^{'}} P_{i j i^{'} j^{'}}^{ν} Cov [W_{i j}^{ν}, W_{i^{'} j^{'}}^{ν}],

where W^ν is the m × m symmetric Gaussian random matrix defined in Proposition 3, with covariance $Cov [W_{i j}^{ν}, W_{i^{'} j^{'}}^{ν}]$ given by (5.26). Using this in the developed expression for the variance above leads to

{\tilde{σ}}_{ν}^{2} = {\dot{ρ}}_{ν} \sum_{i, j, i^{'}, j^{'}} P_{i j i^{'} j^{'}}^{ν} (κ_{i j^{'}} κ_{j i^{'}} + κ_{i i^{'}} κ_{j j^{'}}) + {\dot{ρ}}_{ν}^{2} [P^{ν}, κ + \overset{ˇ}{κ}] .

(5.36)

By symmetry and the eigen equation ${(Γ p_{ν})}_{i} = \sum_{j} κ_{i j} p_{ν, j} = ℓ_{ν} p_{ν, i}$ , we have

\sum_{i, j, i^{'}, j^{'}} P_{i j i^{'} j^{'}}^{ν} κ_{i i^{'}} κ_{j j^{'}} = \sum_{i, j, i^{'}, j^{'}} P_{i j i^{'} j^{'}}^{ν} κ_{i j^{'}} κ_{j i^{'}} = \sum_{i, j} p_{ν, i} p_{ν, j} {(Γ p_{ν})}_{i} {(Γ p_{ν})}_{j} = ℓ_{ν}^{2} \sum_{i, j} {(p_{ν, i} p_{ν, j})}^{2} = ℓ_{ν}^{2} .

Therefore, the first sum in (5.36) reduces to $2 {\dot{ρ}}_{ν} ℓ_{ν}^{2}$ , yielding formula (2.6) of Theorem 1.

6. Proofs of the eigenvector results

We now derive the main eigenvector results, presented in Theorem 2 and Theorem 3-(ii).

6.1. Eigenvector inconsistency (Theorem 2-(i))

The convergence result of Theorem 2-(i) follows from two facts: $a_{ν} \overset{a . s .}{\to} p_{ν}$ and $Q_{ν} \overset{a . s .}{\to} c (ρ_{ν}) Γ$ , which are shown below. Once these facts are established, from (3.10),

{‖ {\hat{p}}_{ν} ‖}^{- 2} \overset{a.s.}{\to} p_{ν}^{T} (I_{m} + c (ρ_{ν}) Γ) p_{ν} = 1 + c (ρ_{ν}) ℓ_{ν} = \frac{ρ_{ν}}{ℓ_{ν} {\dot{ρ}}_{ν}},

which leads to

a.s. \lim {〈 {\hat{p}}_{ν}, p_{ν} 〉}^{2} = a.s. \lim {〈 {\hat{p}}_{ν}, p_{ν} 〉}^{2} = a.s. \lim {‖ {\hat{p}}_{ν} ‖}^{2} = \frac{ℓ_{ν} {\dot{ρ}}_{ν}}{ρ_{ν}} .

Proof of $a_{ν} \overset{a.s.}{\to} p_{ν}$

This is a direct consequence of (3.12) and

D_{ν} = K (ρ_{ν n}) - (ρ_{ν n} / ℓ_{ν}) Γ + K ({\hat{ℓ}}_{ν}) - K (ρ_{ν n}) \overset{a . s .}{\to} 0,

which follows from (5.22), (5.23), and the fact that ${\hat{ℓ}}_{ν} - ρ_{ν n} \overset{a.s.}{\to} 0$ , given in (2.5).

Proof of $Q_{ν} \overset{a . s .}{\to} c (ρ_{ν}) Γ$

With $\overset{ˇ}{Z} (t) = {(t I_{p} - R_{22})}^{- 1}$ , we have

Q_{ν} = R_{12} {\overset{ˇ}{Z}}^{2} (ρ_{ν}) R_{21} + R_{12} [{\overset{ˇ}{Z}}^{2} ({\hat{ℓ}}_{ν}) - {\overset{ˇ}{Z}}^{2} (ρ_{ν})] R_{21} ≜ Q_{ν 1} + Q_{ν 2} .

Rewrite $Q_{ν 1} = n^{- 1} {\bar{X}}_{1} {\overset{ˇ}{B}}_{n 1} {\bar{X}}_{1}^{T}$ , with ${\overset{ˇ}{B}}_{n 1} = n^{- 1} {\bar{X}}_{2}^{T} {\overset{ˇ}{Z}}^{2} (ρ_{ν}) {\bar{X}}_{2}$ . On the high-probability event J_nϵ1 = {μ₁ ≤ b_γ + ϵ}, with ϵ > 0 such that ρ_ν – b_γ ≥ 2ϵ, it is easily established that $‖ {\overset{ˇ}{B}}_{n 1} ‖$ is bounded and, consequently, that $‖ {\overset{ˇ}{B}}_{n 1} ‖ = O_{a . s .} (1)$ . Hence, Lemma 2 can be applied to Q_ν1. Moreover, from (5.19) and noting that

n^{- 1} tr {\overset{ˇ}{B}}_{n 1} = n^{- 1} tr B_{n 1} (ρ_{ν}, ρ_{ν}),

with B_n1 defined in (5.25), we have

n^{- 1} tr {\overset{ˇ}{B}}_{n 1} \overset{a . s .}{\to} \int x {(ρ_{ν} - x)}^{- 2} F_{γ} (d x) = c (ρ_{ν}) .

This and Lemma 2 imply that $Q_{ν 1} \overset{a.s.}{\to} c (ρ_{ν}) Γ$ .

It remains to show $Q_{ν 2} \overset{a . s .}{\to} 0$ . Using a variant of the resolvent identity, that is, A⁻² − B⁻² = −A⁻²(A² − B²)B⁻² for square invertible A and B, we rewrite

Q_{ν 2} = - 2 ({\hat{ℓ}}_{ν} - ρ_{ν}) n^{- 1} {\bar{X}}_{1} {\overset{ˇ}{B}}_{n 2} {\bar{X}}_{1}^{T},

with ${\overset{ˇ}{B}}_{n 2} = n^{- 1} {\bar{X}}_{2}^{T} {\overset{ˇ}{Z}}^{2} ({\hat{ℓ}}_{ν}) [\frac{1}{2} ({\hat{ℓ}}_{ν} + ρ_{ν}) I - R_{22}] {\overset{ˇ}{Z}}^{2} (ρ_{ν}) {\bar{X}}_{2}$ . Working on the high-probability event J_nϵ, it can be verified that $‖ {\overset{ˇ}{B}}_{n 2} ‖ = O_{a . s .} (1)$ . Thus, Lemma 2 together with (5.19) imply that $n^{- 1} {\bar{X}}_{1} {\overset{ˇ}{B}}_{n 2} {\bar{X}}_{1}^{T} = O_{a . s .} (1)$ . Because ${\hat{ℓ}}_{ν} \overset{a . s .}{\to} ρ_{ν}$ , we conclude that $Q_{ν 2} \overset{a.s.}{\to} 0$ .

6.2. Eigenvector fluctuations (Theorem 2-(ii))

Again, we use the key expansion (3.12). Because ‖r_ν‖ = O(‖D_ν‖²) = O_p(n⁻¹) from (5.32), we have

\sqrt{n} (a_{ν} - p_{ν}) = - R_{ν n} \sqrt{n} D_{ν} p_{ν} + o_{p} (1) .

Furthermore, using a similar decomposition to the derivation of (5.35),

\sqrt{n} D_{ν} = \sqrt{n} [K ({\hat{ℓ}}_{ν}) - K (ρ_{ν n})] + \sqrt{n} [K (ρ_{ν n}) - K_{0} (ρ_{ν n}, γ_{n})] = W_{n} (ρ_{ν n}) - \sqrt{n} ({\hat{ℓ}}_{ν} - ρ_{ν n}) c (ρ_{ν}) Γ + o_{p} (1),

where we use (5.23) and (5.27), along with (5.28) and (5.30) of Lemma 3. Hence, noting that $R_{ν n} Γ p_{ν} = ℓ_{ν} R_{ν n} p_{ν} = 0$ from the definition of $R_{ν n}$ in (3.12), we have

\sqrt{n} (a_{ν} - p_{ν}) = - R_{ν n} W_{n} (ρ_{ν n}) p_{ν} + o_{p} (1),

or equivalently,

\sqrt{n} (P^{T} a_{ν} - e_{ν}) = - {\tilde{R}}_{ν n} {\tilde{W}}_{n} (ρ_{ν n}) e_{ν} + o_{p} (1),

where

{\tilde{R}}_{ν n} = \frac{ℓ_{ν}}{ρ_{ν n}} \sum_{k \neq ν}^{m} {(ℓ_{k} - ℓ_{ν})}^{- 1} e_{k} e_{k}^{T}, {\tilde{W}}_{n} (ρ_{ν n}) = P^{T} W_{n} (ρ_{ν n}) P .

The CLT for P^T a_ν now follows from Proposition 3. In particular,

\sqrt{n} (P^{T} a_{ν} - e_{ν}) \overset{D}{\to} {\tilde{R}}_{ν} w_{ν} ~ N (0, Σ_{ν}),

where ${\tilde{R}}_{ν} = (ℓ_{ν} / ρ_{ν}) D_{ν}$ , recall (2.8), and w_ν = P^T W^νp_ν, with W^ν defined in Proposition 3. The covariance matrix $Σ_{ν} = {\tilde{R}}_{ν} E [w_{ν} w_{ν}^{T}] {\tilde{R}}_{ν} = D_{ν} {\tilde{Σ}}_{ν} D_{ν}$ , with ${\tilde{Σ}}_{ν} = {(ℓ_{ν} / ρ_{ν})}^{2} E [w_{ν} w_{ν}^{T}]$ . The kth component of w_ν is given by $w_{ν} (k) = p_{k}^{T} W^{ν} p_{ν} = \sum_{i, j} p_{k, i} W_{i j}^{ν} p_{ν, j}$ and, therefore,

{\tilde{Σ}}_{ν, k l} = \sum_{i, j, i^{'}, j^{'}} p_{k, i} p_{ν, j} p_{l, i^{'}} p_{ν, j^{'}} {(ℓ_{ν} / ρ_{ν})}^{2} Cov [W_{i j}^{ν}, W_{i^{'} j^{'}}^{ν}] .

(6.37)

Theorem 2-(ii) follows after substituting (5.26) for $Cov [W_{i j}^{ν}, W_{i^{'} j^{'}}^{ν}]$ and noting that, when k, l ≠ ν,

\sum_{i, j, i^{'}, j^{'}} p_{k, i} p_{ν, j} p_{l, i^{'}} p_{ν, j^{'}} (κ_{i i^{'}} κ_{j j^{'}} + κ_{i j^{'}} κ_{j i^{'}}) = p_{k}^{T} Γ p_{l} \cdot p_{ν}^{T} Γ p_{ν} + p_{k}^{T} Γ p_{ν} \cdot p_{ν}^{T} Γ p_{l} = δ_{k l} ℓ_{k} ℓ_{ν} .

6.3. Eigenvector inconsistency in the subcritical case (Theorem 3-(ii))

From (3.10) and (3.11), it suffices to show that $a_{ν}^{T} Q_{ν} a_{ν} \overset{a . s .}{\to} \infty$ in order for Theorem 3-(ii) to hold. We establish this by showing that $λ_{\min} (Q_{ν}) \overset{a.s.}{\to} \infty$ . The approach uses a regularized version of Q_ν,

Q_{ν ϵ} (t) = R_{12} {[{(t I_{p} - R_{22})}^{2} + ϵ^{2} I_{p}]}^{- 1} R_{21},

for ϵ > 0. Observe that $Q_{ν} ≻ Q_{ν ϵ} ({\hat{ℓ}}_{ν})$ , such that

\lim \inf λ_{\min} (Q_{ν}) \geq \lim \inf λ_{\min} (Q_{ν ϵ} ({\hat{ℓ}}_{ν})) = \lim \inf λ_{\min} (Q_{ν ϵ} (b_{γ}) + Δ_{ν ϵ}),

where $Δ_{ν ϵ} : = Q_{ν ϵ} ({\hat{ℓ}}_{ν}) - Q_{ν ϵ} (b_{γ})$ (Recall that ${\hat{ℓ}}_{ν} \overset{a.s.}{\to} b_{γ})$ . We show that $Δ_{ν ϵ} \overset{a . s .}{\to} 0$ , and

Q_{ν ϵ} (b_{γ}) \overset{a . s .}{\to} \int x {[{(b_{γ} - x)}^{2} + ϵ^{2}]}^{- 1} F_{γ} (d x) \cdot Γ = c_{γ} (ϵ) Γ,

(6.38)

say. Because λ_min(·) is a continuous function on m × m matrices, we conclude that

\lim \inf λ_{\min} (Q_{ν}) \geq c_{γ} (ϵ) λ_{\min} (Γ),

(6.39)

and because c_γ(ϵ) ≥ c(b_γ + ϵ) and c(b_γ + ϵ) ↗ ∞ as ϵ ↘ 0, by [JY, Appendix A], we obtain $λ_{\min} (Q_{ν}) \overset{a.s.}{\to} \infty$ . We write $Q_{ν ϵ} (t) = n^{- 1} {\bar{X}}_{1} {\overset{ˇ}{B}}_{n ϵ} (t) {\bar{X}}_{1}$ , with

{\overset{ˇ}{B}}_{n ϵ} (t) = n^{- 1} {\bar{X}}_{2}^{T} {[{(t I_{p} - n^{- 1} {\bar{X}}_{2} {\bar{X}}_{2}^{T})}^{2} + ϵ^{2} I_{p}]}^{- 1} {\bar{X}}_{2} = H diag {f_{ϵ} (μ_{i}, t)} H^{T},

if we write the singular-value decomposition of $n^{- 1 / 2} {\bar{X}}_{2} = V M^{1 / 2} H^{T}$ , with $M = diag {(μ_{i})}_{i = 1}^{p}$ and define $f_{ϵ} (μ, t) = μ {[{(t - μ)}^{2} + ϵ^{2}]}^{- 1}$ . Evidently, $‖ {\overset{ˇ}{B}}_{n ϵ} (t) ‖ \leq ϵ^{- 2} μ_{1}$ is bounded almost surely. Thus, Lemma 2 may be applied to Q_νϵ(b_γ), and because

n^{- 1} tr {\overset{ˇ}{B}}_{n ϵ} (b_{γ}) \overset{a . s .}{\to} \int f_{ϵ} (x, b_{γ}) F_{γ} (d x) = c_{γ} (ϵ)

from (5.19), our claim (6.38) follows.

Now consider Δ_νϵ. Fix $a \in ℝ^{m}$ such that ‖a‖₂ = 1, and set $b = n^{- 1 / 2} H^{T} {\bar{X}}_{1}^{T} a$ . We have

a^{T} Δ_{ν ϵ} a = \sum_{i = 1}^{p} b_{i}^{2} [f_{ϵ} (μ_{i}, {\hat{ℓ}}_{ν}) - f_{ϵ} (μ_{i}, b_{γ})] .

Because $| \partial f_{ϵ} (μ, t) / \partial t | = | 2 μ (t - μ) | / {[{(t - μ)}^{2} + ϵ^{2}]}^{2} \leq μ / ϵ^{3}$ , for μ, ϵ > 0, by the arithmeticmean–geometric-mean inequality, we have

| a^{T} Δ_{ν ϵ} a | \leq μ_{1} ϵ^{- 3} | {\hat{ℓ}}_{ν} - b_{γ} | \cdot ‖ b ‖_{2}^{2} = μ_{1} ϵ^{- 3} | {\hat{ℓ}}_{ν} - b_{γ} | a^{T} R_{11} a \leq μ_{1} ϵ^{- 3} | {\hat{ℓ}}_{ν} - b_{γ} | {\hat{ℓ}}_{1} \overset{a.s.}{\to} 0,

from Cauchy’s interlacing inequality for eigenvalues of symmetric matrices, Theorem 1-(i) and Theorem 3-(i). Therefore, $Δ_{ν ϵ} \overset{a . s .}{\to} 0$ , and the proof of (6.39) and, hence, of Theorem 3-(ii) is complete.

Supplementary Material

supp_sinica_final.pdf

NIHMS1602372-supplement-supp_sinica_final_pdf.pdf^{(249KB, pdf)}

Acknowledgments

This work was supported, in part, by NIH R01 EB001988 (IMJ, JY), the Hong Kong RGC General Research Fund 16202918 (MRM, DMJ), and a Samsung Scholarship (JY).

Footnotes

Supplementary Material

The online Supplementary Material provides proofs for the following: (i) the Gaussian particularizations of our main results (Corollaries 1 and 2); (ii) the instrumental tightness properties in Lemma 3; and (iii) the asymptotic properties of normalized bilinear forms in Lemma 1 and Proposition 1; see Sections S1, S2, and S3, respectively.

References

Bai Z and Yao J-F (2008). Central limit theorems for eigenvalues in a spiked population model. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques 44(3), 447–474. [Google Scholar]
Bai ZD and Silverstein J (2009). Spectral Analysis of Large Dimensional Random Matrices (2nd ed.). New York: Springer. [Google Scholar]
Baik J, Ben Arous G, and Péché S (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Annals of Probability 33(5), 1643–1697. [Google Scholar]
Baik J and Silverstein JW (2006). Eigenvalues of large sample covariance matrices of spiked population models. Journal of Multivariate Analysis 97(6), 1382–1408. [Google Scholar]
Bao Z, Pan G, and Zhou W (2012). Tracy-Widom law for the extreme eigenvalues of sample correlation matrices. Electronic Journal of Probability 17, 1–32. [Google Scholar]
Benaych-Georges F and Nadakuditi RR (2011). The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics 227(1), 494–521. [Google Scholar]
Bianchi P, Najim J, Maida M, and Debbah M (2009). Performance analysis of some eigen-based hypothesis tests for collaborative sensing. In 2009 IEEE/SP 15th Workshop on Statistical Signal Processing, pp. 5–8. [Google Scholar]
Bloemendal A, Knowles A, Yau H-T, and Yin J (2016). On the principal components of sample covariance matrices. Probability Theory and Related Fields 164(1), 459–552. [Google Scholar]
Boik RJ (2003). Principal component models for correlation matrices. Biometrika 90(3), 679–701. [Google Scholar]
Cai TT and Jiang T (2011). Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. Annals of Statistics 39(3), 1496–1525. [Google Scholar]
Cai TT and Jiang T (2012). Phase transition in limiting distributions of coherence of high-dimensional random matrices. Journal of Multivariate Analysis 107, 24–39. [Google Scholar]
Cocco S, Monasson R, and Sessak V (2011). High-dimensional inference with the generalized Hopfield model: Principal component analysis and corrections. Physical Review E 83(5), 051123. [DOI] [PubMed] [Google Scholar]
Cocco S, Monasson R, and Weigt M (2013). From principal component to direct coupling analysis of co-evolution in proteins: Low-eigenvalue modes are needed for structure prediction. PLoS Computational Biology 9(8), 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cochran D, Gish H, and Sinno D (1995). A geometric approach to multiple-channel signal detection. IEEE Transactions on Signal Processing 43(9), 2049–2057. [Google Scholar]
Couillet R and Debbah M (2011). Random Matrix Methods for Wireless Communications. Cambridge University Press. [Google Scholar]
Couillet R and Hachem W (2013). Fluctuations of spiked random matrix models and failure diagnosis in sensor networks. IEEE Transactions on Information Theory 59(1), 509–525. [Google Scholar]
Dahirel V, Shekhar K, Pereyra F, Miura T, Artyomov M, Talsania S, Allen TM, Altfeld M, Carrington MN, Irvine DJ, Walker BD, and Chakraborty AK (2011). Coordinate linkage of HIV evolution reveals regions of immunological vulnerability. Proceedings of the National Academy of Sciences 108(28), 11530–11535. [DOI] [PMC free article] [PubMed] [Google Scholar]
El Karoui N (2009). Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond. Annals of Applied Probability 19(6), 2362–2405. [Google Scholar]
Fang C and Krishnaiah P (1982). Asymptotic distributions of functions of the eigenvalues of some random matrices for nonnormal populations. Journal of Multivariate Analysis 12(1), 39–63. [Google Scholar]
Gao J, Han X, Pan G, and Yang Y (2017). High dimensional correlation matrices: The central limit theorem and its applications. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79(3), 677–693. [Google Scholar]
Girshick MA (1939). On the sampling theory of roots of determinantal equations. Annals of Mathematical Statistics 10(3), 203–224. [Google Scholar]
Hachem W, Loubaton P, Mestre X, Najim J, and Vallet P (2013). A subspace estimator for fixed rank perturbations of large random matrices. Journal of Multivariate Analysis 114, 427–447. [Google Scholar]
Hero A and Rajaratnam B (2011). Large-scale correlation screening. Journal of the American Statistical Association 106(496), 1540–1552. [Google Scholar]
Hero A and Rajaratnam B (2012). Hub discovery in partial correlation graphs. IEEE Transactions on Information Theory 58(9), 6064–6078. [Google Scholar]
Jiang T (2004a). The asymptotic distributions of the largest entries of sample correlation matrices. Annals of Applied Probability 14(2), 865–880. [Google Scholar]
Jiang T (2004b). The limiting distributions of eigenvalues of sample correlation matrices. Sankhyā: The Indian Journal of Statistics (2003–2007) 66(1), 35–48. [Google Scholar]
Johnstone IM (2001). On the distribution of the largest eigenvalue in principal components analysis. Annals of Statistics 29(2), 295–327. [Google Scholar]
Johnstone IM and Yang J (2018). Notes on asymptotics of sample eigenstructure for spiked models with non-Gaussian data. arXiv:1810.10427. [Google Scholar]
Kollo T and Neudecker H (1993). Asymptotics of eigenvalues and unit-length eigenvectors of sample variance and correlation matrices. Journal of Multivariate Analysis 47(2), 283–300. [Google Scholar]
Konishi S (1979). Asymptotic expansions for the distributions of statistics based on the sample correlation matrix in principal component analysis. Hiroshima Mathematical Journal 9(3), 647–700. [Google Scholar]
Leshem A and van der Veen A-J (2001). Multichannel detection of Gaussian signals with uncalibrated receivers. IEEE Signal Processing Letters 8(4), 120–122. [Google Scholar]
Liu H, Hu Z, Mian A, Tian H, and Zhu X (2014). A new user similarity model to improve the accuracy of collaborative filtering. Knowledge-Based Systems 56, 156–166. [Google Scholar]
Mestre X and Vallet P (2017). Correlation tests and linear spectral statistics of the sample correlation matrix. IEEE Transactions on Information Theory 63(7), 4585–4618. [Google Scholar]
Paul D (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica 17, 1617–1642. [Google Scholar]
Pillai NS and Yin J (2012). Edge universality of correlation matrices. Annals of Statistics 40(3), 1737–1763. [Google Scholar]
Plerou V, Gopikrishnan P, Rosenow B, Amaral L, Guhr T, and Stanley H (2002). A random matrix approach to cross-correlations in financial data. Physical Review E 65, 066126. [DOI] [PubMed] [Google Scholar]
Quadeer AA, Louie RHY, Shekhar K, Chakraborty AK, Hsing I-M, and McKay MR (2014). Statistical linkage analysis of substitutions in patient-derived sequences of genotype 1a Hepatitis C virus nonstructural protein 3 exposes targets for immunogen design. Journal of Virology 88(13), 7628–7644. [DOI] [PMC free article] [PubMed] [Google Scholar]
Quadeer AA, Morales-Jimenez D, and McKay MR (2018). Co-evolution networks of HIV/HCV are modular with direct association to structure and function. PLoS Computational Biology 14(9), 1–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ruan D, Meng T, and Gao K (2016). A hybrid recommendation technique optimized by dimension reduction. In 2016 8th International Conference on Modelling, Identification and Control (ICMIC), pp. 429–433. [Google Scholar]
Schott JR (1991). A test for a specific principal component of a correlation matrix. Journal of the American Statistical Association 86(415), 747–751. [Google Scholar]
Vallet P, Mestre X, and Loubaton P (2015). Performance analysis of an improved MUSIC DoA estimator. IEEE Transactions on Signal Processing 63(23), 6407–6422. [Google Scholar]
Xiao H and Zhou W (2010). Almost sure limit of the smallest eigenvalue of some sample correlation matrices. Journal of Theoretical Probability 23(1), 1–20. [Google Scholar]
Yang L, McKay MR, and Couillet R (2018). High-dimensional MVDR beamforming: Optimized solutions based on spiked random matrix models. IEEE Transactions on Signal Processing 66(7), 1933–1947. [Google Scholar]
Yao J, Zheng S, and Bai Z (2015). Large Sample Covariance Matrices and High-Dimensional Data Analysis. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supp_sinica_final.pdf

NIHMS1602372-supplement-supp_sinica_final_pdf.pdf^{(249KB, pdf)}

[R1] Bai Z and Yao J-F (2008). Central limit theorems for eigenvalues in a spiked population model. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques 44(3), 447–474. [Google Scholar]

[R2] Bai ZD and Silverstein J (2009). Spectral Analysis of Large Dimensional Random Matrices (2nd ed.). New York: Springer. [Google Scholar]

[R3] Baik J, Ben Arous G, and Péché S (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Annals of Probability 33(5), 1643–1697. [Google Scholar]

[R4] Baik J and Silverstein JW (2006). Eigenvalues of large sample covariance matrices of spiked population models. Journal of Multivariate Analysis 97(6), 1382–1408. [Google Scholar]

[R5] Bao Z, Pan G, and Zhou W (2012). Tracy-Widom law for the extreme eigenvalues of sample correlation matrices. Electronic Journal of Probability 17, 1–32. [Google Scholar]

[R6] Benaych-Georges F and Nadakuditi RR (2011). The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics 227(1), 494–521. [Google Scholar]

[R7] Bianchi P, Najim J, Maida M, and Debbah M (2009). Performance analysis of some eigen-based hypothesis tests for collaborative sensing. In 2009 IEEE/SP 15th Workshop on Statistical Signal Processing, pp. 5–8. [Google Scholar]

[R8] Bloemendal A, Knowles A, Yau H-T, and Yin J (2016). On the principal components of sample covariance matrices. Probability Theory and Related Fields 164(1), 459–552. [Google Scholar]

[R9] Boik RJ (2003). Principal component models for correlation matrices. Biometrika 90(3), 679–701. [Google Scholar]

[R10] Cai TT and Jiang T (2011). Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. Annals of Statistics 39(3), 1496–1525. [Google Scholar]

[R11] Cai TT and Jiang T (2012). Phase transition in limiting distributions of coherence of high-dimensional random matrices. Journal of Multivariate Analysis 107, 24–39. [Google Scholar]

[R12] Cocco S, Monasson R, and Sessak V (2011). High-dimensional inference with the generalized Hopfield model: Principal component analysis and corrections. Physical Review E 83(5), 051123. [DOI] [PubMed] [Google Scholar]

[R13] Cocco S, Monasson R, and Weigt M (2013). From principal component to direct coupling analysis of co-evolution in proteins: Low-eigenvalue modes are needed for structure prediction. PLoS Computational Biology 9(8), 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Cochran D, Gish H, and Sinno D (1995). A geometric approach to multiple-channel signal detection. IEEE Transactions on Signal Processing 43(9), 2049–2057. [Google Scholar]

[R15] Couillet R and Debbah M (2011). Random Matrix Methods for Wireless Communications. Cambridge University Press. [Google Scholar]

[R16] Couillet R and Hachem W (2013). Fluctuations of spiked random matrix models and failure diagnosis in sensor networks. IEEE Transactions on Information Theory 59(1), 509–525. [Google Scholar]

[R17] Dahirel V, Shekhar K, Pereyra F, Miura T, Artyomov M, Talsania S, Allen TM, Altfeld M, Carrington MN, Irvine DJ, Walker BD, and Chakraborty AK (2011). Coordinate linkage of HIV evolution reveals regions of immunological vulnerability. Proceedings of the National Academy of Sciences 108(28), 11530–11535. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] El Karoui N (2009). Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond. Annals of Applied Probability 19(6), 2362–2405. [Google Scholar]

[R19] Fang C and Krishnaiah P (1982). Asymptotic distributions of functions of the eigenvalues of some random matrices for nonnormal populations. Journal of Multivariate Analysis 12(1), 39–63. [Google Scholar]

[R20] Gao J, Han X, Pan G, and Yang Y (2017). High dimensional correlation matrices: The central limit theorem and its applications. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79(3), 677–693. [Google Scholar]

[R21] Girshick MA (1939). On the sampling theory of roots of determinantal equations. Annals of Mathematical Statistics 10(3), 203–224. [Google Scholar]

[R22] Hachem W, Loubaton P, Mestre X, Najim J, and Vallet P (2013). A subspace estimator for fixed rank perturbations of large random matrices. Journal of Multivariate Analysis 114, 427–447. [Google Scholar]

[R23] Hero A and Rajaratnam B (2011). Large-scale correlation screening. Journal of the American Statistical Association 106(496), 1540–1552. [Google Scholar]

[R24] Hero A and Rajaratnam B (2012). Hub discovery in partial correlation graphs. IEEE Transactions on Information Theory 58(9), 6064–6078. [Google Scholar]

[R25] Jiang T (2004a). The asymptotic distributions of the largest entries of sample correlation matrices. Annals of Applied Probability 14(2), 865–880. [Google Scholar]

[R26] Jiang T (2004b). The limiting distributions of eigenvalues of sample correlation matrices. Sankhyā: The Indian Journal of Statistics (2003–2007) 66(1), 35–48. [Google Scholar]

[R27] Johnstone IM (2001). On the distribution of the largest eigenvalue in principal components analysis. Annals of Statistics 29(2), 295–327. [Google Scholar]

[R28] Johnstone IM and Yang J (2018). Notes on asymptotics of sample eigenstructure for spiked models with non-Gaussian data. arXiv:1810.10427. [Google Scholar]

[R29] Kollo T and Neudecker H (1993). Asymptotics of eigenvalues and unit-length eigenvectors of sample variance and correlation matrices. Journal of Multivariate Analysis 47(2), 283–300. [Google Scholar]

[R30] Konishi S (1979). Asymptotic expansions for the distributions of statistics based on the sample correlation matrix in principal component analysis. Hiroshima Mathematical Journal 9(3), 647–700. [Google Scholar]

[R31] Leshem A and van der Veen A-J (2001). Multichannel detection of Gaussian signals with uncalibrated receivers. IEEE Signal Processing Letters 8(4), 120–122. [Google Scholar]

[R32] Liu H, Hu Z, Mian A, Tian H, and Zhu X (2014). A new user similarity model to improve the accuracy of collaborative filtering. Knowledge-Based Systems 56, 156–166. [Google Scholar]

[R33] Mestre X and Vallet P (2017). Correlation tests and linear spectral statistics of the sample correlation matrix. IEEE Transactions on Information Theory 63(7), 4585–4618. [Google Scholar]

[R34] Paul D (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica 17, 1617–1642. [Google Scholar]

[R35] Pillai NS and Yin J (2012). Edge universality of correlation matrices. Annals of Statistics 40(3), 1737–1763. [Google Scholar]

[R36] Plerou V, Gopikrishnan P, Rosenow B, Amaral L, Guhr T, and Stanley H (2002). A random matrix approach to cross-correlations in financial data. Physical Review E 65, 066126. [DOI] [PubMed] [Google Scholar]

[R37] Quadeer AA, Louie RHY, Shekhar K, Chakraborty AK, Hsing I-M, and McKay MR (2014). Statistical linkage analysis of substitutions in patient-derived sequences of genotype 1a Hepatitis C virus nonstructural protein 3 exposes targets for immunogen design. Journal of Virology 88(13), 7628–7644. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Quadeer AA, Morales-Jimenez D, and McKay MR (2018). Co-evolution networks of HIV/HCV are modular with direct association to structure and function. PLoS Computational Biology 14(9), 1–29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] Ruan D, Meng T, and Gao K (2016). A hybrid recommendation technique optimized by dimension reduction. In 2016 8th International Conference on Modelling, Identification and Control (ICMIC), pp. 429–433. [Google Scholar]

[R40] Schott JR (1991). A test for a specific principal component of a correlation matrix. Journal of the American Statistical Association 86(415), 747–751. [Google Scholar]

[R41] Vallet P, Mestre X, and Loubaton P (2015). Performance analysis of an improved MUSIC DoA estimator. IEEE Transactions on Signal Processing 63(23), 6407–6422. [Google Scholar]

[R42] Xiao H and Zhou W (2010). Almost sure limit of the smallest eigenvalue of some sample correlation matrices. Journal of Theoretical Probability 23(1), 1–20. [Google Scholar]

[R43] Yang L, McKay MR, and Couillet R (2018). High-dimensional MVDR beamforming: Optimized solutions based on spiked random matrix models. IEEE Transactions on Signal Processing 66(7), 1933–1947. [Google Scholar]

[R44] Yao J, Zheng S, and Bai Z (2015). Large Sample Covariance Matrices and High-Dimensional Data Analysis. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. [Google Scholar]

PERMALINK

Asymptotics of eigenstructure of sample correlation matrices for high-dimensional spiked models

David Morales-Jimenez

Iain M Johnstone

Matthew R McKay

Jeha Yang

Abstract

1. Introduction

Figure 1:

Technical contributions

Model M

Notation

Tensor notation

2. Main results

Theorem 1

Corollary 1

Theorem 2

Corollary 2

Theorem 3

Figure 2:

3. Preliminaries

4. Quadratic forms with normalized entries

4.1. First-order convergence

Lemma 1

Corollary 3

Lemma 2

4.2. Central Limit Theorem

Proposition 1

Proposition 2

5. Proofs of the eigenvalue results

5.1. Preliminaries

Convergence properties of the eigenvalues of R22

Almost sure limit of ℓ^ν

High-probability events Jnϵ, Jnϵ1

Asymptotic expansion of K(ℓ^ν)

CLT of K(ρνn)

Proposition 3

Tightness properties

Lemma 3

5.2. Eigenvalue uctuations (Theorem 1-(ii))

6. Proofs of the eigenvector results

6.1. Eigenvector inconsistency (Theorem 2-(i))

Proof of aν→a.s.pν

Proof of Qν→a.s.c(ρν)Γ

6.2. Eigenvector fluctuations (Theorem 2-(ii))

6.3. Eigenvector inconsistency in the subcritical case (Theorem 3-(ii))

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Convergence properties of the eigenvalues of R₂₂

Almost sure limit of ${\hat{ℓ}}_{ν}$

High-probability events J_nϵ, J_nϵ1

Asymptotic expansion of $K ({\hat{ℓ}}_{ν})$

CLT of K(ρ_νn)

Proof of $a_{ν} \overset{a.s.}{\to} p_{ν}$

Proof of $Q_{ν} \overset{a . s .}{\to} c (ρ_{ν}) Γ$