CONVERGENCE AND PREDICTION OF PRINCIPAL COMPONENT SCORES IN HIGH-DIMENSIONAL SETTINGS

Seunggeun Lee; Fei Zou; Fred A Wright

doi:10.1214/10-AOS821

. Author manuscript; available in PMC: 2011 Mar 23.

Published in final edited form as: Ann Stat. 2010 Jan 1;38(6):3605–3629. doi: 10.1214/10-AOS821

CONVERGENCE AND PREDICTION OF PRINCIPAL COMPONENT SCORES IN HIGH-DIMENSIONAL SETTINGS

Seunggeun Lee ¹, Fei Zou ¹, Fred A Wright ¹

PMCID: PMC3062912 NIHMSID: NIHMS274568 PMID: 21442047

Abstract

A number of settings arise in which it is of interest to predict Principal Component (PC) scores for new observations using data from an initial sample. In this paper, we demonstrate that naive approaches to PC score prediction can be substantially biased towards 0 in the analysis of large matrices. This phenomenon is largely related to known inconsistency results for sample eigenvalues and eigenvectors as both dimensions of the matrix increase. For the spiked eigenvalue model for random matrices, we expand the generality of these results, and propose bias-adjusted PC score prediction. In addition, we compute the asymptotic correlation coefficient between PC scores from sample and population eigenvectors. Simulation and real data examples from the genetics literature show the improved bias and numerical properties of our estimators.

Keywords and phrases: PCA, PC Scores, Random Matrix, PC Regression

1. Introduction

Principal component analysis (PCA) [19] is one of the leading statistical tools for analyzing multivariate data. It is especially popular in genetics/genomics, medical imaging, and chemometrics studies where high-dimensional data is common. PCA is typically used as a dimension reduction tool. A small number of top ranked principal component (PC) scores are computed by projecting data onto spaces spanned by the eigenvectors of sample covariance matrix, and are used to summarize data characteristics that contribute most to data variation. These PC scores can be subsequently used for data exploration and/or model predictions. For example, in genome-wide association studies (GWAS), PC scores are used to estimate ancestries of study subjects and as covariates to adjust for population stratification [24, 27]. In gene expression microarray studies, PC scores are used as synthetic “eigen-genes” or “meta-genes” intended to represent and discover gene expression patterns that might not be discernible from single-gene analysis [30].

Although PCA is widely applied in a number of settings, much of our theoretical understanding rests on a relatively small body of literature. Girshick [12] introduced the idea that the eigenvectors of sample covariance matrix are maximum likelihood estimators. Here a key concept in a population view of PCA is that the data arise as p-variate values from a distinct set of n independent samples. Later, the asymptotic distribution of eigenvalues and eigenvectors of the sample covariance matrix (i.e., the sample eigenvalues and eigenvectors) were derived for the situation where n goes to infinity and p is fixed [2, 13]. With the development of modern high-throughput technologies, it is not uncommon to have data where p is comparable in size to n, or substantially larger. Under the assumption that p and n grow at the same rate, that is p/n → γ > 0, there has been considerable effort to establish convergence results for sample eigenvalues and eigenvectors (see review [5]). The convergence of the sample eigenvalues and eigenvectors under the “spiked population” model proposed by Johnstone [18] has also been established[7, 23, 26]. For this model it is well known that the sample eigenvectors are not consistent estimators of the eigenvectors of population covariance (i.e., the population eigenvectors) [17, 23, 26]. Furthermore, Paul [26] has derived the degree of discrepancy in terms of the angle between the sample and population eigenvectors, under Gaussian assumptions for 0 < γ < 1. More recently, Nadler [23] has extended the same result to the more general γ > 0 using a matrix perturbation approach.

These results have considerable potential practical utility in understanding the behavior of PC analysis and prediction in modern datasets, for which p may be large. The practical goals of this paper focus primarily on the prediction of PC scores for samples which were not included in the original PC analysis. For example, gene expression data of new breast cancer patients may be collected, and we might want to estimate their PC scores in order to classify their cancer sub-type. The recalculation of PCs using both new and old data might not be practical, e.g. if the application of PCs from gene expression is used as a diagnostic tool in clinical applications. For GWAS analysis, it is known that PC analysis which includes related individuals tends to generate spurious PC scores which do not reflect the true underlying population substructures. To overcome this problem, it is common practice to include only one individual per family/sibship in the initial PC analysis. Another example arises in cross-validation for PC regression, in which PC scores for the test set might be derived using PCA performed on the training set [16]. For all of these applications, the predicted PC scores for a new sample are usually estimated in the “naive” fashion, in which the data vector of the new sample is multiplied by the sample eigenvectors from the original PC analysis. Indeed, there appears to be relatively little recognition in the genetics or data mining literature that this approach may lead to misleading conclusions.

For low dimensional data, where p is fixed as n increases or otherwise much smaller than n, the predicted PC scores are nearly unbiased and well-behaved. However, for high-dimensional data, particularly with p > n, they tend to be biased and shrunken towards 0. The following simple example of a stratified population with three strata illustrates the shrinkage phenomenon for predicted PC scores. We generated a training data set with n = 100 and p = 5000. Among the 100 samples, 50 are from stratum 1, 30 are from stratum 2 and the rest from stratum 3. For each stratum, we first created a p-dimensional mean vector μ_k (k = 1, 2, 3). Each element of each mean vector was created by drawing randomly with replacement from {−0.3, 0, 0.3}, and thereafter considered a fixed property of the stratum. Then for each sample from the kth stratum, its p covariates were simulated from the multivariate normal distribution MVN(μ_k, 4I), where I is the p×p identity matrix. A test dataset with the same sample size and μ_k vectors was also simulated. Figure 1 shows that the predicted PC scores for the test data are much closer to 0 compared to the scores from the training data. This shrinkage phenomenon may create a serious problem if the predicted PC scores are used to classify new test samples, perhaps by similarity to previous apparent clusters in the original data. In addition, the predicted PC scores may produce incorrect results if used for downstream analyses (e.g., as covariates in association analyses).

Fig 1 — Simulation results for p=5000 and n=(50,30,20). Different symbols represent different groups. White background color represents the training set and grey background color represents the test set. A) First 2 PC score plot of all simulated samples. B) Center of each group.

In this paper, we investigate the degree of shrinkage bias associated with the predicted PC scores, and then propose new bias-adjusted PC score estimates. As the shrinkage phenomenon is largely related to the limiting behavior of the sample eigenvectors, our first step is to describe the discrepancy between the sample and population eigenvectors. To achieve this purpose, we follow the assumption that p and n both are large and grow at the same rate. By applying and extending results from random matrix theory, we establish the convergence of the sample eigenvalues and eigenvectors under the spiked population model. We generalize Theorem 4 of Paul [26], which describes the asymptotic angle between sample and population eigenvectors, to non-Gaussian random variables for any γ > 0. We further derive the asymptotic angle between PC scores from sample eigenvectors and population eigenvectors, and the asymptotic shrinkage factor of the PC score predictions. Finally we construct estimators of the angles and the shrinkage factor. The theoretical results are presented in Section 2.

In section 3, we report simulations to assess the finite sample accuracy of the proposed asymptotic angle and shrinkage factor estimators. We also show the potential improvements in prediction accuracy for PC regression by using the bias adjusted PC scores. In section 4, we apply our PC analysis to a real genome-wide association study, which demonstrates that the shrinkage phenomenon occurs in real studies and that adjustment is needed.

2. Method

2.1. General Setting

Throughout this paper, we use ^T to denote matrix transpose, $\overset{p}{\to}$ to denote convergence in probability, and $\overset{a . s}{\to}$ to denote almost sure convergence. Let Λ = diag(λ₁, λ₂, …, λ_p), a p × p matrix with λ₁ ≥ λ₂ ≥ ··· ≥ λ_p, and E = [e₁, …, e_p], a p × p orthogonal matrix.

Define the p × n data matrix, X as [x₁, …, x_n], where x_j is the p-dimensional vector corresponding to the j^th sample. For the remainder of the paper, we assume the following:

Assumption 1

X = EΛ^1/2Z, where Z = {z_ij} is a p × n matrix whose elements z_ijs are i.i.d random variables with E(z_ij) = 0, $E (z_{i j}^{2}) = 1$ and $E (z_{i j}^{4}) < \infty$ .

Although the z_ijs are i.i.d, Assumption 1 allows for very flexible covariance structures for X, and thus the results of this paper are quite general. The population covariance matrix of X is Σ = EΛE^T. The sample covariance matrix S equals

S = {XX}^{T} / n = {E Λ}^{1 / 2} {ZZ}^{T} Λ^{1 / 2} E^{T} / n .

The λ_ks are the underlying population eigenvalues. The spiked population model defined in [18] assumes that all the population eigenvalues are 1, except the first m eigenvalues. That is, λ₁ ≥ λ₂ ··· ≥ λ_m > λ_m₊₁ = ··· = λ_p = 1. The spectral decomposition of the sample covariance matrix is

S = {UDU}^{T},

where D = diag(d₁, d₂, …, d_p) is a diagonal matrix of the ordered sample eigenvalues and U = [u₁, …, u_p] is the corresponding p × p sample eigen-vector matrix. Then the PC score matrix is P = [p₁, p₂, …, p_n], where $p_{v}^{T} = u_{v}^{T} X$ is the vth sample PC score. For a new observation x_new, its predicted PC score is similarly defined as U^T x_new with the vth (PC) score equal to $q_{v} = u_{v}^{T} x_{new}$ .

2.2. Sample Eigenvalues and Eigenvectors

Under the classical setting of fixed p, it is well known that the sample eigenvalues and eigenvectors are consistent estimators of the corresponding population eigenvalues and eigenvectors [3]. Under the “large p, large n” framework, however, the consistency is not guaranteed. The following two lemmas summarize and extend some known convergence results.

Lemma 1

Let p/n → γ ≥ 0 as n → ∞.

When γ = 0,
$d_{v} \overset{a . s}{\to} {\begin{array}{l} λ_{v}, & for v \leq m \\ 1, & for v > m; \end{array}$ (1)
When γ > 0,
$d_{v} \overset{a . s}{\to} {\begin{array}{l} ρ (λ_{v}), & for v \leq k \\ {(1 + \sqrt{γ})}^{2}, & for v = k + 1, \end{array}$ (2)

where k is the number of λ_v greater than $1 / \sqrt{γ}$ , and ρ(x) = x(1+γ/(x − 1)).

The result in ii) is due to Baik and Silverstein [7], while the proof of i) can be found in section (6.3). The result in i) shows that when γ = 0, the sample eigenvalues converge to the corresponding population eigenvalues, which is consistent with the classical PC result where p is fixed. The result in ii) shows that for any non-zero γ, d_v is no longer a consistent estimator of λ_v. However, a consistent estimator of λ_v can be constructed from Equation (2). Define

ρ^{- 1} (d) = \frac{d + 1 - γ + \sqrt{{(d + 1 - γ)}^{2} - 4 d}}{2} .

Then ρ⁻¹(d_v) is a consistent estimator of λ_v when $λ_{v} > 1 + \sqrt{γ}$ . Furthermore, Baik et al. [6] have shown the $\sqrt{n}$ -consistency of d_v to ρ(λ_v), and Bai and Yao [4] have shown that d_v is asymptotically normal.

Lemma 2

Suppose p/n → γ ≥ 0 as n → ∞. Let < ., . > be an inner product between two vectors. Under the assumption of multiplicity one,

if 0 < γ < 1, and the z_ijs follow the standard normal distribution, then
$∣ < e_{v}, u_{v} > ∣ \overset{a . s}{\to} {\begin{array}{l} φ (λ_{v}), & i f λ_{v} > 1 + \sqrt{γ} \\ 0, & i f 1 < λ_{v} \leq 1 + \sqrt{γ}; \end{array}$ (3)
removing the normal assumption on the z_ijs, the following weaker convergence result holds for all γ ≥ 0
$∣ < e_{v}, u_{v} > ∣ \overset{p}{\to} {\begin{array}{l} φ (λ_{v}), & i f λ_{v} > 1 + \sqrt{γ} \\ 0, & i f 1 < λ_{v} \leq 1 + \sqrt{γ} . \end{array}$ (4)

Here $φ (x) = \sqrt{(1 - \frac{γ}{{(x - 1)}^{2}}) / (1 + \frac{γ}{x - 1})}$ .

The inner product between unit vectors is the cosine angle between these two. Thus, Lemma 2 shows the convergence of the angle between population and sample eigenvectors. For i), Paul [26] proved it for γ < 1; while Nadler [23] obtained the same conclusion for γ > 0 using the matrix perturbation approach under the Gaussian random noise model. We relax the Gaussian assumption on z and prove ii) for γ ≥ 0 in section 6.4. The result of ii) is general enough for the application of PCA to, for example, genome-wide association mapping, where each entry of X is a standardized variable of SNP genotypes, which are typically coded as {0, 1, 2}, corresponding to discrete genotypes.

2.3. Sample and Predicted PC Scores

In this section, we first discuss convergence of the sample PC scores, which forms the basis for the investigation of the shrinkage phenomenon of the predicted PC scores. For the sample PC scores, we have

Theorem 1

Let $g_{v}^{T} = e_{v}^{T} X / \sqrt{n λ_{v}}$ , the normalized v^th PC score derived from a corresponding population eigenvector, e_v, and ${\tilde{p}}_{v} = p_{v} / \sqrt{d_{v}}$ , the normalized v^th sample PC score. Suppose p/n → γ ≥ 0 as n → ∞. Under the multiplicity one assumption,

∣ < g_{v}, {\tilde{p}}_{v} > ∣ \overset{p}{\to} {\begin{array}{l} \sqrt{1 - \frac{γ}{{(λ_{v} - 1)}^{2}}}, & i f λ_{v} > 1 + \sqrt{γ} \\ 0, & i f 1 < λ_{v} \leq 1 + \sqrt{γ} . \end{array}

(5)

The proof can be found in section 6.7. In PC analysis, the sample PC scores are typically used to estimate certain latent variables (largely the PC scores from population eigenvectors) that represent the underlying data structures. The above result allows us to quantify the accuracy of the sample PC scores. Note that here < g_v, p̃_v > is the correlation coefficient between g_v and p̃_v. Compared to Equation (3) in Lemma 2, the angle between the PC scores is smaller than the angle between their corresponding eigenvectors.

Before we formally derive the asymptotic shrinkage factor for the predicted PC scores, we first describe in mathematical terms the shrinkage phenomenon that was demonstrated in the Introduction. Note that the first population eigenvector e₁ satisfies

e_{1} = \underset{a : a^{T} a = 1}{argmax} E ({(a^{T} x)}^{2})

for a random vector x that follows the same distribution of the x_js. For the data matrix X, its first sample eigenvector u₁ satisfies

u_{1} = \underset{a : a^{T} a = 1}{argmax} \sum_{j = 1}^{n} {(a^{T} x_{j})}^{2} .

Assuming that u₁ and the new sample x_new are independent of each other, we have

\begin{array}{l} E ({(u_{1}^{T} x_{new})}^{2}) = E (E (u_{1}^{T} x_{new} x_{new}^{T} u_{1}^{T} ∣ u_{1})) = E (u_{1}^{T} E (x_{new} x_{new}^{T}) u_{1}^{T}) \\ = E (u_{t}^{T} \sum u_{1}^{T}) \leq e_{1}^{T} \sum e_{1} = E ({(e_{1}^{T} x_{new})}^{2}) . \end{array}

(6)

Since the $u_{1}^{T} x_{j} s (j = 1, \dots, n)$ follow the same distribution,

n E ({(e_{1}^{T} x_{j})}^{2}) = E (\sum_{j = 1}^{n} {(e_{1}^{T} x_{j})}^{2}) \leq E (\sum_{j = 1}^{n} {(u_{1}^{T} x_{j})}^{2}) = n E ({(u_{1}^{T} x_{j})}^{2}) .

(7)

By (6) and (7), we can show that

E ({(u_{1}^{T} x_{new})}^{2}) \leq E ({(e_{1}^{T} x_{new})}^{2}) = E ({(e_{1}^{T} x_{j})}^{2}) \leq E ({(u_{1}^{T} x_{j})}^{2}),

which demonstrates the shrinkage feature of the predicted PC scores. The amount of the shrinkage, or the asymptotic shrinkage factor, is given by the following theorem:

Theorem 2

Suppose p/n → γ ≥ 0 as n → ∞, $λ_{v} > 1 + \sqrt{γ}$ . Under the multiplicity one assumption,

\sqrt{\frac{E (q_{v}^{2})}{E (p_{v j}^{2})}} \overset{n \to \infty}{\to} \frac{λ_{v} - 1}{λ_{v} + γ - 1}

(8)

where p_vj is the j^th element of p_v.

The proof is given in section 6.8. We call (λ_v − 1)/(λ_v + γ − 1), the (asymptotic) shrinkage factor for a new subject. As shown, the shrinkage factor is smaller than 1 if γ > 0. Quite sensibly, it is a decreasing function of γ and an increasing function of λ_v. The bias of the predicted PC score can be potentially large for those high dimensional data where p is substantially greater than n, and/or for the data with relatively minor underlying structures where λ_v is small.

2.4. Rescaling of sample eigenvalues

The previous theorems are based on the assumption that all except the top m eigenvalues are equal to 1. Even under the spiked eigenvalue model, some rescaling of the sample eigenvalues may be necessary with real data.

For a given data, let its ordered population eigenvalues Λ^* = {ζλ₁, …, ζλ_m, ζ, …, ζ}, where ζ ≠ = 1, and its corresponding sample eigenvalues $D^{*} = {d_{1}^{*}, \dots, d_{n}^{*}}$ . We can show that Equations (4), (8), and (5) still hold under such circumstances. However, $ρ^{- 1} (d_{v}^{*})$ is no longer a consistent estimator of λ_v, because

d_{v}^{*} \overset{a . s}{\to} ζ λ_{v} (1 + \frac{γ}{λ_{v} - 1}) = ζ φ (λ_{v}) .

To address this issue, Baik and Silverstein [7] have proposed a simple approach to estimate ζ. In their method, the top significant large sample eigenvalues are first separated from the other grouped sample eigenvalues. Then ζ is estimated as the ratio between the average of the grouped sample eigenvalues and the mean determined by the Marchenko-Pastur law [22]. To separate the eigenvalues, they have suggested to use a screeplot of the percent variance versus component number. However, for real data, we may not be able to clearly separate the sample eigenvalues in such a manner and readily apply the approach. Thus we need an automated method which does not require a clear separation of the sample eigenvalues.

The expectation of the sum of the sample eigenvalues when ζ = 1 is

E (\sum_{v = 1}^{p} d_{v}) = E (trace (S)) = trace (E (S)) = trace (\sum) = \sum_{v = 1}^{p} λ_{v},

Thus, the sum of the rescaled eigenvalues is expected to be close to ( $\sum_{v = 1}^{m} λ_{v} + p - m$ ). Let $r_{v} = d_{v}^{*} / (\sum_{v = 1}^{p} d_{v}^{*})$ and d̂_v be a properly rescaled eigenvalue, then d̂_v should be very close to $r_{v} (\sum_{v = 1}^{m} λ_{v} + p - m)$ . Note that $p / (\sum_{v = 1}^{m} λ_{v} + p - m) \to 1$ for fixed m and λ_v. Thus pr_v is a properly adjusted eigenvalue. However, for finite n and p, the difference between p and ( $\sum_{v = 1}^{m} λ_{v} + p - m$ ) can be substantial, especially when the first several λ_vs are considerably larger than 1. To reduce this difference, we propose a novel method which iteratively estimates the ( $\sum_{v = 1}^{m} λ_{v} + p - m$ ) and d̂_v.

Initially set d̂_v,₀ = pr_v
For the l^th iteration, set λ̂_v,l = ρ⁻¹(d̂_v,l−₁) for ${\hat{d}}_{v, l - 1} > {(1 + \sqrt{γ})}^{2}$ , and λ̂_v,l = 1 for ${\hat{d}}_{v, l - 1} \leq {(1 + \sqrt{γ})}^{2}$ . Define k_l as the number of λ̂_v,ls that are greater than 1, and let
${\hat{d}}_{v, l} = (\sum_{v = 1}^{k_{l}} {\hat{λ}}_{v, l} + p - k_{l}) r_{v} .$
If $\sum_{v = 1}^{k_{l}} {\hat{λ}}_{v, l} + p - k_{l}$ converges, let
${\hat{d}}_{v} = {\hat{d}}_{v, l}$

and stop. Otherwise, go to step 2.

The consistency of d̂_v to ρ(λ_v) is shown in the following theorem.

Theorem 3

Let d̂_v be the rescaled sample eigenvalue from the proposed algorithm. Then, for $λ_{v} > 1 + \sqrt{γ}$ with multiplicity one,

{\hat{d}}_{v} \overset{p}{\to} ρ (λ_{v})

Since $ρ^{- 1} ({\hat{d}}_{v}) \overset{p}{\to} λ_{v}$ , φ(ρ⁻¹(d̂_v,))² is a consistent estimator of φ(λ_v)². Combining this fact with Theorems 2 and 3, we can obtain the bias adjusted PC score $q_{v}^{*}$

q_{v}^{*} = q_{v} \frac{ρ^{- 1} ({\hat{d}}_{v}) + γ - 1}{ρ^{- 1} ({\hat{d}}_{v}) - 1}

and the asymptotic correlation coefficient between g_v and p̃_v

\sqrt{(1 - \frac{γ}{{(ρ^{- 1} ({\hat{d}}_{v}) - 1)}^{2}})} .

3. Simulation

First, we applied our bias adjustment process to the simulated data described in the Introduction. Our estimated asymptotic shrinkage factors are 0.465 and 0.329 for the first and second PC scores, respectively. The scatter plot of the top two bias adjusted PC scores is given in Figure 2. After the bias adjustment, the predicted PC scores of the test data are comparable to those of the training data. This indicates that our method is effective in correcting for the shrinkage bias.

Fig 2 — Shrinkage Adjusted PC scores of the data in Figure 1. Different symbols represent different groups. White background color represents the training set and grey background color represents the test set. A) plots of all simulation samples. B) Center of each group.

Next, we conducted a new simulation to check the accuracy of our estimators. For the jth sample (j = 1, …, n), its i^th variable was generated as

x_{i j} = {\begin{matrix} λ_{1} z_{i j} & i = 1 \\ λ_{2} z_{i j} & i = 1 \\ z_{i j} & i > 2 \end{matrix}

where λ₁ > λ₂ > 1 and z_ij ~ N (0, 2²). Under this setting, λ₁ and λ₂ are the first and the second population eigenvalues. The first and second population eigenvectors are e₁ = {1, 0,…, 0} and e₂ = {0, 1, 0, …, 0} respectively. We set the standard deviation of z_ij to 2 instead of 1, which allows us to test whether the rescaling procedure works properly. We tried different values of γ and n, but set λ₁ and λ₂ to $4 (1 + \sqrt{γ})$ and $2 (1 + \sqrt{γ})$ , respectively.

We split the simulated samples into test and training sets, each with n samples. We first estimated the asymptotic shrinkage factor based on the training samples. We then calculated the predicted PC scores on the test samples. To assess the accuracy of shrinkage factor estimator for each PC, we empirically estimated the shrinkage factor by the ratio of the mean predicted PC scores of the test samples to the mean PC scores of the training samples. That is, for the vth PC, the empirical shrinkage factor is estimated by $\sqrt{\sum_{i = 1}^{n} q_{v i}^{2} / \sum_{k = 1}^{n} p_{v k}^{2}}$ . On the training samples, we also estimated the empirical angle between the sample and (known) population eigenvectors, as well as the empirical angle between PC scores from sample and population eigenvectors. The asymptotic theoretical estimates were also calculated. Tables 1 and 2 summarize the simulation results. Our asymptotic estimators provide accurate estimates for the angles and the shrinkage factor.

Table 1.

Cosine angle estimates of eigenvectors and PC scores based on 1000 simulations. “Angle” indicates the theoretical asymptotic cos (angle), “Estimate1” indicates the empirical cos(angle) estimator, ‘Estimate2” indicates the asymptotic cos(angle) estimator. For each estimator, each entry represents mean of 1, 000 simulation results with standard error in parentheses.

γ	n	PC 1			PC 2
γ	n	Angle	Angle Estimate1	Angle Estimate2	Angle	Angle Estimate1	Angle Estimate2
Eigenvectors
1	100	0.93	0.93(0.013)	0.91(0.027)	0.82	0.81(0.053)	0.80(0.052)
	200		0.93(0.009)	0.92(0.014)		0.81(0.030)	0.81(0.032)
20	100	0.70	0.69(0.037)	0.70(0.031)	0.51	0.50(0.053)	0.50(0.058)
	200		0.69(0.023)	0.70(0.022)		0.51(0.036)	0.51(0.041)
100	100	0.53	0.53(0.034)	0.53(0.031)	0.37	0.35(0.043)	0.35(0.047)
	200		0.53(0.024)	0.53(0.024)		0.36(0.029)	0.36(0.033)
500	100	0.38	0.38(0.029)	0.38(0.028)	0.25	0.24(0.033)	0.24(0.037)
	200		0.38(0.020)	0.38(0.020)		0.25(0.021)	0.25(0.024)

PC Scores
1	100	0.99	0.99(0.004)	0.98(0.016)	0.94	0.93(0.036)	0.91(0.048)
	200		0.99(0.003)	0.99(0.006)		0.94(0.019)	0.93(0.024)
20	100	0.98	0.97(0.083)	0.98(0.008)	0.89	0.86(0.105)	0.87(0.055)
	200		0.97(0.055)	0.98(0.005)		0.88(0.073)	0.88(0.036)
100	100	0.97	0.97(0.079)	0.97(0.009)	0.88	0.85(0.109)	0.86(0.060)
	200		0.97(0.058)	0.97(0.006)		0.86(0.076)	0.87(0.039)
500	100	0.97	0.96(0.084)	0.97(0.010)	0.87	0.83(0.117)	0.84(0.069)
	200		0.96(0.058)	0.97(0.007)		0.86(0.076)	0.86(0.038)

Open in a new tab

Table 2.

Shrinkage factor esimates based on 1000 simulation. “Factor” indicates the theoretical asymptotic factor, “Estimate1” indicates the empirical shrinkage factor estimator, ‘Estimate2” indicates the asymptotic shrinkage factor estimator. For each estimator, each entry represents mean of 1, 000 simulation results with standard error in parentheses.

γ	n	PC 1			PC 2
γ	n	Factor	Factor Estimate1	Factor Estimate2	Factor	Factor Estimate1	Factor Estimate2
1	100	0.88	0.88(0.017)	0.87(0.076)	0.75	0.75(0.044)	0.76(0.063)
	200		0.88(0.013)	0.87(0.054)		0.75(0.027)	0.75(0.044)
20	100	0.51	0.51(0.037)	0.51(0.038)	0.33	0.34(0.033)	0.32(0.038)
	200		0.51(0.025)	0.51(0.026)		0.34(0.022)	0.33(0.028)
100	100	0.30	0.30(0.024)	0.30(0.030)	0.17	0.17(0.019)	0.17(0.023)
	200		0.30(0.017)	0.30(0.023)		0.18(0.013)	0.17(0.017)
500	100	0.16	0.15(0.014)	0.16(0.020)	0.08	0.08(0.010)	0.08(0.013)
	200		0.15(0.010)	0.16(0.014)		0.08(0.007)	0.08(0.009)

Open in a new tab

Finally, we conducted simulation to demonstrate an application of the bias adjusted PC scores in PC regression. PC regression has been widely used in microarray gene-expression studies [9]. In this simulation, we let p = 5, 000, and our set up is very similar to the first simulation of Bair et al. [8]. Let x_ij denote the gene expression level of the ith gene for the jth subject. We generated each x_ij according to

x_{i j} = {\begin{matrix} 3 + ε & i \leq g, j \leq n / 2 \\ 4 + ε & i \leq g, j > n / 2 \\ 3.5 + ε & i > g \end{matrix}

and the outcome variable y_j as

y_{j} = \frac{2}{g} \sum_{i = 1}^{g} x_{i j} + ε_{y},

where n is the number of samples, g is the number of genes that are differentially expressed and associated with the phenotype, ε ~ N(0, 2²) and ε_y ~ N(0, 1). A total of eight different combinations of n and g were simulated. For the training data, we fit the PC regression with the first PC as the covariate and computed the mean square error (MSE). For the test samples with the same configuration of the training samples, we applied the PC model built on the training data to predict the phenotypes using the un-adjusted and adjusted PC scores. The results are presented in Table 3. We see that the MSE of the test set without bias adjustment is appreciably higher than that of the test set with bias adjustment, and the MSE of the test set with bias adjustment is comparable with the MSE of the training set.

Table 3.

Mean Square Error(MSE) of the PC regression based on gene-expression microarray data simulation with and without shrinkage adjustment. 1,000 simulation were conducted. Each entry in the table represents mean of the MSE with standard error in parentheses

n	g	Test Data without Adjustment	Test Data with Adjustment	Training Data
100	150	1.97(0.256)	1.70(0.284)	1.61(0.284)
100	300	1.63(0.230)	1.17(0.167)	1.12(0.158)
100	500	1.43(0.204)	1.07(0.157)	1.03(0.147)
100	1000	1.22(0.182)	1.03(0.148)	0.99(0.142)
200	150	1.73(0.159)	1.33(0.133)	1.30(0.131)
200	300	1.39(0.139)	1.08(0.105)	1.07(0.110)
200	500	1.24(0.131)	1.04(0.105)	1.01(0.101)
200	1000	1.10(0.114)	1.02(0.101)	1.00(0.101)

Open in a new tab

4. Real data example

Here we demonstrate that the shrinkage phenomenon appears in real data, and can be adjusted by our method. For this purpose, genetic data on samples from unrelated individuals in the Phase 3 HapMap study [http://hapmap.ncbi.nlm.nih.gov/] were used. HapMap is a dense genotyping study designed to elucidate population genetic differences. The genetic data are discrete, assuming the values 0, 1, or 2 at each genomic marker (also known as SNPs) for each individual. Data from CEU individuals (of northern and western European ancestry) were compared with data from TSI individuals (Toscani individuals from Italy, representing southern European ancestry).

Some initial data trimming steps are standard in genetic analysis. We first removed apparently related samples, and removed genomic markers with more than a 10% missing rate, and those with frequency less than 0.01 for the minor genetic allele. To avoid spurious PC results, we further pruned out SNPs that are in high linkage disequlibrium (LD) [11]. Lastly, we excluded 7 samples with PC scores greater than 6 standard deviations away from the mean of at least one of the top significant PCs (i.e., with Tracy-Widom (TW) Test p-value < 0.01) [24, 27]. The final dataset contained 178 samples (101 CEU, 77 TSI) and 100, 183 markers. We mean-centered and variance-standardized the genotypes for each marker [27]. The screeplot of the sample eigenvalues is presented in Figure 3. The first eigenvalue is substantially larger than the rest of the eigenvalues, although the TW test actually identifies two significant PCs. Figure 3 suggests that our data approximately satisfies the spiked eigenvalue assumption.

Fig 3 — Scree plot of the first 30 sample eigenvalues, CEU+TSI dataset

We estimated the asymptotic shrinkage factor and compared it with the following jackknife-based shrinkage factor estimate. For the first PC, we first computed the scores of all samples. Next, we removed one sample at a time and computed the (unadjusted) predicted PC score. We then calculated the jackknife estimate as the square root of the ratio of the means of the sample PC score and the predicted PC score. The jackknife shrinkage factor estimate is 0.319, which is close to our asymptotic estimate 0.325. Figure 4 shows the PC scores from the whole sample, the predicted PC score of an illustrative excluded sample, and its bias-adjusted predicted score. Clearly, the predicted PC score without adjustment is very biased towards zero, while the bias adjusted PC score is not.

Fig 4 — An instance with and without shrinkage adjustment, performed on Hapmap CEU(*) and TSI(+). “*” and “+” represent PC scores using all data. The 161^th sample was excluded from PCA, and PC score for it was predicted. The grey rectangle represents the predicted PC score without shrinkage adjustment and the grey circle represents the predicted PC score after the shrinkage adjustment

5. Discussion and conclusions

In this paper we have identified and explored the shrinkage phenomenon of the predicted PC scores, and have developed a novel method to adjust these quantities. We also have constructed the asymptotic estimator of correlation coefficient between PC scores from population eigenvectors and sample eigenvectors. In simulation experiments and real data analysis, we have demonstrated the accuracy of our estimates, and the capability to increase prediction accuracy in PC regression by adopting shrinkage bias adjustment. For achieving these, we consider asymptotics in the large p, large n framework, under the spiked population model.

We believe that this asymptotic regime applies well to many high dimensional datasets. It is not, however, the only model paradigm applied to such data. For example, the large p small n paradigm [1, 14], which assumes p/n → ∞, has also been explored. Under this assumption, Jung and Marron [20] have shown that the consistency and the strong inconsistency of the sample eigenvectors to population eigenvectors depend on whether p increases at a slower or faster rate than λ_v. It may be argued that for real data where p/n is “large,” we should follow the paradigm of Ahn et al. [1], Hall et al. [14]. However, for any real study, it is unclear how to test whether p increases at a faster rate than λ_v, or vice versa, making the application of Ahn et al. [1], Hall et al. [14] difficult in practice. Furthermore, the scenario where p and λ_v grow at the same rate is scientifically more interesting, for which we are aware of no theoretical results. In contrast, our asymptotic results can be straightforwardly applied. Further, our simulation results indicate that for p/n as large as 500, our asymptotic results still hold well. We believe that the approach we describe here applies to many datasets.

Although the results from the spiked model are useful, it is likely that observed data has more structure than allowed by the model. Recently, several methods have been suggested to estimate population eigenvalues under more general scenarios [10, 29]. However, no analogous results are available for the eigenvectors. In data analysis, jackknife estimators, as demonstrated in the real data analysis section, can be used. However, resampling approaches are very computationally intensive, and it remains of interest to establish the asymptotic behavior of eigenvectors in a variety of situations.

We note that inconsistency of the sample eigenvectors does not necessarily imply poor performance of PCA. For example, PCA has been successfully applied in genome-wide association studies for accurate estimation of ethnicity [27], and in PC regression for microarrays [21]. However, for any individual study we cannot rule out the possibility of poor performance of the PC analysis. Our asymptotic result on the correlation coefficient between PC scores from sample and population eigenvectors provides us a measure to quantify the performance of PC analysis.

For the CEU/TSI data, SNP pruning was applied to adjust for strong LD among adjacent SNPs. Such SNP pruning is a common practice in the analysis of GWAS data, and has been implemented in the popular GWAS analysis software Plink [28]. The primary goal of SNP pruning is to avoid spurious PC results unrelated to population substructures. Technically, our approach does not rely on any independence assumption of the SNPs. However, strong local correlation may affect eigenvalues considerably. Thus the value in SNP pruning may be viewed as helping the data better accord with the assumptions of the spiked population model. From the CEU/TSI data and our experience in other GWAS data, we have found that the most common pruning procedure implemented in Plink is sufficient for us to then apply our methods.

6. Proofs

Note that EΛ^1/2ZZ^T Λ^1/2E^T and Λ^1/2ZZ^T Λ^1/2 have the same eigenvalues, and E^T U is the eigenvector matrix of Λ^1/2ZZ^T Λ^1/2. Since eigenvalues and angles between sample and population eigenvectors are what we concerned about, without loss of generality (WLOG), in the sequel, we assume Λ to be the population covariance matrix.

6.1. Notations

We largely follow notations in Paul [26]. We denote λ_v(S) as the v^th largest eigenvalue of S. Let suffice A represent the first m coordinates and B represent the remaining coordinates. Then we can partition S into

S = [\begin{matrix} S_{A A} & S_{A B} \\ S_{B A} & S_{B B} \end{matrix}]

We similarly partition the v^th eigenvector $u_{v}^{T}$ into (u_A,v, u_B,v) and Z^T into [ $Z_{A}^{T}, Z_{B}^{T}$ ]. Define R_v as ||u_B,v|| and let $a_{v} = u_{A, v} / \sqrt{1 - R_{v}^{2}}$ , then we get ||a_v|| = 1.

Applying singular value decomposition (SVD) to $Z_{B} / \sqrt{n}$ , we get

\frac{1}{\sqrt{n}} Z_{B} = {VM}^{1 / 2} H^{T},

(9)

where M = diag(μ₁, …, μ_p₋_m) is a (p − m) × (p − m) diagonal matrix of ordered eigenvalues of S_BB, V is a (p − m) × (p − m) orthogonal matrix, and H is an n × (p − m) matrix. For n ≥ p − m, H has full rank orthogonal columns. When n < p−m, H has more columns than rows, hence it does not have full rank orthogonal columns. For the later case, we make H = [H_n, 0] where H_n is an n × n orthogonal matrix.

6.2. Propositions

We introduce two propositions for later use. The proofs of the 2 propositions can be found in section 6.5 and 6.6.

Proposition 1

Suppose Y is an n×m matrix with fixed m and each entry of Y is i.i.d random variable which satisfies the moment condition of z_ij in Assumption 1. Let C be an n × n symmetric non-negative definite random matrix and independent of Y. Further assume ||C|| = O(1). Then

\frac{1}{n} Y^{T} CY - \frac{1}{n} trace (C) I \overset{p}{\to} 0

as n → ∞

Proposition 2

Suppose y is an n dimensional random vector which follows the same distribution of the row vectors of Y and independent of S_BB. Let f(x) be a bounded continuous function on [ ${(1 - \sqrt{γ})}^{2}, {(1 + \sqrt{γ})}^{2}$ ] and f(0) = 0. Suppose F = diag(f(μ₁), …, f(μ_p₋_m)), where ${μ_{i}}_{i = 1}^{p - m}$ are ordered eigenvalues of M which is defined on (9), then

\frac{1}{n} y^{T} {HFH}^{T} y - γ \int f (x) {d F}_{γ} (x) \overset{p}{\to} 0

as n → ∞, where F_γ(x) is a distribution function of Marchenko-Pastur law with parameter γ [22].

6.3. Proof of Part i) of Lemma 1)

6.3.1. When p is fixed

By the strong law of large numbers, $S \overset{a . s}{\to} Λ$ . Since eigenvalues are continuous with respect to the operator norm, the lemma follows after applying continuous mapping theorem.

6.3.2. When p → ∞

For every small ε > 0, there exist p̃(n) and γ_ε such that p̃(n)/n → γ_ε > 0, λ_v(1 + γ_ε/(λ_v − 1)) < λ_v + ε for all v ≤ m, ${(1 + \sqrt{γ_{ε}})}^{2} < 1 + ε$ , and ${(1 - \sqrt{γ_{ε}})}^{2} > 1 - ε$ . For simplicity, we denote p̃(n) as p̃. Suppose Z_p̃ is a p̃ × n matrix that satisfies the moment condition of z_ij in Assumption 1. Define an augmented data matrix ${\tilde{X}}^{T} = {[Z^{T} Λ, Z_{\tilde{p}}^{T}]}^{T}$ and its sample covariance matrix S̃ = X̃X̃^T. Let S be a p × p upper left submatrix of S̃. We also let Ŝ be an (m + 1) × (m + 1) upper left submatrix of S̃. For v ≤ (m + 1), by the interlacing inequality (Theorem 4.3.15 of Horn and Johnson [15]),

λ_{v} (\hat{S}) \leq λ_{v} (S) \leq λ_{v} (\tilde{S}) .

Since $λ_{v} (\hat{S}) \overset{a . s}{\to} λ_{v}, λ_{v} (\tilde{S}) \overset{a . s}{\to} λ_{v} (1 + γ_{ε} / (λ_{v} - 1)) < 1 + ε$ for v ≤ m, and $λ_{v} (\tilde{S}) \overset{a . s}{\to} {(1 + \sqrt{γ_{ε}})}^{2} < 1 + ε$ for v = m + 1, we have

λ_{v} - o (1) \leq λ_{v} (S) < λ_{v} + ε + o (1), for v \leq m + 1.

Thus,

λ_{v} (S) \overset{a . s}{\to} λ_{v}, for v \leq m + 1.

(10)

Similarly by the interlacing inequality, we get

λ_{\tilde{p}} (\tilde{S}) \leq λ_{p} (S) \leq_{m + 1} (S) .

Since $λ_{m + 1} (S) \overset{a . s}{\to} 1$ , and $λ_{\tilde{p}} (\tilde{S}) \overset{a . s}{\to} {(1 - \sqrt{γ_{ε}})}^{2} > 1 - ε$ , we conclude that

λ_{p} (S) \overset{a . s}{\to} 1.

(11)

The part i) of Lemma 1 follows by (10) and (11)

6.4. Proof of Part ii) of Lemma 2

Our proof of Lemma 2 (ii) closely follows the arguments in Paul [26]. From [26], it can be shown that

(S_{A A} + \frac{1}{n} Λ_{A}^{1 / 2} Z_{A} HM {(d_{v} I - M)}^{- 1} H^{T} Z_{A}^{T} Λ_{A}^{1 / 2}) a_{v} = d_{v} a_{v}

(12)

and

a_{v}^{T} (I + \frac{1}{n} Λ_{A}^{1 / 2} Z_{A} HM {(d_{v} I - M)}^{- 2} H^{T} Z_{A}^{T} Λ_{A}^{1 / 2}) a_{v} = \frac{1}{1 - R_{v}^{2}}

(13)

where Λ_A = diag {λ₁, …, λ_m}.

6.4.1. When $λ_{v} > 1 + \sqrt{γ}$

We can show that

〈 a_{v}, e_{A, v} 〉 \overset{p}{\to} 1

(14)

and

\frac{1}{n} z_{A v}^{T} HM {(d_{v} I - M)}^{- 2} H^{T} z_{A v} \overset{p}{\to} {\begin{matrix} γ \int \frac{x}{{(ρ_{v} - x)}^{2}} {d F}_{γ} (x) & for & γ > 0 \\ 0 & for & γ = 0, \end{matrix}

(15)

where e_A,v is a vector of the first m coordinates of the v^th population eigen-vector e_v, ρ_v is $λ_{v} (1 + \frac{γ}{λ_{v} - 1})$ , and z_Av is a vector of v^th row of Z_A. The proofs can be found in 6.4.3. Note that e_v is a vector with 1 in its vth coordinate and 0 elsewhere. WLOG, we assume that 〈e_v, u_v〉 ≥ 0. Since $〈 e_{v}, u_{v} 〉 = \sqrt{1 - R_{v}^{2}} 〈 e_{A, v}, a_{v} 〉, 〈 e_{v}, u_{v} 〉 \overset{p}{\to} \sqrt{1 - R_{v}^{2}}$ . By (13) and (15), we can show that

\frac{1}{1 - R_{v}^{2}} \overset{p}{\to} {\begin{matrix} 1 + λ_{v} γ \int \frac{x}{{(ρ_{v} - x)}^{2}} {d F}_{γ} (x) & for & γ > 0 \\ 0 & for & γ = 0. \end{matrix}

(16)

From Lemma B.2 of [26],

\int \frac{x}{{(ρ_{v} - x)}^{2}} {d F}_{γ} (x) = \frac{1}{{(λ_{v} - 1)}^{2} - γ} .

(17)

Thus

\sqrt{1 - R_{v}^{2}} \overset{p}{\to} {\begin{matrix} \sqrt{(1 - \frac{γ}{{(λ_{v} - 1)}^{2}}) / (1 + \frac{γ}{λ_{v} - 1})} & for & γ > 0 \\ 0 & for & γ = 0. \end{matrix}

(18)

It concludes the proof of the first part of Lemma 2 ii).

6.4.2. When $1 < λ_{v} \leq 1 + \sqrt{γ}$

Here we only need to consider γ > 0 because no eigenvalue satisfies this condition when γ = 0. We first show that $R_{v} \overset{p}{\to} 1$ , which implies $u_{A, v} \overset{p}{\to} 0$ , hence $〈 e_{v}, u_{v} 〉 \overset{p}{\to} 0$ . For any ε > 0 and x ≥ 0, define

{(x)}_{ε} = {\begin{matrix} x & if x > ε \\ ε & if x \leq ε \end{matrix}

and

G_{ε} = diag (d_{v} / {({(d_{v} - μ_{1})}^{2})}_{ε}, \dots, d_{v} / {({(d_{v} - μ_{p - m})}^{2})}_{ε}),

then by Propositions 1 and 2,

\frac{1}{n} z_{A v}^{T} H G_{ε} H^{T} z_{A v} \overset{p}{\to} γ \int \frac{x}{{({(ρ_{v} - x)}^{2})}_{ε}} {d F}_{γ} (x)

(19)

By monotone convergence theorem,

γ \int \frac{x}{{({(ρ_{v} - x)}^{2})}_{ε}} {d F}_{γ} (x) \overset{ε \to 0}{\to} γ \int \frac{x}{{(ρ_{v} - x)}^{2}} {d F}_{γ} (x)

(20)

RHS of (20) is

\int_{a}^{b} \frac{\sqrt{(b - x) (x - a)}}{2 π {(p_{v} - x)}^{2}} d x

(21)

where $a = {(1 - \sqrt{γ})}^{2}$ and $b = {(1 + \sqrt{γ})}^{2}$ . Since (21) equals ∞ for any a ≤ ρ_v ≤ b, we conclude that

\frac{1}{n} z_{A v}^{T} HM {(d_{v} I - M)}^{- 2} H^{T} z_{A v} \overset{p}{\to} \infty

(22)

Therefore $R_{v} \overset{p}{\to} 1$ , which proves the second part of Lemma 2 ii).

6.4.3. Proof of (14) and (15)

Define $R_{v} = \sum_{k \neq v}^{m} \frac{λ_{v}}{ρ_{v} (λ_{k} - λ_{v})} e_{A, k} e_{A, k}^{T}$ , Inline graphic = S_AA+S_AB(d_vI−S_BB)⁻¹ S_BA −(ρ_v/λ_v)Λ_A, α_v = |||| + |d_v − ρ_v|||||, and β_v = ||e_A,v||.

With the exactly same argument of [26], it can be shown that

a_{v} - e_{A, v} = - R_{v} D_{v} e_{A, v} + r_{v}

where r_v = − (1 − 〈e_A,v, a_v〉)e_A,v − Inline graphic (a_v − e_A,v)+(d_v − ρ_v)(a_v − e_A_,v). By Lemma 1 of [25], r_v = o_p(1), if α_v = o_p(1) and β_v = o_p(1).

When γ = 0, $S_{A A} - (ρ_{v} / λ_{v}) Λ_{A} \overset{p}{\to} 0$ and the remainder of Inline graphic is

S_{A B} {(d_{v} I - S_{B B})}^{- 1} S_{B A} = \frac{1}{n} Λ_{A}^{1 / 2} Z_{A} HM {(d_{v} I - M)}^{- 1} H^{T} Z_{A}^{T} Λ_{A}^{1 / 2} .

(23)

Since $d_{v} \overset{a . s}{\to} λ_{v}$ and $μ_{1} \overset{a . s}{\to} 1$ ,

| | HM {(d_{v} I - M)}^{- 1} H^{T} | | \overset{a . s}{\to} 1 / (λ_{v} - 1)

By Proposition 1,

0 \leq (23) \leq λ_{1} \frac{p μ_{1}}{n (d_{v} - λ_{1})} + o_{p} (1) = o_{p} (1),

(24)

hence Inline graphic = o_p(1).

When γ > 0, Inline graphic can be written as

D_{v} = [S_{A A} - Λ_{A}] + [Λ_{A}^{1 / 2} (\frac{1}{n} Z_{A} HM {(ρ_{v} I - M)}^{- 1} H^{T} Z_{A} - \frac{1}{n} trace (M {(ρ_{v} I - M)}^{- 1}) I) Λ_{A}^{1 / 2}] + [(\frac{1}{n} trace (M {(ρ_{v} I - M)}^{- 1}) - γ \int \frac{x}{ρ_{v} - x} {d F}_{γ} (x)) Λ_{A}] + [(ρ_{v} - d_{v}) \frac{1}{n} Λ_{A}^{1 / 2} Z_{A} HM {(ρ_{v} I - M)}^{- 1} {(d_{v} I - M)}^{- 1} H^{T} Z_{A} Λ_{A}^{1 / 2}]

(25)

The first term of RHS is o_p(1) by the weak law of large number. The second and third terms are o_p(1) by Propositions 1 and 2. For the fourth term, ρ_v − d_v = o_p(1) and its remainder part is O_p(1). Therefore, Inline graphic = o_p(1). By combining the above results and = O_p(1) plus d_v − ρ_v = o_p(1), we prove the Equation (14).

For (15): When γ = 0, (15) can be proved by the exactly same way used to show (24). When γ > 0, $d_{v} \overset{a . s}{\to} ρ_{v}$ , and $μ_{1} \overset{a . s}{\to} {(1 + \sqrt{γ})}^{2} < ρ_{v}$ , hence $| | C | | \overset{a . s}{\to} \frac{{(1 + \sqrt{γ})}^{2}}{{(ρ_{v} - {(1 + \sqrt{γ})}^{2})}^{2}}$ . Therefore the result follows according to Propositions 1 and 2.

6.5. Proof of Proposition 1

Let μ₁ ≥ μ₂ ≥ ··· ≥ μ_n be the ordered eigenvalues of C, and c_ij be the (i, j)th element of C. Suppose y_s is the sth column of Y, and y_ij is the (i, j)th element of Y. We further define $ψ (s, s) = \frac{1}{n} y_{s}^{T} C y_{s} - \frac{1}{n} trace (C)$ and $ψ (s, t) = \frac{1}{n} y_{s}^{T} C y_{t}$ for s ≠ t. The conditional mean of ψ(s, s) given C is

\begin{array}{l} E (ψ (s, s) ∣ C) = E (\frac{1}{n} \sum_{i, j} c_{i j} y_{i s} y_{j s} ∣ C) - \frac{1}{n} \sum_{i = 1}^{n} μ_{i} \\ = \frac{1}{n} \sum_{i = 1}^{n} c_{i i} E (y_{i s}^{2}) + \frac{2}{n} \sum_{i < j}^{n} c_{i j} E (y_{i s} y_{j s}) - \frac{1}{n} \sum_{i = 1}^{n} μ_{i} \\ = \frac{1}{n} \sum_{i = 1}^{n} c_{i i} - \frac{1}{n} \sum_{i = 1}^{n} μ_{i} = 0 \end{array}

(26)

Thus, E(ψ(s, s)) = E(E(ψ(s, s)|C)) = E(0) = 0.

Next, the conditional variance of ψ(s, s) given C is

\begin{array}{l} Var (ψ (s, s) ∣ C) = \frac{1}{n^{2}} Var (\sum_{i, j} c_{i j} y_{i s} y_{j s} ∣ C) \\ = \frac{1}{n^{2}} \sum_{i, j, l, q = 1}^{n} c_{i j} c_{l q} Cov (y_{i s} y_{j s}, y_{l s} y_{q s}) \\ = \frac{4}{n^{2}} \sum_{i, j = 1}^{n} c_{i j}^{2} Var (y_{i s} y_{j s}) \\ \leq \frac{4 α}{n^{2}} \sum_{i, j = 1}^{n} c_{i j}^{2} = \frac{4 α}{n^{2}} trace (C^{2}) = \frac{4 α}{n^{2}} \sum_{i = 1}^{n} μ_{i}^{2} \end{array}

(27)

where $α = \max (1, E (y_{i s}^{4}) - 1)$ . Since ||C|| = O(1), $μ_{i}^{2} \leq {| | C | |}^{2} = O (1)$ . Therefore, Var(ψ(s, s)|C) ≤ O(1/n) and Var(ψ(s, s)) = Var(E(ψ(s, s)|C)) + E(Var(ψ(s, s)|C)) ≤ 0 + O(1/n) → 0 as n → ∞. By Chebyshev inequality, we can conclude that

ψ (s, s) \overset{p}{\to} 0.

We can similarly show $ψ (s, t) \overset{p}{\to} 0$ , which we omit here.

6.6. Proof of Proposition 2

Consider an expansion

\begin{array}{l} \frac{1}{n} y^{T} {HFH}^{T} y - γ \int f (x) {d F}_{γ} (x) \\ = [\frac{1}{n} y^{T} {HFH}^{T} y - \frac{1}{n} trace (F)] \\ + [\frac{1}{n} trace (F) - γ \int f (x) {d F}_{γ} (x)] \\ = (a) + (b) \end{array}

We show that both (a) and (b) converge to 0 in probability.

Since $μ_{1} \overset{a . s .}{\to} {(1 + \sqrt{γ})}^{2}, μ_{\min (p - m, n)} \overset{a . s .}{\to} {(1 - \sqrt{γ})}^{2}$ , μ_k = 0 for k > min(p−m, n), and f(x) is continuous and bounded on [ ${(1 - \sqrt{γ})}^{2}, {(1 + \sqrt{γ})}^{2}$ ], there exists K > 0 such that sup_i |f (μ_i)| < K a.s. Let C = HFH^T, then trace(C) = trace(F). By Proposition 1, (a) = o_p(1).
Let F_p₋_m be an empirical spectral distribution of S_BB, then

$\frac{1}{n} trace (F) = \frac{p - m}{n} \int f (x) {d F}_{p - m} (x),$

and $\int f (x) {d F}_{n} (x) \overset{p}{\to} \int f (x) {d F}_{n} (x)$ [5, 22]. Thus
$\frac{p - m}{n} \int f (x) {d F}_{p - m} (x) \overset{p}{\to} γ \int f (x) {d F}_{γ} (x),$

which shows that (b) = o_p(1).

Combining (a) and (b), we finish the proof.

6.7. Proof of Theorem 1

WLOG we assume 〈g_v, p̃_v〉 ≥ 0. Let e_v = {e_A,v, e_B,v}, then e_A,v is the vector with 1 in v^th coordinate and 0 elsewhere, and e_B,v is the zero vector. Since S_AAu_A,v + S_ABu_B,v = d_vu_A,v, we have

\begin{array}{l} < g_{v}, {\tilde{p}}_{v} > = \frac{1}{n \sqrt{d_{v} λ_{v}}} e_{v}^{T} {XX}^{T} u_{v} \\ = e_{A, v}^{T} S_{A A} u_{A, v} / \sqrt{d_{v} λ_{v}} + e_{A, v}^{T} S_{A B} u_{B, v} / \sqrt{d_{v} λ_{v}} \\ = \frac{d_{v}}{\sqrt{d_{v} λ_{v}}} e_{A, v}^{T}, u_{A, v} = \sqrt{\frac{d_{v}}{λ_{v}}} e_{v}^{T} u_{v} \overset{p}{\to} {\begin{array}{l} \sqrt{(1 - \frac{γ}{{(λ_{v} - 1)}^{2}})} & for & λ_{v} > 1 + \sqrt{γ} \\ 0 & for & 1 < λ_{v} \leq 1 + \sqrt{γ} . \end{array} \end{array}

(28)

6.8. Proof of Theorem 2

First, we show the square of the denominator converges to ρ(λ_v). Since $p_{v j} = u_{v}^{T} x_{j}$ , and $E (p_{v i}^{2}) = E (p_{v j}^{2})$ for i ≠ j,

\begin{array}{l} E (p_{v j}^{2}) = \frac{1}{n} E (\sum_{j = 1}^{n} p_{v j}^{2}) = \frac{1}{n} E (\sum_{j = 1}^{n} {(u_{v}^{T} x_{j})}^{2}) \\ = E (u_{v}^{T} {XX}^{T} u_{v} / n) = E (d_{v}) \overset{a . s}{\to} ρ (λ_{v}) \end{array}

(29)

Next we show the square of numerator converges to φ(λ_v)²(λ_v − 1)+1. Define $u_{v}^{⊥} : = \frac{1}{\sqrt{1 - {(u_{v}^{T} e_{v})}^{2}}} (I - e_{v} e_{v}^{T}) u_{v}$ , then u_v can be expressed as

u_{v} = (u_{v}^{T} e_{v}) e_{v} + \sqrt{1 - {(u_{v}^{T} e_{v})}^{2}} u_{v}^{⊥}

Partition $u_{v}^{⊥} = {u_{A, v}^{⊥}, u_{B, v}^{⊥}}$ . From (14), $a_{v} \overset{p}{\to} e_{A, v}$ , therefore $u_{A, v}^{⊥} \overset{p}{\to} 0$ and $u_{B, v}^{⊥ T} u_{B, v}^{⊥} \overset{p}{\to} 1$ Since x_new and u_v are independent, we have

\begin{array}{l} E (q_{v}^{2} ∣ u_{v}) = E ({(u_{v}^{T} x_{new})}^{2} ∣ u_{v}) = u_{v}^{T} E (x_{new} x_{new}^{T} ∣ u_{v}) u_{v} = u_{v}^{T} Λ u_{v} \\ = {(u_{v}^{T} e_{v})}^{2} e_{v}^{T} Λ e_{v} + (1 - {(u_{v}^{T} e_{v})}^{2}) u_{v}^{⊥ T} Λ u_{v}^{⊥} + 2 u_{v}^{T} e_{v} \sqrt{1 - {(u_{v}^{T} e_{v})}^{2}} e_{v}^{T} Λ u_{v}^{⊥} \\ = {(u_{v}^{T} e_{v})}^{2} λ_{v} + (1 - {(u_{v}^{T} e_{v})}^{2}) (u_{A, v}^{⊥ T} Λ_{A} u_{A, v}^{⊥} + u_{B, v}^{⊥ T} u_{B, v}^{⊥}) + 2 u_{v}^{T} e_{v} \sqrt{1 - {(u_{v}^{T} e_{v})}^{2}} e_{A, v} Λ_{A} u_{A, v}^{⊥} \overset{p}{\to} φ {(λ_{v})}^{2} (λ_{v} - 1) + 1. \end{array}

(30)

From (29) and (30),

\sqrt{\frac{E (q_{v}^{2})}{E (p_{v i}^{2})}} \to \sqrt{\frac{φ {(λ_{v})}^{2} (λ_{v} - 1) + 1}{ρ (λ_{v})}} = \frac{(λ_{v} - 1)}{(λ_{v} + γ - 1)} .

(31)

6.9. Proof of Theorem 3

Since ρ⁻¹(pr_v) → λ_v for v ≤ k, WLOG we assume that k₀ = k, where k is the number of λ_v bigger than $1 + \sqrt{γ}$ . Set

h (x) = \sum_{v = 1}^{k} ρ^{- 1} (r_{v} x) + p - k - x

(32)

The first and second partial derivatives of h(x) are

\frac{\partial h (x)}{\partial x} = \frac{1}{2} \sum_{v = 1}^{k} r_{v} + \frac{1}{2} \sum_{v = 1}^{k} \frac{({x r}_{v} - (1 + γ)) r_{v}}{\sqrt{{({x r}_{v} - (1 + γ))}^{2} - 4 γ}} - 1

(33)

\frac{\partial^{2} h (x)}{\partial x^{2}} = 2 \sum_{v = 1}^{k} \frac{- r_{v}^{2} γ}{{({({x r}_{v} - (1 + γ))}^{2} - 4 γ)}^{3 / 2}} < 0,

(34)

so h(x) is a concave function of x given r_v. From the fact that ρ⁻¹(r_vp) > 1 for v ≤ k, we know h(p) > 0. Because of the concave nature of this function, h(x) = 0 has an unique solution τ on [p, ∞), which $\sum_{v = 1}^{k_{l}} {\hat{λ}}_{v, l} + p - m_{l}$ converges to. Thus d̂_v = τr_v. Define d̃_v = r_vω where $ω = \sum_{v = 1}^{k} λ_{v} + p - k$ , and set d_v as the sample eigenvalue when σ² = 1. The sum of all d_v is

\sum_{v = 1}^{p} d_{v} = \frac{1}{n} trace ({Z Z}^{T} Λ) = \frac{1}{n} \sum_{i = 1}^{p} \sum_{j = 1}^{n} λ_{i} z_{i j}^{2},

(35)

thus

E (\frac{\sum_{v = 1}^{p} d_{v}}{ω}) = \frac{\sum_{v = 1}^{m} λ_{v} + p - k}{ω} \to 1, and

(36)

and

Var (\frac{\sum_{v = 1}^{p} d_{v}}{ω}) = \frac{1}{n} \frac{\sum_{v = 1}^{p} λ_{v}^{2}}{ω^{2}} (E (z_{11}^{4}) - 1) \to 0.

(37)

By (36) and (37)

\sum_{v = 1}^{p} d_{v} / ω = 1 + o_{p} (1) .

(38)

Since d_v → ρ(λ_v) for v ≤ k,

{\tilde{d}}_{v} = d_{v} ω / \sum_{v = 1}^{p} d_{v} = d_{v} (1 + o_{p} (1)) \overset{p}{\to} ρ (λ_{v}) .

(39)

Now, we show that τ = ω + o_p(1). Plugging ω into h(x) and combining the fact that ρ⁻¹(d̃_v) = λ_v + o_p(1), we get

h (ω) = \sum_{v = 1}^{k} ρ^{- 1} ({\tilde{d}}_{v}) - \sum_{v = 1}^{k} λ_{v} = o_{p} (1) .

(40)

From the facts that h(x) is a continuous concave function, ω > p, and h(p) > 0, we conclude that

ω = τ + o_{p} (1) .

(41)

Therefore

{\hat{d}}_{v} = r_{v} τ = r_{v} (ω + o p (1)) = {\tilde{d}}_{v} + o p (1) \overset{p}{\to} ρ (λ_{v})

(42)

for v ≤ k, which concludes the proof.

Contributor Information

Seunggeun Lee, Email: slee@bios.unc.edu.

Fei Zou, Email: fzou@bios.unc.edu.

Fred A. Wright, Email: fwright@bios.unc.edu.

References

1.Ahn J, Marron JS, Muller KM, Chi YY. The high-dimension, low-sample-size geometric representation holds under mild conditions. Biometrika. 2007;94:760–766. [Google Scholar]
2.Anderson TW. Asymptotic theory for principal component analysis. Ann Math Statist. 1963;34:122–148. [Google Scholar]
3.Anderson TW. Wiley Series in Probability and Statistics. 3. Hoboken, NJ: Wiley-Interscience [John Wiley & Sons]; 2003. An introduction to multivariate statistical analysis. [Google Scholar]
4.Bai Z, Yao J-f. Central limit theorems for eigenvalues in a spiked population model. Ann Inst Henri Poincaré Probab Stat. 2008;44:447–474. [Google Scholar]
5.Bai ZD. Methodologies in spectral analysis of large-dimensional random matrices, a review. Statist Sinica. 1999;9:611–677. [Google Scholar]
6.Baik J, Ben Arous G, Péché S. Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann Probab. 2005;33:1643–1697. [Google Scholar]
7.Baik J, Silverstein JW. Eigenvalues of large sample covariance matrices of spiked population models. J Multivariate Anal. 2006;97:1382–1408. [Google Scholar]
8.Bair E, Hastie T, Paul D, Tibshirani R. Prediction by supervised principal components. J Amer Statist Assoc. 2006;101:119–137. [Google Scholar]
9.Bovelstad H, Nygard S, Storvold H, Aldrin M, Borgan O, Frigessi A, Ling-jaerde O. Predicting survival from microarray data a comparative study. Bioinformatics. 2007;23:2080–2087. doi: 10.1093/bioinformatics/btm305. [DOI] [PubMed] [Google Scholar]
10.El Karoui N. Spectrum estimation for large dimensional covariance matrices using random matrix theory. Ann Statist. 2008;36:2757–2790. [Google Scholar]
11.Fellay J, Shianna K, Ge D, Colombo S, Ledergerber B, Weale M, Zhang K, Gumbs C, Castagna A, Cossarizza A, et al. A Whole-Genome Association Study of Major Determinants for Host Control of HIV-1. Science. 2007;317:944–947. doi: 10.1126/science.1143767. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Girshick M. Principal components. J Amer Statist Assoc. 1936;31:519–528. [Google Scholar]
13.Girshick M. On the sampling theory of roots of determinantal equations. Ann Math Statist. 1939;10:203–224. [Google Scholar]
14.Hall P, Marron JS, Neeman A. Geometric representation of high dimension, low sample size data. J R Stat Soc Ser B Stat Methodol. 2005;67:427–444. [Google Scholar]
15.Horn R, Johnson C. Matrix analysis. Cambridge Univ Pr; 1990. [Google Scholar]
16.Jackson J. A user’s guide to principal components. Wiley-Interscience; 2005. [Google Scholar]
17.Johnstone I, Lu A. On Consistency and Sparsity for Principal Components Analysis in High Dimensions. J Amer Statist Assoc. 2009;104:682–693. doi: 10.1198/jasa.2009.0121. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Johnstone IM. On the distribution of the largest eigenvalue in principal components analysis. Ann Statist. 2001;29:295–327. [Google Scholar]
19.Jolliffe I. Principal component analysis. Springer; New York: 2002. [Google Scholar]
20.Jung S, Marron JS. PCA consistency in High dimension, low sample size context. Ann Statist. 2009;37:4104–4130. [Google Scholar]
21.Ma S, Kosorok MR, Fine JP. Additive risk models for survival data with high-dimensional covariates. Biometrics. 2006;62:202–210. doi: 10.1111/j.1541-0420.2005.00405.x. [DOI] [PubMed] [Google Scholar]
22.Marčenko V, Pastur L. Distribution of eigenvalues for some sets of random matrices. Sbornik: Mathematics. 1967;1:457–483. [Google Scholar]
23.Nadler B. Finite sample approximation results for principal component analysis: a matrix perturbation approach. Ann Statist. 2008;36:2791–2817. [Google Scholar]
24.Patterson N, Price A, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2:e190. doi: 10.1371/journal.pgen.0020190. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Paul D. Asymptotics of the leading sample eigenvalues for a spiked covariance model. Technical Report. 2005 http://anson.ucdavis.edu/debashis/techrep/eigenlimit.pdf.
26.Paul D. Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist Sinica. 2007;17(4):1617–1642. [Google Scholar]
27.Price A, Patterson N, Plenge R, Weinblatt M, Shadick N, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
28.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M, Bender D, Maller J, Sklar P, de Bakker P, Daly M, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics. 2007;81(3):559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Rao NR, Mingo JA, Speicher R, Edelman A. Statistical eigen-inference from large Wishart matrices. Ann Statist. 2008;36:2850–2885. [Google Scholar]
30.Wall M, Rechtsteiner A, Rocha L. Singular value decomposition and principal component analysis. A practical approach to microarray data analysis. 2003:91–109. [Google Scholar]

[R1] 1.Ahn J, Marron JS, Muller KM, Chi YY. The high-dimension, low-sample-size geometric representation holds under mild conditions. Biometrika. 2007;94:760–766. [Google Scholar]

[R2] 2.Anderson TW. Asymptotic theory for principal component analysis. Ann Math Statist. 1963;34:122–148. [Google Scholar]

[R3] 3.Anderson TW. Wiley Series in Probability and Statistics. 3. Hoboken, NJ: Wiley-Interscience [John Wiley & Sons]; 2003. An introduction to multivariate statistical analysis. [Google Scholar]

[R4] 4.Bai Z, Yao J-f. Central limit theorems for eigenvalues in a spiked population model. Ann Inst Henri Poincaré Probab Stat. 2008;44:447–474. [Google Scholar]

[R5] 5.Bai ZD. Methodologies in spectral analysis of large-dimensional random matrices, a review. Statist Sinica. 1999;9:611–677. [Google Scholar]

[R6] 6.Baik J, Ben Arous G, Péché S. Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann Probab. 2005;33:1643–1697. [Google Scholar]

[R7] 7.Baik J, Silverstein JW. Eigenvalues of large sample covariance matrices of spiked population models. J Multivariate Anal. 2006;97:1382–1408. [Google Scholar]

[R8] 8.Bair E, Hastie T, Paul D, Tibshirani R. Prediction by supervised principal components. J Amer Statist Assoc. 2006;101:119–137. [Google Scholar]

[R9] 9.Bovelstad H, Nygard S, Storvold H, Aldrin M, Borgan O, Frigessi A, Ling-jaerde O. Predicting survival from microarray data a comparative study. Bioinformatics. 2007;23:2080–2087. doi: 10.1093/bioinformatics/btm305. [DOI] [PubMed] [Google Scholar]

[R10] 10.El Karoui N. Spectrum estimation for large dimensional covariance matrices using random matrix theory. Ann Statist. 2008;36:2757–2790. [Google Scholar]

[R11] 11.Fellay J, Shianna K, Ge D, Colombo S, Ledergerber B, Weale M, Zhang K, Gumbs C, Castagna A, Cossarizza A, et al. A Whole-Genome Association Study of Major Determinants for Host Control of HIV-1. Science. 2007;317:944–947. doi: 10.1126/science.1143767. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Girshick M. Principal components. J Amer Statist Assoc. 1936;31:519–528. [Google Scholar]

[R13] 13.Girshick M. On the sampling theory of roots of determinantal equations. Ann Math Statist. 1939;10:203–224. [Google Scholar]

[R14] 14.Hall P, Marron JS, Neeman A. Geometric representation of high dimension, low sample size data. J R Stat Soc Ser B Stat Methodol. 2005;67:427–444. [Google Scholar]

[R15] 15.Horn R, Johnson C. Matrix analysis. Cambridge Univ Pr; 1990. [Google Scholar]

[R16] 16.Jackson J. A user’s guide to principal components. Wiley-Interscience; 2005. [Google Scholar]

[R17] 17.Johnstone I, Lu A. On Consistency and Sparsity for Principal Components Analysis in High Dimensions. J Amer Statist Assoc. 2009;104:682–693. doi: 10.1198/jasa.2009.0121. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Johnstone IM. On the distribution of the largest eigenvalue in principal components analysis. Ann Statist. 2001;29:295–327. [Google Scholar]

[R19] 19.Jolliffe I. Principal component analysis. Springer; New York: 2002. [Google Scholar]

[R20] 20.Jung S, Marron JS. PCA consistency in High dimension, low sample size context. Ann Statist. 2009;37:4104–4130. [Google Scholar]

[R21] 21.Ma S, Kosorok MR, Fine JP. Additive risk models for survival data with high-dimensional covariates. Biometrics. 2006;62:202–210. doi: 10.1111/j.1541-0420.2005.00405.x. [DOI] [PubMed] [Google Scholar]

[R22] 22.Marčenko V, Pastur L. Distribution of eigenvalues for some sets of random matrices. Sbornik: Mathematics. 1967;1:457–483. [Google Scholar]

[R23] 23.Nadler B. Finite sample approximation results for principal component analysis: a matrix perturbation approach. Ann Statist. 2008;36:2791–2817. [Google Scholar]

[R24] 24.Patterson N, Price A, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2:e190. doi: 10.1371/journal.pgen.0020190. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Paul D. Asymptotics of the leading sample eigenvalues for a spiked covariance model. Technical Report. 2005 http://anson.ucdavis.edu/debashis/techrep/eigenlimit.pdf.

[R26] 26.Paul D. Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist Sinica. 2007;17(4):1617–1642. [Google Scholar]

[R27] 27.Price A, Patterson N, Plenge R, Weinblatt M, Shadick N, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]

[R28] 28.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M, Bender D, Maller J, Sklar P, de Bakker P, Daly M, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics. 2007;81(3):559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Rao NR, Mingo JA, Speicher R, Edelman A. Statistical eigen-inference from large Wishart matrices. Ann Statist. 2008;36:2850–2885. [Google Scholar]

[R30] 30.Wall M, Rechtsteiner A, Rocha L. Singular value decomposition and principal component analysis. A practical approach to microarray data analysis. 2003:91–109. [Google Scholar]

PERMALINK

CONVERGENCE AND PREDICTION OF PRINCIPAL COMPONENT SCORES IN HIGH-DIMENSIONAL SETTINGS

Seunggeun Lee

Fei Zou

Fred A Wright

Abstract

1. Introduction

Fig 1.

2. Method

2.1. General Setting

Assumption 1

2.2. Sample Eigenvalues and Eigenvectors

Lemma 1

Lemma 2

2.3. Sample and Predicted PC Scores

Theorem 1

Theorem 2

2.4. Rescaling of sample eigenvalues

Theorem 3

3. Simulation

Fig 2.

Table 1.

Table 2.

Table 3.

4. Real data example

Fig 3.

Fig 4.

5. Discussion and conclusions

6. Proofs

6.1. Notations

6.2. Propositions

Proposition 1

Proposition 2

6.3. Proof of Part i) of Lemma 1)

6.3.1. When p is fixed

6.3.2. When p → ∞

6.4. Proof of Part ii) of Lemma 2

6.4.1. When λv>1+γ

6.4.2. When 1<λv≤1+γ

6.4.3. Proof of (14) and (15)

6.5. Proof of Proposition 1

6.6. Proof of Proposition 2

6.7. Proof of Theorem 1

6.8. Proof of Theorem 2

6.9. Proof of Theorem 3

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

6.4.1. When $λ_{v} > 1 + \sqrt{γ}$

6.4.2. When $1 < λ_{v} \leq 1 + \sqrt{γ}$