The Empirical Distribution of a Large Number of Correlated Normal Variables

David Azriel; Armin Schwartzman

doi:10.1080/01621459.2014.958156

. Author manuscript; available in PMC: 2016 Sep 1.

Published in final edited form as: J Am Stat Assoc. 2014 Sep 25;110(511):1217–1228. doi: 10.1080/01621459.2014.958156

The Empirical Distribution of a Large Number of Correlated Normal Variables

David Azriel ¹, Armin Schwartzman ²

PMCID: PMC4742377 NIHMSID: NIHMS628938 PMID: 26858467

Abstract

Motivated by the advent of high dimensional highly correlated data, this work studies the limit behavior of the empirical cumulative distribution function (ecdf) of standard normal random variables under arbitrary correlation. First, we provide a necessary and sufficient condition for convergence of the ecdf to the standard normal distribution. Next, under general correlation, we show that the ecdf limit is a random, possible infinite, mixture of normal distribution functions that depends on a number of latent variables and can serve as an asymptotic approximation to the ecdf in high dimensions. We provide conditions under which the dimension of the ecdf limit, defined as the smallest number of effective latent variables, is finite. Estimates of the latent variables are provided and their consistency proved. We demonstrate these methods in a real high-dimensional data example from brain imaging where it is shown that, while the study exhibits apparently strongly significant results, they can be entirely explained by correlation, as captured by the asymptotic approximation developed here.

Keywords: empirical null, dependent random variables, high dimensional data, factor analysis, asymptotic approximation, strong correlation

1 Introduction

The empirical cumulative distribution function (ecdf) and its large sample properties have a long and rich history in probability and statistics. However, most of this vast literature assumes that the variables used to construct the ecdf are independent (e.g., Wasserman, 2006, Chapter 2, and references therein). Under independence, the ecdf is a consistent estimator of the true cumulative distribution function (cdf). The consistency property continues to hold under various forms of weak dependence (e.g. Dedecker and Merlevéde (2007); Wu (2006)).

Motivated by modern problems in high dimensional data, where a large number of correlated variables are measured, it is of interest to study the asymptotic behavior of the ecdf when the variables involved are arbitrarily correlated. In particular, the present paper is motivated by large-scale multiple testing problems, where each of p tests produces a z-score Z_i, i = 1, …, p. It has been pointed out by Bradley Efron that in large-scale multiple testing problems, the observed distribution of the z-scores often does not match the theoretical null distribution N(0, 1) (Efron, 2004, 2007a, b, 2008). Efron (2007a) conjectured that, even when the theoretical model is correct, the observed distribution of the test statistics can look different from the theoretical null distribution simply because of correlation between them. This interesting observation suggests that the ecdf may not always be consistent and calls for a detailed study of the ecdf of a large number of dependent variables.

Assuming that the variables Z₁, Z₂, …, Z_p are marginally standard normal simplifies the problem and allows obtaining results for their ecdf under arbitrary dependence because all the dependence is expressed through correlation. In this situation, Efron (2007a) proposed the so-called empirical null as an approximation to the observed distribution of the z-scores, parametrized as a normal distribution with mean and variance other than 0 and 1. To further understand the effect of correlation, Efron (2010) derived the covariance function of the ecdf and applied it to estimating the variance of functions of the ecdf relevant in large-scale multiple testing, such as the local and tail false discovery rate. Schwartzman (2010) proposed to approximate the ecdf by a Gram-Charlier expansion and used its coefficients to establish some constraints to the extent of the departure of the ecdf from the marginal N(0, 1). However, these approaches have not succeeded to fully characterize the behavior of the ecdf under correlation.

In this article we describe the asymptotic behavior of the ecdf of a large number of correlated standard normal variables. First, we show that in general, the ecdf need not converge to Φ, the standard normal cdf. A necessary and sufficient condition for convergence, which we call weak correlation, is that the average of the absolute pairwise correlations between the z-scores (or the average of the squares of the pairwise correlations) tends to zero with increasing dimension p. However, we show that in a wide range of strong correlation situations, the ecdf converges instead to a random distribution function. This random function can be written as a (possibly infinite) normal mixture parametrized by latent independent standard normal variables. It can be thought of as an analytic asymptotic approximation to the ecdf, where the latent variables can be consistently estimated, under some regularity conditions, from the observed data sequence. Further, it can be thought of as a dimension reduction of the p-dimensional ecdf in the sense that its inherent dimension is the number of the latent variables. We give a lower bound for this inherent dimension and show that, under certain regularity conditions, it can be achieved by a particular parametrization obtained via an eigendecomposition of the correlation matrix. This parametrization is based on the factor analysis model of Fan et al. (2012), who use it to calculate the false discovery proportion in large scale multiple testing under arbitrary correlation. Here we consider a more general framework where not only eigendecompositions are investigated.

As an illustration of the behavior of the ecdf as a random function as described in this paper, Figure 1 presents the histograms of two realizations of normal random variables under two correlation structures:

Histograms of various instances of 1000 standard normal variables with a one-block correlation structure (a) and a two-independent-blocks correlation structure (b). The red line is the standard normal density and the blue dashed line is the asymptotic approximation we use.

One block: ${Z_{i}}_{i = 1}^{1000}$ is a sequence of exchangeable random variables with correlation ρ = 0.9, i.e., for any i ≠ j, cor(Z_i, Z_j) = ρ
Two independent blocks: ${Z_{i}}_{i = 1}^{1000}$ consists of two intercalated independent sequences of exchangeable random variables, i.e., for any i ≠ j such that |i − j| is even, cor(Z_i, Z_j) = ρ (and 0 otherwise).

It can be seen that for both structures the empirical distribution differs from the standard normal density (red line). In case (a) the histogram is shifted and narrower than the standard normal density, while in case (b) it looks like a mixture of two normal distributions. Notice that the histograms change between the two realizations, suggesting that the distribution may not be converging to a deterministic limit. We will show that in these cases the empirical distribution indeed does not converge to a deterministic limit, but can be asymptotically approximated by a normal mixture estimated from the data, represented here by the dashed blue line. Not surprisingly, in case (a) the asymptotic approximation is of dimension 1, containing one latent standard normal variable, and in case (b) is of dimension 2, containing two independent latent standard normal variables. The fitted density in case (a) can be interpreted as Efron’s empirical null model of a shifted and scaled normal, but clearly that model cannot capture the behavior of the distribution in case (b).

When considering the entire range of possible correlation structures, we show that there are essentially three regimes. The first is what we call weak correlation; while the random variables need not to be independent, the ecdf converges to Φ. The second regime is what we call finite dimensional correlation and is the main interest of this paper. Under this type of correlation, the ecdf can be approximated by a random function that depends on a finite number of latent independent standard normal random variables. Furthermore, under some regularity conditions, a representation using the smallest number of latent variables can be achieved and estimated consistently. The examples of Figure 1 belong to this regime. Finally, in the third regime, the limiting random function depends on infinite number of independent standard normal random variables, like the ecdf itself.

As an illustration of the usefulness of the results presented in this article, we present an analysis of brain imaging data obtained from a study of cortical thickness of adults who had a diagnosis of attention deficit/hyperactivity disorder (ADHD) as children (Proal et al., 2011). In this study, it had been noticed before that, when searching for brain locations whose cortical thickness is related to clinical diagnosis, the histogram of the z-scores did not follow the theoretical standard normal distribution (Reiss et al., 2012). Here we do a slightly different analysis (where the correlation structure is known) and show that, while the study exhibits apparently strongly significant results, they can be entirely explained by correlation, as captured by the asymptotic approximation mentioned above.

The rest of the paper is organized as follows. After a brief treatment of weak correlation in Section 2, the main results of the paper for general correlation are given in Section 3. Section 4 discusses how to consistently estimate the latent variables. In Section 4 we also briefly discuss the case where the correlation matrix is unknown. Several concrete examples, including those of Figure 1, are presented in detail in Section 5. A data example is analyzed in Section 6. Section 7 considers some possible extensions of this work. All the proofs are given in a supplementary material document.

2 Weak correlation

2.1 Definition

To define weak correlation some notation is needed. For a given p × p matrix R_p = (r_ij) define the following average norms: ${‖ R_{p} ‖}_{1}^{(p)} : = \frac{1}{p^{2}} \sum_{i, j} ∣ r_{i j} ∣$ and ${‖ R_{p} ‖}_{2}^{(p)} : = \frac{1}{p} {(\sum_{i, j} r_{i j}^{2})}^{1 / 2} = \frac{1}{p} {(\sum_{i} λ_{i}^{2})}^{1 / 2}$ , where the λ_i’s are the eigenvalues of R_p. The latter is simply a scaled version of the Frobenius norm, and in both cases we use the superscript (p) to denote that the norm itself, not just the matrix, changes with p. If ${R_{p}}_{p = 1}^{\infty}$ is a sequence of correlation matrices then, as p → ∞,

{‖ R_{p} ‖}_{1}^{(p)} \to 0 \Leftrightarrow {‖ R_{p} ‖}_{2}^{(p)} \to 0.

This is because, by Jensen’s inequality, ${‖ R_{p} ‖}_{1}^{(p)} \leq {‖ R_{p} ‖}_{2}^{(p)}$ , while on the other hand, |r_ij| ≤ 1 and (r_ij)² ≤ |r_ij|, and so ${‖ R_{p} ‖}_{1}^{(p)} \geq {{‖ R_{p} ‖}_{2}^{(p)}}^{2}$ . A similar argument holds if R_p is a covariance matrix and the diagonal entries of R_p, i.e., the variances, are bounded.

Definition 1

Let ${ξ_{i}}_{i = 1}^{\infty}$ be a sequence of standard normal variables with joint normal distribution and denote the correlation matrix of (ξ₁, …, ξ_p) by R_p. If ${‖ R_{p} ‖}_{1}^{(p)} \to 0$ , or equivalently, ${‖ R_{p} ‖}_{2}^{(p)} \to 0$ , then ${ξ_{i}}_{i = 1}^{\infty}$ is called weakly correlated. Otherwise it is called strongly correlated.

2.2 Convergence of the ecdf

Let ${Z_{i}}_{i = 1}^{\infty}$ be a sequence of standard normal variables with joint normal distribution. The ecdf is

{\hat{F}}_{p} (z) : = \frac{1}{p} \sum_{i = 1}^{p} I (Z_{i} \leq z),

(1)

where I(·) denotes the indicator function. Correlation does not affect the expectation E[F̂_p(z)] = Φ(z) of the ecdf, but it affects its covariance. The covariance function is given in Proposition 1 of Schwartzman (2010).

The following theorem establishes that a necessary and sufficient condition for consistency of the ecdf in Inline graphic is that ${Z_{i}}_{i = 1}^{\infty}$ is weakly correlated.

Theorem 1

Let Z₁, Z₂, …, Z_p, … be N(0, 1) variables with ecdf (1) and let R_p denote the correlation matrix of (Z₁, …, Z_p).

(Sufficiency) If ${Z_{i}}_{i = 1}^{\infty}$ is weakly correlated, then F̂_p(z) converges to Φ(z) in uniformly:
$sup_{z} E {{\hat{F}}_{p} (z) - Φ (z)}^{2} \leq \frac{1}{4 p} + C {‖ R_{p} ‖}_{1}^{(p)} \to 0,$

where C is a universal constant.
(Necessity) If ${Z_{i}}_{i = 1}^{\infty}$ is not weakly correlated, i.e., ${‖ R_{p} ‖}_{2}^{(p)} ↛ 0$ , then F̂_p(z) does not converge to Φ(z) in for any z ≠ 0, i.e.:
$E {{\hat{F}}_{p} (z) - Φ (z)}^{2} \overset{p \to \infty}{↛} 0, \forall z \neq 0$

Notice that according to Theorem 1 the convergence is either uniform or none at all; if the correlation is strong, i.e., not weak, then for any z ≠ 0 there is no convergence to Φ(z). Under a general correlation structure the ecdf may converge to a random function. In Section 3 below we aim at identifying and estimating this function.

2.3 Examples

It is easy to check that every Gaussian autoregressive moving average (ARMA) process is weakly correlated. So is every m-dependent Gaussian sequence with banded correlation matrix so that r_ij = 0 for i − j > m and fixed finite m. This includes correlation in fixed finite blocks. In all these cases the ecdf converges to the standard normal distribution.

More generally, all Gaussian stationary ergodic processes are weakly correlated. This is because ergodicity requires that the autocorrelation function ρ(ℓ) = r_i,i₊_ℓ satisfies |ρ(ℓ)| → 0 as ℓ → ∞, and therefore

{‖ R_{p} ‖}_{1}^{(p)} = \frac{1}{p^{2}} \sum_{i, j} ∣ r_{i j} ∣ = \frac{1}{p^{2}} [p + 2 \sum_{ℓ = 1}^{p - 1} (p - ℓ) ∣ ρ (ℓ) ∣] \leq \frac{1}{p} + \frac{2}{p} \sum_{ℓ = 1}^{p - 1} ∣ ρ (ℓ) ∣ \to 0.

It is not hard to check that even long-range correlation, defined by $\sum_{ℓ = 1}^{\infty} ∣ ρ (ℓ) ∣ = \infty$ , also implies weak correlation except in the extreme case where $\sum_{ℓ = 1}^{p - 1} ∣ ρ (ℓ) ∣$ is of order p (the largest possible).

3 General correlation

3.1 An asymptotic approximation

To describe the asymptotic behavior of the ecdf for a general correlation structure, the main idea is to decompose the correlation into a strong correlation component and a weak correlation component. Then, the asymptotic behavior of the ecdf as a random function will be captured by the strong correlation component, while the weak component will converge as in Theorem 1.

Specifically, suppose that for every p we have the decomposition

R_{p} = A_{p} + B_{p},

(2)

where A_p and B_p are symmetric positive semi-definite matrices. Write A_p as

A_{p} = L_{p} {(L_{p})}^{T},

(3)

where L_p is of dimension p × k(p); let $ℓ_{i}^{(p)}$ be the i’th row of L_p. If the matrix A_p is the zero matrix then we define k(p) := 1, and L_p := (0, …, 0)^T.

For a matrix B, we define the matrix Cor(B) by

{Cor (B)}_{i j} = {\begin{cases} \frac{B_{i j}}{\sqrt{B_{i i} B_{j j}}} & B_{i i} B_{j j} \neq 0 \\ 0 & B_{i i} B_{j j} = 0 \end{cases} .

Notice that if B is a positive semidefinite matrix, then |{Cor(B)}_ij| ≤ 1. Obviously, if B is a correlation matrix, then Cor(B) = B. The following theorem states the main result.

Theorem 2

Under the previous setting and notation,

For every p there exists a (non-unique) random vector $W_{k (p)}^{(p)} ~ N (0, I_{k (p)})$ such that
$sup_{z} E {{\hat{F}}_{p} (z) - {\bar{F}}_{p} (z)}^{2} \leq \frac{1}{4 p} + C {‖ Cor (B_{p}) ‖}_{1}^{(p)},$ (4)

where
${\bar{F}}_{p} (z) : = E [{\hat{F}}_{p} (z) ∣ W_{k (p)}^{(p)}] = \frac{1}{p} \sum_{i = 1}^{p} Φ (\frac{z - μ_{i}^{(p)}}{σ_{i}^{(p)}})$ (5)

with $μ_{i}^{(p)} : = ℓ_{i}^{(p)} W_{k (p)}^{(p)}, σ_{i}^{(p)} : = {{(B_{p})}_{i i}}^{1 / 2}$ , and C is a universal constant. If $σ_{i}^{(p)} = 0$ we define $Φ (\frac{z - μ_{i}^{(p)}}{σ_{i}^{(p)}}) : = I (μ_{i}^{(p)} \leq z)$ .
Therefore, if ${‖ Cor (B_{p}) ‖}_{1}^{(p)} \overset{p \to \infty}{\to} 0$ then ${sup}_{z} E {{\hat{F}}_{p} (z) - {\bar{F}}_{p} (z)}^{2} \overset{p \to \infty}{\to} 0$ .

The key idea of the proof is the following: due to decomposition (2) and equality (3), there exists a random vector W_k ~ N (0, I_k) such that

Z_{i} = ℓ_{i}^{(p)} W_{k (p)}^{(p)} + ξ_{i}, i = 1, \dots, p,

(6)

where (ξ₁, …, ξ_p) ~ N(0, B_p). Therefore, the conditional random vector

({\tilde{Z}}_{1}, \dots, {\tilde{Z}}_{p}) : = (Z_{1}, \dots, Z_{p}) ∣ W_{k (p)}^{(p)} = w_{k}

is normal with conditional mean μ = L_pw_k and conditional covariance matrix B_p. The Inline graphic distance between F̂_p(z) and ${\bar{F}}_{p} (z) : = E [{\hat{F}}_{p} (z) ∣ W_{k (p)}^{(p)}]$ can be essentially bounded uniformly by ${‖ Cor (B_{p}) ‖}_{1}^{(p)}$ .

It is important to note that the random vector $W_{k (p)}^{(p)}$ is not unique. Because $W_{k (p)}^{(p)}$ has a spherically symmetric distribution, Equation (4) will hold if $W_{k (p)}^{(p)}$ is replaced by $Q W_{k (p)}^{(p)}$ where Q is any p × p orthonormal matrix.

Definition 2

The sequence of random functions ${G_{p} (z)}_{p = 1}^{\infty}$ is said to converge uniformly in Inline graphic to the (random) function G(z) if ${sup}_{z} E {G_{p} (z) - G (z)}^{2} \overset{p \to \infty}{\to} 0$ .

Corollary 1

Suppose F̂_p(z) satisfies the conditions of Theorem 2(ii). If there exists a (random) function F̄(z) such that F̄_p(z) converges to F̄(z) uniformly in Inline graphic , then F̂_p(z) also converges to F̄_p(z) uniformly in .

Theorem 2 holds for all decompositions of the form (2). We call every corresponding F̄_p(z) an asymptotic approximation of F̂_p(z). However, some decompositions may need a smaller number of latent variables k(p) than others. In the next section we characterize the best decomposition in the sense of giving the asymptotic approximation with the smallest number of latent variables. If F̄_p(z) converges to some F̄(z), then we call the latter the asymptotic representation of F̄_p(z).

3.2 Dimension reduction

Theorem 2 approximates F̂_p(z) by F̄_p(z), which is the projection of F̂_p(z) onto a space generated by $W_{k (p)}^{(p)}$ . Thus, as explained below, F̂_p(z), which has dimension p, is approximated by F̄_p(z) with dimension rank (A_p). Hence, Theorem 2 can be regarded as a dimension reduction of the empirical distribution function as stated in the following proposition.

Proposition 1

Let Inline graphic be the collection of all one-dimensional distribution functions, where convergence is defined in the sense of weak convergence of the corresponding random variables.

F̂_p(z) has dimension p: define the mapping (Z₁, ···, Z_p) = F̂_p(z), from ℝ^p to . Then (ℝ^p) ⊆ is homeomorphic to ℝ^p.
F̄_p(z) has dimension rank(A_p): define the mapping $B (W_{k (p)}^{(p)}) = {\bar{F}}_{p} (z)$ , from ℝ^k⁽^p⁾ to . Then (ℝ^k⁽^p⁾) ⊆ is homeomorphic to ℝ^RANK(A_p).

In order to achieve dimension reduction, we are interested in knowing whether there exist decompositions of the form (2) such that the approximation of Theorem 2 holds and the dimension of the approximation rank(A_p) is finite. Consider the collection Inline graphic of decompositions of R_p that satisfy the conditions of Theorem 2 Part (ii): D ∈ if $D = {(A_{p}, B_{p})}_{p = 1}^{\infty}$ such that A_p, B_p satisfy (2) and ${‖ Cor (B_{p}) ‖}_{1}^{(p)} \overset{p \to \infty}{\to} 0$ . Clearly, is a large collection and some notion of optimality is required in order to decide which D ∈ Inline graphic to choose. Given Proposition 1 (ii), we are interested in decompositions where rank(A_p) is the smallest. Theorem 3 below states a lower bound on the limiting rank of A_p and presents a decomposition that achieves it under certain conditions.

To state the result, we need to define the eigendecompositions of R_p as a special case of the decompositions in Inline graphic . Suppose that the eigenvalues of R_p are $λ_{1}^{(p)} \geq \dots \geq λ_{p}^{(p)}$ and the corresponding eigenvectors are $γ_{1}^{(p)}, \dots, γ_{p}^{(p)}$ (notice that everything depends on p). The eigendecomposition of R_p is $\sum_{i = 1}^{p} λ_{i}^{(p)} γ_{i}^{(p)} {γ_{i}^{(p)}}^{T}$ . For k < p, define

A_{k, p} = \sum_{i = 1}^{k} λ_{i}^{(p)} γ_{i}^{(p)} {γ_{i}^{(p)}}^{T}, B_{k, p} = \sum_{i = k + 1}^{p} λ_{i}^{(p)} γ_{i}^{(p)} {γ_{i}^{(p)}}^{T} .

(7)

Then, the covariance matrix R_p can be expressed as R_p = A_k,p + B_k,p and A_k,p = L_k,p(L_k,p)^T where

L_{k, p} : = (\sqrt{λ_{1}^{(p)}} γ_{1}^{(p)}, \dots, \sqrt{λ_{k}^{(p)}} γ_{k}^{(p)}) .

(8)

A critical quantity is the number of “big” eigenvalues, i.e., with size of order p. Define

\underline{K} = \sum_{i = 1}^{\infty} {\underline{K}}_{i} with {\underline{K}}_{i} = {\begin{cases} 1 & if & lim {inf}_{p} \frac{λ_{i}^{(p)}}{p} > 0 \\ 0 & if & lim {inf}_{p} \frac{λ_{i}^{(p)}}{p} = 0 \end{cases}, i = 1, 2, \dots

and

\bar{K} = \sum_{i = 1}^{\infty} {\bar{K}}_{i} with {\bar{K}}_{i} = {\begin{cases} 1 & if & lim {sup}_{p} \frac{λ_{i}^{(p)}}{p} > 0 \\ 0 & if & lim {sup}_{p} \frac{λ_{i}^{(p)}}{p} = 0 \end{cases}, i = 1, 2, \dots

By definition we have that K ≤ K̄. Notice that K could be ∞ and that if K < ∞ then K_i = 1 for i ≤ K and K_i = 0 otherwise; the same for K̄. Since K and K̄ are sums of indicators, then they are either an integer or infinity. The following theorem states that K and K̄ give lower bounds for the limiting rank of A_p and presents a decomposition of the correlation matrix that achieves it under certain conditions.

Theorem 3

Under the previous setting and notation:

If ${(A_{p}, B_{p})}_{p = 1}^{\infty} \in D$ , then lim inf_p rank(A_p) ≥ K and lim sup_p rank(A_p) ≥ K̄ (both K and K̄ could be ∞).
For K̄ < ∞ define the decomposition $D_{\bar{K}} = {(A_{\bar{K}, p}, B_{\bar{K}, p})}_{p = 1}^{\infty}$ according to (7) with k = K̄. If the nonzero diagonal terms of B_K̄_,_p are bounded from below, i.e.
$\underset{p}{lim inf} min_{1 \leq i \leq p; {B_{\bar{K}, p}}_{i i} \neq 0} {B_{\bar{K}, p}}_{i i} = ε_{B} > 0$ (9)

then D_K̄ ∈ and obviously rank(A_K̄_,_p) = K̄ for all p > K̄.

The regularity condition (9) guarantees that the norm of the residual covariance in B_K̄,p goes to zero because the correlation in it goes to zero, not because the variance in it goes to zero.

To better appreciate the result given by Theorem 3, it is useful to define the following concepts.

Definition 3

If there exists ${(A_{p}, B_{p})}_{p = 1}^{\infty} \in D$ such that lim sup_p rank(A_p) < ∞ we say that ${Z_{i}}_{i = 1}^{\infty}$ has finite dimensional correlation.
If further K := K = K̄ < ∞ then we say that ${Z_{i}}_{i = 1}^{\infty}$ has asymptotic dimension K.

With these definitions in mind, Theorem 3 implies that if ${Z_{i}}_{i = 1}^{\infty}$ has finite dimensional correlation with asymptotic dimension K and the regularity condition (9) holds, then the decomposition ${(A_{K, p}, B_{K, p})}_{p = 1}^{\infty}$ is “optimal” in the sense that lim_p rank(A_K,p) = K, so that it achieves the lowest dimension among all asymptotic approximations F̄_p(z) of F̂_p(z). We shall see in Section 5 that the regularity condition (9) holds for most typical correlation structures.

It is now possible to see that the number K̂ determines three different convergence regimes. If K̂ = ∞ then ${Z_{i}}_{i = 1}^{\infty}$ has no finite dimensional correlation. If K̄ < ∞ and (9) holds then ${Z_{i}}_{i = 1}^{\infty}$ has finite dimensional correlation. When K̄ = 0 then (9) holds trivially since {B₀_,p}_ii = {R_p}_ii = 1 and therefore a necessary and sufficient conditions for weak correlation, as in Theorem 1, is K̄ = 0. We summarize the results in the following corollary and in Figure 2.

Different cases of asymptotic approximation to the ecdf. The corresponding examples from Section 5 below appear in parentheses.

Corollary 2

If K̄ = ∞ then ${Z_{i}}_{i = 1}^{\infty}$ has no finite dimensional correlation.
If K̄ < ∞ and (9) holds then ${Z_{i}}_{i = 1}^{\infty}$ has finite dimensional correlation.
If K := K = K̄ < ∞ and (9) holds then ${Z_{i}}_{i = 1}^{\infty}$ has asymptotic dimension K and the decomposition ${(A_{K, p}, B_{K, p})}_{p = 1}^{\infty}$ achieves it in the sense that lim_p rank(A_K,p) = K.
K̄ = 0 if and only if ${Z_{i}}_{i = 1}^{\infty}$ is weakly correlated.

The regularity condition (9) does not hold when the diagonal of B_K̄,p contains elements that are arbitrarily small. In this case, captured by the label “unknown” in Figure 2, ${‖ B_{\bar{K}, p} ‖}_{1}^{(p)}$ may be small but not necessarily ${‖ Cor (B_{\bar{K}, p}) ‖}_{1}^{(p)}$ . Since our bound (4) is based on ${‖ Cor (B_{\bar{K}, p}) ‖}_{1}^{(p)}$ , rather than on ${‖ B_{\bar{K}, p} ‖}_{1}^{(p)}$ , we cannot establish the asymptotic dimension in this case. However, we could not find an example of R_p, where (9) is not satisfied. A hand-waving argument that (9) typically holds is:

{B_{\bar{K}, p}}_{i i} = \sum_{j = \bar{K} + 1}^{p} λ_{j}^{(p)} {γ_{j}^{(p)}}_{i}^{2} \geq λ_{\bar{K} + 1}^{(p)} \sum_{j = \bar{K} + 1}^{p} {γ_{j}^{(p)}}_{i}^{2} \approx λ_{\bar{K} + 1}^{(p)} > 0,

where ≈ is typically true because $γ_{j}^{(p)}$ is a normalized vector and therefore ${γ_{j}^{(p)}}_{i}^{2} \approx 1 / p$ .

4 Estimating the asymptotic representation from the data

4.1 Estimating the latent variables

We now discuss how to estimate the underlying latent variables $W_{K}^{(p)}$ when the sequence ${Z_{i}}_{i = 1}^{\infty}$ has asymptotic dimension K := K = K̄. For ease of presentation we write everything in a matrix/vector form; thus, we write Z_p := (Z₁, …, Z_p)^T, ξ_p = (ξ₁, …, ξ_p)^T. In this notation, (6) becomes the linear regression equation $Z_{p} = L_{K, p} W_{K}^{(p)} + ξ_{p}$ and the least square estimate of $W_{K}^{(p)}$ is ${\hat{W}}_{K}^{(p)} : = {(L_{K, p}^{T} L_{K, p})}^{- 1} L_{K, p}^{T} Z_{p}$ .

When the eigendecomposition (7) is used, then the columns of L_k(p) are orthogonal and ${(L_{K, p}^{T} L_{K, p})}^{- 1}$ is a diagonal matrix whose i-th diagonal element is $1 / λ_{i}^{(p)}$ . Thus, in this case,

{\hat{W}}_{K}^{(p)} = (\begin{matrix} \frac{1}{\sqrt{λ_{1}^{(p)}}} {γ_{1}^{(p)}}^{T} Z_{p} \\ ⋮ \\ \frac{1}{\sqrt{λ_{K}^{(p)}}} {γ_{k}^{(p)}}^{T} Z_{p} \end{matrix}) .

(10)

Fan et al. (2012) and Fan and Han (2014) consider a related framework where some of the variables Z_i may have a non-zero mean, and therefore use an estimate that minimizes the Inline graphic distance under sparsity assumptions.

Since ${Z_{i}}_{i = 1}^{p}$ has strong correlation, then $Cov ({\hat{W}}_{K}^{(p)})$ does not converge to 0, as p goes to infinity. However, the following proposition states that ${\hat{W}}_{K}^{(p)}$ is still consistent in an Inline graphic sense.

Proposition 2

Suppose that ${Z_{i}}_{i = 1}^{\infty}$ has asymptotic dimension K. Then

E [{{\hat{W}}_{K}^{(p)} - W_{K}^{(p)}}^{T} {{\hat{W}}_{K}^{(p)} - W_{K}^{(p)}}] \leq K \frac{p}{λ_{K}^{(p)}} {‖ B_{K, p} ‖}_{2}^{(p)} \overset{p \to \infty}{\to} 0.

4.2 Estimating the asymptotic representation

In real data problems $W_{K}^{(p)}$ and, thus, F̄_p(·) are unknown. Proposition 2 suggests that if we plug-in ${\hat{W}}_{K}^{(p)}$ to F̄_p(·) instead of $W_{K}^{(p)}$ , then the Inline graphic distance between F̂_p(·) and the plugged-in F̄_p(·) converges to zero. Indeed, this can be proved under some additional regularity conditions.

Theorem 4

Define the plug-in ecdf estimate
${\hat{\bar{F}}}_{p} (z) : = \frac{1}{p} \sum_{i = 1}^{p} Φ (\frac{z - {\hat{μ}}_{i}^{(p)}}{σ_{i}^{(p)}})$ (11)

with ${\hat{μ}}_{i}^{(p)} : = ℓ_{i}^{(p)} {\hat{W}}_{K}^{(p)}$ under the convention $Φ (\frac{z - {\hat{μ}}_{i}^{(p)}}{0}) : = I (μ_{i}^{(p)} \leq z)$ . We have that
$\begin{array}{l} sup_{z} E {{\hat{F}}_{p} (z) - {\hat{\bar{F}}}_{p} (z)}^{2} \leq 3 sup_{z} E {{\hat{F}}_{p} (z) - {\bar{F}}_{p} (z)}^{2} + \frac{3 (p - ∣ J_{p} ∣)}{p} + \\ + \frac{3}{2 π} max_{i \in J_{p}} \frac{1}{{σ_{i}^{(p)}}^{2}} E [{{\hat{W}}_{K}^{(p)} - W_{K}^{(p)}}^{T} {{\hat{W}}_{K}^{(p)} - W_{K}^{(p)}}], \end{array}$

where $J_{p} : = {1 \leq i \leq p : σ_{i}^{(p)} > 0}$ is the set of indexes for which $σ_{i}^{(p)} > 0$ and |J_p| is the cardinality of J_p.
Therefore, if ${Z_{i}}_{i = 1}^{\infty}$ has asymptotic dimension K, (9) holds and also
$\frac{∣ J_{p} ∣}{p} \overset{p \to \infty}{\to} 1$ (12)

then ${sup}_{z} E {{\hat{F}}_{p} (z) - {\hat{\bar{F}}}_{p} (z)}^{2} \overset{p \to \infty}{\to} 0$ .

4.3 Approximated eigendecomposition

In this section we study the case in which the largest K eigenvalues and eigenvectors of the correlation matrix are not known exactly and an approximation is used. We show that if the distance between the approximated eigenvalues and eigenvectors and the true ones goes to zero as p goes to infinity, then the result of the previous section still holds.

Let ${\tilde{λ}}_{1}^{(p)} \geq \dots \geq {\tilde{λ}}_{K}^{(p)}$ be an approximation to the K biggest eigenvalues, and let ${\tilde{γ}}_{1}^{(p)}, \dots, {\tilde{γ}}_{K}^{(p)}$ be norm 1 vectors that are an approximation to the corresponding eigenvectors. Define the p × K matrix ${\tilde{L}}_{p} : = (\sqrt{{\tilde{λ}}_{1}^{(p)}} {\tilde{γ}}_{1}^{(p)}, \dots, \sqrt{{\tilde{λ}}_{k}^{(p)}} {\tilde{γ}}_{k}^{(p)})$ , and let ${\tilde{ℓ}}_{i}^{(p)}$ be the i-th row of the matrix. Define,

{\tilde{W}}_{K}^{(p)} : = (\begin{matrix} \frac{1}{\sqrt{{\tilde{λ}}_{1}^{(p)}}} {{\tilde{γ}}_{1}^{(p)}}^{T} Z_{p} \\ ⋮ \\ \frac{1}{\sqrt{{\tilde{λ}}_{K}^{(p)}}} {{\tilde{γ}}_{K}^{(p)}}^{T} Z_{p} \end{matrix});

and let ${\tilde{μ}}_{i}^{(p)} : = {\tilde{ℓ}}_{i}^{(p)} {\tilde{W}}_{K}^{(p)}$ and ${\tilde{σ}}_{i}^{(p)} : = {1 - \sum_{j = 1}^{K} λ_{j}^{(p)} {γ_{j}^{(p)}}_{i}^{2}}^{1 / 2}$ , where {x}_i is the i-th element of the vector x. Finally, define ${\tilde{F}}_{p} (z) : = \frac{1}{p} \sum_{i = 1}^{p} Φ (\frac{z - {\tilde{μ}}_{i}^{(p)}}{{\tilde{σ}}_{i}^{(p)}})$ . The following proposition bounds the Inline graphic distance between F̃_p(z) and ${\hat{\bar{F}}}_{p} (z)$ .

Proposition 3

Suppose that $σ_{i}^{(p)}, {\tilde{σ}}_{i}^{(p)} \geq ε_{B}$ , i = 1, …, p, for some ε_B > 0.

The following inequality holds:

\begin{array}{l} E {{\tilde{F}}_{p} (z) - {\hat{\bar{F}}}_{p} (z)}^{2} \\ \leq \frac{C (ε_{B})}{λ_{K}^{(p)} / p} \sum_{i = 1}^{K} [{\frac{1}{\sqrt{{\tilde{λ}}_{i}^{(p)} / p}} - \frac{1}{\sqrt{λ_{i}^{(p)} / p}}}^{2} + {\sqrt{\frac{{\tilde{λ}}_{i}^{(p)}}{p}} - \sqrt{\frac{λ_{i}^{(p)}}{p}}}^{2} + 2 {{\tilde{γ}}_{i}^{(p)} - γ_{i}^{(p)}}^{T} {{\tilde{γ}}_{i}^{(p)} - γ_{i}^{(p)}}] \\ + \frac{C (ε_{B})}{\sqrt{p}} {[\sum_{i = 1}^{K} {\sqrt{\frac{{\tilde{λ}}_{i}^{(p)}}{p}} = \sqrt{\frac{λ_{i}^{(p)}}{p}}}^{2} + \sum_{i = 1}^{K} {{\tilde{γ}}_{i}^{(p)} - γ_{i}^{(p)}}^{T} {{\tilde{γ}}_{i}^{(p)} - γ_{i}^{(p)}}]}^{1 / 2}, \end{array}

where C(ε_B) is a constant that depends on ε_B.

Therefore, if $\frac{{\tilde{λ}}_{i}^{(p)} - λ_{i}^{(p)}}{p} \overset{p \to \infty}{\to} 0, {{\tilde{γ}}_{i}^{(p)} - γ_{i}^{(p)}}^{T} {{\tilde{γ}}_{i}^{(p)} - {\tilde{γ}}_{i}^{(p)}} \overset{p \to \infty}{\to} 0$ , i = 1, …, K, and the conditions of Theorem 4 (ii) hold, then ${sup}_{z} E {{\tilde{F}}_{p} (z) - {\hat{F}}_{p} (z)}^{2} \overset{p \to \infty}{\to} 0$ .

Proposition 3 and its proof imply that if there exist consistent estimates for the first K eigenvalues and eigenvectors, then the result of Theorem 4 (ii) still holds; see also Fan and Han (2014). A systematic study of the case that the correlation matrix is unknown is left to future research.

5 Examples

We now consider some examples of correlation structures. In Examples 1–3, K = K̄ is finite, conditions (9,12) are met, and therefore the decomposition ${(A_{\underline{K}, p}, B_{\underline{K}, p})}_{p = 1}^{\infty}$ is optimal and ${\hat{\bar{F}}}_{p}$ is consistent, but in Example 3, F̄_p(z) does not have a limit. In Example 4, K = 1 but K̄ > 1 and in Example 5, K = ∞.

5.1 Exchangeable correlation: asymptotic dimension 1

Suppose that ${Z_{i}}_{i = 1}^{\infty}$ is a sequence of exchangeable random variables with correlation ρ ≥ 0, i.e., for any i ≠ j, cor(Z_i, Z_j) = ρ and

R_{p} = (\begin{matrix} 1 & ρ & \dots & ρ \\ ρ & 1 & \dots & ρ \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ρ & \dots & ρ & 1 \end{matrix}) .

(13)

The eigenvalues of R_p are $λ_{1}^{(p)} = ρ p + 1 - ρ$ (with corresponding eigenvector equal to ${(1, \dots, 1)}^{T} / \sqrt{p}$ ) and $λ_{2}^{(p)} = \dots = λ_{p}^{(p)} = 1 - ρ$ , hence, K = K̄ = 1. According to (7), we have that A₁_,p = [ρ + (1 − ρ)/p] Inline graphic and B₁_,p = (1 − ρ) [I_p − (1/p) ], where I_p is the p × p identity matrix and is a p × p matrix with 1 in every entry. It is easy to check that for every i, lim_p{B₁_,p}_ii = 1 − ρ > 0, and (9) holds. Also, |J_p| = p and (12) holds. Thus, in this case ${Z_{i}}_{i = 1}^{\infty}$ has finite dimensional correlation with asymptotic dimension K = 1 and the decomposition ${(A_{1, p}, B_{1, p})}_{p = 1}^{\infty}$ is optimal.

To write the asymptotic representation F̄(z) in this case, it is easier to work with the asymptotically equivalent decomposition A_p = ρ Inline graphic , B_p = (1 − ρ)I_p, which is in D and rank(A_p) = K = 1. For this decomposition $μ_{i}^{(p)} = \sqrt{ρ} W_{1}$ and $σ_{i}^{(p)} = \sqrt{1 - ρ}$ do not depend on i or p and

{\bar{F}}_{p} (z) = \frac{1}{p} \sum_{i = 1}^{p} Φ (\frac{z - μ_{i}^{(p)}}{σ_{i}^{(p)}}) = \frac{1}{p} \sum_{i = 1}^{p} Φ (\frac{z - \sqrt{ρ} W_{1}}{\sqrt{1 - ρ}}) = Φ (\frac{z - \sqrt{ρ} W_{1}}{\sqrt{1 - ρ}}) = \bar{F} (z) .

(14)

That is, F̂_p(z), which has dimension p, is approximated by F̄_p(z) with dimension 1. Moreover, F̄_p(z) = F̄(z) above does not depend on p and is therefore the asymptotic representation: F̂_p(z) converges to F̄(z) in the sense of Definition 2.

Given the sequence ${Z_{i}}_{i = 1}^{p}$ , the regression estimate (10) of W₁ is ${\hat{W}}_{1}^{(p)} = {\bar{Z}}_{p} / \sqrt{ρ}$ , where ${\bar{Z}}_{p} : = \frac{1}{p} \sum_{i = 1}^{p} Z_{i}$ . An illustration of the estimated asymptotic representation ${\hat{\bar{F}}}_{p} (z)$ in (11) is shown in Figure 1(a).

Notice that the distribution specified by (14) is a shifted and scaled normal distribution, corresponding to Efron’s empirical null model (Efron, 2004, 2007a,b, 2008). Fitting an empirical null in this case would amount to estimating W₁ and ρ. This example shows that Efron’s empirical null model is justified under exchangeable correlation. However, as it can be seen from the above theory and the following examples, it does not capture the effect of many other correlation structures.

When ρ ≤ 0 in (13), positive semi-definiteness of the correlation matrix requires ρ ≥ −1/(p − 1). Therefore, in the high-dimensional setting, too much negative correlation is not possible. Consequently, the histograms of the Z’s are typically narrower than N(0, 1) (see Figure 1).

5.2 Two exchangeable correlation blocks: asymptotic dimension 2

Suppose that R_p consists of two blocks of the form (13), the first one of size n₁(p) × n₁(p) and the second of size n₂(p) × n₂(p) with n₁(p) + n₂(p) = p and lim_p_→∞ n₁(p)/p = π. We assume that there is a constant correlation between the two blocks ρ_B (which can be zero or negative). Thus,

R_{p} = (\begin{matrix} \underset{n_{1} (p)}{\underset{︸}{\begin{matrix} 1 & ρ_{1} & ρ_{1} \\ ρ_{1} & ⋱ & ρ_{1} \\ ρ_{1} & ρ_{1} & 1 \end{matrix}}} & ρ_{B} \\ ρ_{B} & \underset{n_{2} (p)}{\underset{︸}{\begin{matrix} 1 & ρ_{2} & ρ_{2} \\ ρ_{2} & ⋱ & ρ_{2} \\ ρ_{2} & ρ_{2} & 1 \end{matrix}}} \end{matrix}) .

Let I₁(p) and I₂(p) denote the set of indexes that belong to Blocks 1 and 2. If $0 \leq ρ_{B} < \sqrt{ρ_{1} ρ_{2}}$ , then a sequence ${Z_{i}}_{i = 1}^{\infty}$ with the above covariance can be generated by the model

Z_{i} = {\begin{matrix} \sqrt{ρ_{1}} (\sqrt{{\tilde{ρ}}_{B}} W_{1} + \sqrt{1 - {\tilde{ρ}}_{B}} W_{2}) + \sqrt{1 - ρ_{1}} ε_{i} & i \in I_{1} (p) \\ \sqrt{ρ_{2}} (\sqrt{{\tilde{ρ}}_{B}} W_{1} + \sqrt{1 - {\tilde{ρ}}_{B}} W_{3}) + \sqrt{1 - ρ_{2}} ε_{i} & i \in I_{2} (p) \end{matrix},

where ${\tilde{ρ}}_{B} = ρ_{B} / \sqrt{ρ_{1} ρ_{2}}$ , and W₁, W₂, W₃, ${ε_{i}}_{i = 1}^{p}$ are i.i.d N (0, 1). Comparing with (6), we can write

\begin{array}{l} {\bar{F}}_{p} (z) = \frac{n_{1} (p)}{p} Φ (\frac{z - \sqrt{ρ_{1}} (\sqrt{{\tilde{ρ}}_{B}} W_{1} + \sqrt{1 - {\tilde{ρ}}_{B}} W_{2})}{\sqrt{1 - ρ_{1}}}) + \frac{n_{2} (p)}{p} Φ (\frac{z - \sqrt{ρ_{2}} (\sqrt{{\tilde{ρ}}_{B}} W_{1} + \sqrt{1 - {\tilde{ρ}}_{B}} W_{3})}{\sqrt{1 - ρ_{2}}}) \\ \to \bar{F} (z) = π Φ (\frac{z - \sqrt{ρ_{1}} (\sqrt{{\tilde{ρ}}_{B}} W_{1} + \sqrt{1 - {\tilde{ρ}}_{B}} W_{2})}{\sqrt{1 - ρ_{1}}}) + (1 - π) Φ (\frac{z - \sqrt{ρ_{2}} (\sqrt{{\tilde{ρ}}_{B}} W_{1} + \sqrt{1 - {\tilde{ρ}}_{B}} W_{3})}{\sqrt{1 - ρ_{2}}}) . \end{array}

If ρ_B = 0, then the asymptotic approximation F̄(z) depends only on W₂, W₃ and has dimension K = K̄ = 2. In this case R_p has two uncorrelated blocks of the type presented in Section 5.1; the generalization of the formulas there is straightforward (see also Figure 1(b)).

For ρ_B ≠ 0, however, the above F̄(z) depends on W₁, W₂, W₃ and, thus, has dimension three; it is suboptimal. To obtain the optimal asymptotic representation in more generality, we work with the eigendecomposition. The matrix R_p has two eigenvalues

λ_{1, 2}^{(ρ)} : = \frac{η_{1}^{(p)} + η_{2}^{(p)}}{2} \pm {[{(\frac{η_{1}^{(p)} - η_{2}^{(p)}}{2})}^{2} + n_{1} (p) n_{2} (p) ρ_{B}^{2}]}^{1 / 2}

where $η_{j}^{(p)} = 1 + ρ_{j} {n_{j} (p) - 1}$ , j = 1, 2, and the other p − 2 eigenvalues are either 1 − ρ₁ or 1 − ρ₂. For large p, positive semi-definiteness requires that $ρ_{B}^{2} \leq ρ_{1} ρ_{2}$ . On the boundary, when $ρ_{B}^{2} = ρ_{1} ρ_{2}, λ_{2}^{(p)}$ converges to a positive constant and K = K̄ = 1.

We now concentrate on the case $ρ_{B}^{2} < ρ_{1} ρ_{2}$ . Then, $λ_{1, 2}^{(p)}$ are of order p and K = K̄ = 2. The corresponding eigenvectors are

{(\underset{n_{1} (p) times}{\underset{︸}{x_{j}^{(p)}, x_{j}^{(p)}, \dots, x_{j}^{(p)}}}, \underset{n_{2} (p) times}{\underset{︸}{y_{j}^{(p)}, y_{j}^{(p)}, \dots, y_{j}^{(p)}}})}^{T}, j = 1, 2,

where $x_{j}^{(p)}$ and $y_{j}^{(p)}$ are given by

\begin{array}{l} x_{1}^{(p)} = \frac{- \sqrt{n_{2} (p)} ρ_{B}}{\sqrt{{(η_{1}^{(p)} - λ_{1}^{(p)})}^{2} + n_{1} (p) n_{2} (p) ρ_{B}^{2}}}, y_{1}^{(p)} = \frac{(η_{1}^{(p)} - λ_{1}^{(p)}) / \sqrt{n_{2} (p)}}{\sqrt{{(η_{1}^{(p)} - λ_{1}^{(p)})}^{2} + n_{1} (p) n_{2} (p) ρ_{B}^{2}}}, \\ x_{2}^{(p)} = \frac{(η_{2}^{(p)} - λ_{2}^{(p)}) / \sqrt{n_{1} (p)}}{\sqrt{{(η_{2}^{(p)} - λ_{2}^{(p)})}^{2} + n_{1} (p) n_{2} (p) ρ_{B}^{2}}}, y_{2}^{(p)} = \frac{- \sqrt{n_{1} (p)} ρ_{B}}{\sqrt{{(η_{2}^{(p)} - λ_{2}^{(p)})}^{2} + n_{1} (p) n_{2} (p) ρ_{B}^{2}}} . \end{array}

We have that

μ_{i}^{(p)} = {\begin{matrix} \sqrt{λ_{1}^{(p)}} x_{1}^{(p)} W_{1} + \sqrt{λ_{2}^{(p)}} x_{2}^{(p)} W_{2}, & i \in I_{1} (p) \\ \sqrt{λ_{1}^{(p)}} y_{1}^{(p)} W_{1} + \sqrt{λ_{2}^{(p)}} y_{2}^{(p)} W_{2}, & i \in I_{2} (p), \end{matrix}

and

σ_{i}^{(p)} = {\begin{matrix} \sqrt{1 - [λ_{1}^{(p)} {x_{1}^{(p)}}^{2} + λ_{2}^{(p)} {x_{2}^{(p)}}^{2}]}, & i \in I_{1} (p) \\ \sqrt{1 - [λ_{1}^{(p)} {y_{1}^{(p)}}^{2} + λ_{2}^{(p)} {y_{2}^{(p)}}^{2}]}, & i \in I_{2} (p) . \end{matrix}

Moreover, ${B_{2, p}}_{i i} \overset{p \to \infty}{\to} 1 - ρ_{1}$ for i ∈ I₁(p) and ${B_{2, p}}_{i i} \overset{p \to \infty}{\to} 1 - ρ_{2}$ for i ∈ I₂(p). Thus, (9) holds with K̄ = K = 2. Also, |J_p| = p and (12) holds. The corresponding asymptotic representation can be obtained by writing F̄_p(z) in terms of the $μ_{i}^{(p)}$ and $σ_{i}^{(p)}$ above and taking the limit as p → ∞.

The regression estimate of $W_{j}^{(p)}$ is

{\hat{W}}_{j}^{(p)} = {x_{j} (p) \sum_{i \in I_{1} (p)} Z_{i} + y_{j} (p) \sum_{i \in I_{2} (p)} Z_{i}} / \sqrt{λ_{j}^{(p)}} j = 1, 2.

5.3 Finite dimensional correlation with no asymptotic representation

In this example, correlation is finite with K = 2 and the Inline graphic distance between F̂_p and F̄_p goes to zero, but F̄_p itself has no limit. Let 0 < ρ < 1 and $0 < {α_{i}}_{i = 1}^{\infty} < 1$ be any sequence, W₁, W₂, ${ε_{i}}_{i = 1}^{\infty}$ independent standard random variables. Define

Z_{i} = \sqrt{ρ α_{i}} W_{1} + \sqrt{ρ (1 - α_{i})} W_{2} + \sqrt{1 - ρ} ε_{i};

then, Z_i ~ N(0, 1) and $Cov (Z_{i}, Z_{j}) = ρ {\sqrt{α_{i} α_{j}} + \sqrt{(1 - α_{i}) (1 - α_{j})}}$ for i ≠ j. We can write R_p = A_p + B_p, where A_p = L_p(L_p)^T,

L_{p} = (\begin{matrix} \sqrt{ρ α_{1}} & \sqrt{ρ (1 - α_{1})} \\ \sqrt{ρ α_{2}} & \sqrt{ρ (1 - α_{2})} \\ ⋮ & ⋮ \\ \sqrt{ρ α_{p}} & \sqrt{ρ (1 - α_{p})} \end{matrix}), and B_{p} = (\begin{matrix} 1 - ρ & 0 & \dots & 0 \\ 0 & 1 - ρ & \dots & 0 \\ ⋮ & ⋱ & ⋱ & ⋮ \\ 0 & \dots & 0 & 1 - ρ \end{matrix}) .

Then, ${\bar{F}}_{p} (z) = Φ (\frac{z - μ_{i}}{\sqrt{1 - ρ}})$ , where $μ_{i} : = \sqrt{ρ α_{i}} W_{1} + \sqrt{ρ (1 - α_{i})} W_{2}$ and it is clear that for certain choices of $0 < {α_{i}}_{i = 1}^{\infty} < 1$ , F̄_p(z) has no limit for any z.

5.4 Independent exchangeable correlation blocks with no asymptotic dimension

In this example there are two independent blocks similar to the example in Section 5.2 (with ρ_B = 0) and with the same notation, but here $\frac{n_{1} (p)}{p}, \frac{n_{2} (p)}{p}$ change infinitely often. Consequently, there are infinitely many p’s with two “big” eigenvalues and infinitely many p’s with only one “big” eigenvalue. Therefore, in this example, K = 1 while K̄ = 2. This means that correlation is finite dimensional and (9) holds, but there is no asymptotic representation and no well defined asymptotic dimension.

Consider the subsequence m₁ = 2, $m_{k + 1} = m_{k}^{2}$ for k ≥ 2; i.e., m_k = 2^(2_k−1). Suppose that Z_i belongs to the first block if i ∈ {m_k₋₁ + 1, …, m_k} for even k, otherwise it belongs to the second block. That is, at p = m_k for even k,

n_{1} (p) = n_{1} (m_{k - 1}) + m_{k} - m_{k - 1} = n_{1} (\sqrt{p}) + p - \sqrt{p}, and n_{2} (p) = n_{2} (m_{k - 1})

and for odd k

n_{1} (p) = n_{1} (m_{k - 1}), and n_{2} (p) = n_{2} (m_{k - 1}) + m_{k} - m_{k - 1} = n_{2} (\sqrt{p}) + p - \sqrt{p} .

When p = m_k for even k, $m_{k} - \sqrt{m_{k}} \leq n_{1} (m_{k}) \leq m_{k}$ , and therefore the eigenvalue associated with this block satisfies ${lim}_{k even} \frac{ρ n_{1} (m_{k}) + 1 - ρ}{m_{k}} = ρ$ but for odd k, $n_{1} (m_{k}) \leq m_{k - 1} = \sqrt{m_{k}}$ and ${lim}_{k odd} \frac{ρ n_{1} (m_{k}) + 1 - ρ}{m_{k}} = 0$ ; and vice versa for the other block. The largest eigenvalue satisfies

\frac{λ_{1}^{(p)}}{p} = max {\frac{ρ n_{1} (p) + 1 - ρ}{p}, \frac{ρ n_{2} (p) + 1 - ρ}{p}} \geq \frac{ρ p / 2 + 1 - ρ}{p}

and therefore $lim {inf}_{p} \frac{λ_{1}^{(p)}}{p} \geq ρ / 2 > 0$ and K₁ = 1. For the second eigenvalue

\frac{λ_{2}^{(p)}}{p} = min {\frac{ρ n_{1} (p) + 1 - ρ}{p}, \frac{ρ n_{2} (p) + 1 - ρ}{p}};

since lim inf_p of both terms is 0, then $lim {inf}_{p} \frac{λ_{2}^{(p)}}{p} = 0$ and K₂ = 0. Thus, K = 1.

We next show that K̄ = 2. When p = m_k one block is larger than the other block and when p = m_k₊₁ the other block is larger. Therefore, for each k there exists p_k between m_k and m_k₊₁ such that the two blocks are equal, i.e., n₁(p_k) = n₂(p_k) = p_k/2. Thus,

\frac{λ_{2}^{(p_{k})}}{p_{k}} = min {\frac{ρ n_{1} (p_{k}) + 1 - ρ}{p_{k}}, \frac{ρ n_{2} (p_{k}) + 1 - ρ}{p_{k}}} = ρ / 2 + \frac{1 - ρ}{p_{k}};

Therefore, $lim {sup}_{p} \frac{λ_{2}^{(p)}}{p} \geq {lim}_{k} \frac{λ_{2}^{(p_{k})}}{p_{k}} = ρ / 2 > 0$ . Since the other eigenvalues are constant (1 − ρ) we have that K̄ = 2.

Moreover, in this case ${‖ B_{1, p} ‖}_{2}^{(p)}$ does not converge to 0 and therefore ${‖ Cor (B_{1, p}) ‖}_{2}^{(p)} ↛ 0$ ; hence, the decomposition {(A₁_,p, B₁_,p)} is not in Inline graphic . This is because

{‖ B_{1, p_{k}} ‖}_{2}^{(p_{k})} = \frac{{[{λ_{2}^{(p_{k})}}^{2} + {λ_{3}^{(p_{k})}}^{2} + \dots + {λ_{p_{k}}^{(p_{k})}}^{2}]}^{1 / 2}}{p_{k}} \geq \frac{λ_{2}^{(p_{k})}}{p_{k}} = ρ / 2 + \frac{1 - ρ}{p_{k}},

so that $lim {inf}_{k} {‖ Cor B_{1, p_{k}} ‖}_{2}^{(p_{k})} \geq lim {inf}_{k} {‖ B_{1, p_{k}} ‖}_{2}^{(p_{k})} \geq ρ / 2$ .

This example can be easily generalized such that K̄ = 3, 4,.. or K̄ = ∞. Given M < ∞, to obtain K̄ = M, for even k one can divide the observations Z_{m_k}, …, Z_{m_k+1} into M − 1 independent blocks with correlation ρ within blocks. To obtain K̄ = ∞, for even k one can divide the observations Z_{m_k}, …, Z_{m_k+1} into k blocks.

5.5 No finite dimensional approximation

In this example, K = K̄ = ∞, and therefore, according to Theorem 3, for every decomposition in Inline graphic , lim_p rank(A_p) = ∞. Suppose that at p = 2ⁿ, R_p consists of n independent blocks of sizes of 2ⁿ⁻¹, 2ⁿ⁻², …, 2 and an additional block of size 2, each block of the form (13), all with the same correlation parameter ρ. At p = 2ⁿ, there are n − 1 “big” eigenvalues, ρ(p/2ⁱ − 1) + 1 for i = 1, …, n − 1, one eigenvalue 1 + ρ (from the last block of size 2) and the rest are equal to 1 − ρ. For any fixed i ∈ Inline graphic , for large enough the eigenvalue $λ_{i}^{(p)}$ is equal to ρ(p/2ⁱ − 1) + 1 and therefore for large enough ,

\frac{λ_{i}^{(p)}}{p} = \frac{ρ (p / 2^{i} - 1) + 1}{p} \overset{p \to \infty}{\to} ρ / 2^{i} > 0.

Therefore, for any fixed i ∈ Inline graphic , K_i = 1 and $\underline{K} = \sum_{i = 1}^{\infty} {\underline{K}}_{i} = \infty$ .

6 Data example

As a practical application, we use the methods developed above to analyze a high-dimensional data set obtained from brain imaging. The data belongs to a study of cortical thickness of adults who had a diagnosis of attention deficit/hyperactivity disorder (ADHD) as children (Proal et al., 2011). The data set consists of cortical thickness measurements for about 80000 cortical voxels, obtained from magnetic resonance imaging (MRI) scans, as well as demographic and behavioral measurements, for each of n = 139 individuals. In this study, it had been noticed by Reiss et al. (2012) that z-scores corresponding to the voxelwise relationship between cortical thickness and ADHD diagnosis did not follow the theoretical standard normal distribution. Instead, the distribution of z-scores exhibited a substantial shift away from zero, indicating a possible widespread cortical thinning over the brain for individuals with ADHD. It is unclear, however, whether those results could have been caused by correlation between voxels rather than by a real relationship with clinical diagnosis.

In order to apply the methods developed in this paper, here we perform a slightly different analysis where the correlation structure can be taken as known. Specifically, we follow the approach of Owen (2005) of performing a regression of the observed trait for the subjects on the high-dimensional predictors, one dimension at a time. Owen (2005) and also Fan et al. (2012) used this approach in the context of genomic data. In our case, the trait is a global assessment of behavior, while the predictors are the cortical thickness measurements. For ease of computation, in the following analysis we use a random sample of p = 1000 voxels, which is enough to show the effect we want to highlight.

6.1 Regression analysis

Let Y_j denote the global assessment of the j-th subject and let $X_{j} = {X_{j}^{(i)}}_{i = 1}^{p}$ be the cortical thickness in the p voxels of the j-th subject. For each voxel i ∈ {1, …, p}, consider the simple regression model

Y_{j} = α^{(i)} + β^{(i)} X_{j}^{(i)} + ε_{j} j = 1, \dots, n,

where the ε’s are i.i.d with mean 0 and variance σ². Let ${\tilde{Y}}_{j} = Y_{j} - \frac{1}{n} \sum_{j = 1}^{n} Y_{j}$ and ${\tilde{X}}_{j}^{(i)} = X_{j}^{(i)} - \frac{1}{n} \sum_{j = 1}^{n} X_{j}^{(i)}$ be the centered variables and define $s_{i ℓ} = \sum_{j = 1}^{n} {\tilde{X}}_{j}^{(i)} {\tilde{X}}_{j}^{(ℓ)}$ . The least squares estimate of β⁽ⁱ⁾ is ${\hat{β}}^{(i)} = \sum_{j = 1}^{n} {\tilde{X}}_{j}^{(i)} {\tilde{Y}}_{j} / s_{i i}$ . According to the model, we can write ${\hat{β}}^{(i)} = β^{(i)} + \sum_{j = 1}^{n} {\tilde{X}}_{j}^{(i)} ε_{j} / s_{i i}$ and so Cov(β̂⁽ⁱ⁾, β̂^(ℓ)) = σ²s_i_ℓ/[s_iis_ℓℓ].

Consider the p hypotheses $H_{0}^{(i)} : β^{(i)} = 0$ versus $H_{1}^{(i)} : β^{(i)} \neq 0$ , i = 1, …, p. The z-score for the i-th test is $Z_{i} = {\hat{β}}^{(i)} / se [{\hat{β}}^{(i)}] = {\hat{β}}^{(i)} \sqrt{s_{i i}} / σ$ . Thus, under the global null hypothesis, i.e. that all $H_{0}^{(i)}$ ’s are true, we have that Z₁, …, Z_p are each approximately N (0, 1) with correlation matrix $R : = {r_{i ℓ}}_{i, ℓ, = 1}^{p}$ given by the pairwise correlations $r_{i ℓ} : = Cor (Z_{i}, Z_{ℓ}) = s_{i ℓ} / \sqrt{s_{i i} s_{ℓ ℓ}}$ . Notice that the pairwise correlations between the z-scores are precisely the pairwise correlations between the cortical thickness measurements at each voxel. Because the regression is conditional on the voxelwise measurements, we take the pairwise correlations as fixed. To compute the z-scores, we use the variance estimate ${\hat{σ}}^{2} = \frac{1}{p (n - 2)} \sum_{i = 1}^{p} \sum_{j = 1}^{n} {[{\tilde{Y}}_{j} - {\hat{β}}^{(i)} {\tilde{X}}_{j}^{(i)}]}^{2}$ and ignore its negligible contribution to their variability.

6.2 The distribution of the z-scores

Consider the decomposition R_p = A_k,p + B_k,p, where A_k,p, B_k,p are of the form (7). The asymptotic dimension, if it exists, is unknown. In the next subsection we discuss the choice of k, for now we work with k = 2. After k is set, $W_{k}^{(p)}$ is estimated via (10). Define

{\hat{μ}}_{i} : = ℓ_{i} {\hat{W}}_{k}^{(p)}, and σ_{i} : = {{(B_{k, p})}_{i i}}^{1 / 2} i = 1, \dots, p,

where ℓ_i is the i-th row of L_k,p defined in (8). The empirical distribution is approximated by $\hat{\bar{F}} (z)$ given by (11), and the density is approximated by $\bar{f} (z) : = \frac{1}{p} \sum_{i = 1}^{p} \frac{1}{σ_{i}} φ (\frac{z - {\hat{μ}}_{i}}{σ_{i}})$ , where φ is the standard normal density. Figure 3(a) plots the histogram of the Z’s, f̄ and φ. The approximation f̄ beautifully captures the shape of the empirical distribution, even though it is based on only two latent variables.

Histogram of Z₁, Z₂, …, Z₁₀₀₀ of the real data (a) and the simulation results (b). The red line illustrates φ, the standard normal density, and the blue dashed line is the approximation we use, f̄.

Recall that the approximation is done under the complete null hypothesis, i.e. that there is no effect. The validity of the complete null hypothesis is confirmed by a false discovery rate (FDR) analysis which shows no significant voxels after applying the Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995) at an FDR level of 0.2. Nevertheless, the z-scores exhibit a strong shift toward positive values, indicating a possible positive correlation between cortical thickness and behavior. Our approximation indicates that the reason that many Z’s are large may be because of the correlation between them and not because of a true effect. To illustrate this point we simulated Y₁, …, Y_n ~ N (0, 1) i.i.d and independent of the X’s and repeated the same procedure as before. One such simulated instance is given in Figure 3(b). Without correlation, we would expect the histogram of the Y’s to follow the theoretical null density (red). However, the correlation is causing the impression of a strong positive effect not unlike the one seen in panel (a).

6.3 The number of latent variables

We now discuss the choice of k, the number of latent variables. Recall that the asymptotic dimension K, if it exists, is the optimal choice as discussed in Section 3.2 and is equal the number of eigenvalues of order p. A scree plot of the 50 largest eigenvalues of R_p is shown in Figure 4(a). Catell’s graphical test indicates an elbow at k = 3, and after 10 ~ 20 eigenvalues, the value is almost constant.

(a) The 50 largest eigenvalues of *R_P* ordered according to their size. (b) Histogram of Z₁, Z₂, …, Z₁₀₀₀ and the approximation, f̄, for k = 2 (blue dashed), 10 (red dotted), 100 (solid green).

Theorem 2 states that under the decomposition R_p = A_k,p + B_k,p, the Inline graphic distance between F̂_p and F̄_p is bounded by $\frac{1}{4 p} + C {‖ cor (B_{k, p}) ‖}_{2}^{(p)}$ . As k increases the bound decreases and ${‖ cor (B_{k, p}) ‖}_{2}^{(p)} = 0$ when k = p. However, as k increases the dimension of the representation also increases since the dimension is rank(A_k,p) (Proposition 1). Furthermore, for large k the empirical distribution and the approximation, F̄, are very close and the overfitting problem arises. To illustrate this point, we plot different choices of k in Figure 4(b). It can be seen that when k = 100 the approximation is much closer to the histogram, but such high dimension is unnecessary to capture the global behavior of the histogram. The differences between k = 2 and k = 10 are rather small suggesting that k = 2 suffices.

7 Summary and extensions

In this work we have studied the limit of the ecdf of marginally standard normal variables when strong correlation is present. As predicted by Efron (2007a), we have shown that the limit is indeed not standard normal. Specifically, we have shown that under a regime that we call finite dimensional correlation (and some regularity conditions), the limit is a finite mixture of scaled normals with random means, which reduces to Efron’s empirical null model when the correlation structure is exchangeable. Moreover, we have shown that if the correlation is not finite then the limit can still be approximated by mixture of normals but an infinite number of them may be required.

The main technique for achieving these results has been a decomposition of the correlation matrix R into two matrices A and B, where A captures the strong correlation and B captures the weakly correlated residual noise. The form of the limiting distribution of the ecdf is determined by A, while the residual noise represented by B goes to zero asymptotically. The key to achieving the asymptotic representation of the ecdf with the smallest dimension is to choose B so that it contains the largest amount of variance while remaining weakly correlated.

For future work, we consider the following extensions. First, we assumed that all random variables have variance 1. If the variances are not 1, but are bounded above and below, then the random variables could be standardized and all our results follow. If they are not bounded, then still for each finite p one can standardize and obtain inequality (4) of Theorem 2.

In Theorem 2 we proved that the Inline graphic distance between F̂_p and F̄_p converges to zero if the residual correlation not captured by F̄_p is weak. Could the same be said of the distance between the moments of F̂_p and F̄_p? Let ${\hat{m}}_{n, p} : = \frac{1}{p} \sum_{i = 1}^{p} Z_{i}^{n}$ and m̄_n,p := ∫ zⁿdF̄_p(z) be the n-th moment of F̂_p and F̄_p. For the first moment and an appropriate decomposition R_p = A_p + B_p,

{\bar{m}}_{1, p} = \frac{1}{p} \sum_{i = 1}^{p} \int z \frac{1}{σ_{i}^{(p)}} φ (\frac{z - μ_{i}^{(p)}}{σ_{i}^{(p)}}) d z = \frac{1}{p} \sum_{i = 1}^{p} μ_{i}^{(p)} .

By the definition of $μ_{i}^{(p)}$ and equation (6), we have that $Z_{i} - μ_{i}^{(p)} = ξ_{i}$ . Therefore,

E {({\hat{m}}_{1, p} - {\bar{m}}_{1, p})}^{2} = \frac{1}{p^{2}} E {\sum_{i = 1}^{p} (Z_{i} - μ_{i}^{(p)})}^{2} = \frac{1}{p^{2}} E {(\sum_{i = 1}^{p} ξ_{i})}^{2} = {‖ B_{p} ‖}_{2}^{(p)},

and the distance converges to zero if and only if ${‖ B_{p} ‖}_{2}^{(p)} \to 0$ . This condition is weaker than that of Theorem 2 and thus convergence of the ecdf implies convergence of the first moment. Proving convergence of higher moments is more difficult and is left for future work.

About the choice of number of components k in practice, the problem is not unlike that of determining the number of components in factor analysis or the effective dimension in principal components analysis. The connection with these techniques is worth exploring.

Finally, about the normality assumption, it is used in the proof of Theorem 2 by applying Mehler’s expansion to the joint density. Similar expansions are available for other distributions such as chi square and gamma (e.g., Koudou (1998); Schwartzman (2011)). Thus it may be possible to extend our results to those distributions as well.

Supplementary Material

NIHMS628938-supplement-Supplementary_Material.pdf^{(151.3KB, pdf)}

Acknowledgments

The authors are grateful to Philip Reiss from the Department of Child and Adolescent Psychiatry, New York University School of Medicine, for providing the brain imaging data. This work was partially supported by NIH grant R01-CA157528.

Contributor Information

David Azriel, Email: davidazr@ie.technion.ac.il, Lecturer at the Faculty of Industrial Engineering and Management, Technion - Israel Institute of Technology, Haifa 32000, Israel, and Postdoctoral Research Associate in the Department of Statistics of the Wharton School of the University of Pennsylvania, 3730 Walnut Street, Philadelphia, PA 19104.

Armin Schwartzman, Email: armin.schwartzman@ncsu.edu, Associate Professor at the Department of Statistics, North Carolina State University, Raleigh, NC 27695.

References

Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B. 1995;57:289–300. [Google Scholar]
Dedecker J, Merlevéde F. The empirical distribution function for dependent variables: asymptotic and nonasymptotic results in LP. ESAIM: Probability and Statistics. 2007;11:102–114. [Google Scholar]
Efron B. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. Journal of the American Statistical Association. 2004;99:96–104. [Google Scholar]
Efron B. Correlation and large-scale simultaneous hypothesis testing. Journal of the American Statistical Association. 2007a;102:93–103. [Google Scholar]
Efron B. Size, power and false discovery rates. The Annals of Statistics. 2007b;35:1351–1377. [Google Scholar]
Efron Bradley. Simultaneous inference: When should hypothesis testing problems be combined? The Annals of Applied Statistics. 2008;2:197–223. [Google Scholar]
Efron Bradley. Correlated z-values and the accuracy of large-scale statistical estimates. Journal of the American Statistical Association. 2010;105:1042–1055. doi: 10.1198/jasa.2010.tm09129. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fan J, Han X. Estimation of False Discovery Proportion with Unknown Dependence. 2014 doi: 10.1111/rssb.12204. submitted http://arxiv.org/abs/1305.7007. [DOI] [PMC free article] [PubMed]
Fan J, Han X, Gu W. Estimating False Discovery Proportion Under Arbitrary Covariance Dependence. Journal of the American Statistical Association. 2012;107:1019–1035. doi: 10.1080/01621459.2012.720478. [DOI] [PMC free article] [PubMed] [Google Scholar]
Koudou AE. Lancaster bivariate probability distributions with Poisson, negative binomial and gamma margins. Test. 1998;7:95–110. [Google Scholar]
Owen AB. Variance of the number of false discoveries. Journal of the Royal Statistical Society: Series B. 2005;67:411–426. [Google Scholar]
Proal E, Reiss PT, Klein RG, Mannuzza S, Gotimer K, Ramos-Olazagasti MA, Lerch JP, He Y, Zijdenbos A, Kelly C, Milham MP, Castellanos FX. Brain gray matter deficits at 33-year follow-up in adults with attention-deficit/hyperactivity disorder established in childhood. Archives of general psychiatry. 2011;68:1122–1134. doi: 10.1001/archgenpsychiatry.2011.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
Reiss PT, Schwartzman A, Lu F, Huang L, Proal E. Paradoxical results of adaptive false discovery rate procedures in neuroimaging studies. Neuroimage. 2012;63:1833–1840. doi: 10.1016/j.neuroimage.2012.07.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schwartzman A. Comment on “Correlated z-values and the accuracy of large-scale statistical estimates” by Bradley Efron. Journal of the American Statistical Association. 2010;105(491):1059–1063. doi: 10.1198/jasa.2010.tm10237. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schwartzman A, Lin X. The effect of correlation in false discovery rate estimation. Biometrika. 2011;98:199–214. doi: 10.1093/biomet/asq075. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wasserman L. All of Nonparametric Statistics: A Concise Course in Nonparametric Statistical Inference. New York: Springer; 2006. [Google Scholar]
Wu WB. Oscillations of Empirical Distribution Functions under Dependence. IMS lecture notes-monograph series, High dimensional Probabilities. 2006;51:53–61. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

NIHMS628938-supplement-Supplementary_Material.pdf^{(151.3KB, pdf)}

[R1] Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B. 1995;57:289–300. [Google Scholar]

[R2] Dedecker J, Merlevéde F. The empirical distribution function for dependent variables: asymptotic and nonasymptotic results in LP. ESAIM: Probability and Statistics. 2007;11:102–114. [Google Scholar]

[R3] Efron B. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. Journal of the American Statistical Association. 2004;99:96–104. [Google Scholar]

[R4] Efron B. Correlation and large-scale simultaneous hypothesis testing. Journal of the American Statistical Association. 2007a;102:93–103. [Google Scholar]

[R5] Efron B. Size, power and false discovery rates. The Annals of Statistics. 2007b;35:1351–1377. [Google Scholar]

[R6] Efron Bradley. Simultaneous inference: When should hypothesis testing problems be combined? The Annals of Applied Statistics. 2008;2:197–223. [Google Scholar]

[R7] Efron Bradley. Correlated z-values and the accuracy of large-scale statistical estimates. Journal of the American Statistical Association. 2010;105:1042–1055. doi: 10.1198/jasa.2010.tm09129. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Fan J, Han X. Estimation of False Discovery Proportion with Unknown Dependence. 2014 doi: 10.1111/rssb.12204. submitted http://arxiv.org/abs/1305.7007. [DOI] [PMC free article] [PubMed]

[R9] Fan J, Han X, Gu W. Estimating False Discovery Proportion Under Arbitrary Covariance Dependence. Journal of the American Statistical Association. 2012;107:1019–1035. doi: 10.1080/01621459.2012.720478. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Koudou AE. Lancaster bivariate probability distributions with Poisson, negative binomial and gamma margins. Test. 1998;7:95–110. [Google Scholar]

[R11] Owen AB. Variance of the number of false discoveries. Journal of the Royal Statistical Society: Series B. 2005;67:411–426. [Google Scholar]

[R12] Proal E, Reiss PT, Klein RG, Mannuzza S, Gotimer K, Ramos-Olazagasti MA, Lerch JP, He Y, Zijdenbos A, Kelly C, Milham MP, Castellanos FX. Brain gray matter deficits at 33-year follow-up in adults with attention-deficit/hyperactivity disorder established in childhood. Archives of general psychiatry. 2011;68:1122–1134. doi: 10.1001/archgenpsychiatry.2011.117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Reiss PT, Schwartzman A, Lu F, Huang L, Proal E. Paradoxical results of adaptive false discovery rate procedures in neuroimaging studies. Neuroimage. 2012;63:1833–1840. doi: 10.1016/j.neuroimage.2012.07.040. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Schwartzman A. Comment on “Correlated z-values and the accuracy of large-scale statistical estimates” by Bradley Efron. Journal of the American Statistical Association. 2010;105(491):1059–1063. doi: 10.1198/jasa.2010.tm10237. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Schwartzman A, Lin X. The effect of correlation in false discovery rate estimation. Biometrika. 2011;98:199–214. doi: 10.1093/biomet/asq075. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Wasserman L. All of Nonparametric Statistics: A Concise Course in Nonparametric Statistical Inference. New York: Springer; 2006. [Google Scholar]

[R17] Wu WB. Oscillations of Empirical Distribution Functions under Dependence. IMS lecture notes-monograph series, High dimensional Probabilities. 2006;51:53–61. [Google Scholar]

PERMALINK

The Empirical Distribution of a Large Number of Correlated Normal Variables

David Azriel

Armin Schwartzman

Abstract

1 Introduction

Figure 1.

2 Weak correlation

2.1 Definition

Definition 1

2.2 Convergence of the ecdf

Theorem 1

2.3 Examples

3 General correlation

3.1 An asymptotic approximation

Theorem 2

Definition 2

Corollary 1

3.2 Dimension reduction

Proposition 1

Theorem 3

Definition 3

Figure 2.

Corollary 2

4 Estimating the asymptotic representation from the data

4.1 Estimating the latent variables

Proposition 2

4.2 Estimating the asymptotic representation

Theorem 4

4.3 Approximated eigendecomposition

Proposition 3

5 Examples

5.1 Exchangeable correlation: asymptotic dimension 1

5.2 Two exchangeable correlation blocks: asymptotic dimension 2

5.3 Finite dimensional correlation with no asymptotic representation

5.4 Independent exchangeable correlation blocks with no asymptotic dimension

5.5 No finite dimensional approximation

6 Data example

6.1 Regression analysis

6.2 The distribution of the z-scores

Figure 3.

6.3 The number of latent variables

Figure 4.

7 Summary and extensions

Supplementary Material

Acknowledgments

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases