Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Sep 1.
Published in final edited form as: J Am Stat Assoc. 2014 Sep 25;110(511):1217–1228. doi: 10.1080/01621459.2014.958156

The Empirical Distribution of a Large Number of Correlated Normal Variables

David Azriel 1, Armin Schwartzman 2
PMCID: PMC4742377  NIHMSID: NIHMS628938  PMID: 26858467

Abstract

Motivated by the advent of high dimensional highly correlated data, this work studies the limit behavior of the empirical cumulative distribution function (ecdf) of standard normal random variables under arbitrary correlation. First, we provide a necessary and sufficient condition for convergence of the ecdf to the standard normal distribution. Next, under general correlation, we show that the ecdf limit is a random, possible infinite, mixture of normal distribution functions that depends on a number of latent variables and can serve as an asymptotic approximation to the ecdf in high dimensions. We provide conditions under which the dimension of the ecdf limit, defined as the smallest number of effective latent variables, is finite. Estimates of the latent variables are provided and their consistency proved. We demonstrate these methods in a real high-dimensional data example from brain imaging where it is shown that, while the study exhibits apparently strongly significant results, they can be entirely explained by correlation, as captured by the asymptotic approximation developed here.

Keywords: empirical null, dependent random variables, high dimensional data, factor analysis, asymptotic approximation, strong correlation

1 Introduction

The empirical cumulative distribution function (ecdf) and its large sample properties have a long and rich history in probability and statistics. However, most of this vast literature assumes that the variables used to construct the ecdf are independent (e.g., Wasserman, 2006, Chapter 2, and references therein). Under independence, the ecdf is a consistent estimator of the true cumulative distribution function (cdf). The consistency property continues to hold under various forms of weak dependence (e.g. Dedecker and Merlevéde (2007); Wu (2006)).

Motivated by modern problems in high dimensional data, where a large number of correlated variables are measured, it is of interest to study the asymptotic behavior of the ecdf when the variables involved are arbitrarily correlated. In particular, the present paper is motivated by large-scale multiple testing problems, where each of p tests produces a z-score Zi, i = 1, …, p. It has been pointed out by Bradley Efron that in large-scale multiple testing problems, the observed distribution of the z-scores often does not match the theoretical null distribution N(0, 1) (Efron, 2004, 2007a, b, 2008). Efron (2007a) conjectured that, even when the theoretical model is correct, the observed distribution of the test statistics can look different from the theoretical null distribution simply because of correlation between them. This interesting observation suggests that the ecdf may not always be consistent and calls for a detailed study of the ecdf of a large number of dependent variables.

Assuming that the variables Z1, Z2, …, Zp are marginally standard normal simplifies the problem and allows obtaining results for their ecdf under arbitrary dependence because all the dependence is expressed through correlation. In this situation, Efron (2007a) proposed the so-called empirical null as an approximation to the observed distribution of the z-scores, parametrized as a normal distribution with mean and variance other than 0 and 1. To further understand the effect of correlation, Efron (2010) derived the covariance function of the ecdf and applied it to estimating the variance of functions of the ecdf relevant in large-scale multiple testing, such as the local and tail false discovery rate. Schwartzman (2010) proposed to approximate the ecdf by a Gram-Charlier expansion and used its coefficients to establish some constraints to the extent of the departure of the ecdf from the marginal N(0, 1). However, these approaches have not succeeded to fully characterize the behavior of the ecdf under correlation.

In this article we describe the asymptotic behavior of the ecdf of a large number of correlated standard normal variables. First, we show that in general, the ecdf need not converge to Φ, the standard normal cdf. A necessary and sufficient condition for convergence, which we call weak correlation, is that the average of the absolute pairwise correlations between the z-scores (or the average of the squares of the pairwise correlations) tends to zero with increasing dimension p. However, we show that in a wide range of strong correlation situations, the ecdf converges instead to a random distribution function. This random function can be written as a (possibly infinite) normal mixture parametrized by latent independent standard normal variables. It can be thought of as an analytic asymptotic approximation to the ecdf, where the latent variables can be consistently estimated, under some regularity conditions, from the observed data sequence. Further, it can be thought of as a dimension reduction of the p-dimensional ecdf in the sense that its inherent dimension is the number of the latent variables. We give a lower bound for this inherent dimension and show that, under certain regularity conditions, it can be achieved by a particular parametrization obtained via an eigendecomposition of the correlation matrix. This parametrization is based on the factor analysis model of Fan et al. (2012), who use it to calculate the false discovery proportion in large scale multiple testing under arbitrary correlation. Here we consider a more general framework where not only eigendecompositions are investigated.

As an illustration of the behavior of the ecdf as a random function as described in this paper, Figure 1 presents the histograms of two realizations of normal random variables under two correlation structures:

Figure 1.

Figure 1

Histograms of various instances of 1000 standard normal variables with a one-block correlation structure (a) and a two-independent-blocks correlation structure (b). The red line is the standard normal density and the blue dashed line is the asymptotic approximation we use.

  1. One block: {Zi}i=11000 is a sequence of exchangeable random variables with correlation ρ = 0.9, i.e., for any ij, cor(Zi, Zj) = ρ

  2. Two independent blocks: {Zi}i=11000 consists of two intercalated independent sequences of exchangeable random variables, i.e., for any ij such that |ij| is even, cor(Zi, Zj) = ρ (and 0 otherwise).

It can be seen that for both structures the empirical distribution differs from the standard normal density (red line). In case (a) the histogram is shifted and narrower than the standard normal density, while in case (b) it looks like a mixture of two normal distributions. Notice that the histograms change between the two realizations, suggesting that the distribution may not be converging to a deterministic limit. We will show that in these cases the empirical distribution indeed does not converge to a deterministic limit, but can be asymptotically approximated by a normal mixture estimated from the data, represented here by the dashed blue line. Not surprisingly, in case (a) the asymptotic approximation is of dimension 1, containing one latent standard normal variable, and in case (b) is of dimension 2, containing two independent latent standard normal variables. The fitted density in case (a) can be interpreted as Efron’s empirical null model of a shifted and scaled normal, but clearly that model cannot capture the behavior of the distribution in case (b).

When considering the entire range of possible correlation structures, we show that there are essentially three regimes. The first is what we call weak correlation; while the random variables need not to be independent, the ecdf converges to Φ. The second regime is what we call finite dimensional correlation and is the main interest of this paper. Under this type of correlation, the ecdf can be approximated by a random function that depends on a finite number of latent independent standard normal random variables. Furthermore, under some regularity conditions, a representation using the smallest number of latent variables can be achieved and estimated consistently. The examples of Figure 1 belong to this regime. Finally, in the third regime, the limiting random function depends on infinite number of independent standard normal random variables, like the ecdf itself.

As an illustration of the usefulness of the results presented in this article, we present an analysis of brain imaging data obtained from a study of cortical thickness of adults who had a diagnosis of attention deficit/hyperactivity disorder (ADHD) as children (Proal et al., 2011). In this study, it had been noticed before that, when searching for brain locations whose cortical thickness is related to clinical diagnosis, the histogram of the z-scores did not follow the theoretical standard normal distribution (Reiss et al., 2012). Here we do a slightly different analysis (where the correlation structure is known) and show that, while the study exhibits apparently strongly significant results, they can be entirely explained by correlation, as captured by the asymptotic approximation mentioned above.

The rest of the paper is organized as follows. After a brief treatment of weak correlation in Section 2, the main results of the paper for general correlation are given in Section 3. Section 4 discusses how to consistently estimate the latent variables. In Section 4 we also briefly discuss the case where the correlation matrix is unknown. Several concrete examples, including those of Figure 1, are presented in detail in Section 5. A data example is analyzed in Section 6. Section 7 considers some possible extensions of this work. All the proofs are given in a supplementary material document.

2 Weak correlation

2.1 Definition

To define weak correlation some notation is needed. For a given p × p matrix Rp = (rij) define the following average norms: Rp1(p):=1p2i,jrij and Rp2(p):=1p(i,jrij2)1/2=1p(iλi2)1/2, where the λi’s are the eigenvalues of Rp. The latter is simply a scaled version of the Frobenius norm, and in both cases we use the superscript (p) to denote that the norm itself, not just the matrix, changes with p. If {Rp}p=1 is a sequence of correlation matrices then, as p → ∞,

Rp1(p)0Rp2(p)0.

This is because, by Jensen’s inequality, Rp1(p)Rp2(p), while on the other hand, |rij| ≤ 1 and (rij)2 ≤ |rij|, and so Rp1(p){Rp2(p)}2. A similar argument holds if Rp is a covariance matrix and the diagonal entries of Rp, i.e., the variances, are bounded.

Definition 1

Let {ξi}i=1 be a sequence of standard normal variables with joint normal distribution and denote the correlation matrix of (ξ1, …, ξp) by Rp. If Rp1(p)0, or equivalently, Rp2(p)0, then {ξi}i=1 is called weakly correlated. Otherwise it is called strongly correlated.

2.2 Convergence of the ecdf

Let {Zi}i=1 be a sequence of standard normal variables with joint normal distribution. The ecdf is

F^p(z):=1pi=1pI(Ziz), (1)

where I(·) denotes the indicator function. Correlation does not affect the expectation E[p(z)] = Φ(z) of the ecdf, but it affects its covariance. The covariance function is given in Proposition 1 of Schwartzman (2010).

The following theorem establishes that a necessary and sufficient condition for consistency of the ecdf in Inline graphic is that {Zi}i=1 is weakly correlated.

Theorem 1

Let Z1, Z2, …, Zp, … be N(0, 1) variables with ecdf (1) and let Rp denote the correlation matrix of (Z1, …, Zp).

  1. (Sufficiency) If {Zi}i=1 is weakly correlated, then F̂p(z) converges to Φ(z) in Inline graphic uniformly:
    supzE{F^p(z)-Φ(z)}214p+CRp1(p)0,

    where C is a universal constant.

  2. (Necessity) If {Zi}i=1 is not weakly correlated, i.e., Rp2(p)0, then F̂p(z) does not converge to Φ(z) in Inline graphic for any z ≠ 0, i.e.:
    E{F^p(z)-Φ(z)}2p0,z0

Notice that according to Theorem 1 the convergence is either uniform or none at all; if the correlation is strong, i.e., not weak, then for any z ≠ 0 there is no convergence to Φ(z). Under a general correlation structure the ecdf may converge to a random function. In Section 3 below we aim at identifying and estimating this function.

2.3 Examples

It is easy to check that every Gaussian autoregressive moving average (ARMA) process is weakly correlated. So is every m-dependent Gaussian sequence with banded correlation matrix so that rij = 0 for ij > m and fixed finite m. This includes correlation in fixed finite blocks. In all these cases the ecdf converges to the standard normal distribution.

More generally, all Gaussian stationary ergodic processes are weakly correlated. This is because ergodicity requires that the autocorrelation function ρ() = ri,i+ satisfies |ρ()| → 0 as → ∞, and therefore

Rp1(p)=1p2i,jrij=1p2[p+2=1p-1(p-)ρ()]1p+2p=1p-1ρ()0.

It is not hard to check that even long-range correlation, defined by =1ρ()=, also implies weak correlation except in the extreme case where =1p-1ρ() is of order p (the largest possible).

3 General correlation

3.1 An asymptotic approximation

To describe the asymptotic behavior of the ecdf for a general correlation structure, the main idea is to decompose the correlation into a strong correlation component and a weak correlation component. Then, the asymptotic behavior of the ecdf as a random function will be captured by the strong correlation component, while the weak component will converge as in Theorem 1.

Specifically, suppose that for every p we have the decomposition

Rp=Ap+Bp, (2)

where Ap and Bp are symmetric positive semi-definite matrices. Write Ap as

Ap=Lp(Lp)T, (3)

where Lp is of dimension p × k(p); let i(p) be the i’th row of Lp. If the matrix Ap is the zero matrix then we define k(p) := 1, and Lp := (0, …, 0)T.

For a matrix B, we define the matrix Cor(B) by

{Cor(B)}ij={BijBiiBjjBiiBjj00BiiBjj=0.

Notice that if B is a positive semidefinite matrix, then |{Cor(B)}ij| ≤ 1. Obviously, if B is a correlation matrix, then Cor(B) = B. The following theorem states the main result.

Theorem 2

Under the previous setting and notation,

  1. For every p there exists a (non-unique) random vector Wk(p)(p)~N(0,Ik(p)) such that
    supzE{F^p(z)-F¯p(z)}214p+CCor(Bp)1(p), (4)
    where
    F¯p(z):=E[F^p(z)Wk(p)(p)]=1pi=1pΦ(z-μi(p)σi(p)) (5)

    with μi(p):=i(p)Wk(p)(p),σi(p):={(Bp)ii}1/2, and C is a universal constant. If σi(p)=0 we define Φ(z-μi(p)σi(p)):=I(μi(p)z).

  2. Therefore, if Cor(Bp)1(p)p0 then supzE{F^p(z)-F¯p(z)}2p0.

The key idea of the proof is the following: due to decomposition (2) and equality (3), there exists a random vector Wk ~ N (0, Ik) such that

Zi=i(p)Wk(p)(p)+ξi,i=1,,p, (6)

where (ξ1, …, ξp) ~ N(0, Bp). Therefore, the conditional random vector

(Z1,,Zp):=(Z1,,Zp)Wk(p)(p)=wk

is normal with conditional mean μ = Lpwk and conditional covariance matrix Bp. The Inline graphic distance between p(z) and F¯p(z):=E[F^p(z)Wk(p)(p)] can be essentially bounded uniformly by Cor(Bp)1(p).

It is important to note that the random vector Wk(p)(p) is not unique. Because Wk(p)(p) has a spherically symmetric distribution, Equation (4) will hold if Wk(p)(p) is replaced by QWk(p)(p) where Q is any p × p orthonormal matrix.

Definition 2

The sequence of random functions {Gp(z)}p=1 is said to converge uniformly in Inline graphic to the (random) function G(z) if supzE{Gp(z)-G(z)}2p0.

Corollary 1

Suppose p(z) satisfies the conditions of Theorem 2(ii). If there exists a (random) function (z) such that p(z) converges to (z) uniformly in Inline graphic, then p(z) also converges to p(z) uniformly in Inline graphic.

Theorem 2 holds for all decompositions of the form (2). We call every corresponding p(z) an asymptotic approximation of p(z). However, some decompositions may need a smaller number of latent variables k(p) than others. In the next section we characterize the best decomposition in the sense of giving the asymptotic approximation with the smallest number of latent variables. If p(z) converges to some (z), then we call the latter the asymptotic representation of p(z).

3.2 Dimension reduction

Theorem 2 approximates p(z) by p(z), which is the projection of p(z) onto a space generated by Wk(p)(p). Thus, as explained below, p(z), which has dimension p, is approximated by p(z) with dimension rank (Ap). Hence, Theorem 2 can be regarded as a dimension reduction of the empirical distribution function as stated in the following proposition.

Proposition 1

Let Inline graphic be the collection of all one-dimensional distribution functions, where convergence is defined in the sense of weak convergence of the corresponding random variables.

  1. p(z) has dimension p: define the mapping Inline graphic(Z1, ···, Zp) = p(z), fromp to Inline graphic. Then Inline graphic(ℝp) ⊆ Inline graphic is homeomorphic top.

  2. p(z) has dimension rank(Ap): define the mapping B(Wk(p)(p))=F¯p(z), fromk(p) to Inline graphic. Then Inline graphic(ℝk(p)) ⊆ Inline graphic is homeomorphic toRANK(Ap).

In order to achieve dimension reduction, we are interested in knowing whether there exist decompositions of the form (2) such that the approximation of Theorem 2 holds and the dimension of the approximation rank(Ap) is finite. Consider the collection Inline graphic of decompositions of Rp that satisfy the conditions of Theorem 2 Part (ii): DInline graphic if D={(Ap,Bp)}p=1 such that Ap, Bp satisfy (2) and Cor(Bp)1(p)p0. Clearly, Inline graphic is a large collection and some notion of optimality is required in order to decide which DInline graphic to choose. Given Proposition 1 (ii), we are interested in decompositions where rank(Ap) is the smallest. Theorem 3 below states a lower bound on the limiting rank of Ap and presents a decomposition that achieves it under certain conditions.

To state the result, we need to define the eigendecompositions of Rp as a special case of the decompositions in Inline graphic. Suppose that the eigenvalues of Rp are λ1(p)λp(p) and the corresponding eigenvectors are γ1(p),,γp(p) (notice that everything depends on p). The eigendecomposition of Rp is i=1pλi(p)γi(p){γi(p)}T. For k < p, define

Ak,p=i=1kλi(p)γi(p){γi(p)}T,Bk,p=i=k+1pλi(p)γi(p){γi(p)}T. (7)

Then, the covariance matrix Rp can be expressed as Rp = Ak,p + Bk,p and Ak,p = Lk,p(Lk,p)T where

Lk,p:=(λ1(p)γ1(p),,λk(p)γk(p)). (8)

A critical quantity is the number of “big” eigenvalues, i.e., with size of order p. Define

K_=i=1K_iwithK_i={1ifliminfpλi(p)p>00ifliminfpλi(p)p=0,i=1,2,

and

K¯=i=1K¯iwithK¯i={1iflimsuppλi(p)p>00iflimsuppλi(p)p=0,i=1,2,

By definition we have that K. Notice that K could be ∞ and that if K < ∞ then Ki = 1 for iK and Ki = 0 otherwise; the same for . Since K and are sums of indicators, then they are either an integer or infinity. The following theorem states that K and give lower bounds for the limiting rank of Ap and presents a decomposition of the correlation matrix that achieves it under certain conditions.

Theorem 3

Under the previous setting and notation:

  1. If {(Ap,Bp)}p=1D, then lim infp rank(Ap) ≥ K and lim supp rank(Ap) ≥ (both K and K̄ could be ∞).

  2. For K̄ < ∞ define the decomposition DK¯={(AK¯,p,BK¯,p)}p=1 according to (7) with k = K̄. If the nonzero diagonal terms of B,p are bounded from below, i.e.
    liminfpmin1ip;{BK¯,p}ii0{BK¯,p}ii=εB>0 (9)

    then DInline graphic and obviously rank(A,p) = K̄ for all p > .

The regularity condition (9) guarantees that the norm of the residual covariance in BK̄,p goes to zero because the correlation in it goes to zero, not because the variance in it goes to zero.

To better appreciate the result given by Theorem 3, it is useful to define the following concepts.

Definition 3

  1. If there exists {(Ap,Bp)}p=1D such that lim supp rank(Ap) < ∞ we say that {Zi}i=1 has finite dimensional correlation.

  2. If further K := K = < ∞ then we say that {Zi}i=1 has asymptotic dimension K.

With these definitions in mind, Theorem 3 implies that if {Zi}i=1 has finite dimensional correlation with asymptotic dimension K and the regularity condition (9) holds, then the decomposition {(AK,p,BK,p)}p=1 is “optimal” in the sense that limp rank(AK,p) = K, so that it achieves the lowest dimension among all asymptotic approximations p(z) of p(z). We shall see in Section 5 that the regularity condition (9) holds for most typical correlation structures.

It is now possible to see that the number determines three different convergence regimes. If = ∞ then {Zi}i=1 has no finite dimensional correlation. If < ∞ and (9) holds then {Zi}i=1 has finite dimensional correlation. When = 0 then (9) holds trivially since {B0,p}ii = {Rp}ii = 1 and therefore a necessary and sufficient conditions for weak correlation, as in Theorem 1, is = 0. We summarize the results in the following corollary and in Figure 2.

Figure 2.

Figure 2

Different cases of asymptotic approximation to the ecdf. The corresponding examples from Section 5 below appear in parentheses.

Corollary 2

  1. If K̄ = ∞ then {Zi}i=1 has no finite dimensional correlation.

  2. If K̄ < ∞ and (9) holds then {Zi}i=1 has finite dimensional correlation.

  3. If K := K = < ∞ and (9) holds then {Zi}i=1 has asymptotic dimension K and the decomposition {(AK,p,BK,p)}p=1 achieves it in the sense that limp rank(AK,p) = K.

  4. = 0 if and only if {Zi}i=1 is weakly correlated.

The regularity condition (9) does not hold when the diagonal of BK̄,p contains elements that are arbitrarily small. In this case, captured by the label “unknown” in Figure 2, BK¯,p1(p) may be small but not necessarily Cor(BK¯,p)1(p). Since our bound (4) is based on Cor(BK¯,p)1(p), rather than on BK¯,p1(p), we cannot establish the asymptotic dimension in this case. However, we could not find an example of Rp, where (9) is not satisfied. A hand-waving argument that (9) typically holds is:

{BK¯,p}ii=j=K¯+1pλj(p){γj(p)}i2λK¯+1(p)j=K¯+1p{γj(p)}i2λK¯+1(p)>0,

where ≈ is typically true because γj(p) is a normalized vector and therefore {γj(p)}i21/p.

4 Estimating the asymptotic representation from the data

4.1 Estimating the latent variables

We now discuss how to estimate the underlying latent variables WK(p) when the sequence {Zi}i=1 has asymptotic dimension K := K = . For ease of presentation we write everything in a matrix/vector form; thus, we write Zp := (Z1, …, Zp)T, ξp = (ξ1, …, ξp)T. In this notation, (6) becomes the linear regression equation Zp=LK,pWK(p)+ξp and the least square estimate of WK(p) is W^K(p):=(LK,pTLK,p)-1LK,pTZp.

When the eigendecomposition (7) is used, then the columns of Lk(p) are orthogonal and (LK,pTLK,p)-1 is a diagonal matrix whose i-th diagonal element is 1/λi(p). Thus, in this case,

W^K(p)=(1λ1(p){γ1(p)}TZp1λK(p){γk(p)}TZp). (10)

Fan et al. (2012) and Fan and Han (2014) consider a related framework where some of the variables Zi may have a non-zero mean, and therefore use an estimate that minimizes the Inline graphic distance under sparsity assumptions.

Since {Zi}i=1p has strong correlation, then Cov(W^K(p)) does not converge to 0, as p goes to infinity. However, the following proposition states that W^K(p) is still consistent in an Inline graphic sense.

Proposition 2

Suppose that {Zi}i=1 has asymptotic dimension K. Then

E[{W^K(p)-WK(p)}T{W^K(p)-WK(p)}]KpλK(p)BK,p2(p)p0.

4.2 Estimating the asymptotic representation

In real data problems WK(p) and, thus, p(·) are unknown. Proposition 2 suggests that if we plug-in W^K(p) to p(·) instead of WK(p), then the Inline graphic distance between p(·) and the plugged-in p(·) converges to zero. Indeed, this can be proved under some additional regularity conditions.

Theorem 4

  1. Define the plug-in ecdf estimate
    F¯^p(z):=1pi=1pΦ(z-μ^i(p)σi(p)) (11)
    with μ^i(p):=i(p)W^K(p) under the convention Φ(z-μ^i(p)0):=I(μi(p)z). We have that
    supzE{F^p(z)-F¯^p(z)}23supzE{F^p(z)-F¯p(z)}2+3(p-Jp)p++32πmaxiJp1{σi(p)}2E[{W^K(p)-WK(p)}T{W^K(p)-WK(p)}],

    where Jp:={1ip:σi(p)>0} is the set of indexes for which σi(p)>0 and |Jp| is the cardinality of Jp.

  2. Therefore, if {Zi}i=1 has asymptotic dimension K, (9) holds and also
    Jppp1 (12)

    then supzE{F^p(z)-F¯^p(z)}2p0.

4.3 Approximated eigendecomposition

In this section we study the case in which the largest K eigenvalues and eigenvectors of the correlation matrix are not known exactly and an approximation is used. We show that if the distance between the approximated eigenvalues and eigenvectors and the true ones goes to zero as p goes to infinity, then the result of the previous section still holds.

Let λ1(p)λK(p) be an approximation to the K biggest eigenvalues, and let γ1(p),,γK(p) be norm 1 vectors that are an approximation to the corresponding eigenvectors. Define the p × K matrix Lp:=(λ1(p)γ1(p),,λk(p)γk(p)), and let i(p) be the i-th row of the matrix. Define,

WK(p):=(1λ1(p){γ1(p)}TZp1λK(p){γK(p)}TZp);

and let μi(p):=i(p)WK(p) and σi(p):={1-j=1Kλj(p){γj(p)}i2}1/2, where {x}i is the i-th element of the vector x. Finally, define Fp(z):=1pi=1pΦ(z-μi(p)σi(p)). The following proposition bounds the Inline graphic distance between p(z) and F¯^p(z).

Proposition 3

Suppose that σi(p),σi(p)εB, i = 1, …, p, for some εB > 0.

  1. The following inequality holds:
    E{Fp(z)-F¯^p(z)}2C(εB)λK(p)/pi=1K[{1λi(p)/p-1λi(p)/p}2+{λi(p)p-λi(p)p}2+2{γi(p)-γi(p)}T{γi(p)-γi(p)}]+C(εB)p[i=1K{λi(p)p=λi(p)p}2+i=1K{γi(p)-γi(p)}T{γi(p)-γi(p)}]1/2,

    where C(εB) is a constant that depends on εB.

  2. Therefore, if λi(p)-λi(p)pp0,{γi(p)-γi(p)}T{γi(p)-γi(p)}p0, i = 1, …, K, and the conditions of Theorem 4 (ii) hold, then supzE{Fp(z)-F^p(z)}2p0.

Proposition 3 and its proof imply that if there exist consistent estimates for the first K eigenvalues and eigenvectors, then the result of Theorem 4 (ii) still holds; see also Fan and Han (2014). A systematic study of the case that the correlation matrix is unknown is left to future research.

5 Examples

We now consider some examples of correlation structures. In Examples 1–3, K = is finite, conditions (9,12) are met, and therefore the decomposition {(AK_,p,BK_,p)}p=1 is optimal and F¯^p is consistent, but in Example 3, p(z) does not have a limit. In Example 4, K = 1 but > 1 and in Example 5, K = ∞.

5.1 Exchangeable correlation: asymptotic dimension 1

Suppose that {Zi}i=1 is a sequence of exchangeable random variables with correlation ρ ≥ 0, i.e., for any ij, cor(Zi, Zj) = ρ and

Rp=(1ρρρ1ρρρ1). (13)

The eigenvalues of Rp are λ1(p)=ρp+1-ρ (with corresponding eigenvector equal to (1,,1)T/p) and λ2(p)==λp(p)=1-ρ, hence, K = = 1. According to (7), we have that A1,p = [ρ + (1 − ρ)/p] Inline graphic and B1,p = (1 − ρ) [Ip − (1/p) Inline graphic], where Ip is the p × p identity matrix and Inline graphic is a p × p matrix with 1 in every entry. It is easy to check that for every i, limp{B1,p}ii = 1 − ρ > 0, and (9) holds. Also, |Jp| = p and (12) holds. Thus, in this case {Zi}i=1 has finite dimensional correlation with asymptotic dimension K = 1 and the decomposition {(A1,p,B1,p)}p=1 is optimal.

To write the asymptotic representation (z) in this case, it is easier to work with the asymptotically equivalent decomposition Ap = ρ Inline graphic, Bp = (1 − ρ)Ip, which is in D and rank(Ap) = K = 1. For this decomposition μi(p)=ρW1 and σi(p)=1-ρ do not depend on i or p and

F¯p(z)=1pi=1pΦ(z-μi(p)σi(p))=1pi=1pΦ(z-ρW11-ρ)=Φ(z-ρW11-ρ)=F¯(z). (14)

That is, p(z), which has dimension p, is approximated by p(z) with dimension 1. Moreover, p(z) = (z) above does not depend on p and is therefore the asymptotic representation: p(z) converges to (z) in the sense of Definition 2.

Given the sequence {Zi}i=1p, the regression estimate (10) of W1 is W^1(p)=Z¯p/ρ, where Z¯p:=1pi=1pZi. An illustration of the estimated asymptotic representation F¯^p(z) in (11) is shown in Figure 1(a).

Notice that the distribution specified by (14) is a shifted and scaled normal distribution, corresponding to Efron’s empirical null model (Efron, 2004, 2007a,b, 2008). Fitting an empirical null in this case would amount to estimating W1 and ρ. This example shows that Efron’s empirical null model is justified under exchangeable correlation. However, as it can be seen from the above theory and the following examples, it does not capture the effect of many other correlation structures.

When ρ ≤ 0 in (13), positive semi-definiteness of the correlation matrix requires ρ ≥ −1/(p − 1). Therefore, in the high-dimensional setting, too much negative correlation is not possible. Consequently, the histograms of the Z’s are typically narrower than N(0, 1) (see Figure 1).

5.2 Two exchangeable correlation blocks: asymptotic dimension 2

Suppose that Rp consists of two blocks of the form (13), the first one of size n1(p) × n1(p) and the second of size n2(p) × n2(p) with n1(p) + n2(p) = p and limp→∞ n1(p)/p = π. We assume that there is a constant correlation between the two blocks ρB (which can be zero or negative). Thus,

Rp=(1ρ1ρ1ρ1ρ1ρ1ρ11n1(p)ρBρB1ρ2ρ2ρ2ρ2ρ2ρ21n2(p)).

Let I1(p) and I2(p) denote the set of indexes that belong to Blocks 1 and 2. If 0ρB<ρ1ρ2, then a sequence {Zi}i=1 with the above covariance can be generated by the model

Zi={ρ1(ρBW1+1-ρBW2)+1-ρ1εiiI1(p)ρ2(ρBW1+1-ρBW3)+1-ρ2εiiI2(p),

where ρB=ρB/ρ1ρ2, and W1, W2, W3, {εi}i=1p are i.i.d N (0, 1). Comparing with (6), we can write

F¯p(z)=n1(p)pΦ(z-ρ1(ρBW1+1-ρBW2)1-ρ1)+n2(p)pΦ(z-ρ2(ρBW1+1-ρBW3)1-ρ2)F¯(z)=πΦ(z-ρ1(ρBW1+1-ρBW2)1-ρ1)+(1-π)Φ(z-ρ2(ρBW1+1-ρBW3)1-ρ2).

If ρB = 0, then the asymptotic approximation (z) depends only on W2, W3 and has dimension K = = 2. In this case Rp has two uncorrelated blocks of the type presented in Section 5.1; the generalization of the formulas there is straightforward (see also Figure 1(b)).

For ρB ≠ 0, however, the above (z) depends on W1, W2, W3 and, thus, has dimension three; it is suboptimal. To obtain the optimal asymptotic representation in more generality, we work with the eigendecomposition. The matrix Rp has two eigenvalues

λ1,2(ρ):=η1(p)+η2(p)2±[(η1(p)-η2(p)2)2+n1(p)n2(p)ρB2]1/2

where ηj(p)=1+ρj{nj(p)-1}, j = 1, 2, and the other p − 2 eigenvalues are either 1 − ρ1 or 1 − ρ2. For large p, positive semi-definiteness requires that ρB2ρ1ρ2. On the boundary, when ρB2=ρ1ρ2,λ2(p) converges to a positive constant and K = = 1.

We now concentrate on the case ρB2<ρ1ρ2. Then, λ1,2(p) are of order p and K = = 2. The corresponding eigenvectors are

(xj(p),xj(p),,xj(p)n1(p)times,yj(p),yj(p),,yj(p)n2(p)times)T,j=1,2,

where xj(p) and yj(p) are given by

x1(p)=-n2(p)ρB(η1(p)-λ1(p))2+n1(p)n2(p)ρB2,y1(p)=(η1(p)-λ1(p))/n2(p)(η1(p)-λ1(p))2+n1(p)n2(p)ρB2,x2(p)=(η2(p)-λ2(p))/n1(p)(η2(p)-λ2(p))2+n1(p)n2(p)ρB2,y2(p)=-n1(p)ρB(η2(p)-λ2(p))2+n1(p)n2(p)ρB2.

We have that

μi(p)={λ1(p)x1(p)W1+λ2(p)x2(p)W2,iI1(p)λ1(p)y1(p)W1+λ2(p)y2(p)W2,iI2(p),

and

σi(p)={1-[λ1(p){x1(p)}2+λ2(p){x2(p)}2],iI1(p)1-[λ1(p){y1(p)}2+λ2(p){y2(p)}2],iI2(p).

Moreover, {B2,p}iip1-ρ1 for iI1(p) and {B2,p}iip1-ρ2 for iI2(p). Thus, (9) holds with = K = 2. Also, |Jp| = p and (12) holds. The corresponding asymptotic representation can be obtained by writing p(z) in terms of the μi(p) and σi(p) above and taking the limit as p → ∞.

The regression estimate of Wj(p) is

W^j(p)={xj(p)iI1(p)Zi+yj(p)iI2(p)Zi}/λj(p)j=1,2.

5.3 Finite dimensional correlation with no asymptotic representation

In this example, correlation is finite with K = 2 and the Inline graphic distance between p and p goes to zero, but p itself has no limit. Let 0 < ρ < 1 and 0<{αi}i=1<1 be any sequence, W1, W2, {εi}i=1 independent standard random variables. Define

Zi=ραiW1+ρ(1-αi)W2+1-ρεi;

then, Zi ~ N(0, 1) and Cov(Zi,Zj)=ρ{αiαj+(1-αi)(1-αj)} for ij. We can write Rp = Ap + Bp, where Ap = Lp(Lp)T,

Lp=(ρα1ρ(1-α1)ρα2ρ(1-α2)ραpρ(1-αp)),andBp=(1-ρ0001-ρ0001-ρ).

Then, F¯p(z)=Φ(z-μi1-ρ), where μi:=ραiW1+ρ(1-αi)W2 and it is clear that for certain choices of 0<{αi}i=1<1, p(z) has no limit for any z.

5.4 Independent exchangeable correlation blocks with no asymptotic dimension

In this example there are two independent blocks similar to the example in Section 5.2 (with ρB = 0) and with the same notation, but here n1(p)p,n2(p)p change infinitely often. Consequently, there are infinitely many p’s with two “big” eigenvalues and infinitely many p’s with only one “big” eigenvalue. Therefore, in this example, K = 1 while = 2. This means that correlation is finite dimensional and (9) holds, but there is no asymptotic representation and no well defined asymptotic dimension.

Consider the subsequence m1 = 2, mk+1=mk2 for k ≥ 2; i.e., mk = 2(2k−1). Suppose that Zi belongs to the first block if i ∈ {mk−1 + 1, …, mk} for even k, otherwise it belongs to the second block. That is, at p = mk for even k,

n1(p)=n1(mk-1)+mk-mk-1=n1(p)+p-p,andn2(p)=n2(mk-1)

and for odd k

n1(p)=n1(mk-1),andn2(p)=n2(mk-1)+mk-mk-1=n2(p)+p-p.

When p = mk for even k, mk-mkn1(mk)mk, and therefore the eigenvalue associated with this block satisfies limkevenρn1(mk)+1-ρmk=ρ but for odd k, n1(mk)mk-1=mk and limkoddρn1(mk)+1-ρmk=0; and vice versa for the other block. The largest eigenvalue satisfies

λ1(p)p=max{ρn1(p)+1-ρp,ρn2(p)+1-ρp}ρp/2+1-ρp

and therefore liminfpλ1(p)pρ/2>0 and K1 = 1. For the second eigenvalue

λ2(p)p=min{ρn1(p)+1-ρp,ρn2(p)+1-ρp};

since lim infp of both terms is 0, then liminfpλ2(p)p=0 and K2 = 0. Thus, K = 1.

We next show that = 2. When p = mk one block is larger than the other block and when p = mk+1 the other block is larger. Therefore, for each k there exists pk between mk and mk+1 such that the two blocks are equal, i.e., n1(pk) = n2(pk) = pk/2. Thus,

λ2(pk)pk=min{ρn1(pk)+1-ρpk,ρn2(pk)+1-ρpk}=ρ/2+1-ρpk;

Therefore, limsuppλ2(p)plimkλ2(pk)pk=ρ/2>0. Since the other eigenvalues are constant (1 − ρ) we have that = 2.

Moreover, in this case B1,p2(p) does not converge to 0 and therefore Cor(B1,p)2(p)0; hence, the decomposition {(A1,p, B1,p)} is not in Inline graphic. This is because

B1,pk2(pk)=[{λ2(pk)}2+{λ3(pk)}2++{λpk(pk)}2]1/2pkλ2(pk)pk=ρ/2+1-ρpk,

so that liminfkCorB1,pk2(pk)liminfkB1,pk2(pk)ρ/2.

This example can be easily generalized such that = 3, 4,.. or = ∞. Given M < ∞, to obtain = M, for even k one can divide the observations Zmk, …, Zmk+1 into M − 1 independent blocks with correlation ρ within blocks. To obtain = ∞, for even k one can divide the observations Zmk, …, Zmk+1 into k blocks.

5.5 No finite dimensional approximation

In this example, K = = ∞, and therefore, according to Theorem 3, for every decomposition in Inline graphic, limp rank(Ap) = ∞. Suppose that at p = 2n, Rp consists of n independent blocks of sizes of 2n−1, 2n−2, …, 2 and an additional block of size 2, each block of the form (13), all with the same correlation parameter ρ. At p = 2n, there are n − 1 “big” eigenvalues, ρ(p/2i − 1) + 1 for i = 1, …, n − 1, one eigenvalue 1 + ρ (from the last block of size 2) and the rest are equal to 1 − ρ. For any fixed iInline graphic, for large enough Inline graphic the eigenvalue λi(p) is equal to ρ(p/2i − 1) + 1 and therefore for large enough Inline graphic,

λi(p)p=ρ(p/2i-1)+1ppρ/2i>0.

Therefore, for any fixed iInline graphic, Ki = 1 and K_=i=1K_i=.

6 Data example

As a practical application, we use the methods developed above to analyze a high-dimensional data set obtained from brain imaging. The data belongs to a study of cortical thickness of adults who had a diagnosis of attention deficit/hyperactivity disorder (ADHD) as children (Proal et al., 2011). The data set consists of cortical thickness measurements for about 80000 cortical voxels, obtained from magnetic resonance imaging (MRI) scans, as well as demographic and behavioral measurements, for each of n = 139 individuals. In this study, it had been noticed by Reiss et al. (2012) that z-scores corresponding to the voxelwise relationship between cortical thickness and ADHD diagnosis did not follow the theoretical standard normal distribution. Instead, the distribution of z-scores exhibited a substantial shift away from zero, indicating a possible widespread cortical thinning over the brain for individuals with ADHD. It is unclear, however, whether those results could have been caused by correlation between voxels rather than by a real relationship with clinical diagnosis.

In order to apply the methods developed in this paper, here we perform a slightly different analysis where the correlation structure can be taken as known. Specifically, we follow the approach of Owen (2005) of performing a regression of the observed trait for the subjects on the high-dimensional predictors, one dimension at a time. Owen (2005) and also Fan et al. (2012) used this approach in the context of genomic data. In our case, the trait is a global assessment of behavior, while the predictors are the cortical thickness measurements. For ease of computation, in the following analysis we use a random sample of p = 1000 voxels, which is enough to show the effect we want to highlight.

6.1 Regression analysis

Let Yj denote the global assessment of the j-th subject and let Xj={Xj(i)}i=1p be the cortical thickness in the p voxels of the j-th subject. For each voxel i ∈ {1, …, p}, consider the simple regression model

Yj=α(i)+β(i)Xj(i)+εjj=1,,n,

where the ε’s are i.i.d with mean 0 and variance σ2. Let Yj=Yj-1nj=1nYj and Xj(i)=Xj(i)-1nj=1nXj(i) be the centered variables and define si=j=1nXj(i)Xj(). The least squares estimate of β(i) is β^(i)=j=1nXj(i)Yj/sii. According to the model, we can write β^(i)=β(i)+j=1nXj(i)εj/sii and so Cov(β̂(i), β̂(ℓ)) = σ2si/[siisℓℓ].

Consider the p hypotheses H0(i):β(i)=0 versus H1(i):β(i)0, i = 1, …, p. The z-score for the i-th test is Zi=β^(i)/se[β^(i)]=β^(i)sii/σ. Thus, under the global null hypothesis, i.e. that all H0(i)’s are true, we have that Z1, …, Zp are each approximately N (0, 1) with correlation matrix R:={ri}i,,=1p given by the pairwise correlations ri:=Cor(Zi,Z)=si/siis. Notice that the pairwise correlations between the z-scores are precisely the pairwise correlations between the cortical thickness measurements at each voxel. Because the regression is conditional on the voxelwise measurements, we take the pairwise correlations as fixed. To compute the z-scores, we use the variance estimate σ^2=1p(n-2)i=1pj=1n[Yj-β^(i)Xj(i)]2 and ignore its negligible contribution to their variability.

6.2 The distribution of the z-scores

Consider the decomposition Rp = Ak,p + Bk,p, where Ak,p, Bk,p are of the form (7). The asymptotic dimension, if it exists, is unknown. In the next subsection we discuss the choice of k, for now we work with k = 2. After k is set, Wk(p) is estimated via (10). Define

μ^i:=iW^k(p),andσi:={(Bk,p)ii}1/2i=1,,p,

where i is the i-th row of Lk,p defined in (8). The empirical distribution is approximated by F¯^(z) given by (11), and the density is approximated by f¯(z):=1pi=1p1σiφ(z-μ^iσi), where φ is the standard normal density. Figure 3(a) plots the histogram of the Z’s, and φ. The approximation beautifully captures the shape of the empirical distribution, even though it is based on only two latent variables.

Figure 3.

Figure 3

Histogram of Z1, Z2, …, Z1000 of the real data (a) and the simulation results (b). The red line illustrates φ, the standard normal density, and the blue dashed line is the approximation we use, .

Recall that the approximation is done under the complete null hypothesis, i.e. that there is no effect. The validity of the complete null hypothesis is confirmed by a false discovery rate (FDR) analysis which shows no significant voxels after applying the Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995) at an FDR level of 0.2. Nevertheless, the z-scores exhibit a strong shift toward positive values, indicating a possible positive correlation between cortical thickness and behavior. Our approximation indicates that the reason that many Z’s are large may be because of the correlation between them and not because of a true effect. To illustrate this point we simulated Y1, …, Yn ~ N (0, 1) i.i.d and independent of the X’s and repeated the same procedure as before. One such simulated instance is given in Figure 3(b). Without correlation, we would expect the histogram of the Y’s to follow the theoretical null density (red). However, the correlation is causing the impression of a strong positive effect not unlike the one seen in panel (a).

6.3 The number of latent variables

We now discuss the choice of k, the number of latent variables. Recall that the asymptotic dimension K, if it exists, is the optimal choice as discussed in Section 3.2 and is equal the number of eigenvalues of order p. A scree plot of the 50 largest eigenvalues of Rp is shown in Figure 4(a). Catell’s graphical test indicates an elbow at k = 3, and after 10 ~ 20 eigenvalues, the value is almost constant.

Figure 4.

Figure 4

(a) The 50 largest eigenvalues of RP ordered according to their size. (b) Histogram of Z1, Z2, …, Z1000 and the approximation, , for k = 2 (blue dashed), 10 (red dotted), 100 (solid green).

Theorem 2 states that under the decomposition Rp = Ak,p + Bk,p, the Inline graphic distance between p and p is bounded by 14p+Ccor(Bk,p)2(p). As k increases the bound decreases and cor(Bk,p)2(p)=0 when k = p. However, as k increases the dimension of the representation also increases since the dimension is rank(Ak,p) (Proposition 1). Furthermore, for large k the empirical distribution and the approximation, , are very close and the overfitting problem arises. To illustrate this point, we plot different choices of k in Figure 4(b). It can be seen that when k = 100 the approximation is much closer to the histogram, but such high dimension is unnecessary to capture the global behavior of the histogram. The differences between k = 2 and k = 10 are rather small suggesting that k = 2 suffices.

7 Summary and extensions

In this work we have studied the limit of the ecdf of marginally standard normal variables when strong correlation is present. As predicted by Efron (2007a), we have shown that the limit is indeed not standard normal. Specifically, we have shown that under a regime that we call finite dimensional correlation (and some regularity conditions), the limit is a finite mixture of scaled normals with random means, which reduces to Efron’s empirical null model when the correlation structure is exchangeable. Moreover, we have shown that if the correlation is not finite then the limit can still be approximated by mixture of normals but an infinite number of them may be required.

The main technique for achieving these results has been a decomposition of the correlation matrix R into two matrices A and B, where A captures the strong correlation and B captures the weakly correlated residual noise. The form of the limiting distribution of the ecdf is determined by A, while the residual noise represented by B goes to zero asymptotically. The key to achieving the asymptotic representation of the ecdf with the smallest dimension is to choose B so that it contains the largest amount of variance while remaining weakly correlated.

For future work, we consider the following extensions. First, we assumed that all random variables have variance 1. If the variances are not 1, but are bounded above and below, then the random variables could be standardized and all our results follow. If they are not bounded, then still for each finite p one can standardize and obtain inequality (4) of Theorem 2.

In Theorem 2 we proved that the Inline graphic distance between p and p converges to zero if the residual correlation not captured by p is weak. Could the same be said of the Inline graphic distance between the moments of p and p? Let m^n,p:=1pi=1pZin and n,p := ∫ zndF̄p(z) be the n-th moment of p and p. For the first moment and an appropriate decomposition Rp = Ap + Bp,

m¯1,p=1pi=1pz1σi(p)φ(z-μi(p)σi(p))dz=1pi=1pμi(p).

By the definition of μi(p) and equation (6), we have that Zi-μi(p)=ξi. Therefore,

E(m^1,p-m¯1,p)2=1p2E{i=1p(Zi-μi(p))}2=1p2E(i=1pξi)2=Bp2(p),

and the distance converges to zero if and only if Bp2(p)0. This condition is weaker than that of Theorem 2 and thus convergence of the ecdf implies convergence of the first moment. Proving convergence of higher moments is more difficult and is left for future work.

About the choice of number of components k in practice, the problem is not unlike that of determining the number of components in factor analysis or the effective dimension in principal components analysis. The connection with these techniques is worth exploring.

Finally, about the normality assumption, it is used in the proof of Theorem 2 by applying Mehler’s expansion to the joint density. Similar expansions are available for other distributions such as chi square and gamma (e.g., Koudou (1998); Schwartzman (2011)). Thus it may be possible to extend our results to those distributions as well.

Supplementary Material

Supplementary Material

Acknowledgments

The authors are grateful to Philip Reiss from the Department of Child and Adolescent Psychiatry, New York University School of Medicine, for providing the brain imaging data. This work was partially supported by NIH grant R01-CA157528.

Contributor Information

David Azriel, Email: davidazr@ie.technion.ac.il, Lecturer at the Faculty of Industrial Engineering and Management, Technion - Israel Institute of Technology, Haifa 32000, Israel, and Postdoctoral Research Associate in the Department of Statistics of the Wharton School of the University of Pennsylvania, 3730 Walnut Street, Philadelphia, PA 19104.

Armin Schwartzman, Email: armin.schwartzman@ncsu.edu, Associate Professor at the Department of Statistics, North Carolina State University, Raleigh, NC 27695.

References

  1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B. 1995;57:289–300. [Google Scholar]
  2. Dedecker J, Merlevéde F. The empirical distribution function for dependent variables: asymptotic and nonasymptotic results in LP. ESAIM: Probability and Statistics. 2007;11:102–114. [Google Scholar]
  3. Efron B. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. Journal of the American Statistical Association. 2004;99:96–104. [Google Scholar]
  4. Efron B. Correlation and large-scale simultaneous hypothesis testing. Journal of the American Statistical Association. 2007a;102:93–103. [Google Scholar]
  5. Efron B. Size, power and false discovery rates. The Annals of Statistics. 2007b;35:1351–1377. [Google Scholar]
  6. Efron Bradley. Simultaneous inference: When should hypothesis testing problems be combined? The Annals of Applied Statistics. 2008;2:197–223. [Google Scholar]
  7. Efron Bradley. Correlated z-values and the accuracy of large-scale statistical estimates. Journal of the American Statistical Association. 2010;105:1042–1055. doi: 10.1198/jasa.2010.tm09129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Fan J, Han X. Estimation of False Discovery Proportion with Unknown Dependence. 2014 doi: 10.1111/rssb.12204. submitted http://arxiv.org/abs/1305.7007. [DOI] [PMC free article] [PubMed]
  9. Fan J, Han X, Gu W. Estimating False Discovery Proportion Under Arbitrary Covariance Dependence. Journal of the American Statistical Association. 2012;107:1019–1035. doi: 10.1080/01621459.2012.720478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Koudou AE. Lancaster bivariate probability distributions with Poisson, negative binomial and gamma margins. Test. 1998;7:95–110. [Google Scholar]
  11. Owen AB. Variance of the number of false discoveries. Journal of the Royal Statistical Society: Series B. 2005;67:411–426. [Google Scholar]
  12. Proal E, Reiss PT, Klein RG, Mannuzza S, Gotimer K, Ramos-Olazagasti MA, Lerch JP, He Y, Zijdenbos A, Kelly C, Milham MP, Castellanos FX. Brain gray matter deficits at 33-year follow-up in adults with attention-deficit/hyperactivity disorder established in childhood. Archives of general psychiatry. 2011;68:1122–1134. doi: 10.1001/archgenpsychiatry.2011.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Reiss PT, Schwartzman A, Lu F, Huang L, Proal E. Paradoxical results of adaptive false discovery rate procedures in neuroimaging studies. Neuroimage. 2012;63:1833–1840. doi: 10.1016/j.neuroimage.2012.07.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Schwartzman A. Comment on “Correlated z-values and the accuracy of large-scale statistical estimates” by Bradley Efron. Journal of the American Statistical Association. 2010;105(491):1059–1063. doi: 10.1198/jasa.2010.tm10237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Schwartzman A, Lin X. The effect of correlation in false discovery rate estimation. Biometrika. 2011;98:199–214. doi: 10.1093/biomet/asq075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Wasserman L. All of Nonparametric Statistics: A Concise Course in Nonparametric Statistical Inference. New York: Springer; 2006. [Google Scholar]
  17. Wu WB. Oscillations of Empirical Distribution Functions under Dependence. IMS lecture notes-monograph series, High dimensional Probabilities. 2006;51:53–61. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

RESOURCES