Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Sep 29.
Published in final edited form as: Ann Stat. 2021 Sep 29;49(4):1999–2020. doi: 10.1214/20-aos2024

ASYMPTOTIC DISTRIBUTIONS OF HIGH-DIMENSIONAL DISTANCE CORRELATION INFERENCE

Lan Gao 1, Yingying Fan 1, Jinchi Lv 1, Qi-Man Shao 2,3
PMCID: PMC8491772  NIHMSID: NIHMS1684707  PMID: 34621096

Abstract

Distance correlation has become an increasingly popular tool for detecting the nonlinear dependence between a pair of potentially high-dimensional random vectors. Most existing works have explored its asymptotic distributions under the null hypothesis of independence between the two random vectors when only the sample size or the dimensionality diverges. Yet its asymptotic null distribution for the more realistic setting when both sample size and dimensionality diverge in the full range remains largely underdeveloped. In this paper, we fill such a gap and develop central limit theorems and associated rates of convergence for a rescaled test statistic based on the bias-corrected distance correlation in high dimensions under some mild regularity conditions and the null hypothesis. Our new theoretical results reveal an interesting phenomenon of blessing of dimensionality for high-dimensional distance correlation inference in the sense that the accuracy of normal approximation can increase with dimensionality. Moreover, we provide a general theory on the power analysis under the alternative hypothesis of dependence, and further justify the capability of the rescaled distance correlation in capturing the pure nonlinear dependency under moderately high dimensionality for a certain type of alternative hypothesis. The theoretical results and finite-sample performance of the rescaled statistic are illustrated with several simulation examples and a blockchain application.

MSC2020 subject classifications: Primary 62E20, 62H20, secondary 62G10, 62G20

Keywords: Nonparametric inference, high dimensionality, distance correlation, test of independence, nonlinear dependence detection, central limit theorem, rate of convergence, power, blockchain

1. Introduction.

In many big data applications nowadays, we are often interested in measuring the level of association between a pair of potentially high-dimensional random vectors giving rise to a pair of large random matrices. There exist a wide spectrum of both linear and nonlinear dependency measures. Examples include the Pearson correlation (Pearson, 1895), rank correlation coefficients (Kendall, 1938; Spearman, 1904), coefficients based on the cumulative distribution functions or density functions (Hoeffding, 1948; Blum, Kiefer and Rosenblatt, 1961; Rosenblatt, 1975), measures based on the characteristic functions (Feuerverger, 1993; Székely, Rizzo and Bakirov, 2007; Székely and Rizzo, 2009), the kernel-based dependence measure (Gretton et al., 2005), and sign covariances (Bergsma and Dassios, 2014; Weihs, Drton and Meinshausen, 2018). See also Shah and Peters (2020); Berrett et al. (2020) for some recent developments on determining the conditional dependency through the test of conditional independence. In particular, nonlinear dependency measures have been popularly used since independence can be fully characterized by zero measures. Indeed test of independence between two random vectors is of fundamental importance in these applications.

Among all the nonlinear dependency measures, distance correlation introduced in Székely, Rizzo and Bakirov (2007) has gained growing popularity in recent years due to several appealing features. First, zero distance correlation completely characterizes the independence between two random vectors. Second, the pair of random vectors can be of possibly different dimensions and possibly different data types such as a mix of continuous and discrete components. Third, this nonparametric approach enjoys computationally fast implementation. In particular, distance-based nonlinear dependency measures have been applied to many high-dimensional problems. Such examples include dimension reduction (Vepakomma, Tonde and Elgammal, 2018), independent component analysis (Matteson and Tsay, 2017), interaction detection (Kong et al., 2017), feature screening (Li, Zhong and Zhu, 2012; Shao and Zhang, 2014), and variable selection (Kong, Wang and Wahba, 2015; Shao and Zhang, 2014). See also the various extensions for testing the mutual independence (Yao, Zhang and Shao, 2018), testing the multivariate mutual dependence (Jin and Matteson, 2018; Chakraborty and Zhang, 2019), testing the conditional mean and quantile independence (Zhang, Yao and Shao, 2018), the partial distance correlation (Székely and Rizzo, 2014), the conditional distance correlation (Wang et al., 2015), measuring the nonlinear dependence in time series (Zhou, 2012; Davis et al., 2018), and measuring the dependency between two stochastic processes (Matsui, Mikosch and Samorodnitsky, 2017; Davis et al., 2018).

To exploit the distance correlation for nonparametric inference of test of independence between two random vectors Xp and Yq with p, q ≥ 1, it is crucial to determine the significance threshold. Although the bootstrap or permutation methods can be used to obtain the empirical significance threshold, such approaches can be computationally expensive for large-scale data. Thus it is appealing to obtain its asymptotic distributions for easy practical use. There have been some recent developments along this line. For example, for the case of fixed dimensionality with independent X and Y, Székely, Rizzo and Bakirov (2007) showed that the standardized sample distance covariance by directly plugging in the empirical characteristic functions converges in distribution to a weighted sum of chi-square random variables as the sample size n tends to infinity. A bias-corrected version of the distance correlation was introduced later in Székely and Rizzo (2013, 2014) to address the bias issue in high dimensions. Huo and Székely (2016) proved that for fixed dimensionality and independent X and Y, the standardized unbiased sample distance covariance converges to a weighted sum of centralized chi-square random variables asymptotically. In contrast, Székely and Rizzo (2013) considered another scenario when the dimensionality diverges with sample size fixed and showed that for random vectors each with exchangeable components, the bias-corrected sample distance correlation converges to a suitable t-distribution. Recently Zhu et al. (2020) extended the result to more general assumptions and obtained the central limit theorem in the high-dimensional medium-sample-size setting.

Despite the aforementioned existing results, the asymptotic theory for sample distance correlation between X and Y under the null hypothesis of independence in general case of n, p and q diverging in an arbitrary fashion remains largely unexplored. As the first major contribution of the paper, we provide a more complete picture of the precise limiting distribution in such setting. In particular, under some mild regularity conditions and the independence of X and Y, we obtain central limit theorems for a rescaled test statistic based on the bias-corrected sample distance correlation in high dimensions (see Theorems 1 and 2). Moreover, we derive the explicit rates of convergence to the limiting distributions (see Theorems 3 and 4). To the best of our knowledge, the asymptotic theory built in Theorems 14 is new to the literature. Our theory requires no constraint on the relationship between sample size n and dimensionalities p and q. Our results show that the accuracy of normal approximation can increase with dimensionality, revealing an interesting phenomenon of blessing of dimensionality.

The second major contribution of our paper is to provide a general theory on the power analysis of the rescaled sample distance correlation. We show in Theorem 5 that as long as the population distance correlation and covariance do not decay too fast as sample size increases, the rescaled sample distance correlation diverges to infinity with asymptotic probability one, resulting in a test with asymtotic power one. We further consider in Theorem 6 a specific alternative hypothesis where X and Y have pure nonlinear dependency in the sense that their componentwise Pearson correlations are all zero, and show that the rescaled sample distance correlation achieves asymptotic power one when p=q=o(n). This reveals an interesting message that in moderately high-dimensional setting, the rescaled sample distance correlation is capable of detecting pure nonlinear dependence with high power.

Among the existing literature, the most closely related paper to ours is the one by Zhu et al. (2020). Yet, our results are significantly different from theirs. For clarity we discuss the differences under the null and alternative hypotheses separately. Under the null hypothesis of X and Y being independent, our results differ from theirs in four important aspects: 1) Zhu et al. (2020) considered the scenario where sample size n grows at a slower rate compared to dimensionalities p and q, while our results make no assumption on the relationship between n and p, q; 2) Zhu et al. (2020) assumed that min{p, q} → ∞, whereas our theory relies on a more relaxed assumption of p + q → ∞; 3) there is no rate of convergence provided in the work of Zhu et al. (2020), while explicit rates of convergence are developed in our theory; 4) the proof in Zhu et al. (2020) is based on the componentwise analysis, whereas our technical proof is based on the joint analysis by treating the high-dimensional random vectors as a whole; See Table 1 in Section 3.4 for a summary of these key differences under the illustrative example of m-dependent components.

Table 1.

Comparison under the assumptions of Proposition 2

Conditions for asymptotic mormality

p → ∞, q → ∞ p → ∞, q fixed (similarly for p fixed, q → ∞)

Zhu et al. (2020) m13/p0, m23/q0,
m1/n1/40, m2/n1/40,
nm1m2/q0,
nm1m2/p0.
No result

Our work m13/p0, m23/q0,
m1m2/n0.
m13/p0,m1/n0.

The difference under the alternative hypothesis of dependence is even more interesting. Zhu et al. (2020) showed that under the alternative hypothesis of dependence, when both dimensionalities p and q grow much faster than sample size n, the sample distance covariance asymptotically measures the linear dependence between two random vectors satisfying certain moment conditions, and fails to capture the nonlinear dependence in high dimensions. To address this issue, a marginally aggregated distance correlation statistic was introduced therein to deal with high-dimensional independence testing. However, as discussed above, we provide a specific alternative hypothesis under which the rescaled sample distance correlation is capable of identifying the pure nonlinear relationship when p=q=o(n). These two results complement each other and indicate that the sample distance correlation can have rich asymptotic behavior in different diverging regimes of (n, p, q). See Table 2 in Section 3.5 for a summary of the differences in power analysis. The complete spectrum of the alternative distribution as a function of (n, p, q) is still largely open and can be challenging to study. In simulation Example 6 in Section 4.3, we give an example showing that the marginally aggregated distance correlation statistic can suffer from power loss if the true dependence in data is much more than just marginal.

Table 2.

Comparison of power analysis in detecting pure nonlinear dependency

Zhu et al. (2020) Asymptotically no power when p and q grow much faster than n (especially it requires min{p, q} ≫ n2 when X, Y consist of i.i.d. components)
Our work Asymptotically can achieve power one when p=q=o(n) (under the conditions of Theorem 6)

It is also worth mentioning that our Propositions A.1A.3 (see Section A.4 of Supplementary Material), which serve as the crucial ingredient of the proofs for Theorems 2 and 4, provide some explicit bounds on certain key moments identified in our theory under fairly general conditions, which can be of independent interest.

The rest of the paper is organized as follows. Section 2 introduces the distance correlation and reviews the existing limiting distributions. We present a rescaled test statistic, its asymptotic distributions, and a power analysis for high-dimensional distance correlation inference in Section 3. Sections 4 and 5 provide several simulation examples and a blockchain application justifying our theoretical results and illustrating the finite-sample performance of the rescaled test statistic. We discuss some implications and extensions of our work in Section 6. All the proofs and technical details are provided in the Supplementary Material.

2. Distance correlation and distributional properties.

2.1. Bias-corrected distance correlation.

Let us consider a pair of random vectors Xp and Yq with integers p, q ≥ 1 that are of possibly different dimensions and possibly mixed data types such as continuous or discrete components. For any vectors tp and sq, denote by 〈t,X〉 and 〈s,Y 〉 the corresponding inner products. Let fX(t)=Eeit,X, fY(s)=Eeis,Y, and fX,Y(t,s)=Eeit,X+is,Y be the characteristic functions of X, Y, and the joint distribution (X,Y ), respectively, where i associated with the expectations represents the imaginary unit (−1)1/2. Székely, Rizzo and Bakirov (2007) defined the squared distance covariance V2(X,Y) as

V2(X,Y)=p+q|fX,Y(t,s)fX(t)fY(s)|2cpcqtp+1sq+1dtds, (1)

where

cp=π(p+1)/2Γ((p+1)/2)

with Γ(·) the gamma function and ‖ · ‖ stands for the Euclidean norm of a vector. Observe that 2cp and 2cq are simply the volumes of p-dimensional and q-dimensional unit spheres in the Euclidean spaces, respectively. In view of the above definition, it is easy to see that X and Y are independent if and only if V2(X,Y)=0. Thus distance covariance characterizes completely the independence.

The specific weight in (1) gives us an explicit form of the squared distance covariance (see Székely, Rizzo and Bakirov (2007))

V2(X,Y)=E[X1X2Y1Y2]2E[X1X2Y1Y3]+E[X1X2]E[Y1Y2], (2)

where (X1,Y1), (X2,Y2), and (X3,Y3) are independent copies of (X,Y ). Moreover, Lyons (2013) showed that

V2(X,Y)=E[d(X1,X2)d(Y1,Y2)] (3)

with the double-centered distance

d(X1,X2)=X1X2E[X1X2X1]E[X1X2X2]+E[X1X2] (4)

and d(Y1,Y2) defined similarly. Let V2(X)=V2(X,X) and V2(Y)=V2(Y,Y) be the squared distance variances of X and Y, respectively. Then the squared distance correlation R(X,Y) is defined as

R2(X,Y)={V2(X,Y)V2(X)V2(Y)ifV2(X)V2(Y)>0,0ifV2(X)V2(Y)=0. (5)

Now assume that we are given a sample of n independent and identically distributed (i.i.d.) observations {(Xi,Yi),1 ≤ in} from the joint distribution (X,Y ). In Székely, Rizzo and Bakirov (2007), the squared sample distance covariance Vn2(X,Y) was constructed by directly plugging in the empirical characteristic functions as

Vn2(X,Y)=p+q|fX,Yn(t,s)fXn(t)fYn(s)|2cpcqtp+1sq+1dtds, (6)

where fYn(t), fYn(s), and fX,Yn(t,s) are the corresponding empirical characteristic functions. Thus the squared sample distance correlation is given by

Rn2(X,Y)={Vn2(X,Y)Vn2(X)Vn2(Y)ifVn2(X)Vn2(Y)>0,0ifVn2(X)Vn2(Y)=0. (7)

Similar to (2) and (3), the squared sample distance covariance admits the following explicit form

Vn2(X,Y)=1n2k,l=1nAk,lBk,l, (8)

where Ak,l and Bk,l are the double-centered distances defined as

Ak,l=ak,l1ni=1nai,l1nj=1nak,j+1n2i,j=1nai,j,
Bk,l=bk,l1ni=1nbi,l1nj=1nbk,j+1n2i,j=1nbi,j

with ak,l = ‖XkXl‖ and bk,l = ‖YkYl‖. It is easy to see that the above estimator is an empirical version of the right hand side of (3). The double-centered population distance d(Xk,Xl) is estimated by the double-centered sample distance Ak,l and then E[d(X1,X2)] is estimated by the mean of all the pairs of double-centered sample distances.

Although it is natural to define the sample distance covariance in (6), Székely and Rizzo (2013) later demonstrated that such an estimator is biased and can lead to interpretation issues in high dimensions. They revealed that for independent random vectors Xp and Yq with i.i.d. components and finite second moments, it holds that

Rn2(X,Y)p,q1

when sample size n is fixed, but we naturally have R2(X,Y)=0 in this scenario. To address this issue, Székely and Rizzo (2013, 2014) introduced a modified unbiased estimator of the squared distance covariance and the bias-corrected sample distance correlation given by

Vn*(X,Y)=1n(n3)klAk,l*Bk,l* (9)

and

Rn*(X,Y)={Vn*(X,Y)Vn*(X)Vn*(Y)ifVn*(X)Vn*(Y)>0,0ifVn*(X)Vn*(Y)=0. (10)

respectively, where the U-centered distances Ak,l* and Bk,l* are defined as

Ak,l*=ak,l1n2i=1nai,l1n2j=1nak,j+1(n1)(n2)i,j=1nai,j,
Bk,l*=bk,l1n2i=1nbi,l1n2j=1nbk,j+1(n1)(n2)i,j=1nbi,j.

Our work will focus on the bias-corrected distance-based statistics Vn*(X,Y) and Rn*(X,Y) given in (9) and (10), respectively.

2.2. Distributional properties.

In general, the exact distributions of the distance covariance and distance correlation are intractable. Thus it is essential to investigate the asymptotic surrogates in order to apply the distance-based statistics for the test of independence. With dimensionalities p, q fixed and sample size n → ∞, Huo and Székely (2016) validated that Vn*(X,Y) is a U-statistic and then under the independence of X and Y, it admits the following asymptotic distribution

nVn*(X,Y)nDi=1λi(Zi21), (11)

where {Zi, i ≥ 1} are i.i.d. standard normal random variables and {λi, i ≥ 1} are the eigenvalues of some operator.

On the other hand, Székely and Rizzo (2013) showed that when the dimensionalities p and q tend to infinity and sample size n ≥ 4 is fixed, if X and Y both consist of i.i.d. components, then under the independence of X and Y we have

TR:=n(n3)/21Rn*(X,Y)1(Rn*(X,Y))2p,qDtn(n3)/21. (12)

However, it still remains to investigate the limiting distributions of distance correlation when both sample size and dimensionality are diverging simultaneously. It is common to encounter datasets that are of both high dimensions and large sample size such as in biology, ecology, medical science, and networks. When min{p, q} → ∞ and n → ∞ at a slower rate compared to p, q, under the independence of X and Y and some conditions on the moments Zhu et al. (2020) showed that

TRDN(0,1). (13)

Their result was obtained by approximating the unbiased sample distance covariance with the aggregated marginal distance covariance, which can incur stronger assumptions including n → ∞ at a slower rate compared to p, q and min{p, q} → ∞.

The main goal of our paper is to fill such a gap and make the asymptotic theory of distance correlation more complete. Specifically, we will prove central limit theorems for Rn*(X,Y) when n → ∞ and p + q → ∞. In contrast to the work of Zhu et al. (2020), we analyze the unbiased sample distance covariance directly by treating the random vectors as a whole. Our work will also complement the recent power analysis in Zhu et al. (2020), where distance correlation was shown to asymptotically measure only linear dependency in the regime of fast growing dimensionality (min{p, q}/n2 → ∞) and thus the marginally aggregated distance correlation statistic was introduced. However, as shown in Example 6 in Section 4.3, the marginally aggregated statistic can be less powerful than the joint distance correlation statistic when the dependency between the two random vectors far exceeds the marginal contributions. To understand such a phenomenon, we will develop a general theory on the power analysis for the rescaled distance correlation statistic in Theorem 5 and further justify its capability of detecting nonlinear dependency in Theorem 6 for the regime of moderately high dimensionality.

3. High-dimensional distance correlation inference.

3.1. A rescaled test statistic.

To simplify the technical presentation, we assume that E[X]=0 and E[Y]=0 since otherwise we can first subtract the means in our technical analysis. Let E[XXT]=Σx and E[YYT]=Σy be the covariance matrices of random vectors X and Y, respectively. To test the null hypothesis that X and Y are independent, in this paper we consider a rescaled test statistic defined as a rescaled distance correlation

Tn:=n(n1)2Rn*(X,Y)=n(n1)2Vn*(X,Y)Vn*(X)Vn*(Y). (14)

It has been shown in Huo and Székely (2016) that Vn*(X,Y) is a U-statistic. A key observation is that by the Hoeffding decomposition for U-statistics, the dominating part is a martingale array under the independence of X and Y. Then we can apply the martingale central limit theorem and calculate the specific moments involved.

More specifically, Huo and Székely (2016) showed that

Vn*(X,Y)=(n4)11i1<i2<i3<i4nh((Xi1,Yi1),,(Xi4,Yi4)), (15)

where the kernel function is given by

h((X1,Y1),(X2,Y2),(X3,Y3),(X4,Y4))=141i,j4,ijXiXjYiYj14i=14(1j4,jiXiXj1j4jiYiYj)+1241i,j4,ijXiXj1i,j4,ijYiYj. (16)

Let us define another functional

g(X1,X2,X3,X4):=d(X1,X2)d(X1,X3)d(X2,X4)d(X3,X4), (17)

where d(·,·) is the double-centered distance defined in (4). The above technical preparation enables us to derive the main theoretical results.

3.2. Asymptotic distributions.

THEOREM 1.

Assume that EX2+2τ+EY2+2τ< for some constant 0 < τ ≤ 1. If

E(|d(X1,X2)|2+2τ)E(|d(Y1,Y2)|2+2τ)nτ[V2(X)V2(Y)]1+τ0 (18)

and

E[g(X1,X2,X3,X4)]E[g(Y1,Y2,Y3,Y4)][V2(X)V2(Y)]20 (19)

as n → ∞ and p+q → ∞, then under the independence of X and Y we have TnDN(0,1).

Theorem 1 presents a general theory and relies on the martingale central limit theorem. In fact, when X and Y are independent, via the Hoeffding decomposition we can find that the dominating part of Vn*(X,Y) forms a martingale array which admits asymptotic normality under conditions (18) and (19). Moreover, it also follows from (18) that

Vn*(X)V2(X)1andVn*(Y)V2(Y)1inprobability.

Thus an application of Slutsky’s lemma results in the desired results.

Although Theorem 1 is for the general case, the calculation of the moments involved such as E[g(X1,X2,X3,X4)], V2(X), and E(|d(X1,X2)|2+2τ) for the general underlying distribution can be challenging. To this end, we provide in Propositions A.1A.3 in Section A.4 some bounds or exact orders of those moments. These results together with Theorem 1 enable us to obtain Theorem 2 on an explicit and useful central limit theorem with more specific conditions. Let us define quantities

BX=E[X1X22]=2E[X2],BY=E[Y1Y22]=2E[Y2],
Lx,τ=E(|X2EX2|2+2τ)+E(|X1TX2|2+2τ),
Ly,τ=E(|Y2EY2|2+2τ)+E(|Y1TY2|2+2τ),

and

Ex=E[(X1TΣxX2)2]+BX2τLx,τ(2+τ)/(1+τ)(E[(X1TX2)2])2,
Ey=E[(Y1TΣyY2)2]+BY2τLy,τ(2+τ)/(1+τ)(E[(Y1TY2)2])2.

THEOREM 2.

Assume that E[X4+4τ]+E[Y4+4τ]< for some constant 0 < τ ≤ 1/2 and as n → ∞ and p+q → ∞,

nτLx,τLy,τ(E[(X1TX2)2]E[(Y1TY2)2])1+τ0. (20)

In addition, assume that Ex → 0 if p → ∞, and Ey → 0 if q → ∞. Then under the independence of X and Y, we have TnDN(0,1).

Theorem 2 provides a user-friendly central limit theorem with mild regularity conditions that are easy to verify and can be satisfied by a large class of distributions. To get some insights into the orders of the moments BX,Lx,τ,E[(X1TX2)2], and E[(X1TΣxX2)2], one can refer to Section 3.4 for detailed explanations by examining some specific examples. In Theorem 2, we show the results only under the scenario of 0 < τ ≤ 1/2. In fact, similar results also hold for the case of 1/2 < τ ≤ 1; see Section D of Supplementary Material for more details.

3.3. Rates of convergence.

Thanks to the martingale structure of the dominating term of Vn*(X,Y) under the independence of X and Y, we can obtain explicitly the rates of convergence for the normal approximation.

THEOREM 3.

Assume that EX2+2τ+EY2+2τ< for some constant 0 < τ ≤ 1. Then under the independence of X and Y, we have

supx|(Tnx)Φ(x)|C{(E[g(X1,X2,X3,X4)]E[g(Y1,Y2,Y3,Y4)][V2(X)V2(Y)]2)1+τ2+E[|d(X1,X2)|2+2τ]E[|d(Y1,Y2)|2+2τ]nτ[V2(X)V2(Y)]1+τ}13+2τ, (21)

where C is an absolute positive constant and Φ(x) is standard normal distribution function.

In view of the evaluation of the moments in Propositions A.1A.3, we can obtain the following theorem as a consequence of Theorem 3.

THEOREM 4.

Assume that E[X4+4τ]+E[Y4+4τ]< for some constant 0 < τ ≤ 1/2,

BX2τLx,τ/E[(X1TX2)2]1/18,andBY2τLy,τ/E[(Y1TY2)2]1/18. (22)

Then under the independence of X and Y, we have

supx|(Tnx)Φ(x)|C{(ExEy)1+τ2+nτLx,τLy,τ(E[(X1TX2)2]E[(Y1TY2)2])1+τ}13+2τ, (23)

where C is an absolute positive constant.

The counterpart theory for the case of 1/2 < τ ≤ 1 is presented in Section D of Supplementary Material. In general, larger value of τ will lead to better convergence rates and weaker conditions, which will be elucidated by the example of m-dependent components in Proposition 2 (see Section 3.4).

Let us now consider the case when only one of p and q is diverging, say, p is fixed and q → ∞. Then by the moment assumption E[X4+4τ]<, all the moments related to X on the right hand side of (21) are of bounded values. Thus in light of the proof of Theorem 4, we can see that if E[X4+4τ]+E[Y4+4τ]< for some constant 0 < τ ≤ 1/2, then there exists some positive constant CX depending on the underlying distribution of X such that under the independence of X and Y, we have

supx|(Tn<x)Φ(x)|CX{(Ey118)1+τ2+nτLy,τ(E[(Y1TY2)2])1+τ}13+2τ. (24)

It is worth mentioning that the bounds obtained in (21) and (23) are nonasymptotic results that quantify the accuracy of the normal approximation and reveal how the rate of convergence depends on the sample size and dimensionalities. Since we exploit the rate of convergence in the central limit theorem for general martingales (Haeusler, 1988) under the assumption of 0 < τ ≤ 1, the result may not necessarily be optimal. It is possible that better convergence rate can be obtained for the case of τ > 1, which is beyond the scope of the current paper.

An anonymous referee asked a great question on whether similar results as in Theorems 1 and 3 apply to the studentized statistic TR defined in (12). The answer is affirmative. Combining our Theorem 1 with Lemma 1 and (A.50), it can be shown that TR enjoys the same asymptotic normality as Tn presented in Theorem 1. Moreover, the rates of convergence in Theorem 3 also apply to TR. See Section F of Supplementary Material for the proof of these results for TR. These results suggest that the studentized statistic TR can be a good choice in both small and large samples. Yet the exact phase transition theory for the asymptotic null distribution of TR in the full diverging spectrum of (n, p, q) remains to be developed.

3.4. Some specific examples.

To better illustrate the results obtained in the previous theorems, let us consider several concrete examples now. To simplify the technical presentation, we assume in this section that both p and q tend to infinity as n increases. Our technical analysis also applies to the case when only one of p and q diverges.

PROPOSITION 1.

Assume that E(X4+4τ)+E(Y4+4τ)< for some constant 0 < τ ≤ 1/2 and there exist some positive constants c1,c2 such that

Lx,τc1p1+τ,E[(X1TΣxX2)2]c1p, (25)
E[(X1TX2)2]c2p,E[X2]c2p, (26)

and

Ly,τc1q1+τ,E[(Y1TΣyY2)2]c1q, (27)
E[(Y1TY2)2]c2q,E[Y2]c2q. (28)

Then under the independence of X and Y, there exists some positive constant A depending upon c1 and c2 such that for sufficiently large p and q, we have

supx|(Tnx)Φ(x)|A[(pq)τ(1+τ)/2+nτ]1/(3+2τ).

Hence as n → ∞ and p, q → ∞, it holds that TnDN(0,1).

The first example considered in Proposition 1 is motivated by the case of independent components. Indeed, by Rosenthal’s inequality for the sum of independent random variables, (25) and (26) are automatically satisfied when X consists of independent nondegenerate components with zero mean and uniformly bounded (4+ 4τ)th moment.

We next consider the second example of m-dependent components. For an integer m ≥ 1, a sequence {Ui}0 is m-depenendent if {Ui}0n and {Ui}n+m+1 are independent for every n ≥ 0. We now focus on a special but commonly used scenario in which X consists of m1-dependent components and Y consists of m2-dependent components for some integers m1 ≥ 1 and m2 ≥ 1. Assume that (X1,Y1) and (X2,Y2) are independent copies of (X,Y ) and denote by

X1=(X1,1,X1,2,,X1,p)T,X2=(X2,1,X2,2,,X2,p)T,
Y1=(Y1,1,Y1,2,,Y1,q)T,Y2=(Y2,1,Y2,2,,Y2,q)T.

We can develop the following proposition by resorting to Theorem 4 for the case of 0 < τ ≤ 1/2 and Theorem D.1 in Section D.1 of Supplementary Material for the case of 1/2 < τ ≤ 1.

PROPOSITION 2.

Assume that E(|X1,i|4+4τ)< and E(|Y1,j|4+4τ)< for any 1 ≤ ip, 1 ≤ jq with some constant 0 < τ ≤ 1, and there exist some positive constants κ1234 such that

max{p1i=1pE[|X1,i|4+4τ],q1j=1qE[|Y1,j|4+4τ]}κ1, (29)
min{p1E[(X1TX2)2],q1E[(Y1TY2)2]}κ2, (30)
min{p1BX,q1BY}κ3, (31)
max1ipE[X1,i2]κ4,max1jqE[Y1,j2]κ4. (32)

In addition, assume that X consists of m1-dependent components, Y consists of m2-dependent components, and

m1=o(pτ/(2+τ)),m2=o(qτ/(2+τ)),m1m2=o(nτ/(1+τ)). (33)

Then under the independence of X and Y, there exists some positive constant A depending upon κ1, · · · 4 such that

supx|(Tnx)Φ(x)|A[([(m1+1)(m2+1)]2+τ(pq)τ)1+τ2+[(m1+1)(m2+1)]1+τnτ]13+2τ. (34)

Hence under condition (33), we have TnDN(0,1) as n → ∞ and p, q → ∞.

Zhu et al. (2020) also established the asymptotic normality of the rescaled distance correlation. For clear comparison, we summarize in Table 1 the key differences between our results and theirs under the assumptions of Proposition 2 and the existence of the eighth moments (τ = 1).

We further consider the third example of multivariate normal random variables. For such a case, we can obtain a concise result in the following proposition.

PROPOSITION 3.

Assume that X ~ N(0,Σx), Y ~ N(0,Σy), and the eigenvalues of Σx and Σy satisfy that a1λ1Xλ2XλpXa2 and a1λ1Yλ2YλqYa2 for some positive constants a1 and a2. Then under the independence of X and Y, there exists some positive constant C depending upon a1, a2 such that

supx|(Tnx)Φ(x)|C[(pq)1/5+n1/5].

Hence we have TnDN(0,1) as n → ∞ and p, q → ∞.

We would like to point out that the rate of convergence obtained in Proposition 3 can be suboptimal since the error rate n−1/5 is slower than the classical convergence rate with order n−1/2 of the CLT for the sum of independent random variables. Our results are derived by exploiting the convergence rate of CLT for general martingales (Haeusler, 1988). It may be possible to improve the rate of convergence if one takes into account the specific intrinsic structure of distance covariance, which is beyond the scope of the current paper.

3.5. Power analysis.

We now turn to the power analysis for the rescaled distance correlation. We start with presenting a general theory on power in Theorem 5 below. Let us define two quantities

Lx=E(|X2EX2|4)+E(|X1TX2|4),Ly=E(|Y2EY2|4)+E(|Y1TY2|4). (35)

THEOREM 5.

Assume that E(X8)+E(Y8)< and (18) holds with τ = 1. If nR2(X,Y) and nV2(X,Y)/(BX1/2BY1/2Lx1/4Ly1/4), then for any arbitrarily large constant C > 0, (Tn>C)1 as n → ∞. Thus, for any significance level α, (Tn>Φ1(1α))1 as n → ∞, where Φ−1(1 − α) represents the (1 − α)th quantile of the standard normal distribution.

Theorem 5 provides a general result on the power of the rescaled distance correlation statistic. It reveals that as long as the signal strength, measured by R2(X,Y) and V2(X,Y), is not too weak, the power of testing independence with the rescaled sample distance correlation can be asymptotically one. In most cases, the population distance variances V2(X) and V2(Y) are of constant order by Proposition A.2. Therefore, if BX1/2BY1/2Lx1/4Ly1/4 also of constant order, then the conditions in Theorem 5 will reduce to nR2(X,Y), which indicates that the signal strength should not decay faster than n−1/2. To gain some insights, assume that both Xp and Yq consist of independent components with uniformly upper bounded eighth moments and uniformly lower bounded second moments. Then it holds that BX = O(p), BY = O(q), Lx = O(p2), Ly = O(q2), V2(X)=O(1), and V2(Y)=O(1). Thus the conditions in Theorem 5 above reduce to E(X8)+E(Y8)< and nR2(X,Y). In general, R2(X,Y) and V2(X,Y) depend on the dimensionalities and hence the conditions of Theorem 5 impose certain relationship between n and p.

Recently Zhu et al. (2020) showed that in the asymptotic sense, the distance covariance detects only componentwise linear dependence in the high-dimensional setting when both dimensionalities p and q grow much faster than sample size n (see Theorems 2.1.1 and 3.1.1 therein). In particular, when X and Y both consist of i.i.d. components with certain bounded moments, distance covariance was shown to asymptotically measure linear dependence if min{p, q}/n2 → ∞. However, in view of (1) and (5), the population distance covariance and distance correlation indeed characterize completely the independence between two random vectors in arbitrary dimensions. Therefore, it is natural to ask whether the sample distance correlation can detect nonlinear dependence in some other diverging regime of (n, p, q). The answer turns out to be affirmative in the regime of moderately high dimensionality: We formally present this result in the following theorem on the asymptotic power and compare with the results in Zhu et al. (2020) in Table 2.

THEOREM 6.

Assume that we have i.i.d. observations {(Xi,Yi),1 ≤ in} with Xip and Yip, X1 = (X1,1, …, X1,p) with X having a symmetric distribution, and {X1,i, 1 ≤ ip} are m-dependent for some fixed positive integer m. Let Y1 = (Y1,1, …, Y1,p) be given by Y1,j = gj(X1,j) for each 1 ≤ jp, where {gj, 1 ≤ jp} are symmetric functions satisfying gj(x) = gj(−x) for x and 1 ≤ jp. Assume further that E(X1,j12)+E(Y1,j12)c112, var(X1,j)c22, and var(Y1,j)c22 for some positive constants c1, c2. Then there exists some positive constant A depending on c1, c2, and m such that

V2(X,Y)Ap1+O(p3/2)
andR2(X,Y)Ap1+O(p3/2).

Consequently, if p=o(n), then for any arbitrary large constant C > 0, (Tn>C)1 as n → ∞, and thus the test of independence between X and Y based on the rescaled sample distance correlation Tn has asymptotic power one.

Under the symmetry assumptions in Theorem 6, we can show that there is no linear dependence between X and Y by noting that cov(X1,i,Y1,j) = 0 for each 1 ≤ i, jp. It is worth mentioning that we have assumed the m-dependence for some fixed integer m ≥ 1 to simplify the technical analysis. In fact, m can be allowed to grow slowly with sample size n and our technical arguments are still applicable.

4. Simulation studies.

In this section, we conduct several simulation studies to verify our theoretical results on sample distance correlation and illustrate the finite-sample performance of our rescaled test statistic for the test of independence.

4.1. Normal approximation accuracy.

We generate two independent multivariate normal random vectors Xp and Yp in the following simulated example and calculate the rescaled distance correlation Tn defined in (14).

EXAMPLE 1.

Let Σ=(σi,j)p×p with σi,j = 0.7|i−j|, and X ~ N(0, Σ) and Y ~ N(0, Σ) be independent. We consider the settings of n = 100 and p = 10, 50, 200, 500.

We conduct 5000 Monte Carlo simulations and generate the histograms of the rescaled test statistic Tn to investigate its empirical distribution. Histograms with a comparison of the kernel density estimate (KDE) and the standard normal density function are shown in Figure 1. From the histograms, we can see that the distribution of Tn mimics very closely the standard normal distribution under different settings of dimensionalities. Moreover, for more refined comparison, the maximum pointwise distances between the KDE and the standard normal density function under different settings are presented in Table 3. It is evident that the accuracy of the normal approximation increases with dimensionality, which is in line with our theoretical results.

Fig 1:

Fig 1:

Histograms of the rescaled test statistic Tn in Example 1. The blue curve represents the kernel density estimate and the red curve represents the standard normal density.

Table 3.

Distances between the KDE and standard normal density function in Example 1.

n p Distance n p Distance

100 10 0.0955 100 200 0.0288
100 50 0.0357 100 500 0.0181

4.2. Test of independence.

To test the independence of random vectors X and Y in high dimensions, based on the asymptotic normality developed for the rescaled distance correlation statistic Tn, under significance level α we can reject the null hypothesis when

Tn=n(n1)2Rn*(X,Y)>Φ1(1α), (36)

since the distance correlation is positive under the alternative hypothesis. To assess the performance of our normal approximation test, we also include the gamma-based approximation test (Huang and Huo, 2017) and normal approximation for studentized sample distance correlation TR defined in (12) (Zhu et al., 2020) in the numerical comparisons.

The gamma-based approximation test assumes that the linear combination i=1λiZi2 involved in the limiting distribution of the standardized sample distance covariance nVn*(X,Y) under fixed dimensionality (see (11)) can be approximated heuristically by a gamma distribution Γ(β12) with matched first two moments. In particular, the shape and rate parameters are determined as

β1=(i=1λi)22i=1λi2=(EXXEYY)22V2(X)V2(Y)

and

β2=(i=1λi)22i=1λi2=(EXXEYY)22V2(X)V2(Y).

Thus given observations (X1,Y1), · · ·, (Xn,Yn), β1 and β2 can be estimated by their empirical versions

β^1=μ22Vn*(X)Vn*(Y)andβ^2=μ2Vn*(X)Vn*(Y).

where μ=1n2(n1)2ijXiXjijYiYj. Then the null hypothesis is rejected at the significanve level α if nVn*(X,Y)>Γ1α(β^1,β^2)μ, where Γ1α(β^1,β^2) is the (1 − α)th quantile of the distribution Γ(β^1,β^2). The gamma-based approximation test still lacks rigorous theoretical justification.

When the sample size and dimensionalities tend to infinity simultaneously, in view of our main result in Theorem 2 and the consistency of Rn*(X,Y) (recall Lemma 1 and (A.50) in Section C.1 of Supplementary Material), one can see that under the null hypothesis, TRDN(0,1). Therefore, we can reject the null hypothesis at significance level α if TR > Φ−1(1− α).

We consider two simulated examples to compare the aforementioned three approaches for testing the independence between two random vectors in high dimensions. The significance level is set as α = 0.05 and 2000 Monte Carlo replicates are carried out to compute the empirical rejection rates.

EXAMPLE 2.

Let Σ=(σi,j)p×p with σi,j = 0.5|i−j|. Let X and Y be independent and X ~ N(0, Σ), Y ~ N(0, Σ).

EXAMPLE 3.

Let Σ=(σi,j)p×p with σi,j = 0.5|ij|. Let X = (X(1), …, X(p)) ~ N(0, Σ) and Y = (Y (1), …, Y (p)) with Y(i)=0.2(X(i)+(X(i))2)+εi and εii.i.d.t4.

Type-I error rates in Example 2 under different settings of n and p are presented in Figure 2. From Figure 2, it is easy to see that the rejection rates of the normal approximation test for Tn tend to be closer and closer to the preselected significance level as the dimensionalities and the sample size grow. The same trend applies to the other two approches too. The empirical powers of the three tests in Example 3 are shown in Figure 3. We can observe from the simulation results in Figures 2 and 3 that these three tests perform asymptotically almost the same, which is sensible. Empirically, the gamma approximation for nVn*(X,Y) and normal approximation may be asymptotically equivalent to some extent and more details on their connections are discussed in Section E of Supplementary Material. However, the theoretical foundation of the gamma approximation for nVn*(X,Y) remains undeveloped. As for the asymptotic equivalence between Tn and the studentized sample distance correlation TR, Lemma 1 and (A.50) imply that under the null hypothesis and some general conditions, Rn*(X,Y)0 in probability and hence TR can be asymptotically equivalent to Tn when n → ∞.

Fig 2:

Fig 2:

Rejection rates of the three approaches under different settings of n and p in Example 2.

Fig 3:

Fig 3:

Power of the three approaches under different settings of n and p in Example 3.

4.3. Detecting nonlinear dependence.

We further provide several examples to justify the power of the rescaled distance correlation statistic in detecting nonlinear dependence in the regime of moderately high dimensionality. In the following simulation examples, the significance level of test is set as 0.05 and 2000 Monte Carlo replicates are conducted to compute the rejection rates.

EXAMPLE 4.

Let X = (X(1), …, X(p))T ~ N(0, Ip) and Y = (Y (1), …, Y (p))T satisfying Y (i) = (X(i))2.

EXAMPLE 5.

Set Σ=(σi,j)p×p with σi,j = 0.5|ij|. Let X = (X(1), …, X(p)) ~ N(0,Σ) and Y = (Y (1), …, Y (p))T with Y (i) = (X(i))2.

For the above two examples, it holds that cov(X(i),Y (j)) = 0 for each 1 ≤ i, jp. Simulation results on the power under Examples 4 and 5 for different settings of n and p are summarized in Table 4. Guided by Theorem 6, we set p=2[n] with [·] denoting the integer part of a given number. From Table 4, we can see that even though there is only nonlinear dependency between X and Y, the power of rescaled distance correlation can still approach one when the dimensionality p is moderately high. One interesting phenomenon is that the power in Example 5 is higher than that in Example 4, which suggests that the dependence between components may strengthen the dependency between X and Y.

Table 4.

Power of our rescaled test statistic with p=2[n] in Examples 4 and 5 (with standard errors in parentheses).

Example 4 Example 5

n p Power n p Power

10 6 0.2765 (0.0100) 10 6 0.3060 (0.0103)
40 12 0.5165 (0.0112) 40 12 0.7005 (0.0102)
70 16 0.6970 (0.0103) 70 16 0.9380 (0.0054)
100 20 0.8220 (0.0086) 100 20 0.9885 (0.0024)
130 22 0.9270 (0.0058) 130 22 0.9995 (0.0005)
160 26 0.9550 (0.0046) 160 26 0.9990 (0.0007)

Moreover, we investigate the setting when one dimensionality is fixed and the other one tends to infinity.

EXAMPLE 6.

Set Σ=(σi,j)p×p with σi,j = 0.7|ij|. Let X = (X(1), …, X(p)) ~ N(0,Σ) and Y=(i=1pX(i))2/p.

For Example 6, it holds that cov(X(i),Y ) = 0 for each 1 ≤ ip and thus the dependency is purely nonlinear. We compare the power of our rescaled distance correlation statistic with the marginally aggregated distance correlation (mdCor) statistic (Zhu et al., 2020) and the linear measure of RV coefficient (Escoufier, 1973; Robert and Escoufier, 1976). The comparison under different settings of p and n are presented in Figure 4. We can observe from Figure 4 that under this scenario, the rescaled distance correlation statistic significantly outperforms the marginally aggregated distance correlation statistic. This is because the marginally aggregated statistic can detect only the marginal dependency between X and Y, while Y depends on the entire X jointly in this example. Since the RV coefficient measures the linear dependence, its power stays flat and low when the sample size increases.

Fig 4:

Fig 4:

Comparison of power under different settings of n and p in Example 6.

These simulation examples demonstrate the capability of distance correlation in detecting nonlinear dependence in the regime of moderately high dimensionality, which is in line with our theoretical results on the power analysis in Theorem 6. Moreover, when X and Y depend on each other far from marginally, the marginally aggregated distance correlation statistic can indeed be less powerful than the rescaled distance correlation statistic.

5. Real data application.

We further demonstrate the practical utility of our normal approximation test for bias-corrected distance correlation on a blockchain application, which has gained increasing public attention in recent years. Specifically, we would like to understand the nonlinear dependency between the cryptocurrency market and the stock market through the test of independence. Indeed investors are interested in testing whether there is any nonlinear association between these two markets since they want to diversify their portfolios and reduce the risks. In particular, we collected the historical daily returns over recent three years from 08/01/2016 to 07/31/2019 for both stocks in the Standard & Poors 500 (S&P 500) list (from https://finance.yahoo.com) and the top 100 cryptocurrencies (from https://coinmarketcap.com). As a result, we obtained a data matrix of dimensions 755 × 505 for stock daily returns and a data matrix of dimensions 1095 × 100 for cryptocurrency daily returns, where the rows correspond to the trading dates and the columns represent the stocks or cryptocurrencies. Since stocks are traded only on Mondays through Fridays excluding holidays, we adapted the cryptocurrency data to this restriction and picked a submatrix of cryptocurrency data matrix to match the dates. Moreover, because some stocks and cryptocurrencies were launched after 08/01/2016, there are some missing values in the corresponding columns. We removed those columns containing missing values. Finally, we obtained a data matrix XT×N1 for stock daily returns and a data matrix YT×N2 for cryptocurrency daily returns, where T = 755, N1 = 496, and N2 = 22. Although the number of cryptocurrencies drops to 22 after removing the missing values, the remaining ones are still very representative in terms of market capitalization, which include the major cryptocurrencies such as Bitcoin, Ethereum, Litecoin, Ripple, Monero, and Dash.

To test the independence of the cryptocurrency market and the stock market, we choose three-month rolling windows (66 days). Specifically, for each trading date t from 11/01/2016 to 07/31/2019, we set XFt×N1 as a submatrix of XT×N1 that contains the most recent three months before date t, where Ft is the set of 66 rows right before date t (including date t). The data submatrix YFt×N2 is defined similarly. Then we apply the rescaled test statistic Tn defined in (14) to XFt×N1 and YFt×N2. Thus the sample size n = 66 and the dimensions of the two random vectors are N1 = 496 and N2 = 22, respectively. For each trading date, we obtain a p-value calculated by 1Φ(Tn(t)), where Tn(t) is the value of the test statistic based on XFt×N1 and YFt×N2 and Φ(·) is the standard normal distribution function. As a result, we end up with a p-value vector consisting of Tn(t) for trading dates t from 11/01/2016 to 07/31/2019. In addition, we use the “fdr.control” function in R package “fdrtool,” which applies the algorithms in Benjamini and Hochberg (1995) and Storey (2002) to calculate the p-value cut-off for controlling the false discovery rate (FDR) at the 10% level. Based on the p-value vector, we obtain the p-value cut-off of 0.0061. The time series plot of the p-values is shown in Figure 5 (the red curve).

Fig 5:

Fig 5:

Time series plots of p-values from 11/01/2016 to 07/31/2019 using three-month, four-month, and six-month rolling windows, respectively.

The red curve in Figure 5 indicates that most of the time the cryptocurrency market and the stock market tend to move independently. There are apparently two periods during which the p-values are below the cut-off point 0.0061, roughly March 2017 and April 2018. Since we use the three-month rolling window right before each date to calculate the p-values, the significantly low p-values in the aforementioned two periods might suggest some nonlinear association between the two markets during the time intervals 12/01/2016–03/31/2017 and 01/01/2018–04/30/2018, respectively. To verify our findings, noticing that Bitcoin is the most representative cryptocurrency and the S&P 500 Index measures the overall performance of the 500 stocks on its list, we present in the two plots in Figure 6 the trend of closing prices of Bitcoin and that of S&P 500 Index during the periods 12/01/2016–03/31/2017 and 01/01/2018–04/30/2018, respectively. The first plot in Figure 6 shows that the trends of the two prices shared striking similarity starting from the middle of January 2017 and both peaked around early March 2017. From the second plot in Figure 6, we see that both the prices of S&P 500 Index and Bitcoin dropped sharply to the bottom around early Febrary 2018 and then rose to two rekindled peaks followed by continuingly falling to another bottom. Therefore, Figure 6 indicates some strong dependency between the two markets in the aforementioned two time intervals and hence demonstrate the effective discoveries of dependence by our normal approximation test for biased-corrected distance correlation.

Fig 6:

Fig 6:

Closing prices of Standard & Poors 500 Index and Bitcoin during the time periods 12/01/2016–03/31/2017 and 01/01/2018–04/30/2018, respectively. The black curve is for Standard & Poors 500 Index and the red one is for Bitcoin.

In addition, to show the robustness of our procedure and choose a reasonable length of rolling window, we also apply four-month and six-month rolling windows before each date t to test the independence between the cryptocurrency market and the stock market. The time series plots of the resulting p-values are presented as the blue curve and the green curve in Figure 5, respectively. From Figure 5, we see that the p-values from using the three different rolling windows (three-month, fourth-month, and six-month) move in a similar fashion. For the four-month rolling window, the p-value cut-off for FDR control at the 10% level is 0.0053. We observe that the time periods with significantly small p-values by applying four-month rolling window are almost consistent with those by applying three-month rolling window. However, when the six-month rolling window is applied, the p-value cut-off for FDR control at the 10% level is 0 and hence there is no significant evidence for dependence identified at any time point. This suggests that the long-run dependency between the cryptocurrency market and the stock market might be limited, but there could be some strong association between them in certain special periods. These results show that to test the short-term dependence, the three-month rolling window seems to be a good choice.

As a comparison, we conduct the analysis with the rescaled sample distance correlation statistic Tn replaced by the RV coefficient, which measures only the linear dependence between two random vectors. The three-month rolling window is utilized as before. We apply the function ‘coeffRV’ in the R package ‘FactoMineR’ to calculate the p-values of the independence test based on the RV coefficient. The time series plot of the resulting p-values is depicted in Figure 7. From Figure 7, we see that there are three periods in which the p-values are below the significance level 0.05, while there are four such periods in Figure 5 for p-values based on the rescaled sample distance correlation Tn from using three-month rolling window. Moreover, the four periods detected by Tn roughly cover the three periods detected by the RV coefficient. On the other hand, for the p-values based on the RV coefficient, the p-value cut-off for the Benjamini–Hochberg FDR control at the 10% level is 0, which implies that no significant periods can be discovered with FDR controlled at the 10% level. However, as mentioned previously, if we use Tn the corresponding p-value cut-off with the three-month rolling window is 0.0061 and two periods, roughly March 2017 and April 2018, are still significant. The effectiveness of these two periods are demonstrated in Figure 6. Therefore, compared to the linear measure of RV coeffcient, the nonlinear dependency measure of rescaled distance correlation is indeed more powerful in this real data application.

Fig 7:

Fig 7:

Time series plot of p-values based on RV coefficient from 11/01/2016 to 07/31/2019 using three-month rolling window.

6. Discussions.

The major contributions of this paper are twofold. First, we have obtained central limit theorems for a rescaled distance correlation statistic for a pair of high-dimensional random vectors and the associated rates of convergence under the independence when both sample size and dimensionality are diverging. Second, we have also developed a general power theory for the sample distance correlation and demonstrated its ability of detecting nonlinear dependence in the regime of moderately high dimensionality. These new results shed light on the precise limiting distributions of distance correlation in high dimensions and provide a more complete picture of the asymptotic theory for distance correlation. To prove our main results, Propositions A.1A.3 in Section A.4 of Supplementary Material have been developed to help us better understand the moments therein in the high-dimensional setting, which are of independent interest.

In particular, Theorem 6 unveils that the sample distance correlation is capable of measuring the nonlinear dependence when the dimensionalities of X and Y are diverging. It would be interesting to further investigate the scenario when only one of the dimensionalities tends to infinity and the other one is fixed. Moreover, it would also be interesting to extend our asymptotic theory to the conditional or partial distance correlation and investigate more scalable high-dimensional nonparametric inference with theoretical guarantees, for both i.i.d. and time series data settings. These problems are beyond the scope of the current paper and will be interesting topics for future research.

Supplementary Material

Supplementary

Acknowledgements.

The authors would like to thank the Co-Editor, Associate Editor, and referees for their constructive comments that have helped improve the paper significantly.

* Fan, Gao and Lv’s research was supported by NIH Grant 1R01GM131407-01, NSF Grant DMS-1953356, a grant from the Simons Foundation, and Adobe Data Science Research Award. Shao’s research was partially suppported by NSFC12031005.

Footnotes

SUPPLEMENTARY MATERIAL

Supplement to “Asymptotic Distributions of High-Dimensional Distance Correlation Inference”. The supplement Gao et al. (2020) contains all the proofs and technical details. ().

REFERENCES

  1. BENJAMINI Y and HOCHBERG Y (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300. MR1325392 [Google Scholar]
  2. BERGSMA W and DASSIOS A (2014). A consistent test of independence based on a sign covariance related to Kendall’s tau. Bernoulli 20 1006–1028. MR3178526 [Google Scholar]
  3. BERRETT TB, WANG Y, BARBER RF and SAMWORTH RJ (2020). The conditional permutation test for independence while controlling for confounders. J. Roy. Statist. Soc. Ser. B, to appear. [Google Scholar]
  4. BLUM JR, KIEFER J and ROSENBLATT M (1961). Distribution free tests of independence based on the sample distribution function. Ann. Math. Statist 32 485–498. [Google Scholar]
  5. CHAKRABORTY S and ZHANG X (2019). Distance Metrics for Measuring Joint Dependence with Application to Causal Inference. J. Amer. Statist. Assoc, to appear. [Google Scholar]
  6. DAVIS RA, MATSUI M, MIKOSCH T and WAN P (2018). Applications of distance correlation to time series. Bernoulli 24 3087–3116. MR3779711 [Google Scholar]
  7. ESCOUFIER Y (1973). Le Traitement des Variables Vectorielles. Biometrics 29 751–760. [Google Scholar]
  8. FEUERVERGER A (1993). A Consistent Test for Bivariate Dependence. International Statistical Review 61 419–433. [Google Scholar]
  9. GAO L, FAN Y, LV J and SHAO QM (2020). Supplement to “Asymptotic Distributions of High-Dimensional Distance Correlation Inference”. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. GRETTON A, HERBRICH R, SMOLA A, BOUSQUET O and SCHÖLKOPF B (2005). Kernel methods for measuring independence. J. Mach. Learn. Res 6 2075–2129. [Google Scholar]
  11. HAEUSLER E (1988). On the rate of convergence in the central limit theorem for martingales with discrete and continuous time. Ann. Probab 16 275–299. MR920271 [Google Scholar]
  12. HOEFFDING W (1948). A non-parametric test of independence. Ann. Math. Statistics 19 546–557. [Google Scholar]
  13. HUANG C and HUO X (2017). A statistically and numerically efficient independence test based on random projections and distance covariance. arXiv preprint arXiv:1701.06054. [Google Scholar]
  14. HUO X and SZÉKELY GJ (2016). Fast computing for distance covariance. Technometrics 58 435–447. MR3556612 [Google Scholar]
  15. JIN Z and MATTESON DS (2018). Generalizing distance covariance to measure and test multivariate mutual dependence via complete and incomplete V-statistics. J. Multivariate Anal 168 304–322. MR3858367 [Google Scholar]
  16. KENDALL MG (1938). A new measure of rank correlation. Biometrika 30 81–93. [Google Scholar]
  17. KONG J, WANG S and WAHBA G (2015). Using distance covariance for improved variable selection with application to learning genetic risk models. Stat. Med 34 1708–1720. MR3334686 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. KONG Y, LI D, FAN Y and LV J (2017). Interaction pursuit in high-dimensional multi-response regression via distance correlation. Ann. Statist 45 897–922. MR3650404 [Google Scholar]
  19. LI R, ZHONG W and ZHU L (2012). Feature screening via distance correlation learning. J. Amer. Statist. Assoc 107 1129–1139. MR3010900 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. LYONS R (2013). Distance covariance in metric spaces. Ann. Probab 41 3284–3305. MR3127883 [Google Scholar]
  21. MATSUI M, MIKOSCH T and SAMORODNITSKY G (2017). Distance covariance for stochastic processes. Probab. Math. Statist 37 355–372. MR3745391 [Google Scholar]
  22. MATTESON DS and TSAY RS (2017). Independent component analysis via distance covariance. J. Amer. Statist. Assoc 112 623–637. MR3671757 [Google Scholar]
  23. PEARSON K (1895). Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London 58 240–242. [Google Scholar]
  24. ROBERT P and ESCOUFIER Y (1976). A Unifying Tool for Linear Multivariate Statistical Methods: The RV-Coefficient. Applied Statistics 25 257–265. [Google Scholar]
  25. ROSENBLATT M (1975). A quadratic measure of deviation of two-dimensional density estimates and a test of independence. Ann. Statist 3 1–14. [Google Scholar]
  26. SHAH RD and PETERS J (2020). The hardness of conditional independence testing and the generalised covariance measure. Ann. Statist, to appear. [Google Scholar]
  27. SHAO X and ZHANG J (2014). Martingale difference correlation and its use in high-dimensional variable screening. J. Amer. Statist. Assoc 109 1302–1318. MR3265698 [Google Scholar]
  28. SPEARMAN C (1904). The proof and measurement of association between two things. The American Journal of Psychology 15 72–101. [PubMed] [Google Scholar]
  29. STOREY JD (2002). A direct approach to false discovery rates. J. Roy. Statist. Soc. Ser. B 64 479–498. MR1924302 [Google Scholar]
  30. SZÉKELY GJ, RIZZO ML and BAKIROV NK (2007). Measuring and testing dependence by correlation of distances. Ann. Statist 35 2769–2794. MR2382665 [Google Scholar]
  31. SZÉKELY GJ and RIZZO ML (2009). Brownian distance covariance. Ann. Appl. Stat 3 1236–1265. MR2752127 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. SZÉKELY GJ and RIZZO ML (2013). The distance correlation t-test of independence in high dimension. J. Multivariate Anal 117 193–213. MR3053543 [Google Scholar]
  33. SZEKÉLY GJ and RIZZO ML (2014). Partial distance correlation with methods for dissimilarities. Ann. Statist 42 2382–2412. MR3269983 [Google Scholar]
  34. VEPAKOMMA P, TONDE C and ELGAMMAL A (2018). Supervised dimensionality reduction via distance correlation maximization. Electron. J. Stat 12 960–984. MR3772810 [Google Scholar]
  35. WANG X, PAN W, HU W, TIAN Y and ZHANG H (2015). Conditional distance correlation. J. Amer. Statist. Assoc 110 1726–1734. MR3449068 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. WEIHS L, DRTON M and MEINSHAUSEN N (2018). Symmetric rank covariances: a generalized framework for nonparametric measures of dependence. Biometrika 105 547–562. MR3842884 [Google Scholar]
  37. YAO S, ZHANG X and SHAO X (2018). Testing mutual independence in high dimension via distance covariance. J. Roy. Statist. Soc. Ser. B 80 455–480. MR3798874 [Google Scholar]
  38. ZHANG X, YAO S and SHAO X (2018). Conditional mean and quantile dependence testing in high dimension. Ann. Statist 46 219–246. MR3766951 [Google Scholar]
  39. ZHOU Z (2012). Measuring nonlinear dependence in time-series, a distance correlation approach. J. Time Series Anal 33 438–457. MR2915095 [Google Scholar]
  40. ZHU C, YAO S, ZHANG X and SHAO X (2020). Distance-based and RKHS-based dependence metrics in high dimension. Ann. Statist, to appear. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary

RESOURCES