Skip to main content
Oxford University Press logoLink to Oxford University Press
. 2017 Sep 4;104(4):829–843. doi: 10.1093/biomet/asx043

Projection correlation between two random vectors

Liping Zhu 1, Kai Xu 2, Runze Li 3, Wei Zhong 4
PMCID: PMC5793497  PMID: 29430040

Abstract

We propose the use of projection correlation to characterize dependence between two random vectors. Projection correlation has several appealing properties. It equals zero if and only if the two random vectors are independent, it is not sensitive to the dimensions of the two random vectors, it is invariant with respect to the group of orthogonal transformations, and its estimation is free of tuning parameters and does not require moment conditions on the random vectors. We show that the sample estimate of the projection correction is Inline graphic-consistent if the two random vectors are independent and root-Inline graphic-consistent otherwise. Monte Carlo simulation studies indicate that the projection correlation has higher power than the distance correlation and the ranks of distances in tests of independence, especially when the dimensions are relatively large or the moment conditions required by the distance correlation are violated.

Keywords: Distance correlation, Projection correlation, Ranks of distance

1. Introduction

Let Inline graphic and Inline graphic be two random vectors. In this paper, we aim to test

graphic file with name M5.gif

Measuring and testing dependence between Inline graphic and Inline graphic is a fundamental problem in statistics. The Pearson correlation is perhaps the first and the best-known quantity to measure the degree of linear dependence between two univariate random variables. Extensions including Spearman’s (1904) rho, Kendall’s (1938) tau, and those due to Hoeffding (1948) and Blum (1961) can be used to measure nonlinear dependence without moment conditions.

Testing independence has important applications. Two examples from genomics research are testing whether two groups of genes are associated and examining whether certain phenotypes are determined by particular genotypes. In social science research, scientists are interested in understanding potential associations between psychological and physiological characteristics. Wilks (1935) introduced a parametric test based on Inline graphic, where Inline graphic, Inline graphic and Inline graphic. Throughout Inline graphic stands for the covariance matrix of Inline graphic and Inline graphic stands for the determinant of Inline graphic. Hotelling (1936) suggested the canonical correlation coefficient, which seeks Inline graphic and Inline graphic such that the Pearson correlation between Inline graphic and Inline graphic is maximized. Both Wilks’s test and the canonical correlation can be used to test for independence between Inline graphic and Inline graphic when they follow normal distributions. Nonparametric extensions of Wilks’s test were proposed by Puri & Sen (1971), Hettmansperger & Oja (1994), Gieser & Randles (1997), Taskinen et al. (2003) and Taskinen et al. (2005). These tests can be used to test for independence between Inline graphic and Inline graphic when they follow elliptically symmetric distributions, but they are inapplicable when the normality or ellipticity assumptions are violated or when the dimensions of Inline graphic and Inline graphic exceed the sample size. In addition, multivariate rank-based tests of independence are ineffective for testing nonmonotone dependence (Székely et al., 2007).

The distance correlation (Székely et al., 2007) can be used to measure and test dependence between Inline graphic and Inline graphic in arbitrary dimensions without assuming normality or ellipticity. Provided that Inline graphic, the distance correlation between Inline graphic and Inline graphic, denoted by Inline graphic, is nonnegative, and it equals zero if and only if Inline graphic and Inline graphic are independent. Throughout, we define Inline graphic for a vector Inline graphic. Székely & Rizzo (2013) observed that the distance correlation may be adversely affected by the dimensions of Inline graphic and Inline graphic, and proposed an unbiased estimator of it when Inline graphic and Inline graphic are high-dimensional. In this paper, we shall demonstrate that the distance correlation may be less efficient in detecting nonlinear dependence when the assumption Inline graphic is violated. To remove this moment condition, Benjamini et al. (2013) suggested using ranks of distances, but this involves the selection of several tuning parameters, the choice of which is an open problem. The asymptotic properties of a test based on ranks of distances also need further investigation.

We propose using projection correlation to characterize dependence between Inline graphic and Inline graphic. Projection correlation first projects the multivariate random vectors into a series of univariate random variables, then detects nonlinear dependence by calculating the Pearson correlation between the dichotomized univariate random variables. The projection correlation between Inline graphic and Inline graphic, denoted by Inline graphic, is nonnegative and equals zero if and only if Inline graphic and Inline graphic are independent, so it is generally applicable as an index for measuring the degree of nonlinear dependence without moment conditions, normality or ellipticity (Tracz et al., 1992). The projection correlation test for independence is consistent against all dependence alternatives. The projection correlation is free of tuning parameters and is invariant to orthogonal transformation. We shall show that the sample estimator of projection correlation is Inline graphic-consistent if Inline graphic and Inline graphic are independent and root-Inline graphic-consistent otherwise. We conduct Monte Carlo studies to evaluate the finite-sample performance of the projection correlation test. The results indicate that the projection correlation is less sensitive to the dimensions of Inline graphic and Inline graphic than the distance correlation and even its improved version (Székely & Rizzo, 2013), and is more powerful than both the distance correlation and ranks of distances, especially when the dimensions of Inline graphic and Inline graphic are relatively large or the moment conditions required by the distance correlation are violated.

2. Projection correlation

2.1. Motivation

In this section, we propose a new measure of dependence between two random vectors. Testing that Inline graphic and Inline graphic are independent is equivalent to testing whether Inline graphic and Inline graphic are independent for all unit vectors Inline graphic and Inline graphic. Let Inline graphic denote the joint distribution of Inline graphic, and let Inline graphic and Inline graphic denote the marginal distributions of Inline graphic and Inline graphic. Given Inline graphic and Inline graphic, Inline graphic and Inline graphic are independent if and only if Inline graphic, for all Inline graphic. Therefore, testing whether Inline graphic and Inline graphic are independent amounts to testing whether

graphic file with name M76.gif (1)

Suppose that Inline graphic is a random sample of Inline graphic. Using the first five independent copies of Inline graphic, we rewrite the left-hand side of (1) as

graphic file with name M80.gif

Consequently, by Fubini’s theorem, Inline graphic and Inline graphic are independent if and only if

graphic file with name M83.gif (2)

In general, integration over the Inline graphic-dimensional space Inline graphic is not straightforward. Lemma 1 enables us to derive an explicit form for (2).

Lemma 1

(Escanciano, 2006). For two arbitrary vectorsInline graphic, we have

Lemma 1

whereInline graphic, Inline graphicis the gamma function andInline graphicis the inverse cosine function.

Lemma 1 yields an explicit formula for the left-hand side of (2). Ignoring the constants irrelevant to the joint distribution of Inline graphic, we define the resultant explicit formula as the squared projection covariance between Inline graphic and Inline graphic. To be precise, define

graphic file with name M94.gif (3)

where Inline graphic, Inline graphic and Inline graphic are defined in an obvious manner. We provide details of the derivation of (3) in the Appendix. A distinctive feature of Inline graphic is that it uses only vectors of the form Inline graphic and Inline graphic, whose second moments always equal unity, regardless of the dimensions of the random vectors. This indicates that the projection covariance removes the moment restrictions on Inline graphic required by the distance correlation.

Define the projection correlation between Inline graphic and Inline graphic, denoted by Inline graphic, as the square root of

graphic file with name M105.gif

and set Inline graphic if Inline graphic or Inline graphic. Proposition 1 presents the appealing properties of the projection correlation at the population level.

Proposition 1.

  • (i) In general,Inline graphic. In particular,Inline graphicif and only ifInline graphicandInline graphicare independent, andInline graphicif and only ifInline graphicalmost surely.

  • (ii) Let Inline graphicandInline graphicbe two orthonormal matrices,Inline graphicandInline graphicbe two vectors, andInline graphicandInline graphicbe two scalars. ThenInline graphic.

The first statement indicates that the projection correlation is generally applicable as an index to measure dependence. The second statement implies that, although it is not affine-invariant, the projection correlation is invariant with respect to the group of orthogonal transformations.

2.2. Asymptotic properties

We give two equivalent estimators for Inline graphic and study their asymptotic properties. The first estimate is built upon the Inline graphic-statistic (Serfling, 1980), given by the square root of

graphic file with name M124.gif

Here Inline graphic, Inline graphic and Inline graphic are defined in an obvious fashion and are the estimates of Inline graphic, Inline graphic and Inline graphic, respectively. The Inline graphic-statistic estimate appears natural, yet it is difficult to calculate (Székely & Rizzo, 2010). Therefore, we give an equivalent form below. Define, for Inline graphic,

graphic file with name M133.gif

To avoid possible confusion, we define Inline graphic if Inline graphic or Inline graphic. The second sample estimate of Inline graphic is defined by

graphic file with name M138.gif

Accordingly, the sample estimate of Inline graphic is defined by the square root of

graphic file with name M140.gif

In general, Inline graphic is easier to compute than Inline graphic. Although it may not be immediately obvious that Inline graphic, this fact will become clear from Theorem 1.

Theorem 1.

For a given random sample Inline graphic,

Theorem 1.

and both equal

Theorem 1.

where Inline graphic, Inline graphic and Inline graphic stand for the empirical distributions of Inline graphic, Inline graphic and Inline graphic, respectively, Inline graphic, and Inline graphic.

The following theorems state the consistency of Inline graphic and Inline graphic.

Theorem 2.

For a given random sample Inline graphic, Inline graphic almost surely.

Theorem 3.

  • (i) If Inline graphic and Inline graphic are independent, then as Inline graphic, Inline graphic converges in distribution to Inline graphic where the Inline graphic depend on the distribution of Inline graphic and are nonnegative with sum equal to one, and the Inline graphic are independent standard normal random variables.

  • (ii) If Inline graphic and Inline graphic are not independent, then Inline graphic converges in distribution to a normal distribution with mean zero and variance Inline graphic, where the random variable Inline graphic is defined in (A2). Consequently, Inline graphic diverges to Inline graphic.

The projection correlation test is built upon the test statistic Inline graphic, which converges in distribution to the quadratic form if Inline graphic and Inline graphic are independent and diverges to Inline graphic otherwise. Theorem 3 suggests that the projection correlation test is consistent against all dependence alternatives without requiring any moment conditions. Because the weights Inline graphic in the quadratic form are unknown, the asymptotic null distribution is intractable. To put the projection correlation test into practice, we approximate the asymptotic null distribution of Inline graphic through a random permutation method. Specifically, we calculate replicates of the test statistic under random permutations of the indices of the Inline graphic sample or, equivalently, the Inline graphic sample. The Inline graphic-value obtained from this permutation procedure is defined as the fraction of replicates of the test statistic under random permutations that are at least as large as the observed test statistic. Throughout our simulations, we use 2000 replications and obtain very good control of the Type I error rates. The permutation procedure is computationally feasible owing to the simple form of the test statistic. Computer code for implementing the projection correlation test and the permutation procedure is available from the authors upon request.

3. Simulations

In this section, we conduct simulations to compare the performance of independence tests based on the projection correlation, the distance correlation and the ranks of distances (Benjamini et al., 2013). These three tests are consistent and suitable for arbitrary dimensions. Because the distance-correlation-based test is sensitive to the dimensions of random vectors, throughout our simulations we use its improved version recommended by Székely & Rizzo (2013).

We consider three simulated examples in which the dimensions of both Inline graphic and Inline graphic, denoted by Inline graphic and Inline graphic, respectively, are relatively large for the sample size Inline graphic. We set Inline graphic for simplicity. In Example 1, we set Inline graphic and vary Inline graphic from 15 to 30. In Example 2, we set Inline graphic and vary Inline graphic from 10 to 30. We also vary Inline graphic from 30 to 60 and Inline graphic from 10 to 30. In Example 3, we set Inline graphic and vary Inline graphic from 20 to 40. The dependence structure is monotone in Example 1 and nonmonotone in Example 2. The dependence structure is much more complicated in Example 3, where the random vectors are drawn from a mixture of distributions.

All simulations are implemented in R (R Development Core Team, 2017). We implement the test based on distance correlation by calling the dcor.ttest function in the energy package and the test based on ranks of distances by calling the hhg.test function in the HHG package. We repeat each setting 2000 times and report the size and power of the respective tests at significance levels Inline graphic 0·01 and 0·05.

Example 1. We consider three scenarios in this example.

  • (1a) Draw Inline graphic independently from a Cauchy distribution. Let Inline graphic (Inline graphic) and draw Inline graphic (Inline graphic) independently from a standard normal distribution.

  • (1b) This is identical to scenario (1a), except that Inline graphic are sampled independently from the Cauchy distribution.

  • (1c) This is identical to scenario (1a), except that Inline graphic, for Inline graphic are sampled independently from a standard normal distribution.

In the above scenarios, we set Inline graphic, 2, 4, 6, 8 and 10, where Inline graphic indicates that Inline graphic and Inline graphic are independent. Table 1 charts the empirical size and power of the tests based on projection correlation, ranks of distances, and distance correlation at significance levels Inline graphic 0·01 and 0·05. In all scenarios, the empirical sizes are very close to the significance levels, even when Inline graphic 0·01. The test based on projection correlation has higher power than those based on distance correlation or ranks of distances, especially in scenarios (1a) and (1b), where the distributions of the random vectors are all heavy-tailed. The test based on distance correlation fails in the first two scenarios, partly because the moment restrictions required by this test are violated. The test based on ranks of distances is slightly worse than our test based on projection correlation.

Table 1.

Empirical size and power Inline graphic of the tests based on projection correlation, ranks of distances and distance correlation for different Inline graphic in Example 1 with Inline graphic andInline graphic. All numbers reported in this table are multiplied byInline graphic

Scenario Inline graphic Test Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
(1a) Inline graphic Projection correlation 1·1 28·4 53·4 70·0 80·1 88·1
    Ranks of distances 0·9 2·7 8·1 20·1 36·2 52·3
    Distance correlation 1·3 2·9 3·4 4·1 4·2 4·4
  Inline graphic Projection correlation 4·9 52·9 75·1 87·7 93·0 96·0
    Ranks of distances 4·9 11·2 25·1 42·2 60·4 75·2
    Distance correlation 6·1 5·3 5·4 6·3 6·4 6·1
(1b) Inline graphic Projection correlation 1·2 20·8 41·7 61·5 76·3 84·8
    Ranks of distances 0·9 2·3 8·0 19·5 35·3 51·3
    Distance correlation 1·4 3·0 4·0 4·2 4·5 5·0
  Inline graphic Projection correlation 5·2 39·9 64·8 81·4 89·5 94·6
    Ranks of distances 5·2 8·3 21·1 38·8 60·4 76·1
    Distance correlation 4·5 4·8 5·9 5·3 5·9 6·6
(1c) Inline graphic Projection correlation 1·0 65·1 98·8 100 100 100
    Ranks of distances 0·9 20·3 58·9 87·1 96·6 99·5
    Distance correlation 0·9 67·0 98·3 100 100 100
  Inline graphic Projection correlation 4·5 82·0 99·6 100 100 100
    Ranks of distances 4·7 37·7 79·6 96·2 99·1 100
    Distance correlation 5·1 82·3 99·5 100 100 100

We also vary the dimensions of Inline graphic and Inline graphic from 15 to 30 and fix Inline graphic in scenarios (1a) and (1b) and Inline graphic in scenario (1c). The simulation results are summarized in Table 2. The power of all tests diminishes quickly as Inline graphic increases. Table 2 indicates that the test based on projection correlation is much less sensitive to the increase of dimensions than the other two tests.

Table 2.

Power Inline graphic of the tests based on projection correlation, ranks of distances and distance correlation for different Inline graphic in Example Inline graphic with Inline graphic and Inline graphic. All numbers reported in this table are multiplied byInline graphic

Scenario Inline graphic Inline graphic Test Inline graphic Inline graphic Inline graphic Inline graphic
(1a) 10 Inline graphic Projection correlation 98·2 88·1 74·1 59·5
      Ranks of distances 81·6 52·3 35·6 24·2
      Distance correlation 7·9 4·4 4·6 2·7
    Inline graphic Projection correlation 99·8 96·0 89·4 79·3
      Ranks of distances 93·7 75·2 60·6 46·8
      Distance correlation 10·5 6·1 6·0 4·0
(1b) 10 Inline graphic Projection correlation 97·0 84·8 70·7 54·7
      Ranks of distances 79·8 51·3 34·9 24·8
      Distance correlation 7·6 5·0 3·1 2·5
    Inline graphic Projection correlation 99·5 94·6 87·4 77·6
      Ranks of distances 93·7 76·1 60·3 45·5
      Distance correlation 9·7 6·6 4·5 3·7
(1c) 2 Inline graphic Projection correlation 78·1 65·1 53·1 43·1
      Ranks of distances 31·7 20·3 13·7 9·2
      Distance correlation 79·5 67·0 54·1 43·4
    Inline graphic Projection correlation 89·7 82·0 72·2 63·3
      Ranks of distances 54·2 37·7 30·4 24·6
      Distance correlation 90·2 82·3 73·0 63·5

Example 2. We draw Inline graphic independently from the uniform distribution on Inline graphic. We generate Inline graphic (Inline graphic), where the Inline graphic are generated from the Cauchy distribution, and generate Inline graphic (Inline graphic) independently from the standard normal distribution. This model was also used in Escanciano (2006) for different purposes. In this example, Inline graphic indicates that Inline graphic and Inline graphic are independent.

We first fix Inline graphic and vary Inline graphic from 10 to 30. The empirical size and power are displayed in Table 3 for Inline graphic0, 5, 15 and 25 and Inline graphic10, 20 and 30. All empirical sizes are close to the significance level. In this example, the moment conditions required by the test based on distance correlation are satisfied. The tests based on projection correlation and on distance correlation are more powerful than that based on ranks of distances, which appears to be very ineffective in this example, partly because the dependence structure is nonmonotone and the dependence strength is very weak.

Table 3.

Empirical size and power Inline graphic of the tests based on projection correlation, ranks of distances and distance correlation for different Inline graphic in Example 2 with Inline graphic

Inline graphic Inline graphic Test Inline graphic Inline graphic Inline graphic Inline graphic
10 Inline graphic Projection correlation 1·3 6·2 7·1 7·1
    Ranks of distances 1·2 1·7 2·2 1·9
    Distance correlation 1·3 6·2 6·5 5·7
  Inline graphic Projection correlation 5·4 18·8 20·7 20·4
    Ranks of distances 4·3 7·6 7·3 7·2
    Distance correlation 5·5 16·5 18·6 17·0
20 Inline graphic Projection correlation 1·2 21·5 26·3 28·2
    Ranks of distances 0·7 4·3 4·2 4·8
    Distance correlation 1·0 18·7 20·2 23·2
  Inline graphic Projection correlation 5·4 46·4 52·4 55·8
    Ranks of distances 4·4 13·4 13·1 12·5
    Distance correlation 5·2 40·5 43·3 45·2
30 Inline graphic Projection correlation 1·4 50·6 63·5 63·4
    Ranks of distances 0·9 8·2 9·3 9·5
    Distance correlation 1·5 42·4 51·3 51·5
  Inline graphic Projection correlation 5·1 78·4 87·0 85·9
    Ranks of distances 4·7 21·0 24·1 23·0
    Distance correlation 5·2 69·3 75·7 76·1

Next we fix Inline graphic and vary Inline graphic from 10 to 30 and Inline graphic from 30 to 60. Table 4 shows that, provided Inline graphic, say, the test based on projection correlation results in much less power loss across almost all scenarios as the dimensions of Inline graphic and Inline graphic increase.

Table 4.

The power Inline graphic of the tests based on projection correlation, ranks of distances and distance correlation for different Inline graphic in Example 2 with Inline graphic

Inline graphic Inline graphic Test Inline graphic Inline graphic Inline graphic Inline graphic
10 Inline graphic Projection correlation 18·5 13·2 11·1 9·3
    Ranks of distances 4·1 2·7 2·6 2·6
    Distance correlation 14·9 11·1 8·5 7·5
  Inline graphic Projection correlation 42·8 34·4 28·4 26·2
    Ranks of distances 13·1 10·6 8·8 8·7
    Distance correlation 34·9 27·6 23·7 21·7
20 Inline graphic Projection correlation 79·7 67·7 53·6 47·0
    Ranks of distances 20·1 15·9 10·8 8·7
    Distance correlation 66·7 55·8 42·4 37·6
  Inline graphic Projection correlation 95·2 88·9 80·2 74·3
    Ranks of distances 39·6 31·8 24·5 22·3
    Distance correlation 86·9 79·5 69·0 62·2
30 Inline graphic Projection correlation 99·6 97·5 93·8 86·9
    Ranks of distances 49·3 33·9 25·3 19·9
    Distance correlation 97·6 92·1 85·6 75·9
  Inline graphic Projection correlation 100 99·9 99·4 97·4
    Ranks of distances 72·6 55·9 45·8 38·6
    Distance correlation 99·7 98·1 96·1 92·2

Example 3. This example was used in Benjamini et al. (2013). We fix Inline graphic and vary Inline graphic from 20 to 40. We draw Inline graphic from a mixture distribution with 10 equally likely components. In the Inline graphicth component, for Inline graphic, Inline graphic are random vectors Inline graphic, where Inline graphic and Inline graphic are sampled independently from a multivariate standard normal distribution, and Inline graphic are sampled independently from a multivariate Cauchy or multivariate Inline graphic distribution with three degrees of freedom and the identity correlation matrix. The dependence of Inline graphic and Inline graphic is through the fixed pairs Inline graphic (Inline graphic), such that the data consist of ten clouds around these pairs.

The simulations are summarized in Table 5. The dependence of Inline graphic and Inline graphic is through the ten equally likely components. The test based on projection correlation performs better than that based on distance correlation, especially when the moment requirements are not satisfied. The improved version of the test based on distance correlation is designed for high dimensions, and its performance appears satisfactory when the moments exist. Again, for the multivariate Cauchy distribution, the test based on projection correlation outperforms that based on distance correlation significantly in this example.

Table 5.

The power Inline graphic of the tests based on projection correlation, ranks of distances and distance correlation for different Inline graphic in Example 3 with Inline graphic

Inline graphic Test Inline graphic Inline graphic Inline graphic
Inline graphic Projection correlation 17·1 34·0 100 100
  Ranks of distances 1·5 6·3 5·9 16·4
  Distance correlation 10·2 18·0 100 100
Inline graphic Projection correlation 36·7 58·7 100 100
  Ranks of distances 1·9 8·8 10·0 30·5
  Distance correlation 10·3 18·2 100 100
Inline graphic Projection correlation 59·5 81·6 100 100
  Ranks of distances 2·1 9·2 15·3 42·4
  Distance correlation 9·5 17·1 100 100

In our simulations, the test based on projection correlation exhibits a good capability for testing monotone and nonmonotone dependence. Our limited experience indicates that it is very effective, even when the second moments are large or infinite, it is useful for limiting the power loss as the dimensions of random vectors increase, and it is suitable even in high-dimensional cases.

Acknowledgement

The authors would like to thank Ms Amanda Applegate, the associate editor and two reviewers for their constructive comments. Li and Zhong are the corresponding authors. This research was supported by the National Natural Science Foundation of China, National Science Foundation of USA, Chinese Ministry of Education Project of Key Research Institute of Humanities and Social Sciences at Universities, National Institute of Drug Abuse and National Institutes of Health of USA, and National Youth Top-notch Talent Support Program of China.

Appendix

Proofs

We first show that by invoking Lemma 1 repeatedly, the squared projection covariance Inline graphic has an explicit form. In other words, we aim to show that

graphic file with name M331.gif

For notational clarity. we define

graphic file with name M332.gif

All the indices in Inline graphic and Inline graphic may take value 1, 2, 3, 4 or 5. Invoking Lemma 1 repeatedly, we obtain

graphic file with name M335.gif

The last equality follows from Inline graphic and Inline graphic.

Proof of Proposition 1

We prove the first assertion. The statement that Inline graphic is a direct consequence of the Cauchy–Schwarz inequality. By definition, Inline graphic indicates that Inline graphic and Inline graphic are independent for any Inline graphic and Inline graphic. In other words, Inline graphic if and only if Inline graphic and Inline graphic are statistically independent. In addition, Inline graphic indicates that Inline graphic must be a constant vector, because otherwise Inline graphic would not be independent of itself.

By definition, Inline graphic, and all the Inline graphic involve quantities of the form Inline graphic and Inline graphic. It is easy to verify that both Inline graphic and Inline graphic are invariant with respect to orthogonal transformations, which completes the proof of the second assertion.

Proof of Theorem 1

We first prove that Inline graphic. Recall the definitions of Inline graphic and Inline graphic. Define

graphic file with name M359.gif

We further define

graphic file with name M360.gif

It can be verified that Inline graphic and Inline graphic. It follows that

graphic file with name M363.gif

which completes the proof of the first part.

Next we prove that Inline graphic is equal to

graphic file with name M365.gif

Invoking Lemma 1, we have

graphic file with name M366.gif

Following similar arguments, we obtain

graphic file with name M367.gif

The above two results yield

graphic file with name M368.gif

The proof of Theorem 1 is complete.

Proof of Theorem 2

By definition, Inline graphic. By the strong law of large numbers for Inline graphic-statistics (Serfling, 1980), it follows that almost surely Inline graphicInline graphic and Inline graphic. Therefore, Inline graphic almost surely. This completes the proof.

Proof of Theorem 3

Define the empirical process

graphic file with name M375.gif

where Inline graphic and Inline graphic. When Inline graphic and Inline graphic are independent, Inline graphic converges in distribution to a zero-mean Gaussian random process Inline graphic with covariance function

graphic file with name M382.gif

Next we define an approximation of Inline graphic, denoted by Inline graphic, as follows:

graphic file with name M385.gif

We first prove that Inline graphic holds uniformly for Inline graphic with Inline graphic and Inline graphic. It is easy to verify that

graphic file with name M390.gif

Invoking the uniform law of large numbers of Jennrich (1969) or the generalization by Wolfowitz (1954) of the Glivenko–Cantelli theorem, we know that Inline graphic uniformly for Inline graphic with Inline graphic. Using Theorem 2.5.2 in van der Vaart & Wellner (1996), we can show that Inline graphic converges to a Gaussian process with zero mean and variance-covariance function Inline graphic. Therefore, Inline graphic holds uniformly for Inline graphic.

Using Theorem 2.5.2 in van der Vaart & Wellner (1996) again, we can show that the finite-dimensional distributions of Inline graphic converge to Inline graphic which implies that Inline graphic is asymptotically tight. Therefore, for a random continuous functional, Lemma 3.1 in Chang (1990) yields

graphic file with name M401.gif

and converges in distribution to Inline graphic When Inline graphic and Inline graphic are independent, Inline graphic is a zero-mean process. According to Kuo (1975, Ch. 1, § 2),

graphic file with name M406.gif (A1)

follows the same distribution as Inline graphic, where the Inline graphic are independent standard normal random variables, and in general, the nonnegative constants Inline graphic depend on the distribution of Inline graphic.

Next we derive the sum of the Inline graphic. In view of (A1), we easily find that

graphic file with name M412.gif

Next we calculate the sum of Inline graphic. If Inline graphic and Inline graphic are independent, then

graphic file with name M416.gif

Using Lemma 1, the right-hand side of the above equation is equal to

graphic file with name M417.gif

By the strong law of large numbers for Inline graphic-statistics, we complete the proof of the first part.

Next, we deal with the second part. We approximate Inline graphic with the Inline graphic-statistics Inline graphic, which can be approximated with their projections. The projections of the Inline graphic-statistics are averages of independent and identically distributed random variables, and thus the asymptotic normality follows. Define Inline graphic to be the number of Inline graphic combinations from a set of Inline graphic elements. Define the Inline graphic-statistic

graphic file with name M427.gif

with the kernel Inline graphic, where Inline graphic is the permutation of three distinct elements Inline graphic. Define the Inline graphic-statistic

graphic file with name M432.gif

with the kernel Inline graphic, where Inline graphic is the permutation of three distinct elements Inline graphic. Define the Inline graphic-statistic

graphic file with name M437.gif

with the kernel Inline graphic, where Inline graphic is the permutation of three distinct elements Inline graphic. Using standard Inline graphic- and Inline graphic-statistic theory, we have

graphic file with name M443.gif

where the Inline graphic are the centralized projections of the Inline graphic-statistics Inline graphic, which are defined as

graphic file with name M447.gif (A2)

All the Inline graphic are independent and identically distributed. The second part of Theorem 3 can be proved with the classical central limit theorem

References

  1. Benjamini, Y., Madar, V. & Stark, P. B. (2013). A consistent multivariate test of association based on ranks of distances. Biometrika 100, 503–10. [Google Scholar]
  2. Blum, J. R. (1961). Distribution free tests of independence based on the sample distribution function. Ann. Math. Statist. 32, 485–98. [Google Scholar]
  3. Chang, M. N. (1990). Weak convergence of a self-consistent estimator of the survival function with doubly censored data. Ann. Statist. 18, 391–404. [Google Scholar]
  4. Escanciano, J. C. (2006). A consistent diagnostic test for regression models using projections. Economet. Theory 22, 1030–51. [Google Scholar]
  5. Gieser, P. W. & Randles, R. H. (1997). A nonparametric test of independence between two vectors. J. Am. Statist. Assoc. 92, 561–7. [Google Scholar]
  6. Hettmansperger, T. P. & Oja, H. (1994). Affine invariant multivariate multisample sign tests. J. R. Statist. Soc., B 56, 235–49. [Google Scholar]
  7. Hoeffding, W. (1948). A non-parametric test of independence. Ann. Math. Statist. 19, 546–57. [Google Scholar]
  8. Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28, 321–77. [Google Scholar]
  9. Jennrich, R. I. (1969). Asymptotic properties of non-linear least squares estimators. Ann. Math. Statist. 40, 633–43. [Google Scholar]
  10. Kendall, M. G. (1938). A new measure of rank correlation. Biometrika 30, 81–93. [Google Scholar]
  11. Kuo, H. H. (1975). Gaussian Measures in Banach Spaces. Lecture Notes in Mathematics 463. Berlin: Springer. [Google Scholar]
  12. Puri, M. & Sen, P. (1971). Nonparametric Methods in Multivariate Analysis. New York: Wiley. [Google Scholar]
  13. R Development Core Team (2017). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0, http://www.R-project.org.
  14. Serfling, R. L. (1980). Approximation Theorems in Mathematical Statistics. New York: Wiley. [Google Scholar]
  15. Spearman, C. (1904). The proof and measurement of association between two things. Am. J. Psychol. 15, 72–101. [PubMed] [Google Scholar]
  16. Székely, G. J. & Rizzo, M. L. (2010). Brownian distance covariance. Ann. Appl. Statist. 3, 1236–65. [Google Scholar]
  17. Székely, G. J. & Rizzo, M. L. (2013). The distance correlation Inline graphic-test of independence in high dimension. J. Mult. Anal. 117, 193–213. [Google Scholar]
  18. Székely, G. J., Rizzo, M. L. & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Ann. Statist. 35, 2769–94. [Google Scholar]
  19. Taskinen, S., Kankainen, A. & Oja, H. (2003). Sign test of independence between two random vectors. Statist. Prob. Lett. 62, 9–21. [Google Scholar]
  20. Taskinen, S., Oja, H. & Randles, R. H. (2005). Multivariate nonparametric tests of independence. J. Am. Statist. Assoc. 100, 916–25. [Google Scholar]
  21. Tracz, S. M., Elmore, P. B. & Pohlmann, J. T. (1992). Correlational meta-analysis: Independent and nonindependent cases. Educ. Psychol. Meas. 52, 879–88. [Google Scholar]
  22. van der Vaart, A. W. & Wellner, J. A. (1996). Weak Convergence and Empirical Processes. New York: Springer. [Google Scholar]
  23. Wilks, S. S. (1935). On the independence of Inline graphic sets of normally distributed statistical variables. Econometrica 3, 309–26. [Google Scholar]
  24. Wolfowitz, J. (1954). Generalization of the theorem of Glivenko–Cantelli. Ann. Math. Statist. 25, 131–8. [Google Scholar]

Articles from Biometrika are provided here courtesy of Oxford University Press

RESOURCES