On Brownian Distance Covariance and High Dimensional Data

Michael R Kosorok

doi:10.1214/09-AOAS312

. Author manuscript; available in PMC: 2010 Jun 22.

Published in final edited form as: Ann Appl Stat. 2009 Jan 1;3(4):1266–1269. doi: 10.1214/09-AOAS312

On Brownian Distance Covariance and High Dimensional Data

Michael R Kosorok ¹

PMCID: PMC2889501 NIHMSID: NIHMS195766 PMID: 20574547

Abstract

We discuss briefly the very interesting concept of Brownian distance covariance developed by Székely and Rizzo (2009) and describe two possible extensions. The first extension is for high dimensional data that can be coerced into a Hilbert space, including certain high throughput screening and functional data settings. The second extension involves very simple modifications that may yield increased power in some settings. We commend Székely and Rizzo for their very interesting work and recognize that this general idea has potential to have a large impact on the way in which statisticians evaluate dependency in data.

Keywords: Brownian distance covariance, Correlation, Hilbert spaces, U-statistics

1 Introduction and Assessment

The Brownian distance covariance and correlation proposed by Székely and Rizzo (2009) (abbreviated SR hereafter) is a very useful and elegant alternative to the standard measures of correlation and is based on several deep and non-trivial theoretical calculations developed earlier in Székely, Rizzo and Bakirov (2007) (abbreviated SRB hereafter). We congratulate the group on this very original and elegant work. The main result is that a single, simple statistic Inline graphic (X, Y) can be used to assess whether two random vectors X and Y, of possibly different respective dimensions p and q, are dependent based on an i.i.d. sample.

The proposed statistic Inline graphic (X, Y) estimates an interesting population parameter (X, Y) that the authors demonstrate can also be expressed as the covariance between independent Brownian motions W and W′, with p and q dimensional indices, evaluated at X and Y, respectively. Specifically, let W: ℝ^p ↦ ℝ be a real valued, tight, mean-zero Gaussian process with covariance |s|_p + |t|_p −|s − t|_p, for s, t ∈ ℝ^p, where |·|_r is the standard Euclidean norm in ℝ^r. Let W′ be similarly defined but for indices s, t ∈ ℝ^q and norm |·|_q. It can be shown that Inline graphic (X, Y) = E[W (X)W (X′)W′(Y)W′(Y′)], where (X′, Y′) is an independent copy of (X, Y), and where W ans W′ are independent of both (X, Y) and (X′, Y′). This justifies the designation “Brownian distance covariance.”

By replacing Brownian motion with other stochastic processes, a very wide array of alternative forms of correlation between vectors X and Y can be generated. In the special case where p = q = 1 and the stochastic processes W and W′ are the non-random identify functions centered respectively at E(X) and E(Y), Inline graphic (X, Y) = E[W (X)W (X′)W′(Y)W′(Y′)] = Cov²(X, Y), which is the standard Pearson product-moment covariance squared. Thus the results obtained by SR not only have a profound connection to Brownian motion, but also include traditional measures of dependence as special cases, while, at the same time, have the potential to generate many useful new measures of dependence through the use of other stochastic processes besides Brownian motion. This raises the very real possibility that a broadly applicable and unified theoretical and methodological framework for testing dependence could be developed.

The SR paper is therefore not only important for the specific results contained therein but also for the possibly far reaching consequences for future statistical research in both theory and applications. For the remainder of the paper, we describe two possible extensions of these results. The first extension is for high dimensional data that can be coerced into a Hilbert space, including certain high throughput screening and functional data settings. The second extension involves very simple modifications that may yield increased power in some settings. We first present some initial results and consequences of SR and SRB that will prove useful in later developments. We then present the Hilbert space extension with a few example applications. Some modifications leading to potential variations in power will then be described. The paper will then conclude with a brief discussion.

2 Some Initial Results

We now present a few initial results which will be useful in later sections. For a paired sample of size n, (X₁, Y₁), …, (X_n, Y_n), of realization of (X, Y), where X and Y are random variables from arbitrary normed spaces with respective norms ||·||_X and ||·||_Y, define, analogously to SR,

\begin{array}{l} T_{1} = \frac{1}{n^{2}} \sum_{k, l = 1}^{n} {| | X_{k} - X_{l} | |}_{X} {| | Y_{k} - Y_{l} | |}_{Y}, \\ T_{2} = \frac{1}{n^{2}} \sum_{k, l = 1}^{n} {| | X_{k} - X_{l} | |}_{X} \times \frac{1}{n^{2}} \sum_{k, l = 1}^{n} {| | Y_{k} - Y_{l} | |}_{Y}, \\ T_{3} = \frac{1}{n^{3}} \sum_{k = 1}^{n} \sum_{l, m = 1}^{n} {| | X_{k} - X_{l} | |}_{X} {| | Y_{k} - Y_{m} | |}_{Y}, \end{array}

and V_n(X, Y) = T₁ + T₂ − 2T₃. Also define

\begin{array}{l} T_{10} = E [{| | X_{1} - X_{2} | |}_{X} {| | Y_{1} - Y_{2} | |}_{Y}], \\ T_{20} = E [{| | X_{1} - X_{2} | |}_{X}] \times E [{| | Y_{1} - Y_{2} | |}_{Y}], \\ T_{30} = E [{| | X_{1} - X_{2} | |}_{X} {| | Y_{1} - Y_{3} | |}_{Y}], \end{array}

and V₀(X, Y) = T₁₀+T₂₀−2T₃₀. Also let V_n(X) = V_n(X, X) and V₀(X) = V₀(X, X); and let V_n(Y) = V_n(Y, Y) and V₀(Y) = V₀(Y, Y). This allows us to define also $R_{n} (X, Y) = V_{n} (X, Y) / \sqrt{V_{n} (X) V_{n} (Y)}$ and $R_{0} (X, Y) = V_{0} (X, Y) / \sqrt{V_{0} (X) V_{0} (Y)}$ , provided the denominators are non-zero (and defined to be zero otherwise). The main distinction between this and the definitions in SR is the use of arbitrary normed spaces.

Because this has a standard U-statistic structure, we have the following general result, the proof of which follows from standard theory for U-statistics (see, e.g., Chapter 12 of van der Vaart, 1998):

Lemma 1

Provided $E {| | X | |}_{X}^{4} < \infty$ and $E {| | Y | |}_{Y}^{4} < \infty$ , then $V_{n} (X, Y) \overset{P}{\to} V_{0} (X, Y), V_{n} (X) \overset{P}{\to} V_{0} (X)$ and $V_{n} (Y) \overset{P}{\to} V_{0} (Y)$ .

Remark 1

In the special case where X and Y are from finite-dimensional Euclidean spaces, we know from Theorems 1–4 of SR that V_n(X, Y), V_n(X), V_n(Y), V₀(X, Y), V₀(X) and V₀(Y) are all non-negative; that $V_{n} (X, Y) \leq \sqrt{V_{n} (X) V_{n} (Y)}$ and $V_{0} (X, Y) \leq \sqrt{V_{0} (X) V_{0} (Y)}$ ; that V₀(X) = 0 or V₀(Y) = 0 only when X or Y is trivial; that V_n(X) = 0 or V_n(Y) = 0 only when the X’s or Y ’s in the sample are all identical; that 0 ≤ R_n(X, Y), R₀(X, Y) ≤ 1; and that V₀(X, Y) = 0 only when X and Y are independent.

We now wish to generalize the above results in the finite-dimensional context to a class of norms more broad than Euclidean norms. These results will be useful for later sections. Let A and B be respectively p × p and q × q symmetric, positive definite matrices. Let a “tilde” placed over T₁, T₂, T₃, V_n, V₀, etc., denote the quantity obtained by replacing |x|_p with ${| | x | |}_{A, p} = \sqrt{x^{'} A x}$ and |y|_q with ${| | y | |}_{B, q} = \sqrt{y^{'} B y}$ in V_n, V₀, etc.. For example ${\tilde{T}}_{1} = n^{- 2} \sum_{k, l = 1}^{n} {| | X_{k} - X_{l} | |}_{A, p} {| | Y_{k} - Y_{l} | |}_{B, q}$ . We now have the following very simple extension:

Lemma 2

Let A and B be symmetric and positive definite. Then Ṽ_n(X, Y), Ṽ_n(X), Ṽ_n(Y), Ṽ₀(X, Y), Ṽ₀(X) and Ṽ₀(Y) are all non-negative; and all of the other results in Remark 1 remain true with a “tilde” placed over the given quantities. Moreover, Ṽ₀(X, Y) = 0 if and only if V₀(X, Y) = 0.

Proof

For a symmetric, positive definite matrix C, let C^1/2 denote the symmetric square root of C, i.e., C^1/2C^1/2 = C. Note that such a square root always exists and, moreover, is always positive definite. Now define U = A^1/2X and V = B^1/2Y, and note that |U|_p = ||X||_A,p and |V|_q = ||Y||_B,q. Now replace X and Y in the quantities listed in Remark 1 with U and V. By the symmetry properties of these norms, the first part of the lemma up to just before the last sentence is proved. The last sentence follows from the simple observation that U and V are independent if and only if X and Y are independent by the positive definiteness of A^1/2 and B^1/2. Since V₀(X, Y) = 0 if and only if X and Y are independent, we now conclude that Ṽ₀(X, Y) = 0 if and only if X and Y are independent. The entire lemma now follows.

The third initial result involves some non-trivial properties of independent components in the finite dimensional setting. Suppose for X ∈ ℝ^p and Y ∈ ℝ^q, where p = p₁ + p₂ and q = q₁ + q₂, we have

X = (\begin{matrix} X^{(1)} + X^{(2)} \\ X^{(3)} \end{matrix}), and Y = (\begin{matrix} Y^{(1)} + Y^{(2)} \\ Y^{(3)} \end{matrix}),

where X⁽¹⁾, X⁽²⁾ ∈ ℝ^p₁, X⁽³⁾ ∈ ℝ^p₂, Y ⁽¹⁾, Y ⁽²⁾ ∈ ℝ^q₁, y⁽³⁾ ∈ ℝ^q₂; and suppose also that the two vectors X̃ = ([X⁽²⁾]^T, [X⁽³⁾]^T)^T and Ỹ = ([Y ⁽²⁾]^T, [Y ⁽³⁾]^T)^T are mutually independent and also independent of X⁽¹⁾ and Y ⁽¹⁾. We have the following somewhat surprising result:

Lemma 3

V₀(X, Y) = V₀(X⁽¹⁾, Y⁽¹⁾).

Proof

For any t ∈ ℝ^p and s ∈ ℝ^q, with $t = {(t_{1}^{T}, t_{2}^{T})}^{T}, s = {(s_{1}^{T}, s_{2}^{T})}^{T}$ , t₁ ∈ ℝ^p₁, t₂ ∈ ℝ^p₂, s₁ ∈ ℝ^q₁, and s₂ ∈ ℝ^q₂, the independence assumptions and standard characteristic function properties yield

\begin{array}{l} ∣ E exp (i [t^{T} X + s^{T} Y]) - E exp ({i t}^{T} X) E exp ({i s}^{T} Y) ∣ \\ = | f_{\tilde{X}} (t) f_{\tilde{Y}} (s) {E exp (i [t_{1}^{T} X^{(1)} + s_{1}^{T} Y^{(1)}]) - E exp (i t_{1}^{T} X^{(1)}) E exp (i s_{1}^{T} Y^{(1)})} | \\ = | E exp (i [t_{1}^{T} X^{(1)} + s_{1}^{T} Y^{(1)}]) - E exp (i t_{1}^{T} X^{(1)}) E exp (i s_{1}^{T} Y^{(1)}) | \\ = | f_{X^{(1)}, Y^{(1)}} (t_{1}, s_{1}) - f_{X^{(1)}} (t_{1}) f_{Y^{(1)}} (s_{1}) | . \end{array}

Combining this with Theorems 1 and 2 of SR, we obtain that

V_{0} (X, Y) = \frac{1}{c_{p} c_{q}} \int_{R^{p + q}} \frac{{| f_{X^{(1)}, Y^{(1)}} (t_{1}, s_{1}) - f_{X^{(1)}} (t_{1}) f_{Y^{(1)}} (s_{1}) |}^{2}}{∣ t ∣_{p}^{p + 1} ∣ s ∣_{q}^{q + 1}} dtds .

Note that the right-hand side is invariant with respect to the distributions of X̃ and Ỹ, and thus we can replace X̃ and Ỹ with degenerate random variables fixed at zero. Doing the same on the left-hand side yields the desired result.

3 High Dimensional Extensions

The basic idea we propose is to extend the results to Hilbert spaces which can be approximated by sequences of finite-dimensional Euclidean spaces. We will give a few examples shortly. First, we give the conditions for our results. Assume X is a random variable in a Hilbert space H_X with inner produce 〈·, ·〉_X and norm ||·||_X. A superscript * will be used to denote adjoint. Say that X is “finitely approximable” if there exists a sequence X_m ∈ H_X such that for each m ≥ 1, there exists a linear map M_m: H_x ↦ ℝ^p_m for which $M_{m}^{*} M_{m}$ is symmetric and positive definite on ℝ^p_m, p_m is non-decreasing, X_m = M_m(U_m) for some sequence of Euclidean random variables U_m, and that $E {| | X_{m} - X | |}_{X}^{2} \to 0$ as m → ∞. Note that we can assume that $M_{m}^{*} M_{m}$ is the identity without loss of generality. This follows since we can always replace U_m with Ũ_m = A_mU_m and M_m with ${\tilde{M}}_{m} = M_{m} A_{m}^{- 1}$ , where $A_{m} = {(M_{m}^{*} M_{m})}^{1 / 2}$ , to yield X_m = M̃_mŨ_m with ${\tilde{M}}_{m}^{*} {\tilde{M}}_{m} = A_{m}^{- 1} (M_{m}^{*} M_{m}) A_{m}^{- 1}$ being the identity.

Example 1

Let X be functional data with realizations that are functions in the Hilbert space H_X = L₂[0, 1] consisting of functions f: [0, 1] ↦ ℝ satisfying ${| | f | |}_{X}^{2} = \int_{0}^{1} f^{2} (t) d t < \infty$ . Specifically, we will assume that

X (t) = \sum_{i = 1}^{\infty} λ_{i} Z_{i} φ_{i} (t),

where Z₁, Z₂, … are independent random variables with mean zero and variance 1; φ₁, φ₂, … form an orthonormal basis in L₂[0, 1]; and λ₁, λ₂, … are fixed constants satisfying $\sum_{i = 1}^{n} λ_{i}^{2} < \infty$ . This formulation can yield a large variety of tight stochastic processes and can be a realistic model for some kinds of functional data.

Let p_m = m, U_m = (λ₁Z₁, …, λ_mZ_m)^T, and, for any vector a ∈ ℝ^p_m, $M_{m} (a) = \sum_{i = 1}^{m} a_{i} φ_{i} (t)$ . Clearly, X_m = M_m(U_m) is in H_X almost surely, since ${| | X_{m} | |}_{X} = \sum_{i = 1}^{m} λ_{i}^{2} Z_{i}^{2}$ is bounded almost surely. Moreover, for any f ∈ L₂[0, 1], it can be shown that

M_{m}^{*} (f) = (\begin{matrix} \int_{0}^{1} φ_{1} (s) f (s) d s \\ ⋮ \\ \int_{0}^{1} φ_{m} (s) f (s) d s \end{matrix}),

and thus $M_{m}^{*} M_{m}$ is the identity by the orthonormality of the basis and is therefore positive definite. Since $\sum_{i = 1}^{\infty} λ_{i}^{2} < \infty$ ,

\begin{array}{l} E {| | X - X_{m} | |}_{X}^{2} = E {‖ \sum_{i = m + 1}^{\infty} λ_{i} Z_{i} φ_{i} (t) ‖}_{X}^{2} \\ = \sum_{i = m + 1}^{\infty} λ_{i}^{2} \to 0, \end{array}

as m → ∞. Thus X is finitely approximable.

Example 2

This is basically the same as Example 1, except that we will not require the basis functions to be orthogonal. Specifically, let X(t) be as given in (1), with the basis functions satisfying $\int_{0}^{1} φ_{i}^{2} (s) d s = 1$ , for all i ≥ 1, but not necessary being mutually orthogonal. Let $a_{i, j} = \int_{0}^{1} φ_{i} (s) φ_{j} (s) d s$ , for i, j ≥ 1, and define A_m to be the m × m matrix with entry a_i,j for row i and column j for 1 ≤ i, j ≤ m. Assume that A is positive definite for each m ≥ 1 and also assume that ${lim}_{m \to \infty} \sum_{i, j = m + 1}^{\infty} λ_{i} λ_{j} a_{i, j} = 0$ . If we now follow parallel calculations to those done in Example 1, we can readily deduce that with $X_{m} = \sum_{i = 1}^{m} λ_{i} Z_{i} φ_{i} (t)$ , we have M_m and $M_{m}^{*}$ defined as before, but with $M_{m}^{*} M_{m} = A_{m}$ instead of the identity, while $E {| | X - X_{m} | |}_{X}^{2} \to 0$ also as before. The increased flexibility enlarges the scope of stochastic processes achievable to include, for example, Brownian motion.

Example 3

Let X = (X⁽¹⁾, X⁽²⁾, …)^T be an infinitely long Euclidean vector in ℓ₂, i.e., $\sum_{i = 1}^{\infty} {[X^{(i)}]}^{2} < \infty$ almost surely; and assume that, after permuting the indices if necessary,

\sum_{i = m + 1}^{\infty} E {[X^{(i)}]}^{2} \to 0,

as m → ∞. It is fairly easy to see that if we let X_m be a vector with the first m elements being identical to the first m elements of X but with all remaining elements equal to zero, then $E {| | X - X_{m} | |}_{X}^{2} \to 0$ , as m → ∞, and all of the remaining conditions for finite approximability are satisfied. This example may be applicable to certain high throughput screening settings where the vector of measurements may be arbitrarily high-dimensional.

The following lemma tells us that the range-related properties of Brownian distance covariance are preserved for finitely approximable random variables:

Lemma 4

Assume that X and Y are both finitely approximable random variables in Hilbert spaces. Then V_n(X, Y), V_n(X), V_n(Y), V₀(X, Y), V₀(X) and V₀(Y) are all non-negative; $V_{n} (X, Y) \leq \sqrt{V_{n} (X) V_{n} (Y)}; V_{0} (X, Y) \leq \sqrt{V_{0} (X) V_{0} (Y)}$ ; and 0 ≤ R_n(X, Y), R₀(X, Y) ≤ 1.

Proof

Let X_m and Y_m be sequences such that $E {| | X - X_{m} | |}_{X}^{2} \to 0$ and $E {| | Y - Y_{m} | |}_{Y}^{2} \to 0$ as m → ∞. Using simple algebra, we can verify that V₀(X_m, Y_m) → V₀(X, Y) which implies V₀(X, Y) ≥ 0. Similar arguments verify the desired results for V₀(X), V₀(Y) and R₀(X, Y). Now, for a sample of size n, (X₁, Y₁), …, (X_n, Y_n), we can create a sequence of samples (X₁_m, Y₁_m), …, (X_nm, Y_nm), such that $\sum_{i = 1}^{n} (E {| | X_{i} - X_{i m} | |}_{X}^{2} + E {| | Y_{i} - Y_{i m} | |}_{Y}^{2}) \to 0$ by finite approximability. Let $V_{n}^{(m)} (X, Y)$ be the same as V_n(X, Y) but with the m’th approximating sample replacing the sample observations. Since convergence in mean implies convergence in probability, we can apply basic algebra to verify that $V_{n}^{(m)} (X, Y) \overset{P}{\to} V_{n} (X, Y)$ as m → ∞. Similar arguments verify the desired results for V_n(X), V_n(Y) and R_n(X, Y), and this completes the proof.

Our ultimate goal in this section, however, is to show that R₀(X, Y) has the same implications for assessing dependence for finitely approximable Hilbert spaces as it does for finite dimensional settings. This is actually quite challenging, and we are only able to achieve part of the goal in this paper. The following is our first result in this direction:

Lemma 5

Suppose X and Y are random variables in finitely approximable Hilbert spaces. Then R₀(X, Y) > 0 implies that X and Y are dependent.

Proof

Assume that R₀(X, Y) > 0 but that X and Y are independent. By finite approximability, there exists a sequence of paired random variables (X_m, Y_m) such that X_m and Y_m are independent for each m ≥ 0, $E {| | X - X_{m} | |}_{X}^{2} \to 0$ , and $E {| | Y - Y_{m} | |}_{Y}^{2} \to 0$ . This implies that R₀(X_m, Y_m) = 0 for all m ≥ 0. Since also R₀(X_m, Y_m) → R₀(X, Y), we have a contradiction. Hence X and Y are dependent.

If we could also show that R₀(X, Y) = 0 implies independence, we would have essentially full homology with the finite dimensional case. It is unclear how to show this in general, and it may not even be true in general. However, it is certainly true for an interesting special case which we now present.

Let X and Y be random variables in finitely approximable Hilbert spaces. Suppose there exists linear maps M: H_X ↦ H_X and N: H_Y ↦ H_Y with adjoints for which both M*M and N*N are identities, and that MX = X₁ + X₂ and NY = Y₁ + Y₂, where $X_{1} \in H_{X}^{(1)}$ and $Y_{1} \in H_{Y}^{(1)}, H_{X}^{(1)}$ and $H_{Y}^{(2)}$ are finite-dimensional subspaces of H_X and H_Y, respectively, and that X₂ and Y₂ are mutually independent and independent of (X₁, Y₁). We will call a random pair (X, Y) that satisfies these conditions “at most finitely dependent.” For example, paired functional data (X, Y) could be at most finitely dependent if all possible dependencies between the two populations X and Y are attributable to at most a few principle functions (or principle components) in each population and that the remaining components are independent noise.

Example 4

Suppose that we are interested in determining whether X and Y are independent, where X is either a functional observation or some other very high dimensional observation and Y is a continuous outcome of interest such as a time to an event. Suppose also that X is finitely approximable and that any potential dependence of Y on X is solely due to a latent set of finite principle components of X. Such a pair (X, Y) would be at most finitely dependent.

The following lemma on finitely dependent data is the final result of this section:

Lemma 6

Suppose that X and Y are finitely approximable random variables in Hilbert spaces and that (X, Y) is at most finitely dependent. Then R₀(X, Y) ≥ 0 and the inequality is strict if and only if X and Y are dependent.

Proof

Note first that ${| | M X | |}_{X}^{2} = {〈 M X, M X 〉}_{X} = {〈 M^{*} M X, X 〉}_{X} = {〈 X, X 〉}_{X} = {| | X | |}_{X}^{2}$ and, similarly, ||NY ||_Y = ||Y ||_Y. Since R₀(X, Y) is a function involving only the norms of X and Y, we can assume without loss of generality that N and M are identities. Thus we will simply assume that X = X₁ + X₂ and Y = Y₁ + Y₂ hereafter. Let (X₂_m, Y₂_m) be a sequence of paired random variables in H_X × H_Y such that $E {| | X_{2} - X_{2 m} | |}_{X}^{2} \to 0$ and $E {| | Y_{2} - Y_{2 m} | |}_{Y}^{2} \to 0$ , and where, for each m ≥ 1, X₂_m and Y₂_m are mutually independent and also independent of (X₁, Y₁).

Now let X̂_m = X₁+X₂_m and Ŷ_m = Y₁+Y₂_m, and note that both X̂_m and Ŷ_m are finite dimensional with R₀(X̂_m, Ŷ_m) → R₀(X, Y). Let p₁ and q₁ be the respective dimensions of X₁ and Y₁, p₂_m and q₂_m be the respective dimensions of X₂_m and Y₂_m, and let p_m = p₁ + p₂_m and q_m = q₁ + q₂_m. Let $X_{2 m}^{(1)}$ be the projection of X₂_m onto $H_{X}^{(1)}, Y_{2 m}^{(1)}$ be the projection of Y₂_m onto $H_{Y}^{(1)}$ , and let $X_{2 m}^{(2)} = X_{2 m} - X_{2 m}^{(1)}$ and $Y_{2 m}^{(2)} = Y_{2 m} - Y_{2 m}^{(1)}$ . By the finite-dimensionality of X₁, X₂_m, Y₁ and Y₂_m, there exists linear maps $A_{1} : R^{p_{1}} \mapsto H_{X}^{(1)}$ , A₂_m: ℝ^p_2m ↦ H_X, $B_{1} : R^{q_{1}} \mapsto H_{Y}^{(1)}$ , and B₂_m: ℝ^q_2m ↦ H_Y, such that $A_{1}^{*} A_{1}, A_{2 m}^{*} A_{2 m}, B_{1}^{*} B_{1}$ and $B_{2 m}^{*} B_{2 m}$ are all identities and that X₁ = A₁U₁, $X_{2 m}^{(1)} = A_{1} U_{2 m}^{(1)}, X_{2 m}^{(2)} = A_{2 m} U_{2 m}^{(2)}$ , Y₁ = B₁Z₁, $Y_{2 m}^{(1)} = B_{1} Z_{2 m}^{(1)}$ , and $Y_{2 m}^{(2)} = B_{2 m} Z_{2 m}^{(2)}$ , for random vectors U₁, $U_{2 m}^{(1)} \in R^{p_{1}}, U_{2 m}^{(2)} \in R^{p_{2 m}}$ , Z₁, $Z_{2 m}^{(1)} \in R^{q_{1}}$ , and $Z_{2 m}^{(2)} \in R^{q_{2 m}}$ , where $U_{2 m} = {({[U_{2 m}^{(1)}]}^{T}, {[U_{2 m}^{(2)}]}^{T})}^{T}$ and $Z_{2 m} = {({[Z_{2 m}^{(1)}]}^{T}, {[Z_{2 m}^{(2)}]}^{T})}^{T}$ are mutually independent and independent of (U₁, Z₁).

If we let ${\hat{U}}_{m} = {({[U_{1} + U_{2 m}^{(1)}]}^{T}, {[U_{2 m}^{(2)}]}^{T})}^{T}$ and ${\hat{Z}}_{m} = {({[Z_{1} + Z_{2 m}^{(1)}]}^{T}, {[Z_{2 m}^{(2)}]}^{T})}^{T}$ , the above formulation yields that ||X̂_m||_X = |Û_m|_{p_m} and ||Ŷ_m||_Y = |Ẑ|_{q_m}. By Lemma 3, we now have that R₀(Û_m, Ẑ_m) = R₀(U₁, Z₁) which does not depend on m. Since $A_{1}^{*} A_{1}$ and $B_{1}^{*} B_{1}$ are both identities, we also have that R₀(U₁, Z₁) = R₀(X₁, Y₁), and thus R₀(X̂_m, Ŷ_m) = R₀(Û_m, Ẑ_m) → R₀(X₁, Y₁), as m → ∞. This now implies that R₀(X, Y) = R₀(X₁, Y₁), which yields the desired result.

4 Increasing Power

We now briefly discuss the issue of power of tests based on R_n(X, Y). By Lemma 2, we observe that there are many different versions of the statistic R_n(X, Y), based on different choices of matrices A and B in the norms ||·||_A,p and ||·||_B,q, that all have the ability to assess general dependence. Is it possible to choose A and B in a way that provides optimal power for certain fixed or contiguous alternatives? The answer should be yes since it appears that A and B could potentially be selected to emphasize dependence for certain subcomponents of X and Y while deemphasizing dependence for other subcomponents. The answer to this question, unfortunately, seems to be very hard to pin down rigorously. We do not pursue this further here, but it does seem to be a potentially important issue that deserves further attention.

5 Discussion

We have briefly proposed two generalizations of the Brownian distance covariance, one based on alternative norms to Euclidean norms, and the other based on infinite dimensional data. The first generalization raises the possibility of fine-tuning the statistics proposed in SR to increase power, and the second generalization opens the door for applicability of the results in SR to a broader array of data types, including infinite dimensional data and data with dimension increasing with sample size. However, for both of these generalizations, there remain many open questions that could lead to important further improvements. In either case, the results of SR are very important both practically and theoretically and should result in many important future developments in both the application and theory of statistics.

Acknowledgments

This research was supported in part by U.S. National Institutes of Health grant CA075142.

References

Székely GJ, Rizzo ML. Brownian distance covariance. Annals of Applied Statistics. 2009 doi: 10.1214/09-AOAS312. In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
Székely GJ, Rizzo ML, Bakirov NK. Measuring and testing dependence by correlation of distances. Annals of Statistics. 2007;35:2769–2794. [Google Scholar]
van der Vaart AW. Asymptotic Statistics. Cambridge University Press; New York: 1998. [Google Scholar]

[R1] Székely GJ, Rizzo ML. Brownian distance covariance. Annals of Applied Statistics. 2009 doi: 10.1214/09-AOAS312. In press. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Székely GJ, Rizzo ML, Bakirov NK. Measuring and testing dependence by correlation of distances. Annals of Statistics. 2007;35:2769–2794. [Google Scholar]

[R3] van der Vaart AW. Asymptotic Statistics. Cambridge University Press; New York: 1998. [Google Scholar]

PERMALINK

On Brownian Distance Covariance and High Dimensional Data

Michael R Kosorok

Abstract

1 Introduction and Assessment

2 Some Initial Results

Lemma 1

Remark 1

Lemma 2

Proof

Lemma 3

Proof

3 High Dimensional Extensions

Example 1

Example 2

Example 3

Lemma 4

Proof

Lemma 5

Proof

Example 4

Lemma 6

Proof

4 Increasing Power

5 Discussion

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

On Brownian Distance Covariance and High Dimensional Data

Michael R Kosorok

Abstract

1 Introduction and Assessment

2 Some Initial Results

Lemma 1

Remark 1

Lemma 2

Proof

Lemma 3

Proof

3 High Dimensional Extensions

Example 1

Example 2

Example 3

Lemma 4

Proof

Lemma 5

Proof

Example 4

Lemma 6

Proof

4 Increasing Power

5 Discussion

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases