Skip to main content
Oxford University Press logoLink to Oxford University Press
. 2017 Feb 22;104(1):129–139. doi: 10.1093/biomet/asw071

Generalized R-squared for detecting dependence

X Wang 1,2,3, B Jiang 1,2,3, J S Liu 1,2,3,
PMCID: PMC5793683  PMID: 29430028

SUMMARY

Detecting dependence between two random variables is a fundamental problem. Although the Pearson correlation coefficient is effective for capturing linear dependence, it can be entirely powerless for detecting nonlinear and/or heteroscedastic patterns. We introduce a new measure, G-squared, to test whether two univariate random variables are independent and to measure the strength of their relationship. The G-squared statistic is almost identical to the square of the Pearson correlation coefficient, R-squared, for linear relationships with constant error variance, and has the intuitive meaning of the piecewise R-squared between the variables. It is particularly effective in handling nonlinearity and heteroscedastic errors. We propose two estimators of G-squared and show their consistency. Simulations demonstrate that G-squared estimators are among the most powerful test statistics compared with several state-of-the-art methods.

Keywords: Bayes factor, Coefficient of determination, Hypothesis test, Likelihood ratio

1. INTRODUCTION

The Pearson correlation coefficient is widely used to detect and measure the dependence between two random quantities. The square of its least-squares estimate, popularly known as R-squared, is often used to quantify how linearly related two random variables are. However, the shortcomings of the R-squared statistic as a measure of the strength of dependence are also significant, as discussed recently by Reshef et al. (2011), which has inspired the development of many new methods for detecting dependence.

The Spearman correlation calculates the Pearson correlation coefficient between rank statistics. Although more robust than the Pearson correlation, this method still cannot capture nonmonotone relationships. The alternating conditional expectation method was introduced by Breiman & Friedman (1985) to approximate the maximal correlation between Inline graphic and Inline graphic, i.e., to find optimal transformations of the data, Inline graphic and Inline graphic, such that their correlation is maximized. The implementation of this method has limitations, because it is infeasible to search through all possible transformations. Estimating mutual information is another popular approach due to the fact that the mutual information is zero if and only if Inline graphic and Inline graphic are independent. Kraskov et al. (2004) proposed a method that involves estimating the entropy of Inline graphic, Inline graphic and Inline graphic separately. The method was claimed to be numerically exact for independent cases, and effective for high-dimensional variables. An energy distance-based method (Szèkely et al., 2007; Szèkely & Rizzo, 2009) and a kernel-based method (Gretton et al., 2005, 2012) for solving the two-sample test problem appeared separately in the statistics and machine learning literatures, and have corresponding usage in independence tests. The two methods were recently shown to be equivalent (Sejdinovic et al., 2013). Methods based on empirical cumulative distribution functions (Hoeffding, 1948), empirical copula (Genest & Rémillard, 2004) and empirical characteristic functions (Kankainen & Ushakov, 1998; Huskova & Meintanis, 2008) have also been proposed for detecting dependence.

Another set of approaches is based on discretization of the random variables. Known as grid-based methods, they are primarily designed to test for independence between univariate random variables. Reshef et al. (2011) introduced the maximum information coefficient, which focuses on the generality and equitability of a dependence statistic; two more powerful estimators for this quantity were suggested by Reshef et al. (arXiv:1505.02213). Equitability requires that the same value of the statistic imply the same amount of dependence regardless of the type of the underlying relationship, but it is not a well-defined mathematical concept. We show in the Supplementary Material that the equitability of G-squared is superior to all other independence testing statistics for a wide range of functional relationships. Heller et al. (2016) proposed a grid-based method which utilizes the Inline graphic statistic to test independence and is a distribution-free test. Blyth (1994) and Doksum et al. (1994) discussed using the correlation curve to measure the strength of the relationship. However, a direct use of nonparametric curve estimation may rely too heavily on the smoothness of the relationship; furthermore, it cannot deal with heteroscedastic noise.

The Inline graphic statistic proposed in this paper is derived from a regularized likelihood ratio test for piecewise-linear relationships and can be viewed as an integration of continuous and discrete methods. It is a function of both the conditional mean and the conditional variance of one variable given the other, so it is capable of detecting general functional relationships with heteroscedastic error variances. An estimate of Inline graphic can be derived via the same likelihood ratio approach as Inline graphic when the true underlying relationship is linear. Thus, it is reasonable that Inline graphic is almost identical to Inline graphic for linear relationships. Efficient estimates of Inline graphic can be computed quickly using a dynamic programming method, whereas the methods of Reshef et al. (2011) and Heller et al. (2016) consider grids on two variables simultaneously and hence require longer computational times. We will also show that, in terms of power, Inline graphic is one of the best statistics for independence testing when considering a wide range of functional relationships.

2. MEASURING DEPENDENCE WITH G-SQUARED

2.1. Defining Inline graphic as a generalization of Inline graphic

The R-squared statistic measures how well the data fit a linear regression model. Given Inline graphic with Inline graphic, the standard estimate of R-squared can be derived from a likelihood ratio test statistic for testing Inline graphic against Inline graphic, i.e.,

R2=1{L(θ^)L0(θ^0)}2/n,

where Inline graphic and Inline graphic are the maximized likelihoods under Inline graphic and Inline graphic.

Throughout the paper, we let Inline graphic and Inline graphic be univariate continuous random variables. As a working model, we assume that the relationship between Inline graphic and Inline graphic can be characterized as Inline graphic, with Inline graphic and Inline graphic. If Inline graphic and Inline graphic are independent, then Inline graphic and Inline graphic. Now let us look at the piecewise-linear relationship

f(X)=μh+βhX,σX2=σh2,ch1<Xch,

where Inline graphic are called the breakpoints. While this working model allows for heteroscedasticity, it requires constant variance within each segment between two consecutive breakpoints. Testing whether Inline graphic and Inline graphic are independent is equivalent to testing whether Inline graphic and Inline graphic. Given Inline graphic, the likelihood ratio test statistic can be written as

LR=exp(n2logν^2h=1Knh2logσ^h2),

where Inline graphic is the overall sample variance of Inline graphic and Inline graphic is the residual variance after regressing Inline graphic on Inline graphic for Inline graphic. Because Inline graphic is a transformation of the likelihood ratio and converges to the square of the Pearson correlation coefficient, we perform the same transformation on Inline graphic. The resulting test statistic converges to a quantity related to the conditional mean and the conditional variance of Inline graphic on Inline graphic. It is easy to show that as Inline graphic,

1(LR)2/n1exp[E{logvar(YX)}logvar(Y)]. (1)

When Inline graphic, the relationship degenerates to a simple linear relation and Inline graphic is exactly Inline graphic.

More generally, because a piecewise-linear function can approximate any almost everywhere continuous function, we can employ the same hypothesis testing framework as above to derive (1) for any such approximation. Thus, for any pair of random variables Inline graphic, the following concept is a natural generalization of R-squared:

G2YX=1exp[E{logvar(YX)}logvar(Y)],

in which we require that Inline graphic. Evidently, Inline graphic lies between 0 and 1, and is equal to zero if and only if both Inline graphic and Inline graphic are constant. The definition of Inline graphic is closely related to the R-squared defined by segmented regression (Oosterbaan & Ritzema, 2006), discussed in the Supplementary Material. We symmetrize Inline graphic to arrive at the following quantity as the definition of the G-squared statistic:

G2=max(G2YX,G2XY),

provided Inline graphic. Thus, Inline graphic if and only if Inline graphic, Inline graphic, Inline graphic and Inline graphic are all constant, which is not equivalent to independence of Inline graphic and Inline graphic. In practice, however, dependent cases with Inline graphic are rare.

2.2. Estimation of Inline graphic

Without loss of generality, we focus on the estimation of Inline graphic; Inline graphic can be estimated in the same way by interchanging Inline graphic and Inline graphic. When Inline graphic with Inline graphic for an almost everywhere continuous function Inline graphic, we can use a piecewise-linear function to approximate Inline graphic and estimate Inline graphic. However, in practice the number and locations of the breakpoints are unknown. We propose two estimators of Inline graphic, the first aiming to find the maximum penalized likelihood ratio among all possible piecewise-linear approximations, and the second focusing on a Bayesian average of all approximations.

Suppose that we have Inline graphic sorted independent observations, Inline graphic, such that Inline graphic. For the set of breakpoints, we only need to consider Inline graphic. Each interval Inline graphic is called a slice of the observations, so that Inline graphic divide the range of Inline graphic into Inline graphic non-overlapping slices. Let Inline graphic denote the number of observations in slice Inline graphic, and let Inline graphic denote a slicing scheme of Inline graphic, i.e., Inline graphic if Inline graphic, which is abbreviated as Inline graphic whenever the meaning is clear. Let Inline graphic be the number of slices in Inline graphic and let Inline graphic denote the minimum size of all the slices.

To avoid overfitting when maximizing loglikelihood ratios both over unknown parameters and over all possible slicing schemes, we restrict the minimum size of each slice to Inline graphic and maximize the loglikelihood ratio with a penalty on the number of slices. For simplicity, let Inline graphic. Thus, we focus on the penalized loglikelihood ratio

nD(YS,λ0)=2logLRSλ0(|S|1)logn, (2)

where Inline graphic is the likelihood ratio for Inline graphic and Inline graphic is the penalty incurred for one additional slice. From a Bayesian perspective, this is equivalent to assigning the prior distribution for the number of slices to be proportional to Inline graphic. Suppose that each observation Inline graphic has probability Inline graphic of being the breakpoint independently. Then the probability of a slicing scheme Inline graphic is

pn|S|1(1pn)n|S|(pn1pn)|S|1=nλ0(|S|1)/2.

When Inline graphic, the statistic Inline graphic is equivalent to the Bayesian information criterion (Schwarz, 1978) up to a constant.

Treating the slicing scheme as a nuisance parameter, we can maximize over all allowable slicing schemes to obtain that

D(YX,λ0)=maxS:mSmD(YS,λ0).

Our first estimator of Inline graphic, which we call Inline graphic with m standing for the maximum likelihood ratio, can be defined as

Gm2(YX,λ0)=1exp{D(YX,λ0)}.

Hence, the overall G-squared can be estimated as

Gm2(λ0)=max{Gm2(YX,λ0),Gm2(XY,λ0)}.

By definition, Inline graphic lies between 0 and 1, and Inline graphic when the optimal slicing schemes for both directions have only one slice. Later, we will show that when Inline graphic and Inline graphic follow a bivariate normal distribution, Inline graphic almost surely for large Inline graphic.

Another attractive way to estimate Inline graphic is to integrate out the nuisance slicing scheme parameter. A full Bayesian approach would require us to compute the Bayes factor (Kass & Raftery, 1995), which may be undesirable since we do not wish to impose too strong a modelling assumption. On the other hand, however, the Bayesian formalism may guide us to a desirable integration strategy for the slicing scheme. We therefore put the problem into a Bayes framework and compute the Bayes factor for comparing the null and alternative models. The null model is only one model while the alternative is any piecewise-linear model, possibly with countably infinite pieces. Let Inline graphic be the marginal probability of the data under the null. Let Inline graphic be the prior probability for slicing scheme Inline graphic and let Inline graphic denote the marginal probability of the data under Inline graphic. The Bayes factor can be written as

BF=S:msmωS×pS(y1,,yn)p0(y1,,yn), (3)

where Inline graphic is the minimum size of all the slices of Inline graphic. The marginal probabilities are not easy to compute even with proper priors. Schwarz (1978) states that if the data distribution is in the exponential family and the parameter is of dimension Inline graphic, the marginal probability of the data can be approximated as

p(y1,,yn)Lexp{k(lognlog2π)/2}, (4)

where Inline graphic is the maximized likelihood. In our set-up, the number of parameters Inline graphic for the null model is 2, and for an alternative model with a slicing scheme Inline graphic it is Inline graphic. Inserting expression (4) into both the numerator and the denominator of (3), we obtain

BFS:msmωSLRSexp{(3|S|2)(lognlog2π)/2}. (5)

If we take Inline graphic, which corresponds to the penalty term in (2) and is involved in defining Inline graphic, the approximated Bayes factor can be restated as

BF(λ0)=[S:mSmn{λ0(|S|1)}/2]1S:mSm(2πn)(3|S|2)/2exp{n2D(YS,λ0)}. (6)

As we will discuss in § 2.5, Inline graphic can serve as a marginal likelihood function for Inline graphic and be used to find an optimal Inline graphic suitable for a particular dataset. This quantity also looks like an average version of Inline graphic, but with an additional penalty. Since Inline graphic can take values below 1, its transformation Inline graphic, as in the case where we derived Inline graphic via the likelihood ratio test, can take negative values, especially when Inline graphic and Inline graphic are independent. It is therefore not an ideal estimator of Inline graphic.

By removing the model size penalty term in (5), we obtain a modified version, which is simply a weighted average of the likelihood ratios and is guaranteed to be greater than or equal to 1:

BF(λ0)=[S:mSmn{λ0(|S|1)}/2]1S:mSmexp{n2D(YS,λ0)}.

We can thus define a quantity similar to our likelihood formulation of R-squared,

Gt2(YX,λ0)=1BF(λ0)2/n,

which we call the total G-squared, and define

Gt2(λ0)=max{Gt2(YX,λ0),Gt2(XY,λ0)}.

We show later that Inline graphic and Inline graphic are both consistent estimators of Inline graphic.

2.3. Theoretical properties of the Inline graphic estimators

In order to show that Inline graphic and Inline graphic converge to Inline graphic as the sample size goes to infinity, we introduce the notation Inline graphic, Inline graphic, Inline graphic and Inline graphic, and assume the following regularity conditions.

condition 1.

The random variables Inline graphic and Inline graphic are bounded continuously with finite variances such that Inline graphic almost everywhere for some constant Inline graphic.

condition 2.

The functions Inline graphic, Inline graphic, Inline graphic and Inline graphic have continuous derivatives almost everywhere.

condition 3.

There exists a constant Inline graphic such that

max{|μX(y)|,|νX(y)|}CνX(y),max{|μY(x)|,|νY(x)|}CνY(x)

almost surely.

With these preparations, we can state our main results.

Theorem 1.

Under Conditions 1–3, for all Inline graphic,

Gm2(YX,λ0)G2YX,Gt2(YX,λ0)G2YX

almost surely as Inline graphic. Thus, Inline graphic and Inline graphic are consistent estimators of Inline graphic.

A proof of the theorem and numerical studies of the estimators’ consistency are provided in the Supplementary Material. It is expected that Inline graphic should converge to Inline graphic because of the way it is constructed. It is surprising that Inline graphic also converges to Inline graphic. The result, which links Inline graphic estimation with the likelihood ratio and Bayesian formalism, suggests that most of the information up to the second moment has been fully utilized in the two test statistics. The theorem thus supports the use of Inline graphic and Inline graphic for testing whether Inline graphic and Inline graphic are independent. The null distributions of the two statistics depend on the marginal distributions of Inline graphic and Inline graphic, and can be generated empirically using permutation. One can also perform a quantile-based transformation on Inline graphic and Inline graphic so that their marginal distributions become standard normal; however, the Inline graphic based on the transformed data tends to lose some power.

When Inline graphic and Inline graphic are bivariate normal, the G-squared statistic is almost the same as the R-squared statistic when Inline graphic is large enough.

Theorem 2.

If Inline graphic and Inline graphic follow a bivariate normal distribution, then for Inline graphic large enough,

pr{Gm2(λ0)=R2}>13nλ0/3+5.

So, for Inline graphic and Inline graphic, we have Inline graphic almost surely.

The lower bound on Inline graphic is not tight and can be relaxed in practice. Empirically, we have observed that Inline graphic is large enough for Inline graphic to be very close to Inline graphic in the bivariate normal setting.

2.4. Dynamic programming algorithm for computing Inline graphic and Inline graphic

The brute force calculation of either Inline graphic or Inline graphic has a computational complexity of Inline graphic and is prohibitive in practice. Fortunately, we have found a dynamic programming scheme for computing both quantities with a time complexity of only Inline graphic. The algorithms for computing Inline graphic and Inline graphic are roughly the same except for one operation, namely maximization versus summation, and can be summarized by the following steps.

Step 1.

(Data preparation). Arrange the observed pairs Inline graphic according to the Inline graphic values sorted from low to high. Then normalize Inline graphic such that Inline graphic and Inline graphic.

Step 2.

(Main algorithm). Define Inline graphic as the smallest slice size, Inline graphic and Inline graphic. Initialize three sequences, Inline graphic with Inline graphic and Inline graphic. For Inline graphic, recursively fill in entries of the tables with

Mi=maxkKi(λ+Mk+lk:i),Bi=kKiαBk,Ti=kKiαTkLk:i,

where Inline graphic, Inline graphic and Inline graphic, with Inline graphic being the residual variance of regressing Inline graphic on Inline graphic for observations Inline graphic.

Step 3.

The final result is

Gm2=1exp{Mnλ},Gt2=1(Tn/Bn)2/n.

Here, Inline graphic stores the partial maximized likelihood ratio up to the ordered observation Inline graphic; Inline graphic stores the partial normalizing constant; and Inline graphic stores the partial sum of the likelihood ratios. When Inline graphic is extremely large, we can speed up the algorithm by considering fewer slice schemes. For example, we can divide Inline graphic into chunks of size Inline graphic by rank and consider only slicing schemes between the chunks. For this method, the computational complexity is Inline graphic. We can compute Inline graphic and Inline graphic similarly to get Inline graphic and Inline graphic. Empirically, the algorithm is faster than many other powerful methods, as shown in the Supplementary Material.

2.5. An empirical Bayes strategy for selecting Inline graphic

Although the choice of the penalty parameter Inline graphic is not critical for the general use of Inline graphic, we typically take Inline graphic for Inline graphic and Inline graphic because Inline graphic is equivalent to the Bayesian information criterion. Fine-tuning Inline graphic can improve the estimation of Inline graphic; we therefore propose a data-driven strategy for choosing Inline graphic adaptively. The quantity Inline graphic in (6) can be viewed as an approximation to Inline graphic up to a normalizing constant. Hence we can use the maximum likelihood principle to choose the Inline graphic that maximizes Inline graphic. We then use the chosen Inline graphic to compute Inline graphic and Inline graphic as estimators of Inline graphic. In practice, we evaluate Inline graphic for a finite set of Inline graphic values, such as Inline graphic, and pick the Inline graphic value that maximizes Inline graphic; Inline graphic can be computed efficiently via a dynamic programming algorithm similar to that described in §2.4. As an illustration, we consider the sampling distributions of Inline graphic and Inline graphic with Inline graphic and Inline graphic for the following two scenarios:

Example 1.

Inline graphic and Inline graphic with Inline graphic.

Example 2.

Inline graphic and Inline graphic with Inline graphic.

We simulated Inline graphic data points. For each model, we set Inline graphic so that Inline graphic and performed 1000 replications. Figure 1 shows histograms of Inline graphic and Inline graphic with different Inline graphic values. The results demonstrate that for relationships which can be approximated well by a linear function, a larger Inline graphic is preferred because it penalizes the number of slices more heavily, so that the resulting sampling distributions are less biased. On the other hand, for complicated relationships such as trigonometric functions, a smaller Inline graphic is preferable because it allows more slices, which can help to capture fluctuations in the functional relationship. The figure also shows that the empirical Bayes selection of Inline graphic worked very well, leading to a proper choice of Inline graphic for each simulated dataset from both examples and resulting in the most accurate estimates of Inline graphic. Additional simulation studies and discussion of the consistency of the data-driven strategy can be found in the Supplementary Material.

Fig. 1.

Fig. 1.

Sampling distributions of Inline graphic and Inline graphic under the two models described in §2.5 with Inline graphic for Inline graphic (dashed), 1Inline graphic5 (dotted), 2Inline graphic5 (dot-dash) and 3Inline graphic5 (solid). The density function in each case is estimated by the histogram. The sampling distributions of Inline graphic and Inline graphic with the empirical Bayes selection of Inline graphic are shaded grey and overlaid on top of the other density functions.

3. POWER ANALYSIS

Next, we compare the power of different independence testing methods for various relationships. Here we again fixed Inline graphic for both Inline graphic and Inline graphic. Other methods we tested include the alternating conditional expectation (Breiman & Friedman, 1985), Genest’s test (Genest & Rémillard, 2004), Pearson correlation, distance correlation (Szèkely et al., 2007), the method of Heller et al. (2016), the characteristic function method (Kankainen & Ushakov, 1998), Hoeffding’s test (Hoeffding, 1948), the mutual information method (Kraskov et al., 2004), and two methods, Inline graphic and Inline graphic, based on the maximum information criterion (Reshef et al., 2011). We follow the procedure for computing the powers of different methods as described in Reshef et al. (arXiv:1505.02214) and a 2012 online note by N. Simon and R. J. Tibshirani.

For different functional relationships Inline graphic and different noise levels Inline graphic, we let

XUn(0,1),Y=f(X)+ϵσ,ϵN(0,1),

where Inline graphic. Thus Inline graphic is a monotone function of the signal-to-noise ratio, and it is of interest to observe how the performances of different methods deteriorate as the signal strength weakens for various functional relationships. We used permutation to generate the null distribution and to set the rejection region in all cases.

Figure 2 shows power comparisons for eight functional relationships. We set the sample size to Inline graphic and performed 1000 replications for each relationship and each Inline graphic value. For the sake of clarity, here we plot only Pearson correlation, distance correlation, the method of Heller et al. (2016), Inline graphic, Inline graphic and Inline graphic. For any method with tuning parameters, we chose the parameter values that resulted in the highest average power over all the examples. Due to computational concerns, we chose Inline graphic for the method of Heller et al. (2016). It can be seen that Inline graphic and Inline graphic performed robustly, and were always among the most powerful methods, with Inline graphic being slightly more powerful than Inline graphic in nearly all the examples. They outperformed the other methods in cases such as the high-frequency sine, triangle and piecewise-constant functions, where piecewise-linear approximation is more appropriate than other approaches. For monotonic examples such as linear and radical relationships, Inline graphic and Inline graphic had slightly lower power than Pearson correlation, distance correlation and the method of Heller et al. (2016), but were still highly competitive.

Fig. 2.

Fig. 2.

The powers of Inline graphic (black solid), Inline graphic (grey solid), Pearson correlation (grey circles), distance correlation (black dashed), the method of Heller et al. (2016) (black dotted) and Inline graphic (black circles) for testing independence between Inline graphic and Inline graphic when the underlying true functional relationships are linear, quadratic, cubic, radical, low-frequency sine, triangle, high-frequency sine, and piecewise constant. The horizontal axis represents Inline graphic, a monotone function of the signal-to-noise ratio, and the vertical axis is the power. We chose Inline graphic and performed 1000 replications for each relationship and each Inline graphic value.

We also studied the performances of these methods for Inline graphic, 100 and 400, and found that Inline graphic and Inline graphic still had high power regardless of Inline graphic, although their advantages were much less obvious when Inline graphic was small. More details can be found in the Supplementary Material.

4. DISCUSSION

The proposed G-squared statistic can be viewed as a direct generalization of the R-squared statistic. While maintaining the same interpretability as the R-squared statistic, the G-squared statistic is also a powerful measure of dependence for general relationships. Instead of resorting to curve-fitting methods to estimate the underlying relationship and the G-squared statistic, we employed piecewise-linear approximations with penalties and dynamic programming algorithms. Although we have considered only piecewise-linear functions, one could potentially approximate a relationship between two variables using piecewise polynomials or other flexible basis functions, with perhaps additional penalty terms to control the complexity. Furthermore, it would be worthwhile to generalize the slicing idea to testing dependence between two multivariate random variables.

Supplementary Material

Supplementary Data

ACKNOWLEDGEMENT

We are grateful to the two referees for helpful comments and suggestions. This research was supported in part by the U.S. National Science Foundation and National Institutes of Health. We thank Ashley Wang for her proofreading of the paper. The views expressed herein are the authors’ alone and are not necessarily the views of Two Sigma Investments, Limited Partnership, or any of its affiliates.

SUPPLEMENTARY MATERIAL

Supplementary material available at Biometrika online includes proofs of the theorems, software implementation details, discussions on segmented regression, a study of equitability, and more simulation results.

References

  1. Blyth S. (1994). Local divergence and association. Biometrika 91579–84. [Google Scholar]
  2. Breiman L. & Friedman J. H. (1985). Estimating optimal transformations for multiple regression and correlation. J. Am. Statist. Assoc. 80580–98. [Google Scholar]
  3. Doksum K. Blyth S. Bradlow E. Meng X. & Zhao H. (1994). Correlation curves as local measures of variance explained by regression. J. Am. Statist. Assoc. 89571–82. [Google Scholar]
  4. Genest C. & Rémillard B. (2004). Test of independence and randomness based on the empirical copula process. Test 13335–69. [Google Scholar]
  5. Gretton A. Gousquet O. Smola A. & Schlkopf B. (2005). Measuring statistical dependence with Hilbert–Schmidt norms. Algor. Learn. Theory 373463–77. [Google Scholar]
  6. Gretton A. Borgwardt K. M. Rasch M. J. Schlkopf B. & Smola A. (2012). A kernel two-sample test. J. Mach. Learn. Res. 13723–73. [Google Scholar]
  7. Heller R. Heller Y. Kaufman S. Brill B. & Gorfine M. (2016). Consistent distribution-free $K$-sample and independence tests for univariate random variables. J. Mach. Learn. Res. 171–54. [Google Scholar]
  8. Hoeffding W. (1948). A non-parametric test of independence. Ann. Statist. 19546–57. [Google Scholar]
  9. Huškovà M. & Meintanis S. (2008). Testing procedures based on the empirical characteristic functions I: Goodness-of-fit, testing for symmetry and independence. Tatra Mt. Math. Publ. 39225–33. [Google Scholar]
  10. Kankainen A. & Ushakov N. G. (1998). A consistent modification of a test for independence based on the empirical characteristic function. J. Math. Sci. 891486–94. [Google Scholar]
  11. Kass R. E. & Raftery A. E. (1995). Bayes factors. J. Am. Statist. Assoc. 90773–95. [Google Scholar]
  12. Kraskov A. Stogbauer H. & Grassberger P. (2004). Estimating mutual information. Phys. Rev. E 69.6066138. [DOI] [PubMed] [Google Scholar]
  13. Oosterbaan R. J. & Ritzema H. P. (2006). Drainage Principles and Applications. Wageningen, Netherlands: International Institute for Land Reclamation and Improvement, pp. 217–20. [Google Scholar]
  14. Reshef D. N. Reshef Y. A. Finucane H. K. Grossman S. R. McVean G. Turnbaugh P. J. Lander E. S. Mitzenmacher M. & Sabeti P. S. (2011). Detecting novel associations in large data sets. Science 3341518–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Schwarz G. (1978). Estimating the dimension of a model. Ann. Statist. 6461–4. [Google Scholar]
  16. Sejdinovic D. Sriperumbudur B. Gretton A. & Fukumizu K. (2013). Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Statist. 412263–91. [Google Scholar]
  17. Szèekely G. J. & Rizzo M. L. (2009). Brownian distance correlation. Ann. Appl. Statist. 121236–65. [Google Scholar]
  18. Székely G. J. Rizzo M. L. & Bakirov N. K. (2007). Measuring and testing dependence by correlation of distances. Ann. Statist. 122769–94. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biometrika are provided here courtesy of Oxford University Press

RESOURCES