Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
letter
. 2014 Apr 29;111(21):E2160. doi: 10.1073/pnas.1403623111

R2-equitability is satisfiable

Ben Murrell a,1, Daniel Murrell b, Hugh Murrell c
PMCID: PMC4040619  PMID: 24782547

Kinney and Atwal (1) make excellent points about mutual information, the maximal information coefficient (2, 3), and “equitability.” One of their central claims, however, is that, “No nontrivial dependence measure can satisfy R2-equitability.” We argue that this is the result of a poorly constructed definition, which we quote:

“A dependence measure D[X;Y] is R2-equitable if and only if, when evaluated on a joint probability distribution p(X,Y) that corresponds to a noisy functional relationship between two real random variables X and Y, the following relation holds:

D[X;Y]=g(R2[f(X);Y]).

Here, g is a function that does not depend on p(X,Y) and f is the function defining the noisy functional relationship, i.e.,

Y=f(X)+η,

for some random variable η. The noise term η may depend on f(X) as long as η has no additional dependence on X….”

This definition is undone by the unconventional specification of the noise term. Specifically, allowing η to depend arbitrarily on f(X) lets many different combinations of f and η result in the same p(X,Y). For example, consider f1(X)=X2 and η1=N(0,1), against f2(X)=X and η2=f2(X)+f2(X)2+N(0,1). The resulting p(X,Y) distributions are identical, but R2[f1(X);Y]R2[f2(X);Y]—a consequence of the deterministic trend embedded in η2.

We emphasize the cause of the definitional deficiency (which the authors exploit to demonstrate unsatisfiability) because it suggests an immediate fix: make η trendless. By constraining the expectation E[η|f(X)]=0, the identifiability issue is resolved without limiting expressive power: any trend removed from η can, and should, be included in f(X) instead. Under this formulation, we also see no reason to restrict the dependence of η to f(X) alone; it can depend arbitrarily on X, as long as E[η|X]=0.

Without a trend in η, not only does the resulting definition of R2-equitability escape Kinney and Atwal’s reductio, but it is demonstrably satisfiable. Because E[η|X]=0f(X)=E[Y|X], R2[f(X);Y] is determined by p(X,Y), satisfying the modified definition with g as the identity function. Further, in the large sample limit (for nonpathological functions), f^(X)f(X) is estimable from X, Y, yielding increasingly accurate approximations of R2[f^(X);Y]R2[f(X);Y], suggesting a family of schemes for nonparametric estimation of D[X;Y] that satisfy R2-equitability.

R2-equitable measures of dependence care only about how accurately Y can be predicted—under a quadratic loss function—by X and are thus sensitive to nonlinear transformations of Y and not symmetric (D[X;Y]D[Y;X]), in contrast to any dependence measure satisfying Kinney and Atwal’s self-equitability (1). These two distinct notions of equitability are useful in different circumstances: R2-equitability should be preferred when quantifying how well you can predict an outcome in expectation (measuring your least-squares predictive accuracy), and measures satisfying self-equitability (exemplified by mutual information) may be more appropriate when quantifying how well you can predict Y in probability, being sensitive to how the distribution p(Y|X) varies with X.

Thus, a simple modification of Kinney and Atwal’s definition renders a satisfiable notion of R2-equitability that is usefully distinct from the notion of self-equitability the authors propose (1). Both can coexist.

Supplementary Material

Acknowledgments

B.M. is supported by Center for AIDS Research Translational Virology Core Grant P30 AI036214 and Molecular Epidemiology Avant Garde Grant DP1 DA034978.

Footnotes

The authors declare no conflict of interest.

References

  • 1.Kinney JB, Atwal GS. Equitability, mutual information, and the maximal information coefficient. Proc Natl Acad Sci USA. 2014;111(9):3354–3359. doi: 10.1073/pnas.1309933111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Reshef DN, et al. Detecting novel associations in large data sets. Science. 2011;334(6062):1518–1524. doi: 10.1126/science.1205438. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Reshef DN, Reshef Y, Mitzenmacher M, Sabeti P. 2013. Equitability analysis of the maximal information coefficient with comparisons. arXiv:1301.6314v1 [cs.LG]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES