Kinney and Atwal (1) make excellent points about mutual information, the maximal information coefficient (2, 3), and “equitability.” One of their central claims, however, is that, “No nontrivial dependence measure can satisfy -equitability.” We argue that this is the result of a poorly constructed definition, which we quote:
“A dependence measure is -equitable if and only if, when evaluated on a joint probability distribution that corresponds to a noisy functional relationship between two real random variables X and Y, the following relation holds:
Here, g is a function that does not depend on and f is the function defining the noisy functional relationship, i.e.,
for some random variable η. The noise term η may depend on as long as η has no additional dependence on X….”
This definition is undone by the unconventional specification of the noise term. Specifically, allowing η to depend arbitrarily on lets many different combinations of f and η result in the same . For example, consider and , against and . The resulting distributions are identical, but —a consequence of the deterministic trend embedded in .
We emphasize the cause of the definitional deficiency (which the authors exploit to demonstrate unsatisfiability) because it suggests an immediate fix: make η trendless. By constraining the expectation , the identifiability issue is resolved without limiting expressive power: any trend removed from η can, and should, be included in instead. Under this formulation, we also see no reason to restrict the dependence of η to alone; it can depend arbitrarily on X, as long as .
Without a trend in η, not only does the resulting definition of -equitability escape Kinney and Atwal’s reductio, but it is demonstrably satisfiable. Because , is determined by , satisfying the modified definition with g as the identity function. Further, in the large sample limit (for nonpathological functions), is estimable from X, Y, yielding increasingly accurate approximations of , suggesting a family of schemes for nonparametric estimation of that satisfy -equitability.
-equitable measures of dependence care only about how accurately Y can be predicted—under a quadratic loss function—by X and are thus sensitive to nonlinear transformations of Y and not symmetric , in contrast to any dependence measure satisfying Kinney and Atwal’s self-equitability (1). These two distinct notions of equitability are useful in different circumstances: -equitability should be preferred when quantifying how well you can predict an outcome in expectation (measuring your least-squares predictive accuracy), and measures satisfying self-equitability (exemplified by mutual information) may be more appropriate when quantifying how well you can predict Y in probability, being sensitive to how the distribution varies with X.
Thus, a simple modification of Kinney and Atwal’s definition renders a satisfiable notion of -equitability that is usefully distinct from the notion of self-equitability the authors propose (1). Both can coexist.
Supplementary Material
Acknowledgments
B.M. is supported by Center for AIDS Research Translational Virology Core Grant P30 AI036214 and Molecular Epidemiology Avant Garde Grant DP1 DA034978.
Footnotes
The authors declare no conflict of interest.
References
- 1.Kinney JB, Atwal GS. Equitability, mutual information, and the maximal information coefficient. Proc Natl Acad Sci USA. 2014;111(9):3354–3359. doi: 10.1073/pnas.1309933111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Reshef DN, et al. Detecting novel associations in large data sets. Science. 2011;334(6062):1518–1524. doi: 10.1126/science.1205438. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Reshef DN, Reshef Y, Mitzenmacher M, Sabeti P. 2013. Equitability analysis of the maximal information coefficient with comparisons. arXiv:1301.6314v1 [cs.LG]