Wilson (1) proposes a multiple testing procedure based on the harmonic mean -value (HMP). While this is a potentially useful method, he makes several claims that are not supported by the theory. Herein we identify 4 errors, for clarity described in terms of the version with equal weights , so that .
First, Wilson claims strong familywise error rate (FWER) control for HMP using a closed testing argument. Indeed, if HMP rejects an intersection hypothesis for some , then it also rejects for every , implying that HMP is a closed testing procedure (2). However, from this argument Wilson may only claim that, for each set where HMP is significant, at least one of the genetic variants in has signal. Strong FWER control, that is, the claim that all such genetic variants have signal, may be claimed only for the elementary hypotheses rejected by a closed testing procedure. Rejecting nonsingleton sets, as Wilson does (ref. 1, p. 1198), gives no more than weak FWER control on these sets.
Second, Wilson claims, without proof, that HMP is valid without any dependence assumptions. However, Vovk and Wang (3) showed that the critical value for HMP under general dependence is smaller than , much smaller than from table 1 in ref. 1, proving that Wilson’s HMP loses error guarantees under general dependence. A simulation shows lack of control even under moderate positive dependence. With , simulating standard normals with common correlation , and testing with 1-sided Z-tests, we obtain a type I error of 0.164 for , and still an excessive rate of 0.091 with in table 1 in ref. 1.
Third, Wilson claims (ref. 1, p. 1197) that HMP is more powerful than Bonferroni: When Bonferroni rejects we have for some , so also HMP has when . However, this argument is not fair. Using table 1 in ref. 1, rejection with HMP requires , so HMP cannot be claimed to be more powerful than Bonferroni. Wilson’s argument even reverses for proper strong FWER control: HMP must reject some elementary , which requires , with from table 1 in ref. 1. Because , after leveling the playing field Bonferroni is more powerful than HMP, and, unlike HMP, robust to dependence.
Fourth, Wilson claims (ref. 1, p. 1197) that HMP “produces significant results whenever the Simes-based BH [Benjamini–Hochberg] procedure does, although BH only controls the less stringent FDR [false discovery rate].” Indeed, HMP is smaller than the Simes/BH -value. However, the proper critical value for Simes/BH is , while being for HMP, so the comparison is not fair. If all -values are in , BH rejects all hypotheses, and HMP none. Moreover, BH in fact controls a more stringent criterion than HMP: Both methods control FWER weakly, but BH additionally controls FDR. Finally, unlike HMP, Simes/BH is robust to positively dependent -values (4).
Despite these concerns, we acknowledge that HMP is of interest. If HMP finds multiple significant regions, HMP’s weak control is simultaneous over these regions. HMP could be extended to control false-discovery proportions post hoc (5, 6). It is thus worth exploring further, but only with a realistic assessment of its properties.
Footnotes
The authors declare no competing interest.
References
- 1.Wilson D. J., The harmonic mean p-value for combining dependent tests. Proc. Natl. Acad. Sci. U.S.A. 116, 1195–1200 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Marcus R., Peritz E., Gabriel K., Closed testing procedures with special reference to ordered analysis of variance. Biometrika 63, 655–660 (1976). [Google Scholar]
- 3.Vovk V., Wang R., Combining p-values via averaging. arXiv:1212.4966 (20 December 2012).
- 4.Benjamini Y., Yekutieli D., The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29, 1165–1188 (2001). [Google Scholar]
- 5.Goeman J. J., Solari A., Multiple testing for exploratory research. Stat. Sci. 26, 584–597 (2011). [Google Scholar]
- 6.Rosenblatt J. D., Finos L., Weeda W. D., Solari A., Goeman J. J., All-resolutions inference for brain imaging. Neuroimage 181, 786–796 (2011). [DOI] [PubMed] [Google Scholar]