I read with much interest the article by Wilson (1) on the harmonic mean -value (HMP) for combining statistical significance tests. I congratulate the author on a thorough discussion of this proposal. However, I would like to point out that Good (2) had suggested the HMP already in 1958 (see also refs. 3 and 4 and references therein). Good (2) distinguishes tests in parallel from tests in series, where the latter lead to Fisher’s method while the former lead to the HMP. Both Good (ref. 2, p. 804) and Wilson (section 5 of supporting information for ref. 1) use the theorem of weighted averages of Bayes factors (5) to derive the HMP. Good’s argument is based on his empirical observation that the Bayes factor against the null hypothesis is approximately equal to , where is the -value and is between 3⅓ and 30. The weighted average of these Bayes factors then leads to the weighted HMP. However, this approximation is quite crude since the Bayes factor is not necessarily monotonically related to the -value (section 3 of ref. 6) and can even support the null hypothesis when a -value would lead to its rejection (section 4.4 of ref. 7).
Wilson uses a Beta(, 1) distribution for the -value under the alternative to derive optimal weights for the HMP. This class is suitable for local alternatives and leads to the popular upper bound on the Bayes factor against the null (8). However, the derivation in Wilson (1) requires that all tests have good power, where Bayes factors based on simple alternatives are more appropriate and may give larger Bayes factors than (6) (Fig. 1). Similar results can be obtained for one-sided -values using the pCalibrate R package.
A promising alternative for well-powered studies is the class of Beta(1, ) distributions (section 2.3 of ref. 6). The Bayes factor against the null is then bounded by where . This bound is also shown in Fig. 1 and is always above the bounds for the Bayes factors based on simple alternatives. For the bound can be well approximated by , where is remarkably close to Good’s lower limit 3⅓ for . This bound is a monotone function of the -value and suggests a modification of Good’s argument to justify the HMP: If is an upper bound for the Bayes factor for all -values considered, then
is an upper bound for the model-averaged Bayes factor, where are the prior probabilities of the alternatives. Direct transformation of the HMP with weights to would give the same bound, which shows that the HMP with weights is compatible with an evidential interpretation of -values using bounds on the Bayes factor.
Footnotes
The author declares no conflict of interest.
References
- 1.Wilson DJ. The harmonic mean p-value for combining dependent tests. Proc Natl Acad Sci USA. 2019;116:1195–1200. doi: 10.1073/pnas.1814092116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Good IJ. Significance tests in parallel and in series. J Am Stat Assoc. 1958;53:799–813. [Google Scholar]
- 3.Good IJ. Good Thinking: The Foundations of Probability and Its Applications. Univ of Minnesota Press; Minneapolis: 1983. [Google Scholar]
- 4.Good IJ. The Bayes/non-Bayes compromise: A brief review. J Am Stat Assoc. 1992;87:597–606. [Google Scholar]
- 5.Good IJ. Probability and the Weighing of Evidence. Griffin; London: 1950. [Google Scholar]
- 6.Held L, Ott M. On -values and Bayes factors. Annu Rev Stat Appl. 2018;5:393–419. [Google Scholar]
- 7.Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Clinical Trials and Health-Care Evaluation. Wiley; New York: 2004. [Google Scholar]
- 8.Sellke T, Bayarri MJ, Berger JO. Calibration of values for testing precise null hypotheses. Am Stat. 2001;55:62–71. [Google Scholar]