Skip to main content
The BMJ logoLink to The BMJ
. 1999 May 8;318(7193):1288. doi: 10.1136/bmj.318.7193.1288a

Adjusting for multiple testing in studies is less important than other concerns

Thomas V Perneger 1
PMCID: PMC1115668  PMID: 10231278

Editor—In a paper in the American Journal of Public Health Aickin stated that “there is substantial debate ... concerning when (if ever) adjustment for multiple testing is warranted.”1 I am glad that he has joined the debate in the BMJ over Bonferroni adjustments but find his arguments unconvincing.2 Yes, “researchers who adjust P values almost always present them for their individual hypotheses,” as he says. This is why they should not worry about unrelated tests and renounce a statistical technique that focuses on the largely irrelevant universal null hypothesis.

Statistical tests were developed for repeated testing, such as industrial quality control. The α and β error rates are valid in the long run, as asymptotic averages. Hence multiple testing is no violation of test theory, it is a prerequisite. Industrialists know that over time they will reject a proportion of good lots (α) and market a proportion of bad lots (β) in error; researchers know that in their career they will reject a proportion of true null hypotheses (α) and miss a proportion of true alternative hypotheses (β). These proportions do not vary with the number of tests.

Dredging data is a smart and cost effective way of doing research. “Dredge your data and [do] not tell” are Aickin’s words; my advice was to “describe what was done.”3 Besides, Bonferroni adjustments make no distinction between data dredging and multiple planned tests.

What tests to include in the adjustment is a serious issue since it will determine whether a result will be “significant” or not. Aickin does not seem to have an answer; nor does anyone else. Pushing a reasoning to the point of absurdity is a rhetorical device to show that the reasoning does not hold.

The only advantage of Holm adjustments over Bonferroni adjustments is that they inflate β errors less, but the procedure is complex.1 I doubt that Holm-adjusted P values are understandable to anyone but a few statisticians. As for the American Journal of Public Health—its instructions for authors in the December 1998 issue include no statement about multiple test adjustments, and the three first articles contain respectively 70, 110, and 114 tests, in tables only, without any multiple test adjustment.

I am baffled that “whether a given study should be statistically analysed at all” should concern anyone. We should worry whether to do the study in the first place and, when it is done, how best to interpret the data. Multiple test adjustments help with neither of these.

References

  • 1.Aickin M, Gensler H. Adjusting for multiple testing when reporting research results: the Bonferroni vs Holm methods. Am J Public Health. 1996;86:726–728. doi: 10.2105/ajph.86.5.726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Aickin M. Other method for adjustment of multiple testing exists. BMJ. 1999;318:127. doi: 10.1136/bmj.318.7176.127a. . (9 January.) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Perneger TV. What’s wrong with Bonferroni adjustments. BMJ. 1998;316:1236–1238. doi: 10.1136/bmj.316.7139.1236. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from BMJ : British Medical Journal are provided here courtesy of BMJ Publishing Group

RESOURCES