Editor—Recently, Perneger tried to establish that adjustments for multiple testing are unnecessary.1 However, the main arguments against multiplicity adjustments are based on misunderstanding of and a lack of knowledge about simultaneous statistical inference.
Firstly, Perneger equated multiple test adjustments with Bonferroni corrections. The Bonferroni procedure ignores dependencies among the data and is therefore much too conservative if the number of tests is large.2 Hence, we agree with Perneger that the Bonferroni method should not be routinely used. This is, however, no argument against the use of multiplicity adjustments in general, as there are several alternative multiple test procedures which were totally ignored by Perneger.3
Secondly, Perneger argued that multiple test adjustments are concerned only with the global null hypothesis that all individual null hypotheses are true simultaneously. This is not true. The best multiple test procedures control the multiple level (also called experimentwise error rate in the strong sense), which is the probability of rejecting falsely at least one true individual null hypothesis, irrespective of which and how many of the other individual null hypotheses are true. The control of the multiple level is the best protection against wrong conclusions and leads to the strongest statistical inference.3
Thirdly, Perneger claimed that a multiple test procedure can only lead to the rejection of the global null hypothesis without possibility of concluding which tests are significant and which are not. In fact, the contrary is true. Multiple test procedures were developed with the aim of concluding which tests are significant and which are not, but with control of the appropriate error rate.
Fourthly, Perneger said that Bonferroni adjustments should be made in studies without prespecified hypotheses. As the number of tests in such studies is often large and the Bonferroni procedure has low power, observing this rule would imply that a large number of true effects – if not all – would be overlooked. Moreover, in exploratory studies without prespecified hypotheses there is typically no clear structure in the multiple tests, so an appropriate multiple test adjustment is difficult or even impossible. Hence, we prefer that data of exploratory studies are analysed without multiplicity adjustment. However, “significant” results based on exploratory analyses should be clearly labelled as exploratory results. To confirm these results, the corresponding hypotheses have to be tested in confirmatory studies.
In confirmatory studies with a prespecified goal represented by multiple hypotheses, in which significance tests are used as statistical evaluation tools for final decision making, the use of multiple test procedures is mandatory.4 For this purpose, several multiple test procedures beyond the Bonferroni method have been developed,3–5 and these deserve wider use in biomedical research.
References
- 1.Perneger TV. What’s wrong with Bonferroni adjustments. BMJ. 1998;316:1236–1238. doi: 10.1136/bmj.316.7139.1236. . (18 April.) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bland JM, Altman DG. Multiple significance tests: the Bonferroni method. BMJ. 1995;310:170. doi: 10.1136/bmj.310.6973.170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bauer P. Multiple testing in clinical trials. Stat Med. 1991;10:871–890. doi: 10.1002/sim.4780100609. [DOI] [PubMed] [Google Scholar]
- 4.Sankoh AJ, Huque MF, Dubin N. Some comments on frequently used multiple endpoint adjustment methods in clinical trials. Stat Med. 1997;16:2529–2542. doi: 10.1002/(sici)1097-0258(19971130)16:22<2529::aid-sim692>3.0.co;2-j. [DOI] [PubMed] [Google Scholar]
- 5.Westfall PH, Young SS. Resampling-based multiple testing. New York, NY: Wiley; 1993. [Google Scholar]