Abstract
Multi-arm, parallel-group clinical trials are an efficient way of testing several new treatments, treatment regimens or doses. However, guidance on the requirement for statistical adjustment to control for multiple comparisons (type I error) using a shared control group is unclear. We argue, based on current evidence, that adjustment is not always necessary in such situations. We propose that adjustment should not be a requirement in multi-arm, parallel-group trials testing distinct treatments and sharing a control group, and we call for clearer guidance from stakeholders, such as regulators and scientific journals, on the appropriate settings for adjustment of multiplicity.
Keywords: Multi-arm, parallel-group clinical trials; Multiplicity; Type 1 error; Family-wise error rate (FWER)
Multiplicity is a major consideration in the analysis of clinical trials. It occurs when multiple significance tests are carried out, increasing the family-wise error rate (FWER), the probability of a “false positive” statistically significant result or type 1 error. Multiplicity can arise for various reasons including use of multiple outcomes, repeated measures, interim analyses, multiple sub-groups, factorial designs and, the focus of this viewpoint, multi-arm clinical trials. It is widely accepted that control of multiplicity is essential in many situations, for example, early stopping of a trial based on interim analyses, or where more than one hypothesis is being tested within a family of hypotheses [1,2]. If not handled correctly, unsubstantiated claims for the effectiveness of a drug may be made due to an inflated rate of false positive conclusions, and this is especially important in confirmatory trials for licensed medications [1,3]. Various methods of control have been developed including hierarchical procedures, the Bonferroni Method and Dunnett's test [4]. However, if such adjustments are applied unnecessarily, potentially effective treatments may be discarded prematurely [5].
Multi-arm trial designs are valuable in clinical research. They allow a number of new treatments, or varying treatment regimens/doses, to be tested within a single trial, increasing efficiency and reducing costs associated with conducting several independent trials. A three-arm trial, for instance, reduces the sample size that would be required for two independent trials by 25% due to efficient sharing of the control group. Some 20% of superiority trials registered in 2010–2012 had more than two groups [6]. However, there appears no consensus, across stakeholders such as regulators and scientific journals, on the necessity to control for a potentially inflated type 1 error rate when comparing distinct treatments to a shared control group in confirmatory parallel-group multi-arm trials [7] and this has become the subject of much recent debate among statisticians and trialists [5,[8], [9], [10], [11]]. Many guidelines on multiplicity do not refer specifically to multi-arm trials [[1], [2], [3]], resulting in inconsistencies in the application of adjustment methods in published articles. A 2014 review found that 49% of published multi-arm trials reported using a multiple-testing adjustment, with adjustment more common in trials evaluating multiple doses or regimens of the same treatments (67%), but surprisingly there was little evidence of difference in adjustment between exploratory and confirmatory trials [10].
There is a general consensus that for multi-arm exploratory trials stringent multiple-testing adjustment is not required [10] as doing so may drop potentially effective treatments too early in the assessment process. Conversely, many authors agree with current guidance from the FDA and EMA that for confirmatory trials where arms represent several doses or regimens of the same treatment, adjustments for multiplicity should be applied [5,[12], [13], [14], [15]]. However, the literature is unclear on the necessity of adjustment in confirmatory trials where the different arms represent distinct treatments which are compared with a shared control [10]. A number of authors argue that adjustment is not always necessary, particularly where the results are not combined into one final conclusion and decision [10,[16], [17], [18]], with Parker et al. arguing that non-adjustment should be the default starting point in such situations [15]. By contrast, guidance from the New England Journal of Medicine requires adjustment in this scenario, even for exploratory analyses, though no rationale is given for applying the same rules to all types of multiplicity [2].
In many cases where adjustment for multiple testing is required, the tests are correlated: for example, when tests of multiple outcomes in the same trial are correlated. Correlation also arises between the multiple tests in a multi-arm trial when they share a common control group. This correlation has practical implications: it makes Dunnett's test the best way to control for multiplicity, it reduces the impact of multiplicity on the FWER [5,19,20], and it increases the probability of making multiple errors given that at least one error is made [11]. However, the correlation does not inform whether multiplicity should be controlled [5].
The key issue in determining whether to control for multiplicity is whether multiple tests are conceptually related: that is, how separate are the scientific questions or the claims to be made? If an issue of a journal publishes multiple clinical trials relating to different medical areas, then the overall type 1 error rate is increased, but no-one would suggest controlling for multiplicity. If multiple doses of the same drug are tested, on the other hand, a claim of efficacy of the drug could be made if any one dose shows benefit, so multiplicity should be controlled: this is true whether the doses are tested in the same trial or in separate trials. Similarly, if multiple treatments are investigated for the purpose of a single regulatory submission then this may be a reason to control the FWER [21]. If drugs with different mechanisms of action are evaluated in the same trial, we believe that control for multiplicity is not required, just as if they were evaluated in separate trials. Hence the “family” over which FWER should be controlled is usually a treatment, and the difficult question is whether closely related treatments should be included in the same family: for example, drugs of the same class, or similar multi-drug regimens [21].
An alternative way to account for multiple tests is to control the false discovery rate (FDR), the expected proportion of rejected null hypotheses that are actually true. The FDR is similar to the FWER in studies with small numbers of experimental treatments (e.g. up to three), but is less stringent than the FWER with larger numbers of experimental treatments (e.g. five or more) [22]. Controlling the FDR may be done using Benjamini–Hochberg procedures [22,23]. When multiple drugs are successful, the FDR has the advantage that it represents the expected proportion of inefficacious drugs among the successful drugs. Wason et al. recommend that sponsors and trialists consider use of the FDR for multi-arm trials testing distinct treatment arms [22], while others suggest the FDR as an appropriate control measure in the context of trials with a large number of treatment arms [24]. However, trials with many treatments may be less likely to have distinct treatments.
We have focussed on the parallel-group multi-arm trial design, but various perspectives on the need for multiplicity adjustments in more complex designs such as basket, umbrella and platform trials are also being debated [14,[24], [25], [26], [27], [28]]. Such new and adaptive designs are increasingly important, especially in the case of COVID-19 where adaptive, cost-effective and rapid trials are critical [24]. However, as discussed by Collignon et al. [25], these complex designs have raised challenging statistical questions around the need for control of multiple testing when adding arms or drugs over time.
In conclusion, increasing trial complexity makes addressing multiplicity more complex but also more important. Clearer guidance for trialists from stakeholders, such as regulators and scientific journals, on the appropriate settings for adjustment of multiplicity is required. We agree with others that the need to adjust or not should be well justified based on the complexity of design and the specific setting and objectives of each trial [15,21] and that control of the FDR should be considered for trials testing a large number of treatments [22]. We propose that, for simple parallel-group multi-arm trials of distinct treatments with a shared control, adjustment should not be a requirement. However, further clarity is needed to define what are distinct treatments [21,25].
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
IW and AN were supported by the Medical Research Council [grant numbers MC_UU_00004/04 and MC_UU_00004/07]. RH receives funding from the UK Medical Research Council (MRC) and the UK Foreign, Commonwealth and Development Office (FCDO) under the MRC/FCDO Concordat agreement and is also part of the EDCTP2 programme supported by the European Union. Grant Ref: MR/R010161/1.
References
- 1.Food and Drug Administration Multiple Endpoints in Clinical Trials Guidance for Industry 2017. 11 August 2020. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/multiple-endpoints-clinical-trials-guidance-industry Available from.
- 2.Harrington D., D’Agostino R.B., Gatsonis C., Hogan J.W., Hunter D.J., Normand S.-L.T., et al. New guidelines for statistical reporting in the journal. N. Engl. J. Med. 2019;381(3):285–286. doi: 10.1056/NEJMe1906559. [DOI] [PubMed] [Google Scholar]
- 3.European Medicines Agency Guideline on Multiplicity Issues in Clinical Trials 2017. 11 August 2020. https://www.ema.europa.eu/en/documents/scientific-guideline/draft-guideline-multiplicity-issues-clinical-trials_en.pdf Available from.
- 4.Food and Drug Administration Adaptive Designs for Clinical Trials of Drugs and Biologics: Guidance for Industry 2019. 11 August 2020. https://www.fda.gov/media/78495/download Available from.
- 5.Howard D.R., Brown J.M., Todd S., Gregory W.M. Recommendations on multiple testing adjustment in multi-arm trials with a shared control group. Stat. Methods Med. Res. 2018;27(5):1513–1530. doi: 10.1177/0962280216664759. [DOI] [PubMed] [Google Scholar]
- 6.Parmar M.K., Carpenter J., Sydes M.R. More multiarm randomised trials of superiority are needed. Lancet. 2014;384(9940):283–284. doi: 10.1016/S0140-6736(14)61122-3. [DOI] [PubMed] [Google Scholar]
- 7.Baron G., Perrodeau E., Boutron I., Ravaud P. Reporting of analyses from randomized controlled trials with multiple arms: a systematic review. BMC Med. 2013;11(1):84. doi: 10.1186/1741-7015-11-84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Wason J., Magirr D., Law M., Jaki T. Some recommendations for multi-arm multi-stage trials. Stat. Methods Med. Res. 2016;25(2):716–727. doi: 10.1177/0962280212465498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Collignon O., Burman C.F., Posch M., Schiel A. Collaborative platform trials to fight COVID-19: methodological and regulatory considerations for a better societal outcome. Clin. Pharmacol. Ther. 2021;110(2):311–320. doi: 10.1002/cpt.2183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Wason J.M.S., Stecher L., Mander A.P. Correcting for multiple-testing in multi-arm trials: is it necessary and is it done? Trials. 2014;15(1):364. doi: 10.1186/1745-6215-15-364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Proschan M.A., Follmann D.A. Multiple comparisons with control in a single experiment versus separate experiments: why do we feel differently? Am. Stat. 1995;49(2):144–149. [Google Scholar]
- 12.Bender R., Lange S. Adjusting for multiple testing—when and how? J. Clin. Epidemiol. 2001;54(4):343–349. doi: 10.1016/s0895-4356(00)00314-0. [DOI] [PubMed] [Google Scholar]
- 13.Wason J.M.S., Jaki T., Stallard N. Planning multi-arm screening studies within the context of a drug development program. Stat. Med. 2013;32(20):3424–3435. doi: 10.1002/sim.5787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Stallard N., Todd S., Parashar D., Kimani P.K., Renfro L.A. On the need to adjust for multiplicity in confirmatory clinical trials with master protocols. Ann. Oncol. 2019;30(4):506–509. doi: 10.1093/annonc/mdz038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Parker R.A., Weir C.J. Non-adjustment for multiple testing in multi-arm trials of distinct treatments: rationale and justification. Clin. Trials. 2020;17(5):562–566. doi: 10.1177/1740774520941419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Rothman K.J. No adjustments are needed for multiple comparisons. Epidemiol. (Cambridge, Mass). 1990;1(1):43–46. [PubMed] [Google Scholar]
- 17.Freidlin B., Korn E.L., Gray R., Martin A. Multi-arm clinical trials of new agents: some design considerations. Clin. Cancer Res. 2008;14(14):4368–4371. doi: 10.1158/1078-0432.CCR-08-0325. [DOI] [PubMed] [Google Scholar]
- 18.Proschan M.A., Waclawiw M.A. Practical guidelines for multiplicity adjustment in clinical trials. Control. Clin. Trials. 2000;21(6):527–539. doi: 10.1016/s0197-2456(00)00106-9. [DOI] [PubMed] [Google Scholar]
- 19.Fernandes N., Stone A. Multiplicity adjustments in trials with two correlated comparisons of interest. Stat. Methods Med. Res. 2011;20(6):579–594. doi: 10.1177/0962280210378943. [DOI] [PubMed] [Google Scholar]
- 20.Dunnett C.W. New tables for multiple comparisons with a control. Biometrics. 1964;20(3):482–491. [Google Scholar]
- 21.Bretz F., Koenig F. Commentary on Parker and Weir. Clin. Trials. 2020;17(5):567–569. doi: 10.1177/1740774520941420. [DOI] [PubMed] [Google Scholar]
- 22.Wason J.M.S., Robertson D.S. Controlling type I error rates in multi-arm clinical trials: a case for the false discovery rate. Pharm. Stat. 2021;20(1):109–116. doi: 10.1002/pst.2059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Benjamini Y., Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 1995;57(1):289–300. [Google Scholar]
- 24.Collignon O., Burman C.-F., Posch M., Schiel A. Collaborative platform trials to fight COVID-19: methodological and regulatory considerations for a better societal outcome. Clin. Pharmacol. Ther. 2021;110(2):311–320. doi: 10.1002/cpt.2183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Collignon O., Gartner C., Haidich A.-B., James Hemmings R., Hofner B., Pétavy F., et al. Current statistical considerations and regulatory perspectives on the planning of confirmatory basket, umbrella, and platform trials. Clin. Pharmacol. Therap. 2020;107(5):1059–1067. doi: 10.1002/cpt.1804. [DOI] [PubMed] [Google Scholar]
- 26.Berry S.M. Potential statistical issues between designers and regulators in confirmatory basket, umbrella, and platform trials. Clin. Pharmacol. Ther. 2020;108(3):444–446. doi: 10.1002/cpt.1908. [DOI] [PubMed] [Google Scholar]
- 27.Lu C.C., Li X.N., Broglio K., Bycott P., Jiang Q., Li X., et al. Practical considerations and recommendations for master protocol framework: basket, umbrella and platform trials. Therap. Innov. Regul. Sci. 2021;55(6):1145–1154. doi: 10.1007/s43441-021-00315-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Bai X., Deng Q., Liu D. Multiplicity issues for platform trials with a shared control arm. J. Biopharm. Stat. 2020;30(6):1077–1090. doi: 10.1080/10543406.2020.1821703. [DOI] [PubMed] [Google Scholar]
