The trial of the Parkinson’s Disease Research Group reported by Ben-Shlomo et al in this issue (p 1191), which updates the results of a previously curtailed randomised controlled trial,1 raises several methodological issues. The current results relate to an initial three arm trial in which 782 patients with early stage Parkinson’s disease were randomised to treatment with either levodopa alone (arm 1), levodopa and selegiline in combination (arm 2), or bromocriptine (arm 3). The first trial report, based on follow up to December 1991, showed no significant differences at the 5% level between arms 1 and 2 in disability levels, but both arms showed significant improvements over baseline.2 At this stage there were too few deaths to assess differences in mortality. The second report was based on follow up to December 1993, resulting in an average 5.6 years follow up.3 As with the earlier report, there continued to be no significant differences between arms 1 and 2 in terms of disability levels. However, based on 44 deaths in 249 patients in arm 1 and 76 in 271 patients in arm 2, a significant difference in all cause mortality was observed, yielding a hazard ratio of 1.57 (95% confidence interval 1.09 to 2.30) and a P value of 0.015. At this point the trial was terminated, and patients in arm 2 were advised to switch to levodopa alone, but follow up continued. The current analysis, based on more complete follow up to September 1995, when the trial was terminated, shows that on the basis of 73 deaths in arm 1 and 103 in arm 2, the difference in all cause mortality is no longer significant at the 5% level, yielding a hazard ratio of 1.32 (0.98 to 1.79) on an intention to treat analysis.
These results raise several questions, paramount being whether the trial should have been stopped when it was. The essential question is whether the eventual result could have shown a clinical benefit in favour of combination therapy if the trial had been continued. The original sample size calculation assumed a 30% reduction in all cause mortality in favour of combination therapy at 10 years. Such a reduction equates to a hazard ratio of 0.74. The figure of 30% was based on an earlier uncontrolled, retrospective survey,4 and it could be argued that such a minimum clinically worthwhile difference is over-optimistic.
Several approaches for estimating the minimum clinically worthwhile difference exist, including the use of elicitation techniques.5,6 These techniques require that the beliefs and demands of clinicians—both those taking part in the trial and those not—about a possible treatment effect are elicited; they can serve two functions. The first is to ensure that the trial is ethical in terms of equipoise—that is, given trial participants’ beliefs and demands, genuine uncertainty exists about the optimal treatment.7 The second is that the demands may be used to design the trial and monitor it, as they represent the treatment difference required for clinical practice to change. If such an exercise had been conducted before the Parkinson’s disease trial, would the average reduction in mortality demanded have been as high as 30%?
Obviously such an exercise cannot be conducted retrospectively, but it may be instructive to explore the implications of a demand less than 30%. When the trial was stopped the lower 95% confidence limit was 1.09, considerably greater than 0.74, and as such the possibility that continuing the trial would result in a 95% confidence interval that contained 0.74 would be unlikely. However, had a more modest 10% reduction been used, the corresponding hazard ratio would have been 0.90, and the argument for stopping would not be quite so persuasive.
Though such an exercise can be enlightening, it does not provide a quantitative summary of the plausibility of observing a clinically worthwhile difference if the trial continues. The Parkinson’s disease trial did not use a formal stopping guideline to adjust the significance levels for the fact that interim analyses were being performed. Various classical methods have been advocated for such adjustments.8 For example, the trial anticipated 10 annual interim analyses, so a simple adjustment would have been to use 0.01 rather than 0.05 at each analysis so that the overall significance level did not exceed 0.05.
An alternative strategy for assessing the level of evidence at each interim analyses would have been to adopt a Bayesian approach.9 This approach enables the accumulating information obtained at successive interim analyses to be summarised and probability statements to be made about future results. Information can be summarised by a credibility interval, in which a quantity of interest lies within a specified probability.10 The Parkinson’s disease trial was designed to follow up patients over 10 years, during which about 260 deaths could be assumed to occur. Thus at the interim analysis to December 1993,2 when 120 deaths had been observed, it could be predicted that the 95% credibility interval at the end of the trial would be 0.95 to 2.58. While this interval contains neither 0.74 nor 0.90, it does contain unity—that is, it suggests no treatment difference. Analogously, if a similar analysis was performed on the more complete follow up to September 1995 then the corresponding 95% credibility interval would be 0.78 to 2.25, indicating a wide range of plausible outcomes.
Though only statistical aspects of the monitoring of the UK Parkinson’s disease trial have been considered, any decision to stop a trial is a complex one. The decision should not rely solely on statistical arguments, Bayesian or otherwise, but must be placed within a wider context, for example, by taking into account the balance between individual and collective ethics. This should be done by an independent data monitoring committee, who can assess all the available evidence relating to a trial, both internal and external.6,8
Papersp 1191
References
- 1.Ben-Shlomo Y, Churchyard A, Head J, Hurwitz B, Overstall P, Ockelford J, et al. Investigation by Parkinson’s Disease Research Group of the United Kingdom into excess mortality seen with combined levodopa and selegiline in patients with early, mild Parkinson’s disease. BMJ. 1998;316:1191–1196. doi: 10.1136/bmj.316.7139.1191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Parkinson’s Disease Research Group in the United Kingdom. Comparisons of therapeutic effects of levodopa, levodopa and selegiline, and bromocriptine in patients with early, mild Parkinson’s disease: three year interim report. BMJ. 1993;307:469–472. doi: 10.1136/bmj.307.6902.469. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lees AJ.on behalf of the Parkinson’s Disease Research Group of the United Kingdom. Comparison of therapeutic effects and mortality data of levodopa and levodopa combined with selegiline in patients with early, mild Parkinson’s disease BMJ 19953111602–1607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Birkmayer W, Knoll J, Riederer P, Youdim MBH, Hars V, Marton J. Increased life expectancy resulting from addition of l-deprenyl to Madopar treatment in Parkinson’s disease: a long term study. J Neural Transm. 1985;64:113–127. doi: 10.1007/BF01245973. [DOI] [PubMed] [Google Scholar]
- 5.Freedman LS, Spiegelhalter DJ. The assessment of subjective opinion and its use in relation to stopping rules for clinical trials. Statistician. 1983;32:153–160. [Google Scholar]
- 6.Abrams KR, Jones DR. Bayesian interim analysis of randomised trials. BMJ. 1997;314:1911–1912. doi: 10.1016/S0140-6736(05)63909-8. [DOI] [PubMed] [Google Scholar]
- 7.Freedman B. Equipoise and the ethics of clinical research. N Engl J Med. 1987;317:141–145. doi: 10.1056/NEJM198707163170304. [DOI] [PubMed] [Google Scholar]
- 8.Pocock SJ. When to stop a trial. BMJ. 1992;305:235–240. doi: 10.1136/bmj.305.6847.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Fayers PM, Ashby D, Parmar MKB. Tutorial in biostatistics: Bayesian data monitoring in clinical trials. Stat Med. 1997;16:1413–1430. doi: 10.1002/(sici)1097-0258(19970630)16:12<1413::aid-sim578>3.0.co;2-u. [DOI] [PubMed] [Google Scholar]
- 10.Lilford RJ, Braunholtz D. The statistical basis of public policy: a paradigm shift is overdue. BMJ. 1996;313:603–607. doi: 10.1136/bmj.313.7057.603. [DOI] [PMC free article] [PubMed] [Google Scholar]