Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2018 Apr 1.
Published in final edited form as: Ann Oncol. 2017 Oct 1;28(10):2327–2330. doi: 10.1093/annonc/mdx410

‘Thursday’s child has far to go’—interpreting subgroups and the STAMPEDE trial

M R Spears 1, N D James 2, M R Sydes 1,*
PMCID: PMC5777583  EMSID: EMS75685  PMID: 28961849

STAMPEDE is a multi-arm multi-stage randomised controlled trial protocol, recruiting men with locally advanced or metastatic prostate cancer who are commencing long-term androgen deprivation therapy. Opening to recruitment with five research questions in 2005 and adding in a further five questions over the past 6 years, it has reported survival data on 6 of these 10 RCT questions over the past 2 years [13]. Some of these results have been of practice-changing magnitude [4, 5], but, in conversation, we have noticed some misinterpretation, both over-interpretation and under-interpretation, of subgroup analyses by the wider clinical community which could impact negatively on practice. We suspect, therefore, that such problems in interpretation may be common. Our intention here is to provide comment on interpretation of subgroup analysis in general using examples from STAMPEDE. Specifically, we would like to highlight some possible implications from the misinterpretation of subgroups and how these might be avoided, particularly where these contravene the very design of the research question. In this, we would hope to contribute to the conversation on subgroup analyses [611].

For each comparison in STAMPEDE, or indeed virtually any trial, the interest is the effect of the research treatment under investigation on the primary outcome measure across the whole population. Upon reporting the primary outcome measure, the consistency of effect across pre-specified subgroups, including stratification factors at randomisation, is presented; these are planned analyses.

The forest plot is a valuable tool in displaying and assessing treatment effects in subgroups. Forest plots, first used in 1978, were initially used for illustration (in meta-analyses) of the treatment effect within studies, offering a sense of consistency of effect across studies [12, 13]. The lack of consistency, or heterogeneity, is formally calculated and taken into consideration when interpreting the pooled estimate. Forest plots traditionally offer two distinct vertical lines for reference: no effect [e.g. hazard ratio (HR) = 1.00] and the estimated, overall treatment effect. The line of no effect is an important consideration in assessing the overall treatment effect: the confidence interval for any one of the trials may cross this line if it is underpowered or it observed no effect on the outcome of interest, and, for each well-powered study on the forest plot, the confidence intervals should be fairly narrow. This line of no effect also helps in the interpretation of the overall, pooled effect. The second line presents the pooled effect and helps readers to consider how each individual study looks compared with this overall effect.

Illustration of treatment effect across subgroups within one trial is a more recent use of forest plots, and in this scenario the emphasis must undoubtedly be placed on the consistency of treatment effect, not the individual effect within each subgroup. Unlike the individual trials in a meta-analysis of trials, each subgroup within one trial, usually, will not be well-powered at the time of reporting the overall effect and the confidence intervals will, typically, be wide as a result. Therefore, any assessment based solely on whether confidence intervals for treatment effect within a subgroup cross the line of no effect, are unhelpful and potentially harmful. This is particularly the case where an interaction has been tested for and there is no evidence to suggest there is indeed any difference in the overall treatment effect.

From the STAMPEDE perspective, we consider this to be an important distinction, specifically in relation to the translation in to practice of trial results found to show overwhelming benefit in the entire, eligible trial population. Most notably this has been most apparent in relation to the subgroup of metastatic status at randomisation. Patients with metastatic disease (M1) at randomisation have substantially poorer prognosis and therefore, sadly, contribute the higher proportion of events sooner; conversely those patients with non-metastatic disease (M0) at randomisation live longer and as such, subgroup analyses on survival at the time of first reporting can always be expected to appear relatively immature. For reported survival results relating to both the ‘docetaxel comparison’ and the ‘abiraterone comparison’, we observed no evidence of inconsistency in the treatment effect across both subgroups of metastatic status at randomisation, with a compelling overall effect and the HR in each group pulling in the direction favouring the research treatment over the standard-of-care; there was no evidence of a lack-of-consistency by metastatic status for either additional treatment in survival nor failure-free survival [1, 3].

The focus of commenters on both the ‘docetaxel comparison’ and the ‘abiraterone comparison’ from STAMPEDE has most often been on the beneficial effect in the M1 subgroup, despite the underlying design and results being positive for the broader population.

Figure 1 shows three example subgroup analyses from the ‘abiraterone comparison’ in STAMPEDE on overall survival, one already published [3] and two deliberately trawled for. The first section of the forest plot shows the subgroup analysis by metastatic status at randomisation. The interaction P-value of 0.37 shows no good evidence of heterogeneity of treatment effect across these subgroups. There is more evidence at this time in the M1 setting so the confidence intervals are narrower than for M0 but one should take the overall effect from the trial; the point estimate in the M0 setting (HR = 0.75) is exactly that targeted by the protocol. However, many in the clinical community, and some commissioners in the case of docetaxel, have only focused on the data from the M1 subgroup.

Figure 1. Forest plot from STAMPEDE data showing the effect on survival of adding abiraterone to SOC, within subgroups.

Figure 1

The second part of Figure 1 shows the impact on survival of abiraterone based on day of birth. The confidence intervals are broad because the patients are split, fairly evenly, into 7 groups. Like with metastatic status, there is no good evidence of heterogeneity (P = 0.33). There is no reasonable clinical hypothesis to underpin a different outcome by day of birth. Therefore, the fact that the point estimates vary by weekday of birth must be by chance. Yet, some of the confidence intervals include the null line and some do not. Under reducto ad absurdum, people uncertain about the impact of abiraterone in M0 patients based on the first part of the graph should also be uncertain about the addition of abiraterone in patients who were born during the latter part of the week (Thursday to Sunday); after all, their confidence intervals also all include the null line, e.g. Thursday’s HR = 0.69 with 95% CI 0.42–1.15.

However, we should also be cautious not to over-interpret subgroups. The final part of Figure 1 is based on weekday of diagnosis, where the point estimate of abiraterone is less favourable for people diagnosed on a Monday than those on other days with striking evidence of heterogeneity (P = 0.021). There is no reasonable clinical rationale in a multi-centre multinational clinical trial for this, and even with apparent statistical evidence, this must be a chance finding (note: it is a chance finding—we trawled through implausible subgroups before deliberately selecting this one).

Estimation of treatment effect within specific subgroups can of course be desirable, and stratification is the crucial step in moving forward towards ‘personalised’ medicine.

In summary, it is important to consider the reasons behind low event rates and whether these are independent of a treatment effect. For this example, the M0 subgroup is prognostically favourable and does better regardless of adding in the research treatment. However, this does not equate to there being no treatment effect; the evidence here is consistent across both populations. There are clearly circumstances in which over-interpretation of subgroups can be detrimental and access to treatments is arguably one of these. Moving forward we would provide the following recommendations when interpreting trial results:

  • Focus should be placed on the design of the trial; what is the primary hypothesis being tested?

  • When considering any difference in treatment effect across subgroups the primary assessment should be relative to the direction of overall effect and whether the subgroup effect contests this, i.e. is there consistency across subgroups?

  • Context can help: consider whether the treatment effect is consistent across other trial end points.

  • Where there is inconsistency the clinical plausibility of this should be clearly considered.

Context is particularly helpful for the ‘abiraterone comparison’ in STAMPEDE. The impact on failure-free survival, the intermediate primary outcome measure for the trial, is very positive for adding abiraterone across the board and positive in each of the subgroups by baseline metastatic status (Figure 2). There is some evidence of heterogeneity of the treatment effect in these groups (P = 0.085) which is more favourable in the M0 setting than in the M1 setting. Commentators who take that there is an impact on survival in M1 disease but not M0 must also conclude that there is a larger impact on FFS in the M0 setting.

Figure 2. Scatter plot of the treatment effect of abiraterone on survival and failure-free survival by baseline metastatic status.

Figure 2

In conclusion, readers must protect themselves equally from over-interpretation and under-interpretation of subgroup effects. The onus on interpretation of results subgroup analyses lies equally with the journal and reviewers who see such details before publication and have a role to play in shaping the message of the publication.

Supplementary Material

Open Access

Funding

Cancer Research UK (20942); Medical Research Council (MC_UU_12023/25).

Disclosure

MRSp reports grants and non-financial support from Janssen, during the conduct of the study; grants and non-financial support from Astellas, grants and non-financial support from Clovis Oncology, grants and non-financial support from Novartis, grants and non-financial support from Pfizer, grants and non-financial support from Sanofi-Aventis, outside the submitted work.

NDJ reports grants, personal fees and non-financial support from Janssen, during the conduct of the study; grants, personal fees and non-financial support from Astellas, grants and non-financial support from Clovis Oncology, grants, personal fees and non-financial support from Novartis, grants, personal fees and non-financial support from Pfizer, grants, personal fees and non-financial support from Sanofi-Aventis, outside the submitted work.

MRSy reports grants and non-financial support from Janssen, during the conduct of this comparison within STAMPEDE grants and non-financial support from Astellas, grants and non-financial support from Clovis Oncology, grants and non-financial support from Novartis, grants and non-financial support from Pfizer, grants and non-financial support from Sanofi-Aventis, outside this comparison but within STAMPEDE.

References

  • 1.James ND, Sydes MR, Clarke NW, et al. Addition of docetaxel, zole-dronic acid, or both to first-line long-term hormone therapy in prostate cancer (STAMPEDE): survival results from an adaptive, multiarm, multistage, platform randomised controlled trial. Lancet. 2016;387:1163–1177. doi: 10.1016/S0140-6736(15)01037-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Mason MD, Clarke NW, James ND, et al. Adding celecoxib with or without zoledronic acid for hormone-naïve prostate cancer: long-term survival results from an adaptive, multiarm, multistage, platform, randomized controlled trial. J Clin Oncol. 2017;35:1530–1541. doi: 10.1200/JCO.2016.69.0677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.James ND, de Bono JS, Spears MR, et al. Abiraterone for prostate cancer not previously treated with hormone therapy. N Engl J Med. 2017;377:338–351. doi: 10.1056/NEJMoa1702900. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.NHS England. Clinical Commissioning Policy Statement: Docetaxel in combination with androgen deprivation therapy for the treatment of hormone naÿve metastatic prostate cancer. NHS England; 2016. [7 July 2017, date last accessed]. 3019. [Google Scholar]
  • 5.NICE. Hormone-sensitive metastatic prostate cancer: docetaxel, 2016 Hormone-sensitive metastatic prostate cancer: docetaxel. NICE; 2016. [7 July 2017, date last accessed]. [Google Scholar]
  • 6.Fletcher J. Subgroup analyses: how to avoid being misled. Br Med J. 2007;335:96. doi: 10.1136/bmj.39265.596262.AD. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Naggara O, Raymond J, Guilbert F, Altman DG. The problem of subgroup analyses: an example from a trial on ruptured intracranial aneurysms. Am J Neuroradiol. 2011;32:633–636. doi: 10.3174/ajnr.A2442. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Yusuf S, Wittes J, Probstfield J, Tyroler HA. Analysis and interpretation of treatment effects in subgroups of patients in randomised clinical trials. JAMA. 1991;266:93–98. [PubMed] [Google Scholar]
  • 9.Pocock SJ, Hughes MD, Lee RJ. Statistical problems in the reporting of clinical trials: a survey of three medical journals. N Engl J Med. 1987;317:426–432. doi: 10.1056/NEJM198708133170706. [DOI] [PubMed] [Google Scholar]
  • 10.Oxman AD, Guyatt GH. A consumer’s guide to subgroup analyses. Ann Intern Med. 1992;116:78–84. doi: 10.7326/0003-4819-116-1-78. [DOI] [PubMed] [Google Scholar]
  • 11.Schulz KF, Grimes DA. Multiplicity in randomised trials. II. Subgroup and interim analyses. Lancet. 2005;365:1657–1661. doi: 10.1016/S0140-6736(05)66516-6. [DOI] [PubMed] [Google Scholar]
  • 12.Freiman JA, Chalmers TC, Smith H, Kuebler RR. The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial. N Engl J Med. 1978;299:690–694. doi: 10.1056/NEJM197809282991304. [DOI] [PubMed] [Google Scholar]
  • 13.Lewis S, Clarke M. Forest plots: trying to see the wood and the trees. Br Med J. 2001;322:1479–1480. doi: 10.1136/bmj.322.7300.1479. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Open Access

RESOURCES