Key Points
Question
Among phase 3 oncology trials, how often do subgroup analyses support claims of differential treatment effects?
Findings
In this cross-sectional study of 379 published phase 3 randomized clinical trials with subgroup analyses, which enrolled 331 653 participants, most claims for differential treatment effects were rated as low or very low credibility according to the Instrument for Assessing the Credibility of Effect Modification Analyses.
Meaning
In this study, the differential treatment effect claims of most phase 3 randomized clinical trials in oncology were not well-supported.
This cross-sectional study evaluates the credibility of differential treatment effect claims in published phase 3 randomized clinical oncology trials published prior to 2021 according to the Instrument for Assessing the Credibility of Effect Modification Analyses.
Abstract
Importance
Subgroup analyses are often performed in oncology to investigate differential treatment effects and may even constitute the basis for regulatory approvals. Current understanding of the features, results, and quality of subgroup analyses is limited.
Objective
To evaluate forest plot interpretability and credibility of differential treatment effect claims among oncology trials.
Design, Setting, and Participants
This cross-sectional study included randomized phase 3 clinical oncology trials published prior to 2021. Trials were screened from ClinicalTrials.gov.
Main Outcomes and Measures
Missing visual elements in forest plots were defined as a missing point estimate or use of a linear x-axis scale for hazard and odds ratios. Multiplicity of testing control was recorded. Differential treatment effect claims were rated using the Instrument for Assessing the Credibility of Effect Modification Analyses. Linear and logistic regressions evaluated associations with outcomes.
Results
Among 785 trials, 379 studies (48%) enrolling 331 653 patients reported a subgroup analysis. The forest plots of 43% of trials (156 of 363) were missing visual elements impeding interpretability. While 4148 subgroup effects were evaluated, only 1 trial (0.3%) controlled for multiple testing. On average, trials that did not meet the primary end point conducted 2 more subgroup effect tests compared with trials meeting the primary end point (95% CI, 0.59-3.43 tests; P = .006). A total of 101 differential treatment effects were claimed across 15% of trials (55 of 379). Interaction testing was missing in 53% of trials (29 of 55) claiming differential treatment effects. Trials not meeting the primary end point were associated with greater odds of no interaction testing (odds ratio, 4.47; 95% CI, 1.42-15.55, P = .01). The credibility of differential treatment effect claims was rated as low or very low in 93% of cases (94 of 101).
Conclusions and Relevance
In this cross-sectional study of phase 3 oncology trials, nearly half of trials presented a subgroup analysis in their primary publication. However, forest plots of these subgroup analyses largely lacked essential features for interpretation, and most differential treatment effect claims were not supported. Oncology subgroup analyses should be interpreted with caution, and improvements to the quality of subgroup analyses are needed.
Introduction
Phase 3 randomized clinical trials in oncology primarily compare the overall outcomes of 2 or more groups of patients.1 Acknowledging heterogeneity in enrolled patient characteristics, phase 3 trials often include additional comparisons to evaluate for differential treatment effects (DTEs) within patient subgroups.2 The results of such subgroup analyses suggesting DTEs have been used as the basis for regulatory approvals, even when the subgroup findings are in conflict with the overall trial results.3
However, despite their popularity and potential impact, subgroup analyses may be beset by multiple challenges.4,5,6,7 Phase 3 trials are usually not powered for subgroup analyses, which increases the risk of type II error; furthermore, repeated inferential testing examining multiple effects increases the risk of type I error.8,9 Forest plots presenting subgroup analyses at American Society of Clinical Oncology (ASCO) annual meetings appear to be uninformative and inconclusive in most cases.10 When DTEs are suspected based on visual inspection of the forest plot, interaction testing is recommended for further evaluation.11,12,13,14,15 However, interaction testing is often missing from subgroup analyses, and the alternative focus on P values within each subgroup is less informative and further inflates the risk of type I error.6,16 A landmark empirical evaluation of medical trials17 published in 2007 found that most claims of DTEs were not well-supported, highlighting the importance, problems, and consequences of DTE claims in subgroup analyses.
While multiple consensus statements offer guidelines on mitigating the challenges of subgroup analysis, the quality of modern published subgroup analyses is poorly understood.13,18,19,20,21 Because of the particular importance placed on subgroup analyses in oncology for regulatory approvals, a large-scale empirical evaluation of the published contemporary phase 3 oncology literature was undertaken to provide a current understanding of DTE claim credibility in oncology.3 The quality of DTE claims among phase 3 trials published between 2004 and 2020 was evaluated using the Instrument for Assessing the Credibility of Effect Modification Analyses (ICEMAN).21
Methods
We performed a cross-sectional analysis of phase 3 randomized clinical trials in oncology. Trials were identified from the ClinicalTrials.gov registry without study date limitations using an advanced query through February 2020 with the search terms cancer; phase, phase 3; study results, with results; and status, excluded: not yet recruiting, as previously reported.22 The subgroup analyses of the article reporting the formal results of the primary end point for the trial were evaluated. This study adhered to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guidelines.23 Per the Common Rule, institutional review board approval was not required as the data were publicly available.
Multiplicity control in subgroup analyses by any established method was recorded, as larger numbers of subgroup analyses, even if prespecified, increase the risk of type I error.13,19 Credibility was rated for each DTE claim according to the ICEMAN guidelines.21 A DTE was defined as visually apparent based on crude inspection of the forest plot by the Cuzick method: if subgroup confidence intervals do not overlap with the point estimate of the main effect, DTEs may be present and merit further investigation (Figure 1).10,24 Forest plot missing visual elements (MVEs) were defined as either (1) utilization of a linear x-axis scale, as opposed to a logarithmic x-axis scale, for hazard ratios (HRs) or odds ratios (ORs) or (2) lack of a main effect present on the plot (Figure 1A).10 Plotting HRs dispersed linearly over the x-axis as in Figure 1A distorts the effect size appearance in favor of the control arm. Instead, HR values should be dispersed logarithmically in the x-axis space as in Figure 1B to preserve symmetry above and below 1. Furthermore, without the main effect present on the graph as shown in Figure 1A, interpretation of the subgroup analysis is substantially limited. To facilitate interpretation, it is recommended to include the main effect and to draw a line through the point estimate HRs of the main effect, so that overlap with subgroups can be easily visualized, as shown in Figure 1B. It is clear, in this example, that the 95% CI of subgroup D does not overlap with the point estimate of the main effect, suggesting potential DTEs by the Cuzick method. Although the 95% CI of subgroup B overlaps with 1, because the interval crosses the point estimate of the main effect, the visual interpretation for subgroup B suggests no evidence of DTEs. Subgroup B should also not be interpreted as lacking benefit of the experimental therapy. Finally, negative portions of the x-axis, as in Figure 1A, should not be present as negative HR are not possible.
Trial-level variables including eligibility for inclusion were manually curated into a standardized database with predefined variables by 4 trained reviewers (A.D.S., J.A.J., R.K., and T.A.L.) with data adjudication by the senior author (E.B.L.); MVEs and DTEs were evaluated by a single reviewer (A.D.S.).25 Trial-level variables included disease stage (ie, solid tumor metastatic, solid tumor nonmetastatic, or hematologic), disease site defined by the histology of the primary tumor, and the treatment being tested in the trial. Cooperative groups were defined as national or international nonprofit organizations typically receiving funding from governmental agencies (eg, the National Cancer Institute).26 Industry was defined as for-profit companies (pharmaceutical, biotechnology, and medical device companies). Because the statistical expertise available to industry trials may vary based on the size of the sponsor as well as other implications such as regulatory approvals, the industry sponsor was further subcharacterized as a larger vs smaller company.27 As there is no standardized definition, this was approximated by defining larger companies as those with an estimated annual revenue greater than US $1 billion, according to the most recent financial reports in the public domain. The primary end point of the trial was defined as met if the prespecified primary end point was statistically achieved in the article.28
Statistical Analysis
Descriptive statistics were used to summarize variables including frequency for categorical variables. Continuous variables were summarized by the median and IQR.29 Trends over time were examined by ordinary least squares linear regression, in which the slope (m) of the regression represented the rate of change. The association of trial-level variables and the number of subgroup tests was evaluated with linear regression. Univariable and multivariable logistic regressions were used to evaluate associations of trial-level factors and binary outcomes. Structural causal models were created for each factor on directed acyclic graphs using DAGitty to identify confounder variables (Figure 2).30,31,32 Effect directionality for each variable pair was drawn according to the most plausible causal pathway based on the investigators’ understanding of trial-level covariates. Each variable was sequentially selected as the factor of interest to identify confounder variables (eTable in Supplement 1). A multivariable logistic regression model for each variable, adjusting for confounders unique to that factor, examined the association of the variable of interest with MVEs using adjusted ORs (aORs). All tests were 2-sided. Significance (α) was set at .05, and confidence intervals were calculated at 95%. Missing data were not encountered. Analyses were performed using SAS version 9 (SAS Institute). The illustrative forest plots were created in R version 4.3.2 using forestploter (R Project for Statistical Computing). Other plots were created using Prism version 9 (GraphPad).
Results
Based on a search of ClinicalTrials.gov, 785 published phase 3 interventional randomized trials were identified (eFigure in Supplement 1). Of these, 379 trials (48%) enrolling 331 653 patients reported the findings of a subgroup analysis and were eligible for study inclusion (eAppendix in Supplement 1). Trial publication dates ranged from 2004 to 2020.
The median number of subgroup factors analyzed per trial was 8 (IQR, 6-10; range, 1-29). A median of 1 outcome (eg, progression-free survival) was evaluated in subgroup analysis per trial (IQR, 1-2; range, 1-8). A total of 4148 subgroup effects were statistically evaluated, with a median of 9 subgroup effects per trial (IQR, 7-13; range, 1-58). One trial (0.3%) accounted for multiplicity of testing (using any method).33 On average, trials that did not meet the primary end point seemed to perform 2 more subgroup effect analyses than trials that met the primary end point (95% CI, 0.59-3.43 analyses; P = .006). Interaction testing was reported in 17% of trials (65 of 379).
Trial authors claimed DTEs in 15% of trials (55 of 379). Among the 55 trials claiming DTE, 101 total DTEs were claimed, appearing in the abstract in 29 instances (29%), in the results in all cases, in the discussion in 87 instances (86%), and in the conclusion in 39 cases (39%). DTE claims included the phrase “statistically significant… effect modification” in 4 cases (4%). ICEMAN ratings were mostly low credibility (69 [68%]) and very low credibility (25 [25%]), with only 7 DTE claims (7%) rated as moderately credible. Zero DTE claims were rated as highly credible. In only 5 of 101 instances (5%), interaction testing suggested that chance may not explain the apparent effect modification (interaction P value range, ≤.01 to >.005) or was an unlikely explanation (interaction P ≤ .005). In 78 DTE claims (77%), more than 10 effect modifiers were tested without regard to multiplicity (or the analysis was explicitly exploratory). Limited or indirect prior evidence, defined as retrospective evidence, nonsignificant effect modification in a prior randomized clinical trial, or a different patient population, supported the DTE claim in 40 cases (40%). Strong prior evidence, defined as a significant effect modification in a related randomized clinical trial, supported the DTE claim in 9 instances (9%). The direction of the effect modification was hypothesized a priori in 1 case (protocol was available) or probably hypothesized a priori in 6 cases (no protocol available).34,35,36,37,38,39 In only 2 cases, arbitrary cut points for continuous variables were definitely or probably avoided.40,41 Among trials claiming DTEs, only 27 had visually apparent DTEs suggested by the forest plot; moreover, the main effect was missing from the forest plot in 21% of these trials (11 of 52).
Interaction tests were missing in 53% (29 of 55) of trials claiming DTE. Unadjusted regression demonstrated that industry funded trials (OR, 6.00; 95% CI, 1.32-42.88; P = .03) were associated with higher odds of DTE claims lacking interaction testing; this association did not appear to be related to the size of the industry sponsor (larger vs smaller company: OR, 0.72; 95% CI, 0.09-4.16; P = .72). Trials that did not meet their primary end point were also more likely to claim DTE without interaction testing (OR, 4.47; 95% CI, 1.42-15.55, P = .01). There were no trial-level factors associated with greater credibility of DTE claims (moderate credibility vs low or very credibility, defined as the most credible DTE claim per trial).
Of 363 trials that included forest plots, 156 trials (43%) had forest plots with MVEs (Figure 1). Specifically, in 35% of trials (126 of 363), the x-axis scale representing HRs or ORs was linear rather than logarithmic. In 17% of trials (61 of 363), the main effect was not displayed in the forest plot. Over time, there appeared to be a significant decline in the rate of MVEs in forest plots (m = −3.47; 95% CI, −5.28 to −1.66; P = .001) (Figure 3). The rate of MVEs decreased from 62% (26 of 42 in 2007-2010) to 43% (50 of 117 in 2011-2014) and lastly to 37% (73 of 196 in 2015-2019).
Unadjusted associations between trial-level characteristics and forest plot MVEs are shown in Table 1, and adjusted associations are shown in Table 2. More recent publication year for the primary end point was associated with lower odds of MVE on multivariable analysis (publication year as a continuous factor: aOR, 0.89; 95% CI, 0.83-0.95; P < .001).
Table 1. Characteristics of Phase 3 Randomized Clinical Trials in Oncology and Forest Plot MVEs.
Characteristic | Trials, No. | Forest Plot MVE, No. (%) | Unadjusted OR (95% CI)a | P value | |
---|---|---|---|---|---|
Yes | No | ||||
Total trials | 363 | 156 (43) | 207 (57) | NA | NA |
Disease stage | |||||
Solid M0 | 67 | 31 (46) | 36 (54) | 1.05 (0.52-2.12) | .89 |
Solid M1 | 236 | 98 (42) | 138 (58) | 0.87 (0.49-1.54) | .63 |
Hematologic | 60 | 27 (45) | 33 (55) | 1 [Reference] | NA |
Disease site | |||||
Breast | 71 | 35 (49) | 36 (51) | 1.42 (0.71-2.87) | .33 |
Gastrointestinal | 62 | 24 (39) | 38 (61) | 0.92 (0.44-1.91) | .82 |
Genitourinary | 43 | 25 (56) | 19 (44) | 1.84 (0.84-4.12) | .13 |
Hematologic | 60 | 27 (45) | 33 (55) | 1.19 (0.58-2.48) | .63 |
Thoracic | 68 | 22 (32) | 46 (68) | 0.70 (0.34-1.44) | .33 |
Otherb | 59 | 24 (41) | 35 (59) | 1 [Reference] | NA |
Treatment type | |||||
Systemic therapy | 350 | 151 (43) | 199 (57) | 1 [Reference] | NA |
Supportive care | 8 | 3 (38) | 5 (62) | 0.79 (0.16-3.27) | .75 |
Local therapy | 5 | 2 (40) | 3 (60) | 0.88 (0.12-5.36) | .89 |
Cooperative study | |||||
Yes | 62 | 36 (58) | 26 (42) | 2.09 (1.20-3.67) | .009 |
No | 301 | 120 (40) | 181 (60) | 1 [Reference] | NA |
Industry funded | |||||
Yes | 328 | 134 (41) | 194 (59) | 0.41 (0.19-0.83) | .01 |
No | 35 | 22 (63) | 13 (37) | 1 [Reference] | NA |
Size of industry sponsor | |||||
Larger | 295 | 119 (40) | 176 (60) | 0.81 (0.39-1.69) | .57 |
Smaller | 33 | 15 (45) | 18 (55) | 1 [Reference] | NA |
Enrollment | NA | NA | NA | 1.00 (1.00-1.00) | .49 |
Publication datec | NA | NA | NA | 0.89 (0.83-0.95) | <.001 |
Abbreviations: M0, nonmetastatic; M1, metastatic; MVE, missing visual element; NA, not applicable; OR, odds ratio.
Associations were evaluated by univariable binary logistic regression.
Includes central nervous system, skin, endocrine, gynecologic, head and neck, and sarcoma.
Publication date is analyzed as a continuous variable, and the findings are shown in Figure 3.
Table 2. Multivariable Binary Logistic Regression Models Examining the Association of Trial-Level Factors With Missing Visual Elements in Forest Plotsa.
Characteristic | aOR (95% CI) | P value |
---|---|---|
Disease stage | ||
Solid M0 | 0.89 (0.38-2.08) | .79 |
Solid M1 | 0.82 (0.39-1.72) | .60 |
Hematologic | 1 [Reference] | NA |
Disease siteb | ||
Breast | 1.42 (0.71-2.87) | .33 |
Gastrointestinal | 0.92 (0.44-1.91) | .82 |
Genitourinary | 1.84 (0.84-4.12) | .13 |
Hematologic | 1.19 (0.58-2.48) | .63 |
Thoracic | 0.70 (0.34-1.44) | .33 |
Otherc | 1 [Reference] | NA |
Treatment type | ||
Systemic therapy | 1 [Reference] | NA |
Supportive care | 0.68 (0.13-2.90) | .61 |
Local therapy | 0.40 (0.05-2.72) | .35 |
Cooperative study | ||
Yes | 1.67 (0.81-3.47) | .16 |
No | 1 [Reference] | NA |
Industry funded | ||
Yes | 0.60 (0.24-1.49) | .27 |
No | 1 [Reference] | NA |
Size of industry sponsor | ||
Larger | 0.81 (0.39-1.69) | .57 |
Smaller | 1 [Reference] | NA |
Enrollment | 1.00 (1.00-1.00) | .89 |
Publication date | 0.89 (0.83-0.95) | <.001 |
Abbreviations: aOR, adjusted odds ratio; M0, non-metastatic; M1, metastatic; NA, not applicable.
Each trial-level factor was evaluated in a structural causal model depicted on a directed acyclic graph to identify confounders unique to its association with missing visual elements (see Figure 2 for an example). For each multivariable model, only the effect estimate of the primary factor of interest is reported. The effect estimates of confounders are not reported here, as they are not representative of their adjusted association with missing visual elements.
The structural causal model for disease site showed no confounding factors for the association with missing visual elements; therefore, the adjusted estimate for the association of disease site with missing visual elements is the same as the unadjusted model.
Includes central nervous system, skin, endocrine, gynecologic, head and neck, and sarcoma.
Discussion
In this cross-sectional study, nearly half of phase 3 oncology trials presented the results of subgroup analyses in their primary article. Despite a large number of subgroup effects examined per trial, less than 1% of trials controlled for type I error rate. Forest plots largely lacked essential visual features needed for interpretation, and most claims for DTEs did not appear to be highly or moderately credible. Improvement to the conduct, quality, and interpretation of subgroup analyses in future phase 3 oncology trials is needed, and the subgroup analyses of current trials should be interpreted with caution.
This study has considerable implications for the interpretation of current phase 3 oncology research and future trials. Most claims of DTEs were scored as having low or very low credibility according to the ICEMAN guidelines.21 The consequences of DTE claims in prominent phase 3 trials can be considerable. In past cases, regulatory approvals have been granted to entire study populations based on positive subgroup findings, and in other cases, approvals have been limited just to subgroups despite positive results in the entire study population.3 Subgroup analyses may also drive the investment of substantial resources toward new trials, which may be particularly problematic if these are largely based on DTE claims with low credibility, in which there is increased risk of spurious results. Most subgroup analyses are at best hypothesis generating (except under highly specific conditions).3,5,42,43 An example is the ARTIST trial,44 which found no difference in disease-free survival between adjuvant chemotherapy and chemoradiotherapy for patients with gastric cancer with a D2 margin-negative resection but did suggest a possible benefit in patients with lymph node–positive disease based on a subgroup P value (without control for multiplicity of testing or use of interaction testing). The subsequent ARTIST 2 trial45 ultimately found no efficacy differences between chemoradiotherapy and chemotherapy among patients with lymph node–positive disease. This example, among others, illustrates why prudence and caution are needed in subgroup analysis interpretation. As suggested by the ICEMAN framework, interpretation of DTEs should be based on a variety of considerations, including strength of the interaction test, whether the directionality of the result was hypothesized a priori, the plausibility of the DTE in light of the underlying biology, the strength of the prior literature supporting the DTE, and the number of effects assessed, among other factors.21 Innovative approaches to subgroup analysis, such as using bayesian models or effect score analyses, may improve subgroup analyses in future phase 3 trials.46,47,48 For these and other reasons, subgroup analyses should not have a strong influence on overall trial interpretation.
While other studies have found that approximately one-quarter to one-half of clinical trials correct for multiple testing in some capacity (such as for the primary or secondary end points), the present study demonstrated a lack of this practice for subgroup analysis.49,50,51 Only 1 of 379 trials33 controlled for type I error in subgroup analyses, even though numerous effects were often analyzed per trial. An important caveat to this finding is that some trials may have conducted subgroup analyses to check for the overall consistency of effects within the sample rather than specifically evaluate for DTEs. Interestingly, in this study, trials that did not meet the primary end point seemed to conduct more subgroup effect tests than trials meeting the primary end point. Trialists should consider limiting the number of subgroup effects tested to those that are prespecified with directionality, biologically plausible, and supported by prior evidence, or if a larger number of effects are explored, controlling for type I error risk.13,19,42,52,53,54,55
While subgroup analyses were prevalent, most subgroup analyses in this study were not associated with DTE claims. This finding was similar to a prior study10 evaluating forest plots presented at ASCO annual meetings. Another notable observation of the current study was the discrepancy between authors’ claims of DTE and the use of interaction testing. This discrepancy appeared to be particularly evident in trials that did not meet their primary end point. Interaction testing is a crucial component of subgroup analysis when DTE is suspected and is preferred over subgroup P values.4,5,9,12,13,56 Two smaller studies16,57 similarly found low rates of interaction testing in subgroup analyses of metastatic solid tumor trials. Finally, accurate visual interpretation of subgroup analyses requires the presence of standard features in a forest plot.10 In this study, the forest plots of nearly half of trials lacked standard features needed for interpretation. The overall effect must be present on the graph to enable comparisons of the subgroups to the main effect for visual DTE interpretation (Figure 1). Furthermore, if a subgroup analysis entails ratios such as HRs or ORs, a logarithmic x-axis scale, rather than a linear x-axis scale, is needed to present effects symmetrically. Figure 1A shows a linear x-axis scale, and Figure 1B shows a corrected logarithmic scale. Encouragingly, the rate of MVEs seemed to decrease over time. This finding suggests the potential effectiveness of recent reporting guidelines, peer review, and editorial processes in mitigating visualization issues.6,19,58
Limitations
There are several important limitations to this study. Trials were identified from ClinicalTrials.gov, and so our findings may not be generalizable to global trials. While ICEMAN offers a generalizable framework for evaluating DTE, ICEMAN credibility ratings were scored by a single reviewer. Multivariable adjustment was not performed in several analyses; their findings should be interpreted cautiously.
Conclusions
In summary, despite finding that nearly half of phase 3 oncology trials use subgroup analyses, less than 1% of studies controlled for type I error rate, most claims of DTE do not appear to be supported, and many forest plots lack essential visual features. Critical improvements to the quality of subgroup analyses in future phase 3 oncology trials are necessary, and current subgroup analyses should be interpreted cautiously.
References
- 1.Lin TA, Sherry AD, Ludmir EB. Challenges, complexities, and considerations in the design and interpretation of late-phase oncology trials. Semin Radiat Oncol. 2023;33(4):429-437. doi: 10.1016/j.semradonc.2023.06.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Msaouel P, Lee J, Thall PF. Interpreting randomized controlled trials. Cancers (Basel). 2023;15(19):4674. doi: 10.3390/cancers15194674 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Amatya AK, Fiero MH, Bloomquist EW, et al. Subgroup analyses in oncology trials: regulatory considerations and case examples. Clin Cancer Res. 2021;27(21):5753-5756. doi: 10.1158/1078-0432.CCR-20-4912 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Assmann SF, Pocock SJ, Enos LE, Kasten LE. Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet. 2000;355(9209):1064-1069. doi: 10.1016/S0140-6736(00)02039-0 [DOI] [PubMed] [Google Scholar]
- 5.Pocock SJ, Assmann SE, Enos LE, Kasten LE. Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems. Stat Med. 2002;21(19):2917-2930. doi: 10.1002/sim.1296 [DOI] [PubMed] [Google Scholar]
- 6.Altman DG. Clinical trials: subgroup analyses in randomized trials–more rigour needed. Nat Rev Clin Oncol. 2015;12(9):506-507. doi: 10.1038/nrclinonc.2015.133 [DOI] [PubMed] [Google Scholar]
- 7.Msaouel P. The big data paradox in clinical practice. Cancer Invest. 2022;40(7):567-576. doi: 10.1080/07357907.2022.2084621 [DOI] [PubMed] [Google Scholar]
- 8.Greenland S, Senn SJ, Rothman KJ, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31(4):337-350. doi: 10.1007/s10654-016-0149-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Brookes ST, Whitley E, Peters TJ, Mulheran PA, Egger M, Davey Smith G. Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives. Health Technol Assess. 2001;5(33):1-56. doi: 10.3310/hta5330 [DOI] [PubMed] [Google Scholar]
- 10.Hahn AW, Dizman N, Msaouel P. Missing the trees for the forest: most subgroup analyses using forest plots at the ASCO annual meeting are inconclusive. Ther Adv Med Oncol. Published online June 1, 2022;14. doi: 10.1177/17588359221103199 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Pocock S. Clinical Trials: A Practical Approach. Wiley; 2013. doi: 10.1002/9781118793916 [DOI] [Google Scholar]
- 12.White IR, Elbourne D. Assessing subgroup effects with binary data: can the use of different effect measures lead to different conclusions? BMC Med Res Methodol. 2005;5:15. doi: 10.1186/1471-2288-5-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wang R, Lagakos SW, Ware JH, Hunter DJ, Drazen JM. Statistics in medicine—reporting of subgroup analyses in clinical trials. N Engl J Med. 2007;357(21):2189-2194. doi: 10.1056/NEJMsr077003 [DOI] [PubMed] [Google Scholar]
- 14.Simon R. Patient subsets and variation in therapeutic efficacy. Br J Clin Pharmacol. 1982;14(4):473-482. doi: 10.1111/j.1365-2125.1982.tb02015.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Gail M, Simon R. Testing for qualitative interactions between treatment effects and patient subsets. Biometrics. 1985;41(2):361-372. doi: 10.2307/2530862 [DOI] [PubMed] [Google Scholar]
- 16.Zhang S, Liang F, Li W, Hu X. Subgroup analyses in reporting of phase iii clinical trials in solid tumors. J Clin Oncol. 2015;33(15):1697-1702. doi: 10.1200/JCO.2014.59.8862 [DOI] [PubMed] [Google Scholar]
- 17.Sun X, Briel M, Busse JW, et al. Credibility of claims of subgroup effects in randomised controlled trials: systematic review. BMJ. 2012;344:e1553. doi: 10.1136/bmj.e1553 [DOI] [PubMed] [Google Scholar]
- 18.Sun X, Ioannidis JP, Agoritsas T, Alba AC, Guyatt G. How to use a subgroup analysis: users’ guide to the medical literature. JAMA. 2014;311(4):405-411. doi: 10.1001/jama.2013.285063 [DOI] [PubMed] [Google Scholar]
- 19.Harrington D, D’Agostino RB Sr, Gatsonis C, et al. New guidelines for statistical reporting in the journal. N Engl J Med. 2019;381(3):285-286. doi: 10.1056/NEJMe1906559 [DOI] [PubMed] [Google Scholar]
- 20.Schandelmaier S, Chang Y, Devasenapathy N, et al. A systematic survey identified 36 criteria for assessing effect modification claims in randomized trials or meta-analyses. J Clin Epidemiol. 2019;113:159-167. doi: 10.1016/j.jclinepi.2019.05.014 [DOI] [PubMed] [Google Scholar]
- 21.Schandelmaier S, Briel M, Varadhan R, et al. Development of the Instrument to Assess the Credibility of Effect Modification Analyses (ICEMAN) in randomized controlled trials and meta-analyses. CMAJ. 2020;192(32):E901-E906. doi: 10.1503/cmaj.200077 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Sherry AD, Corrigan KL, Kouzy R, et al. Prevalence, trends, and characteristics of trials investigating local therapy in contemporary phase 3 clinical cancer research. Cancer. 2023;129(21):3430-3438. doi: 10.1002/cncr.34929 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP; STROBE Initiative . The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet. 2007;370(9596):1453-1457. doi: 10.1016/S0140-6736(07)61602-X [DOI] [PubMed] [Google Scholar]
- 24.Cuzick J. Forest plots and the interpretation of subgroups. Lancet. 2005;365(9467):1308. doi: 10.1016/S0140-6736(05)61026-4 [DOI] [PubMed] [Google Scholar]
- 25.Waffenschmidt S, Knelangen M, Sieben W, Bühn S, Pieper D. Single screening versus conventional double screening for study selection in systematic reviews: a methodological systematic review. BMC Med Res Methodol. 2019;19(1):132. doi: 10.1186/s12874-019-0782-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Bertagnolli MM, Blanke CD, Curran WJ, et al. What happened to the US cancer cooperative groups? a status update ten years after the Institute of Medicine report. Cancer. 2020;126(23):5022-5029. doi: 10.1002/cncr.33209 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Abi Jaoude J, Kouzy R, Ghabach M, et al. Food and Drug Administration approvals in phase 3 cancer clinical trials. BMC Cancer. 2021;21(1):695. doi: 10.1186/s12885-021-08457-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Florez MA, Jaoude JA, Patel RR, et al. Incidence of primary end point changes among active cancer phase 3 randomized clinical trials. JAMA Netw Open. 2023;6(5):e2313819. doi: 10.1001/jamanetworkopen.2023.13819 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Vickers AJ, Assel MJ, Sjoberg DD, et al. Guidelines for reporting of figures and tables for clinical research in urology. Urology. 2020;142:1-13. doi: 10.1016/j.urology.2020.05.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Shapiro DD, Msaouel P. Causal diagram techniques for urologic oncology research. Clin Genitourin Cancer. 2021;19(3):271.e1-271.e7. doi: 10.1016/j.clgc.2020.08.003 [DOI] [PubMed] [Google Scholar]
- 31.Msaouel P, Lee J, Karam JA, Thall PF. A causal framework for making individualized treatment decisions in oncology. Cancers (Basel). 2022;14(16):3923. doi: 10.3390/cancers14163923 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Textor J, van der Zander B, Gilthorpe MS, Liskiewicz M, Ellison GT. Robust causal inference using directed acyclic graphs: the R package ‘dagitty’. Int J Epidemiol. 2016;45(6):1887-1894. [DOI] [PubMed] [Google Scholar]
- 33.Gillison ML, Trotti AM, Harris J, et al. Radiotherapy plus cetuximab or cisplatin in human papillomavirus-positive oropharyngeal cancer (NRG Oncology RTOG 1016): a randomised, multicentre, non-inferiority trial. Lancet. 2019;393(10166):40-50. doi: 10.1016/S0140-6736(18)32779-X [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Ferris RL, Blumenschein G Jr, Fayette J, et al. Nivolumab for recurrent squamous-cell carcinoma of the head and neck. N Engl J Med. 2016;375(19):1856-1867. doi: 10.1056/NEJMoa1602252 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Zapatero A, Guerrero A, Maldonado X, et al. High-dose radiotherapy with short-term or long-term androgen deprivation in localised prostate cancer (DART01/05 GICOR): a randomised, controlled, phase 3 trial. Lancet Oncol. 2015;16(3):320-327. doi: 10.1016/S1470-2045(15)70045-8 [DOI] [PubMed] [Google Scholar]
- 36.Butts C, Socinski MA, Mitchell PL, et al. ; START trial team . Tecemotide (L-BLP25) versus placebo after chemoradiotherapy for stage III non-small-cell lung cancer (START): a randomised, double-blind, phase 3 trial. Lancet Oncol. 2014;15(1):59-68. doi: 10.1016/S1470-2045(13)70510-2 [DOI] [PubMed] [Google Scholar]
- 37.Di Leo A, Johnston S, Lee KS, et al. Buparlisib plus fulvestrant in postmenopausal women with hormone-receptor-positive, HER2-negative, advanced breast cancer progressing on or after mTOR inhibition (BELLE-3): a randomised, double-blind, placebo-controlled, phase 3 trial. Lancet Oncol. 2018;19(1):87-100. doi: 10.1016/S1470-2045(17)30688-5 [DOI] [PubMed] [Google Scholar]
- 38.Aebi S, Gelber S, Anderson SJ, et al. ; CALOR investigators . Chemotherapy for isolated locoregional recurrence of breast cancer (CALOR): a randomised trial. Lancet Oncol. 2014;15(2):156-163. doi: 10.1016/S1470-2045(13)70589-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Ang KK, Zhang Q, Rosenthal DI, et al. Randomized phase III trial of concurrent accelerated radiation plus cisplatin with or without cetuximab for stage III to IV head and neck carcinoma: RTOG 0522. J Clin Oncol. 2014;32(27):2940-2950. doi: 10.1200/JCO.2013.53.5633 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Zhu AX, Park JO, Ryoo BY, et al. ; REACH Trial Investigators . Ramucirumab versus placebo as second-line treatment in patients with advanced hepatocellular carcinoma following first-line therapy with sorafenib (REACH): a randomised, double-blind, multicentre, phase 3 trial. Lancet Oncol. 2015;16(7):859-870. doi: 10.1016/S1470-2045(15)00050-9 [DOI] [PubMed] [Google Scholar]
- 41.Chamie K, Donin NM, Klöpfer P, et al. Adjuvant weekly girentuximab following nephrectomy for high-risk renal cell carcinoma: the ARISER Randomized Clinical Trial. JAMA Oncol. 2017;3(7):913-920. doi: 10.1001/jamaoncol.2016.4419 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Wang X, Piantadosi S, Le-Rademacher J, Mandrekar SJ. Statistical considerations for subgroup analyses. J Thorac Oncol. 2021;16(3):375-380. doi: 10.1016/j.jtho.2020.12.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Moyé LA, Deswal A. Trials within trials: confirmatory subgroup analyses in controlled clinical experiments. Control Clin Trials. 2001;22(6):605-619. doi: 10.1016/S0197-2456(01)00180-5 [DOI] [PubMed] [Google Scholar]
- 44.Lee J, Lim DH, Kim S, et al. Phase III trial comparing capecitabine plus cisplatin versus capecitabine plus cisplatin with concurrent capecitabine radiotherapy in completely resected gastric cancer with D2 lymph node dissection: the ARTIST trial. J Clin Oncol. 2012;30(3):268-273. doi: 10.1200/JCO.2011.39.1953 [DOI] [PubMed] [Google Scholar]
- 45.Park SH, Lim DH, Sohn TS, et al. ; ARTIST 2 investigators . A randomized phase III trial comparing adjuvant single-agent S1, S-1 with oxaliplatin, and postoperative chemoradiation with S-1 and oxaliplatin in patients with node-positive gastric cancer after D2 resection: the ARTIST 2 trial. Ann Oncol. 2021;32(3):368-374. doi: 10.1016/j.annonc.2020.11.017 [DOI] [PubMed] [Google Scholar]
- 46.Jones HE, Ohlssen DI, Neuenschwander B, Racine A, Branson M. Bayesian models for subgroup analysis in clinical trials. Clin Trials. 2011;8(2):129-143. doi: 10.1177/1740774510396933 [DOI] [PubMed] [Google Scholar]
- 47.Dahabreh IJ, Kazi DS. Toward personalizing care: assessing heterogeneity of treatment effects in randomized trials. JAMA. 2023;329(13):1063-1065. doi: 10.1001/jama.2023.3576 [DOI] [PubMed] [Google Scholar]
- 48.Goligher EC, Lawler PR, Jensen TP, et al. ; REMAP-CAP, ATTACC, and ACTIV-4a Investigators . Heterogeneous treatment effects of therapeutic-dose heparin in patients hospitalized for COVID-19. JAMA. 2023;329(13):1066-1077. doi: 10.1001/jama.2023.3651 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Wason JM, Stecher L, Mander AP. Correcting for multiple-testing in multi-arm trials: is it necessary and is it done? Trials. 2014;15:364. doi: 10.1186/1745-6215-15-364 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Khan MS, Khan MS, Ansari ZN, et al. Prevalence of multiplicity and appropriate adjustments among cardiovascular randomized clinical trials published in major medical journals. JAMA Netw Open. 2020;3(4):e203082. doi: 10.1001/jamanetworkopen.2020.3082 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Pike K, Reeves BC, Rogers CA. Approaches to multiplicity in publicly funded pragmatic randomised controlled trials: a survey of clinical trials units and a rapid review of published trials. BMC Med Res Methodol. 2022;22(1):39. doi: 10.1186/s12874-022-01525-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Haslam A, Gill J, Crain T, et al. The frequency of medical reversals in a cross-sectional analysis of high-impact oncology journals, 2009-2018. BMC Cancer. 2021;21(1):889. doi: 10.1186/s12885-021-08632-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.US Food and Drug Administration . Master protocols: efficient clinical trial design strategies to expedite development of oncology drugs and biologics: draft guidance for industry. March 2022. Accessed February 20, 2024. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/master-protocols-efficient-clinical-trial-design-strategies-expedite-development-oncology-drugs-and
- 54.Lipkovich I, Dmitrienko A, Muysers C, Ratitch B. Multiplicity issues in exploratory subgroup analysis. J Biopharm Stat. 2018;28(1):63-81. doi: 10.1080/10543406.2017.1397009 [DOI] [PubMed] [Google Scholar]
- 55.Wasserstein RL, Lazar NA. The ASA statement on P values: context, process, and purpose. Am Stat. 2016;70(2):129-133. doi: 10.1080/00031305.2016.1154108 [DOI] [Google Scholar]
- 56.Ou FS, Le-Rademacher JG, Ballman KV, Adjei AA, Mandrekar SJ. Guidelines for statistical reporting in medical journals. J Thorac Oncol. 2020;15(11):1722-1726. doi: 10.1016/j.jtho.2020.08.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Paratore C, Zichi C, Audisio M, et al. Subgroup analyses in randomized phase III trials of systemic treatments in patients with advanced solid tumours: a systematic review of trials published between 2017 and 2020. ESMO Open. 2022;7(6):100593. doi: 10.1016/j.esmoop.2022.100593 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Moher D, Hopewell S, Schulz KF, et al. ; Consolidated Standards of Reporting Trials Group . CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. J Clin Epidemiol. 2010;63(8):e1-e37. doi: 10.1016/j.jclinepi.2010.03.004 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.