Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jul 24.
Published in final edited form as: Cancer Invest. 2021 Dec 27;40(2):184–188. doi: 10.1080/07357907.2021.2020281

Fooled by randomness. The misleading effect of treatment crossover in randomized trials of therapies with marginal treatment benefit

Vicente Valenti 1, Paula Jimenez-Fonseca 2, Pavlos Msaouel 3, Ramón Salazar 4, Alberto Carmona-Bayonas 5
PMCID: PMC12289325  NIHMSID: NIHMS2095751  PMID: 34919008

Much has been written in recent years about the limitations of causal inference based on real-world evidence (1,2). However, much less attention has been paid on how causal inferences from randomized-controlled trials (RCTs) can be skewed by confounders that impact the treatment outcome and baseline mediating covariates or post-randomization mediating events that often occur due to limitations in the trial design (35). Additional major sources of bias in RCTs include inadequate allocation concealment, attrition bias, competing interests, and treatment switch before progression (6). Among them, the crossover of therapies is one of the most challenging hurdles in interpreting parallel-group RCTs (7). Randomization removes all confounding influences on treatment assignment (3). However, in some situations clinicians and patients may desire guaranteed access to the experimental treatment for all trial patients due to prior evidence suggesting benefit with this therapy. Crossover is one strategy used to achieve this goal, whereby patients initially assigned to the standard of care arm will cross to the experimental therapy after first disease progression. This scenario occurs frequently in RCTs as up to 52% of oncology RCTs allow for crossover after progression to the treatment allocated by randomization (8). However, crossover can bias clinical outcomes by increasing the risk of both type I (false positive) and type II (false negative) error. As described below, although the risk of false negative assertions is commonly recognized, the inflation of false positive signals by crossover is underappreciated and needs to be accounted for when interpreting the result of such RCTs. To demonstrate how crossover can increase type I error, we provide computer simulation and review herein illustrative examples of recently reported RCTs that demonstrated false positive treatment efficacy signals due to crossover.

The type of bias most frequently associated with crossover is attenuation of the overall survival (OS) benefit of the experimental arm resulting in increased type II errors and false negative assertions (7). Complex statistical methods such as rank-preserving structural failure time (RPSFT) models or iterative parametric estimations (IPEs) have been developed to account for the mitigating effect of crossover on OS (8). One example of treatment effect attenuation due to crossover is the phase III ClarlDHy trial which evaluated the effect of ivosidenib in patients with IDH1-mutant advanced cholangiocarcinoma. The trial showed a clear signal of progression-free survival (PFS) improvement but only a weak signal in favor of increased OS (HR 0.79; 95% CI 0.56–1.12) (9). However, the OS estimation was impacted by crossover from the placebo arm to ivosidenib. Subsequently, analysis of crossover-adjusted OS using a RPSFT model yielded a strong signal of substantial clinical OS benefit (HR for death with ivosidenib 0.49; 95% CI 0.34–0.70). More simple approaches such as excluding or censoring patients who crossover do not properly utilize the information gained from the treatment switch and are subject to selection bias because such crossovers do not usually occur at random (10).

While the increase in type II error risk due to crossover is well-described and intuitive to clinicians and researchers, the impact of crossover on false positive assertions due to increased type I error is less frequently discussed and is often not considered as a potential source of bias when interpreting RCTs that report positive treatment effects. However, as we will show below, when the experimental agent is only marginally active and salvage therapies are effective, the crossover can inflate the apparent benefit in OS under certain conditions. This bias must be considered when a large OS benefit occurs following a marginal or absent PFS benefit in a clinical trial that allowed crossover. Several empirical observations suggest that the occurrence of this phenomenon has negatively impacted clinical and regulatory decision-making. Fojo et al. suggested that such OS inflation influenced the results of an RCT testing the effect of the poly(ADP-ribose) polymerase (PARP) inhibitor iniparib in patients with advanced triple-negative breast cancer (11). The study randomized patients to receive carboplatin and gemcitabine plus iniparib versus carboplatin and gemcitabine alone with crossover to carboplatin plus gemcitabine plus iniparib in the control arm following initial disease progression (12).. In this trial, the addition of iniparib to the chemotherapy combination of carboplatin plus gemcitabine yielded a 52% overall response rate (ORR) that fell sharply to a 3% ORR when this triple combination was subsequently administered to the patients in the control arm after progression on the carboplatin plus gemcitabine doublet. Fifty-nine percent of the patients in the control arm crossed over to receive iniparib plus carboplatin and gemcitabine after progression on the carboplatin plus gemcitabine doublet, while patients in the experimental arm who initially progressed on carboplatin and gemcitabine plus iniparib went on to other potentially effective salvage regimens. The trial noted a median OS benefit of 4.6 months in favor of the experimental arm (12.3 vs. 7.7 months; HR 0.57, P=0.01), whereas the median PFS benefit was half that of OS (3.6 vs. 5.9 months; HR 0.59, P=0.01), suggestive of crossover bias (12).

More recently, crossover to a therapy with no intrinsic activity led to the provisional accelerated approval of the platelet-derived growth factor receptor-α-blocking antibody olaratumab. This could have been avoided if the relevant stakeholders had noticed the subtle warnings in the data suggestive of OS inflation. In a small randomized phase II trial with 133 participants, patients with advanced soft tissue sarcoma were randomized to receive doxorubicin plus olaratumab versus doxorubicin alone (13). The trial yielded a marginal median PFS difference of 6.6 vs. 4.1 months in the experimental and control arms, respectively (P=0.06), but a substantial increase in median OS in favor of the experimental arm (26.5 vs. 14.7 months; P=0.003).9 A key aspect of the study design was that crossover was allowed after progression, as a result of which 46% of the subjects in the control arm crossed over to receive olaratumab monotherapy after progression while patients in the experimental arm sought out potentially effective second-line regimens. Regulatory agencies approved olaratumab in late 2016 based on the data of this trial. However, several subtle signals already pointed to the fact that the crossover may have influenced the results. More specifically, PFS-2 (PFS to second-line therapy after progression on the first-line regimen) was 19.9 months in the experimental arm, whose patients were treated with active second-line agents. In contrast, PFS-2 in the control arm was 10.6 months, with 46% of these patients being treated with olaratumab monotherapy. Subsequent results from the ANNOUNCE phase III RCT (n=509 patients) were consistent with a deleterious effect by the addition of olaratumab as evidenced by the shorter median PFS with olaratumab plus doxorubicin versus doxorubicin plus placebo (5.4 vs. 6.8 months; P=0.04) and the lack of OS benefit (20.4 vs. 19.7 months; P = 0.69) (14). Based on these new data, the initial approval was reversed in 2019.

The olaratumab story replicated point by point, five years later, the unfortunate events of the iniparib story: data from a small randomized phase II trial showed a large OS benefit but conflicted with the results of the subsequent phase III trial generated years after the FDA accelerated approval (olaratumab) or FDA expanded access authorization (iniparib) that must be hastily reversed to avoid patient harm. Such scenarios will become increasingly more prevalent in upcoming years as new experimental therapeutic options increase and regulatory agencies are being pressured to provide quicker decisions.

To test the hypothesis that crossover can produce false positive inferences in small phase II RCTs, we simulated survival data from an RCT with 60 patients per arm (n=120) after prespecifying that the experimental new drug A has a null effect (β-coefficient of zero) when added to the standard of care therapy B. The simulation was implemented using a flexible-hazard method available in the “coxed” R package (15). Figure 1A shows the PFS curves of this simulated dataset. It is notable that despite the experimental drug A being clinically inert, the HR point estimate in this small simulated trial is 0.70 due to the expected random fluctuation around the null effect. After progression, the simulation proceeds by assigning 80% crossover to the experimental drug A in subjects from the control arm. In contrast, all patients from experimental drug who initially received drug A were switched to another salvage therapy with a hazard ratio of 0.50 for OS compared with either the experimental or control arms. The Kaplan-Meier curves for OS are shown in Figure 1B and are consistent with the hypothesis that crossover can result in false positive inferences clearly favoring a clinically inert experimental treatment when the subsequent therapies for the treatment arm are different and more effective than the subsequent therapy used in the control arm.

Figure 1.

Figure 1.

(A) PFS curves of the simulated dataset. (B) OS curves of the simulated dataset.

In summary, crossover is a strategy that aims to benefit the greatest number of patients but, when its limitations are not accounted for, can produce spurious that can harm an even greater number of patients. The empirical examples of olaratumab and iniparib, supported by our simulation, showcase how crossover can inflate OS treatment effect estimates even when the experimental therapies produce only marginal or no clinical benefit. This highlights the need for careful consideration and statistical modeling of the effect of subsequent therapies on OS estimates. It is possible that this OS inflation may not be as prominent in trials of heavily pretreated patients with few available effective subsequent therapy options. However, even in these scenarios, if crossover is allowed then the trial results should be carefully interpreted.

The utilitarian ethics that legitimize crossover seek to maximize benefit for the greatest number of patients). This is a noble goal that has benefited thousands of patients in cases when the experimental arm is truly superior to the control. However, the utility of crossover is reduced in scenarios when the experimental arm is inert or harmful compared to the control therapy. Proper statistical modeling can mitigate this harm and facilitate clinical inferences and regulatory decisions. This serves as an example of how nuances in trial design and statistical modeling can subtly influence the ethical status of clinical research (16,17). “The age we live in is a busy age; in which knowledge is rapidly advancing towards perfection,” said the utilitarian philosopher Jeremy Bentham in the 18th century. His prediction of perfect knowledge has yet to come true in the 21st century. We should remain vigilant of the many biases and imperfections that can obscure the generation of knowledge in our research.

FUNDING

Pavlos Msaouel was supported by a Career Development Award by the American Society of Clinical Oncology, a Research Award by KCCure, the MD Anderson Khalifa Scholar Award, the Andrew Sabin Family Foundation Fellowship, a Translational Research Partnership Award (KC200096P1) by the United States Department of Defense, an Advanced Discovery Award by the Kidney Cancer Association, the MD Anderson Physician-Scientist Award, and philanthropic donations by Mike and Mary Allen.

Footnotes

DISCLOSURES

Pavlos Msaouel has received honoraria for service on a Scientific Advisory Board for Mirati Therapeutics, Bristol Myers Squibb, and Exelixis; consulting for Axiom Healthcare Strategies; non-branded educational programs supported by Exelixis and Pfizer; and research funding for clinical trials from Takeda, Bristol Myers Squibb, Mirati Therapeutics, Gateway for Cancer Research, and UT MD Anderson Cancer Center.

References

  • 1.Booth CM, Karim S, Mackillop WJ. Real-world data: towards achieving the achievable in cancer care. Nat Rev Clin Oncol. 2019;16(5):312–325. doi: 10.1038/s41571-019-0167-7. [DOI] [PubMed] [Google Scholar]
  • 2.Carmona-Bayonas A, Jimenez-Fonseca P, Gallego J, et al. Causal Considerations Can Inform the Interpretation of Surprising Associations in Medical Registries. Cancer Invest. 2021:1–13. doi: 10.1080/07357907.2021.1999971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Msaouel P Impervious to Randomness: Confounding and Selection Biases in Randomized Clinical Trials. Cancer Invest. 2021;39(10):783–788. doi: 10.1080/07357907.2021.1974030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Msaouel P, Lee J, Thall PF. Making Patient-Specific Treatment Decisions Using Prognostic Variables and Utilities of Clinical Outcomes. Cancers (Basel). 2021;13(11). doi: 10.3390/cancers13112741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Msaouel P, Grivas P, Zhang T. Adjuvant Systemic Therapies for Patients with Renal Cell Carcinoma: Choosing Treatment Based on Patient-level Characteristics. Eur Urol Oncol. 2021. doi: 10.1016/j.euo.2021.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Gluud LL. Bias in clinical intervention research. Am J Epidemiol. 2006;163(6):493–501. doi: 10.1093/aje/kwj069. [DOI] [PubMed] [Google Scholar]
  • 7.Isbary G, Staab TR, Amelung VE, et al. Effect of Crossover in Oncology Clinical Trials on Evidence Levels in Early Benefit Assessment in Germany. Value Health. 2018;21(6):698–706. doi: 10.1016/j.jval.2017.09.010. [DOI] [PubMed] [Google Scholar]
  • 8.Zhang J, Chen C. Correcting treatment effect for treatment switching in randomized oncology trials with a modified iterative parametric estimation method. Stat Med. 2016;35(21):3690–703. doi: 10.1002/sim.6923. [DOI] [PubMed] [Google Scholar]
  • 9.Abou-Alfa GK, Macarulla T, Javle MM, et al. Ivosidenib in IDH1-mutant, chemotherapy-refractory cholangiocarcinoma (ClarIDHy): a multicentre, randomised, double-blind, placebo-controlled, phase 3 study. Lancet Oncol. 2020;21(6):796–807. doi: 10.1016/S1470-2045(20)30157-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Watkins C, Huang X, Latimer N, et al. Adjusting overall survival for treatment switches: commonly used methods and practical application. Pharm Stat. 2013;12(6):348–57. doi: 10.1002/pst.1602. [DOI] [PubMed] [Google Scholar]
  • 11.Fojo T, Amiri-Kordestani L, Bates SE. Potential pitfalls of crossover and thoughts on iniparib in triple-negative breast cancer. J Natl Cancer Inst. 2011;103(23):1738–40. doi: 10.1093/jnci/djr386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.O’Shaughnessy J, Osborne C, Pippen JE, et al. Iniparib plus chemotherapy in metastatic triple-negative breast cancer. N Engl J Med. 2011;364(3):205–14. doi: 10.1056/NEJMoa1011418. [DOI] [PubMed] [Google Scholar]
  • 13.Tap WD, Jones RL, Van Tine BA, et al. Olaratumab and doxorubicin versus doxorubicin alone for treatment of soft-tissue sarcoma: an open-label phase 1b and randomised phase 2 trial. Lancet. 2016;388(10043):488–97. doi: 10.1016/S0140-6736(16)30587-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Tap WD, Wagner AJ, Schoffski P, et al. Effect of Doxorubicin Plus Olaratumab vs Doxorubicin Plus Placebo on Survival in Patients With Advanced Soft Tissue Sarcomas: The ANNOUNCE Randomized Clinical Trial. JAMA. 2020;323(13):1266–1276. doi: 10.1001/jama.2020.1707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Harden JJ, Kropko J. Simulating Duration Data for the Cox Model. Political Science Research and Methods. 2019;7(4):921–928. doi: 10.1017/psrm.2018.19. [DOI] [Google Scholar]
  • 16.Altman DG. Statistics and ethics in medical research. Misuse of statistics is unethical. Br Med J. 1980;281(6249):1182–4. doi: 10.1136/bmj.281.6249.1182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Berry DA. Bayesian Statistics and the Efficiency and Ethics of Clinical Trials. Statistical Science. 2004;19(1):175–187, 13. [Google Scholar]

RESOURCES