Incorporating data from multiple ongoing trials for Bayesian two-stage Phase-II single-arm studies

Susan Halabi; Taehwa Choi; Elizabeth Garrett-Mayer; Richard L Schilsky; Lorenzo Trippa

doi:10.1177/17407745251358233

. Author manuscript; available in PMC: 2025 Aug 25.

Published in final edited form as: Clin Trials. 2025 Aug 21;22(6):667–675. doi: 10.1177/17407745251358233

Incorporating data from multiple ongoing trials for Bayesian two-stage Phase-II single-arm studies

Susan Halabi ^1,^2,^*, Taehwa Choi ^1,^3,^*, Elizabeth Garrett-Mayer ⁴, Richard L Schilsky ⁴, Lorenzo Trippa ⁵

PMCID: PMC12373003 NIHMSID: NIHMS2093791 PMID: 40838464

Abstract

Background/Aim:

Basket designs have been utilized in recent oncology clinical trials due to an increased interest in precision medicine. One current successful basket trial is the American Society for Clinical Oncology Targeted Agent and Profiling Utilization Registry (TAPUR) study, a pragmatic phase II trial where patients are matched based on their tumor genomic profile to treatments that target specific genomic alterations. Despite its success, recruiting patients with rare genomic alterations remains challenging. This study aims to introduce and evaluate a Bayesian approach for integrating data from ongoing independent basket trials that share similar primary aims to improve interim decisions and final analyses and reduce necessary to evaluate treatments.

Methods:

We introduce a Bayesian two-stage Phase II single-arm trial specifically for rare cancers utilizing a hierarchical Bayesian random effects model that incorporate data from ongoing trials. We compare this approach to the standard Simon two-stage design through extensive numerical simulations and apply it to real-world scenarios.

Results:

Simulation results demonstrate that in rare populations our Bayesian approach has attractive operating characteristics. The simulations show that our approach performs well across a broad set of scenarios with fixed and variable numbers of trials.

Conclusion:

Our proposed Bayesian two-stage approach effectively integrates data from multiple ongoing basket trials, enhancing the ability to recruit and analyze patients with rare genomic alterations. This approach improves the timing of interim decision-making and final analysis, making it a valuable tool for trials with slow accrual rates.

Keywords: Basket trials, clinical trials, hierarchical Bayesian model, random effect, Simon-two stage design

Introduction

The number of precision medicine trials using basket designs has increased substantially.^1–3 In these trials, patients with specific genomic alterations are enrolled regardless of the tumor’s organ of origin or histology, known as histology-agnostic.⁴ This approach allows for the inclusion of patients with rare cancers that are difficult to enroll in randomized trials, as these studies typically require screening large numbers of patients. Basket trials provide an efficient framework for trial conduct and enrollment through a master protocol and allow the addition or closure of patient cohorts as the study progresses.⁵ One example of an ongoing successful exploratory basket trial is the American Society of Clinical Oncology Targeted Agent and Profiling Utilization Registry (TAPUR) study, a pragmatic phase II trial that matches patients to Food and Drug Administration approved treatments based on the genomic profiles of their tumors.² These treatments target specific genomic alterations that have treated effectively in other tumor types.

In TAPUR (NCT02693535), study cohorts are defined by the combination of the targeted therapy administered (D), genomic alteration (G), and tumor type (T), indicated by D/G/T. The primary endpoint for each cohort is disease control, defined as either an objective response (partial or complete) per RECIST v1.1 or stable disease for at least 16 weeks from the start of treatment. Each cohort follows a Simon two-stage design with a null hypothesis of a disease control rate equal or below 15% and an alternative hypothesis rate of 35%.⁶ With power of 0.85 and type I error rates of 0.078, 10 patients are enrolled in the first stage. If fewer than two patients experience disease control, the cohort is closed. Otherwise, the cohort expands to stage 2 enrolling 18 more patients. The null hypothesis is rejected if at least 7 of the 28 patients experience disease control by the end of the trial.²

Although the TAPUR trial has successfully enrolled thousands of patients to hundreds of cohorts,^7–10 recruiting patients with rare genomic alterations or tumor types remains challenging.¹⁰ Many genomic alterations targeted by TAPUR have a prevalence of 5% or less among adult solid tumor patients, making it difficult to develop therapies for these patients. This challenge led to a creative strategy to combine response data across three ongoing exploratory basket trials of similar design: TAPUR, the Dutch Drug Rediscovery Protocol (DRUP; NCT02925234), and the Canadian Profiling and Targeted Agent Utilization Trial (CAPTUR; NCT03297606). These trials share similar designs and test the same treatments in patients with the same genomic alterations. This strategy culminated in the development of the TADRUCA protocol, which aims to share data across ongoing studies to obtain more timely results.¹¹ The TADRUCA protocol utilizes a Simon optimal two-stage design, testing a null hypothesis objective response rate of 10% versus an alternative hypothesis of 30%, with a power of 0.85 and a one-sided type I error rate of 0.10. In the first stage, eight patients are enrolled across the three trials. If at least one patient responds, 16 more patients are included in the second stage. A treatment is considered promising if at least five of the 24 patients respond.

Recent studies have proposed methods to integrate external data sources in clinical trials.^12,13 However, the concept of combining data from ongoing phase II trials for interim decisions and to report on experimental treatment efficacy has not been fully developed. Indeed, most methodologies focus on published trials, which pose challenges such as calendar time bias.¹⁴ The primary objective of this article is to introduce and evaluate a methodology for combining data from ongoing independent basket trials for interim decisions and final analyses. Our approach is based on hierarchical Bayesian random effects models. We consider two scenarios: one with a fixed number of trials sharing data and another with a number of trials that varies over time. The rest of this article is structured as follows. In the Method section, we discuss the approach (applicable in scenarios with fixed number of trials) and we describe the simulation set-up. In Results section, we summarize the results of simulations and provide real-world examples, and in Discussion section, we discuss the implications of the results for future trials involving rare cancers.

Methods

Model and design

Our goal is to provide an efficient two-stage design to combine data from multiple ongoing trials for rare genomic alterations. Let $p$ be a common response rate of the treatment across the different trials and set the one-sided hypothesis $H_{0} : p < p_{0}$ and $H_{1} : p > p_{1}$ , where $p_{0}$ and $p_{1} (> p_{0})$ represents clinically undesirable and desirable target response rates, respectively. Define $n_{i k}$ and $π_{i}$ as the number of accumulated patients of $i$ th trial in $k$ th stage $(i = 1, \dots, m; k = 1,2)$ and the true response rate in the population eligible for the $i$ th study.

We suspend enrollment of the $m$ trials after enrolling a targeted number of patients (first stage) and use a Bayesian model for both interim analyses and to report results. The observed tuple at kth stage consists of $𝒪_{k} = \{(Y_{i k}, n_{i k}), i = 1, \dots, m\}$ , where $Y_{i k} ~ B (n_{i k}, π_{i})$ is a random variable representing the number of responses of ith study in $k$ th stage with trial-specific response rate $π_{i}$ . Note that $𝒪_{k}$ is accumulated over stage so $𝒪_{2}$ contains the information in stage 1. To estimate the study-specific response rate $π_{i}$ , we adopt the hierarchical Bayesian approach, and let $π_{i}$ be generated from a mixed-effect model

logit (π_{i}) = μ + v_{i},

(1)

where $logit (u) = log {u / (1 - u)}$ is the logit link function, $μ$ represents an inverse logit of common response rate $p$ and $v_{i} ~ N (0, τ^{- 2})$ is a random effect representative of the heterogeneity across the trials with variance $τ^{- 2}$ . That is, the prior distribution for $logit (π_{i})$ follows the normal distribution conditional on $(μ, v_{i})$ , $logit (π_{i}) ∣ (μ, τ^{- 2}) ~ N (μ, τ^{- 2})$ , where the hyper-prior distributions for $μ$ and $τ^{- 2}$ are

μ ~ N (μ_{0}, 10) and τ^{- 2} ~ T N (0, 1000, 0, \infty),

where $T N (0,1000, a, b)$ is a truncated normal distribution with mean 0 and variance 1000 defined on an interval $(a, b)$ . Here, $μ_{0}$ represents the prior belief on the response rate. This hierarchical Bayesian procedure can be implemented by the Markov Chain Monte Carlo procedure which can be implemented with readily available statistical packages such as JAGS and STAN. After obtaining the posterior estimate of $μ$ , denoted by $\hat{μ}$ , we then compute $\hat{p} = {logit}^{- 1} (\hat{μ})$ . Note that our focus is on leveraging data from studies that were initially independent.

In our two-stage framework, we first begin by enrolling $n_{1} = \sum_{i = 1}^{m} n_{i 1}$ patients among $m (\geq 2)$ studies. In Figure 1, we have $m = 3$ . Note that the enrollment is paused while evaluating the activity of the drug. In addition, the futility analysis is based on the posterior distribution of the treatment efficacy $p = {logit}^{- 1} (μ)$ given $𝒪_{1}$ . Denote $δ \in [0,1]$ as a pre-specified cutoff value. If there is no sufficient evidence that the common response rate $p$ is less than the historical rate $p_{0}$ , i.e., $P (p > p_{0} ∣ 𝒪_{1}) < δ$ , a no-go decision will be made. Otherwise, the trials proceed to the second stage of enrollment and enrolls $n_{2} = \sum_{i = 1}^{m} n_{i 2}$ additional patients. In addition, we will include other trials if there are other phase II studies that generate relevant data and use another hyper-prior variable $τ_{2}^{- 2} ~ T N (0,1000,0, \infty)$ . When $n_{2} = \sum_{i = 1}^{m} n_{i 2} + n_{42}$ patients are accrued, we test $H_{0}$ and decide on whether the treatment is worthy for a phase III clinical trial. Relevant efficacy of the treatment is reported if $P (p > p_{0} ∣ 𝒪_{2}) > 1 - δ$ . For simplicity, we have employed the same delta value for both stage 1 and stage 2. However, it is possible to assume different delta values for these stages.

The expected sample size (ESS)

ESS (p_{0}) = n_{1} + E \{I [P (p < p_{0} ∣ 𝒪_{1})]\} n_{2},

can be calculated, where $I (\cdot)$ is an indicator function, integrating with respect to scenarios of interest, for example when $H_{0}$ holds. In this paper, we set $1 - δ$ as a power of the design.

Simulation studies

We conduct extensive numerical studies to provide empirical evidence that the proposed approach can control the target errors and is more flexible compared to the Simon’s as it allow us to leverage data from multiple studies. We set the hypotheses as $H_{0} : p < p_{0}$ vs. $H_{1} : p > p_{1}$ for $p_{0} < p_{1}$ , and use the type-I rate and the power as 0.078 and 0.85, similar to the TADRUCA design. In general, the data are generated from a random effect model

logit (π_{i}) = μ + v_{i}, i = 1,2, 3 .

We consider two scenarios, (i) fixed design where the number of trials is the same at the first and second stages with number of patients (overall trial) fixed in stage 1 and stage 2 and (ii) a scenario where new trials are included in the second stage. In scenario 1 (fixed design) we consider low and moderate response rates. For low response rate (in Table 1 and 2), we have $μ = {logit}^{- 1} (0.1)$ or logit⁻¹(0.3) and $v_{i} ~ N (0, {0.1}^{2})$ for $i = 1,2, 3$ , $p_{0} = 0.1$ , and $p_{1} = 0.3$ . For moderate response rate (in Table 3 and 4), we have $μ = {logit}^{- 1} (0.2)$ or logit⁻¹(0.4) and $v_{i} ~ N (0, {0.1}^{2})$ for $i = 1,2, 3$ , $p_{0} = 0.2$ , and $p_{1} = 0.4$ .

Table 1.

Simulation results for $H_{0} : p = 0.1$ and $H_{1} : p = 0.3$ with Simon’s optimal sample size.

Data	( $n_{1}$ , $n_{2}$ )	Par	Simon	logit(0.1)	logit(0.2)	logit(0.3)
$H_{0}$	(3, 21)	$\hat{p}$	0.13 (0.07)	0.10 (0.06)	0.11 (0.06)	0.10 (0.06)
		FPR	0.04	0.08	0.08	0.08
		PET	0.73	0.63	0.59	0.56
		ESS	8.69	10.75	11.64	12.26
	(6, 18)	$\hat{p}$	0.13 (0.07)	0.13 (0.07)	0.11 (0.06)	0.11 (0.06)
		FPR	0.08	0.07	0.08	0.08
		PET	0.53	0.63	0.60	0.58
		ESS	14.43	12.65	13.21	13.60
	(8, 16)	$\hat{p}$	0.13 (0.07)	0.13 (0.07)	0.13 (0.07)	0.13 (0.07)
		FPR	0.08	0.08	0.08	0.08
		PET	0.43	0.62	0.59	0.57
		ESS	17.11	14.09	14.53	14.82
	(9, 15)	$\hat{p}$	0.12 (0.06)	0.12 (0.07)	0.13 (0.07)	0.13 (0.07)
		FPR	0.08	0.08	0.08	0.08
		PET	0.39	0.62	0.59	0.58
		ESS	18.18	14.72	15.10	15.37
$H_{1}$	(3, 21)	$\hat{p}$	0.32 (0.10)	0.29 (0.10)	0.30 (0.10)	0.30 (0.10)
		Power	0.63	0.88	0.89	0.89
	(8, 16)	$\hat{p}$	0.31 (0.09)	0.30 (0.10)	0.30 (0.10)	0.30 (0.10)
		Power	0.80	0.80	0.83	0.86
	(6, 18)	$\hat{p}$	0.31 (0.09)	0.30 (0.10)	0.30 (0.10)	0.31 (0.10)
		Power	0.85	0.85	0.85	0.86
	(9, 15)	$\hat{p}$	0.31 (0.09)	0.30 (0.10)	0.30 (0.10)	0.31 (0.10)
		Power	0.87	0.87	0.87	0.87

Open in a new tab

FPR = false positive rate; PET = probability of early termination; ESS=expected sample size

Table 2.

Simulation results for $H_{0} : p = 0.1$ and $H_{1} : p = 0.3$ with increased sample size.

Data	( $n_{1}$ , $n_{2}$ )	Par	logit(0.1)	logit(0.2)	logit(0.3)
$H_{0}$	(8, 24)	$\hat{p}$	0.12 (0.06)	0.12 (0.06)	0.12 (0.06)
		FPR	0.08	0.08	0.08
		PET	0.62	0.59	0.57
		ESS	17.13	17.79	18.22
	(10, 30)	$\hat{p}$	0.11 (0.05)	0.12 (0.05)	0.12 (0.05)
		FPR	0.09	0.09	0.09
		PET	0.62	0.60	0.58
		ESS	21.41	22.13	22.62
	(16, 48)	$\hat{p}$	0.11 (0.04)	0.11 (0.04)	0.11 (0.04)
		FPR	0.11	0.11	0.11
		PET	0.60	0.59	0.57
		ESS	34.99	35.94	36.57
$H_{1}$	(8, 24)	$\hat{p}$	0.30 (0.09)	0.30 (0.09)	0.30 (0.09)
		Power	0.89	0.89	0.90
	(10, 30)	$\hat{p}$	0.30 (0.08)	0.30 (0.08)	0.30 (0.08)
		Power	0.96	0.96	0.96
	(16, 48)	$\hat{p}$	0.30 (0.07)	0.30 (0.07)	0.30 (0.07)
		Power	1.00	1.00	1.00

Open in a new tab

FPR = false positive rate; PET = probability of early termination; ESS=expected sample size

Table 3.

Simulation results for $H_{0} : p = 0.2$ and $H_{1} : p = 0.4$ with Simon’s optimal sample size.

Data	( $n_{1}$ , $n_{2}$ )	Par	Simon	logit(0.2)	logit(0.3)	logit(0.4)
$H_{0}$	(5, 31)	$\hat{p}$	0.32 (0.08)	0.21 (0.07)	0.22 (0.08)	0.22 (0.08)
		FPR	0.01	0.08	0.08	0.08
		PET	0.99	0.59	0.57	0.56
		ESS	5.21	17.79	18.32	18.79
	(15, 21)	$\hat{p}$	0.24 (0.07)	0.21 (0.07)	0.21 (0.07)	0.21 (0.07)
		FPR	0.07	0.08	0.08	0.08
		PET	0.65	0.56	0.55	0.54
		ESS	22.39	24.28	24.51	24.69
	(20, 16)	$\hat{p}$	0.23 (0.07)	0.22 (0.07)	0.22 (0.07)	0.22 (0.07)
		FPR	0.09	0.09	0.09	0.09
		PET	0.41	0.56	0.55	0.55
		ESS	29.42	26.98	27.14	27.25
	(10, 26)	$\hat{p}$	0.26 (0.07)	0.21 (0.07)	0.21 (0.07)	0.21 (0.07)
		FPR	0.03	0.08	0.08	0.08
		PET	0.88	0.58	0.57	0.56
		ESS	13.14	20.95	21.27	21.57
$H_{1}$	(5, 31)	$\hat{p}$	0.47 (0.08)	0.40 (0.09)	0.41 (0.09)	0.41 (0.09)
		Power	0.09	0.84	0.84	0.85
	(15, 21)	$\hat{p}$	0.43 (0.08)	0.40 (0.09)	0.40 (0.09)	0.40 (0.09)
		Power	0.60	0.89	0.89	0.90
	(20, 16)	$\hat{p}$	0.41 (0.08)	0.40 (0.09)	0.40 (0.09)	0.40 (0.09)
		Power	0.85	0.90	0.90	0.90
	(10, 26)	$\hat{p}$	0.40 (0.08)	0.40 (0.09)	0.40 (0.09)	0.40 (0.09)
		Power	0.89	0.90	0.90	0.90

Open in a new tab

FPR = false positive rate; PET = probability of early termination; ESS=expected sample size

Table 4.

Simulation results for $H_{0} : p = 0.1$ and $H_{1} : p = 0.3$ with an additional trial in the second stage.

Data	( $n_{1}$ , $n_{2}$ )	Par	Simon	logit(0.1)	logit(0.2)	logit(0.3)
$H_{0}$	(3, 21)	$\hat{p}$	0.14 (0.07)	0.10 (0.06)	0.11 (0.06)	0.11 (0.06)
		FPR	0.05	0.09	0.09	0.09
		PET	0.73	0.64	0.60	0.57
		ESS	8.69	10.55	11.46	12.09
	(6, 18)	$\hat{p}$	0.13 (0.07)	0.12 (0.07)	0.11 (0.06)	0.11 (0.06)
		FPR	0.06	0.07	0.07	0.08
		PET	0.53	0.63	0.60	0.58
		ESS	14.43	12.70	13.25	13.64
	(8, 16)	$\hat{p}$	0.12 (0.07)	0.12 (0.07)	0.13 (0.07)	0.12 (0.07)
		FPR	0.07	0.07	0.07	0.07
		PET	0.43	0.63	0.60	0.58
		ESS	17.11	13.97	14.41	14.72
	(9, 15)	$\hat{p}$	0.12 (0.06)	0.12 (0.07)	0.12 (0.07)	0.13 (0.07)
		FPR	0.07	0.08	0.08	0.08
		PET	0.39	0.62	0.59	0.58
		ESS	18.19	14.72	15.11	15.36
$H_{1}$	(3, 21)	$\hat{p}$	0.32 (0.09)	0.30 (0.10)	0.30 (0.10)	0.30 (0.10)
		Power	0.63	0.88	0.89	0.89
	(6, 18)	$\hat{p}$	0.31 (0.09)	0.30 (0.10)	0.30 (0.10)	0.30 (0.10)
		Power	0.82	0.80	0.83	0.86
	(8, 16)	$\hat{p}$	0.30 (0.09)	0.30 (0.10)	0.30 (0.10)	0.30 (0.10)
		Power	0.86	0.84	0.84	0.85
	(9, 15)	$\hat{p}$	0.30 (0.09)	0.30 (0.10)	0.30 (0.10)	0.30 (0.10)
		Power	0.88	0.86	0.86	0.86

Open in a new tab

Note: $logit (π_{i}) = μ + v_{i}$ , where $v_{i} ~ l (i \neq 4) N (0, {0.1}^{2}) + l (i = 4) N (0, {0.3}^{2})$ .

FPR = false positive rate; PET = probability of early termination; ESS=expected sample size

For scenario 2, we include another trial, one that is not initially incorporated in the analysis in the first stage, at the second stage with random effect $v_{4} ~ N (0, {0.3}^{2})$ . Various values of $μ_{0}$ are considered, $logit (p_{0})$ , $logit \{(p_{0} + p_{1}) / 2\}$ , or $logit (p_{1})$ which are the three columns of Table 1–4. To evaluate the performance of the methods, we use various characteristics: (i) the average estimated pooled probability $(\hat{p} = {logit}^{- 1} (\hat{μ}))$ and the averaged posterior standard deviation of $\hat{p}$ , (ii) the rejection probability of $H_{0}$ , (iii) the probability of early termination under the null, and (iv) the average expected sample size under the null hypothesis. Each set of simulations is summarized across the 2000 replications.

Results

Simulation Scenario 1: Time to results: key operating characteristic in rare populations

We have explored the time to final results by comparing the proposed framework and Simon’s design. Assume that the accrual rate is two patients/year and the time to interim analysis and the time to final analysis will be three and six months after accrual in both stages is completed respectively. We simulate the low response rate $H_{0} : p = 0.1$ with stage-specific sample sizes $n_{1} = 8$ and $n_{1} + n_{2} = 16$ . The results of simulations are summarized in Figure S2.

Consider first the closure of a study in stage 1 due to insufficient evidence of a go decision. A standard Simon’s design utilizes a single trial, so it takes four years to accrue eight patients (time to final result: 4.25 years). Whereas under the proposed framework, patients are accrued from three different trials and the accrual period (phase-1 only) and the time to final results (i.e., early discontinuation) are approximately 2.04 and 2.29 years. This is 1.85 times faster than Simon’s design.

If instead the study complete accrual at the end of stage 2, the time to final results is defined as the accrual periods in stages 1 and 2 plus the analysis time, where the analysis time in is assumed to take six months. Simon’s design requires eight years to accrue 16 patients and the time to final results is 12.75 years while the proposed framework $(m = 3)$ requires 6.35 years which is 2 times faster. If we consider combining another trial as in Scenario 2 of the simulation, it requires 5.69 years for the time to final results, which is 2.24 times faster than Simon’s approach.

Simulation Scenario 2: fixed number of trials

Table 1 summarizes the simulation results testing $H_{0} : p < 0.1$ vs. $H_{1} : p > 0.3$ , while the Simon’s two stage approach by merging different trials brutally without considering random effect. serves as the reference. Simon’s two stage does not account for study-to-study variations of the response probabilities. Under the Simon’s optimal design, the sample size for this set of hypotheses with alpha and power stated above is 24 $(n_{1} = 8$ , $n_{2} = 16)$ . Note that under the Simon’s design, no-go decision will be made at the first stage if a trial fails to accrual at least one response for futility and fail to reject the $H_{0}$ if a trial fails to accrual five responses. To assess the flexibility of our approach, we consider various sample sizes in the first and second stage. That is, stages 1 and 2 sample sizes that are not optimal by Simon’s definition. Furthermore, the trial-specific accruals varied across simulation replicates, although we put a restriction that at least two studies must be involved.

In the $(n_{1} = 8, n_{2} = 16)$ combination the Simon’s and proposed methods have comparable operating characteristics. However, Simon’s method underperforms for other scenarios. For example, in the (3, 21) pair combination, Simon’s method provides smaller rejection probability under $H_{0}$ than the original design under $H_{0}$ , but the rejection probability also severely decreases under $H_{1}$ . Note that the accrual rates vary across the $m$ studies although the sample size for stages 1 and 2 are fixed.

Figure 2 depicts the average expected sample size over different sample size pairs and different priors under the null hypotheses. Generally speaking, it is expected that the larger expected sample size is obtained under larger prior on $μ$ and larger sample in the first stag. In these scenarios our method has smaller ESS under $H_{0}$ than Simon’s approach.

We investigate how larger sample size pairs in the first and second stages will impact the performance of the design (Table 2). We utilize the sample size pairs of (8, 24), (10, 30) and (16, 48), with the intention that the futility analysis will be conducted when 25% of the target number of patients have been accrued in the first stage.

We also explore how smaller sample sizes affect the performance of the proposed method (Table S2). Various sample size pairs (3, 13), (4, 14), (5, 15), and (6, 16) are considered. Statistical power is improved with the increased sample size, while the type-I error is increased up to 0.13 and then is stabilized to the target rate when the sample pair is (6, 16).

Next, we evaluate the scenario of a moderate response rate, i.e., $H_{0} : p = 0.2$ vs. $H_{1} : p = 0.4$ . We use the same data generating model (1) but consider $μ = {logit}^{- 1} (0.2)$ or logi t⁻¹(0.4). The simulation result is presented in Table 3. In this scenario, less-enrolled pair combination seems much more ineffective than scenario 1. For example, for (5, 31) pair combination, testing with Simon’s method is nearly impossible as the operating characteristics are too far from the target nominal values, which is not the case with the proposed method that provides reasonable results across most cases demonstrating that it is sufficiently flexible to cover various sample size pairs. Simon’s two-stage method is much more sensitive to the sample ratio, that is the number of patients evaluated in the first and second stages.

Real Life Example 1: Interim analysis at the first stage

This analysis is approved by the Duke Intuitional Review Board. In the TAPUR trial, we consider two patients with pancreatic cancer with the ERBB2 amplification who were enrolled to be treated with pertuzumab and trastuzumab, one of whom had an objective response. We have identified from the other two trials, three patients with the same tumor type and alteration and they were treated with the same treatment and yet none of these patients responded. We are interested in determining whether to expand the enrollment to the next stage. That is, $𝒪_{1} = {(1,2), (0,3), (0,3)}$ . Based on the pooled data, the estimated probability of response is 0.13 and the probability of early termination under the null hypothesis $(PET = P (p < 0.1 ∣ 𝒪_{1}))$ is 0.467, where $p$ can be estimated from the pooled observations at the first stage. The posterior probability exceeding the null hypothesis (1-PET) given the data is greater than $δ = 0.15$ . Based on the evidence it is recommended that additional patients be enrolled in the next stage.

Real Life Example 2: Decision making at the second stage based on lower number of patients

Now we have six patients with uterine cancer with an AKT1 mutation who were treated with temsirolimus on the TAPUR trial and all have responded. And hypothetically in trials B (DRUP) and C (CAPTUR), the number of patients with uterine cancer with an AKT1 mutation treated with temsirolimus were 7 (1 responded) and 5 (2 responses) respectively, i.e., $𝒪_{2} = {(6,6), (1,7), (2,5)}$ . The total number of patients from the three trials is 18 whereas the target number of patients is 24. The main question is if we can combine the data and perform the testing based only on 18 patients without any knowledge of the outcome (ORR). Based on the pooled data from the three basket trials, the estimated probability of response is 0.495 and the probability of early termination under the null hypothesis is 0.01 which is much lower than that of example 1. The posterior probability of exceeding the null hypothesis given the data is greater than $δ$ hence the evidence of the efficacy of the drug is shown.

Real Life Example 3: Decision making at the second stage variable number of trials

In the TAPUR trial, assume that six patients with uterine cancer with the AKT1 mutation were treated with temsirolimus and five of them experienced a response. The number of patients in trial B (DRUP) and trial C (CAPTUR) with uterine cancer with the AKT1 mutation who were treated with temsirolimus were 7 (1 response) and 5 (2 responses) respectively. The total number of patients from the three trials is 18 whereas the target sample size is 24 patients. While we have access to two independent trials (B and C), because more patients are needed to determine whether the drug is effective, we have added data from two additional external trials to evaluate the efficacy of temsirolimus in treating uterine cancer patients with AKT1 mutations. Thus two independent basket trials were added to the analysis at the second stage where each enrolled 2 (0 response) and 4 (1 response) patients to the same treatment, i.e., $𝒪_{2} = {(5,6), (1,7), (2,5), (0,2), (1,4)}$ . Based on the pooled data from five basket trials, the estimated probability of response is 0.361 and the probability of early termination under the null hypothesis is 0.003, which is much lower than those of examples I and II. The posterior probability of exceeding the null hypothesis given the data is greater than $δ$ providing evidence of the efficacy of the drug that is worth further testing in a phase III setting.

Discussion

In this manuscript, we utilize the hierarchical Bayesian random effects model approach to incorporate external data for the analysis two stage phase-II basket trials. In particular, we consider incorporating data from different ongoing trials for a go/no-go decision to obtain decisions for study treatments with patients with rare cancers, including rare cancer types and/or cancers with rare genomic targets, within a reasonable timeframe. Our first scenario centered on a fixed number of ongoing independent trials with low and moderate response rates where the response rates are typical of phase II trials in oncology. Our simulation results indicate that our proposed approach works better in estimating the operating characteristics at the nominal levels than the Simon two-stage approach. In addition, the results suggest that our proposed method is highly flexible as it performs well in several first or second stage designs with different sample sizes while requiring a smaller sample size.

We have also considered increasing the number of trials at the second stage. This may be necessary because of the larger number of patients required to enroll at the second stage and the low prevalence of the alterations. Thus, an adaptive rule can be specified in the protocol to increase the number of trials if the number originally planned is not sufficient to fulfill the required number of patients for testing the efficacy of the drug. Our simulations show that even if the futility analysis in the 1st stage occurs earlier (that is, with fewer patients) or later than the Simon two-stage pre-specified stage 1 sample size, the Bayesian approach maintains excellent operating characteristics compared to the Simon two-stage design.

Unlike the Simon two-stage design, the heterogeneous effect due to different ongoing trials can be effectively modelled and explained by the hierarchical Bayesian random effects model. While we assumed normal and truncated normal priors for the response parameter, other priors show similar results which are summarized in the supplement. Our approach is efficient and the R scripts to implement the proposed method are available from Github (https://github.com/taehwa015/TwoStageTrials).

While we focused on study cohorts that are based on genomic alterations, tumor type and treatment across the trials, this approach can be easily implemented to histology agnostic cohorts. Pooling data within histology-specific cohorts across multiple trials might be helpful, although there are no guarantees that this strategy will work for rare alterations, and it might be necessary to perform a histology agnostic analysis either at the first or second stages. There have been several efforts to utilize external data from clinical trials; however, most of them were focused on pooling data from published trials or using trials with a large number of patients. Our method is innovative as it proposes to combine data from ongoing trials with relatively small sample sizes while the trial is still enrolling patients. In addition, the common response rate can be effectively estimated via the inverse variance weighting method by taking weighted average of the trial-specific rates with posterior samples. This is a drawback of the Simon design, which does not account for trial-specific effects.

Thus far, we have focused on statistical issues, but there are several logistical issues to consider when combining data across ongoing trials.² First, in our motivating example, there were several trials that were designed similarly with a goal of ultimately pooling the data, although the strategy for combining was not formalized until all trials were already enrolling patients. Combining trials requires careful consideration of similarities to determine whether integration is feasible—for instance, comparable covariate distributions, patient populations, or outcome measures. The studies in question enroll patients within the same time period, have similar eligibility criteria, use similar genomic testing strategies and matching rules and offer similar or identical treatments to patients. However, these trials enroll patients in three different geographic regions, each governed by different rules and regulation making it challenging to conduct a single trial that satisfies the regulatory authorities in all three regions.

Second, buy-in from all stakeholders including members of the data monitoring committee(s), the principal investigators, the sponsors and the study steering committees is critical. A protocol that prospectively details the statistical analysis plan, study governance (forming steering committee and how to report to the independent data safety and monitoring boards), plans for reporting the different cohorts and guidelines for authorship is critical. Data sharing agreements need to be executed prior to any initiation of analysis, which can be challenging for international collaborations. Third, there is the challenge of aligning data elements and data dictionaries across trials, precisely defining genomic alterations across trials particularly when multiple genomic tests are employed and constructing a common data platform where all the data are housed for analysis and whether this is transportable or can only be accessed in the cloud. Lastly, for our trials, what is considered an alteration is assumed to be consistent across the different trials. However, we recognize that there is diversity in referencing genomic alterations and the variant interpretation across the different assays even though these assays are FDA approved.¹⁵

In summary, the low prevalence of rare diseases and/or mutations requires collaboration across ongoing trials to obtain results quickly and efficiently. We have considered fixed and adaptive designs. Compared to the Simon two-stage design, the results of the simulations suggest that our proposed method is highly flexible as it performs well as evidenced by the smaller effective sample size and achieving the target operating characteristics. It is our aspiration that data sharing during trial enrollment will be applied to other rare diseases for more rapid translation of results into benefits for patients in the era of precision medicine.

Supplementary Material

NIHMS2093791-supplement-1.pdf^{(291.1KB, pdf)}

Acknowledgement

We gratefully acknowledge the efforts of Pam Mangat, Michael Rothe, Gina Grantham, and the TAPUR team for their exceptional support in data preparation, logistics, and coordination across the TAPUR, DRUP, and CAPTUR teams. Their dedication was instrumental to the success of this study and the implementation of the TADRUCA data-sharing protocol.

Funding

This research was supported in part by the American Society of Clinical Oncology, the National Institutes of Health grants R01 LM013352, R01CA256157, R01CA249279, R21CA263950, the US Food and Drug Administration grant 1U01FD007857-01, the United States Army Medical Research Materiel Command grant HT9425-23-1-0393, American Society of Clinical Oncology and the Prostate Cancer Foundation Challenge Award. T. Choi acknowledges partial support from the Sungshin Women’s University Research Grant 2024.

Funding:

This research was supported in part by the American Society of Clinical Oncology, the National Institutes of Health grants R01 LM013352, R01CA256157, R01CA249279, 1R21CA263950-01A1, the US Food and Drug Administration grant 1U01FD007857-01, the United States Army Medical Research Materiel Command grant HT9425-23-1-0393, and the Prostate Cancer Foundation Challenge Award. Taehwa Choi acknowledges partial support from the Sungshin Women’s University Research Grant 2024.

Footnotes

Declaration of conflicting interests

The authors have no conflict of interest.

References

1.Hyman DM, Puzanov I, Subbiah V, et al. Vemurafenib in multiple nonmelanoma cancers with BRAF V600 mutations. N Eng J Med 2015; 373(8): 726–736. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Mangat PK, Halabi S, Bruinooge SS, et al. Rationale and design of the targeted agent and profiling utilization registry study. JCO Precis Oncol 2018; 2: 1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Flaherty KT, Gray RJ, Chen A, et al. The molecular analysis for therapy choice (NCI-MATCH) trial: Lessons for genomic trial design. J Natl Cancer Inst 2020; 112(10): 1021–1029. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Adashek JJ, Kato S, Sicklick JK, et al. If it’s a target, it’s a pan-cancer target: Tissue is not the issue. Cancer Treat Rev 2024; 125: e102721. doi: 10.1016/j.ctrv.2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Kaizer AM, Koopmeiners JS, Kane MJ, et al. Basket designs: statistical considerations for oncology trials. JCO Precis Oncol 2019; 3: 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Simon R Optimal two-stage designs for phase II clinical trials. Control Clin Trials 1989; 10(1): 1–10. [DOI] [PubMed] [Google Scholar]
7.Ahn ER, Rothe M, Mangat PK, et al. Pertuzumab Plus Trastuzumab in Patients With Endometrial Cancer With ERBB2/3 Amplification, Overexpression, or Mutation: Results From the TAPUR Study. JCO Precis Oncol 2023; 7: e2200609. [DOI] [PubMed] [Google Scholar]
8.Duvivier HL, Rothe M, Mangat PK, et al. Pembrolizumab in patients with tumors with high tumor mutational burden: Results From the Targeted Agent and Profiling Utilization Registry Study. J Clin Oncol 2023; 41(33): 5140–5150. [DOI] [PubMed] [Google Scholar]
9.Sahai V, Rothe M, Mangat PK, et al. Regorafenib in patients with solid tumors with braf alterations: Results From the Targeted Agent and Profiling Utilization Registry (TAPUR) Study. JCO Precis Oncol 2024; 8: e2300527. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Klute KA, Rothe M, Garrett-Mayer E, et al. Cobimetinib plus vemurafenib in patients with colorectal cancer with BRAF Mutations: Results From the Targeted Agent and Profiling Utilization Registry (TAPUR) Study. JCO Precis Oncol 2022; 6: e2200191. [DOI] [PubMed] [Google Scholar]
11.Halabi S, Mangat PK, Garrett-Mayer E, et al. Advancing precision oncology: TADRUCA, a model for global collaboration. [Online]; 2021, https://globalforum.diaglobal.org/issue/april-2021/advancing-precision-oncology-tadruca-a-model-for-global-collaboration/.
12.Zheng H and Wason JM. Borrowing of information across patient subgroups in a basket trial based on distributional discrepancy. Biostat 2022; 23(1): 120–135. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Zhou H, Liu F, Wu C, et al. Optimal two-stage designs for exploratory basket trials. Contemp Clin Trials 2019; 85: e105807. [DOI] [PubMed] [Google Scholar]
14.Berger HU, Gerlinger C, Harbron C, et al. The use of external controls: To what extent can it currently be recommended? Pharm Stat 2021; 20(6): 1002–1016. [DOI] [PubMed] [Google Scholar]
15.Vandekerkhove G, Giri VN, Halabi S, et al. Toward informed selection and interpretation of clinical genomic tests in prostate cancer. JCO Precis Oncol 2024; 8: e2300654. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS2093791-supplement-1.pdf^{(291.1KB, pdf)}

[R1] 1.Hyman DM, Puzanov I, Subbiah V, et al. Vemurafenib in multiple nonmelanoma cancers with BRAF V600 mutations. N Eng J Med 2015; 373(8): 726–736. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Mangat PK, Halabi S, Bruinooge SS, et al. Rationale and design of the targeted agent and profiling utilization registry study. JCO Precis Oncol 2018; 2: 1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Flaherty KT, Gray RJ, Chen A, et al. The molecular analysis for therapy choice (NCI-MATCH) trial: Lessons for genomic trial design. J Natl Cancer Inst 2020; 112(10): 1021–1029. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Adashek JJ, Kato S, Sicklick JK, et al. If it’s a target, it’s a pan-cancer target: Tissue is not the issue. Cancer Treat Rev 2024; 125: e102721. doi: 10.1016/j.ctrv.2024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Kaizer AM, Koopmeiners JS, Kane MJ, et al. Basket designs: statistical considerations for oncology trials. JCO Precis Oncol 2019; 3: 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Simon R Optimal two-stage designs for phase II clinical trials. Control Clin Trials 1989; 10(1): 1–10. [DOI] [PubMed] [Google Scholar]

[R7] 7.Ahn ER, Rothe M, Mangat PK, et al. Pertuzumab Plus Trastuzumab in Patients With Endometrial Cancer With ERBB2/3 Amplification, Overexpression, or Mutation: Results From the TAPUR Study. JCO Precis Oncol 2023; 7: e2200609. [DOI] [PubMed] [Google Scholar]

[R8] 8.Duvivier HL, Rothe M, Mangat PK, et al. Pembrolizumab in patients with tumors with high tumor mutational burden: Results From the Targeted Agent and Profiling Utilization Registry Study. J Clin Oncol 2023; 41(33): 5140–5150. [DOI] [PubMed] [Google Scholar]

[R9] 9.Sahai V, Rothe M, Mangat PK, et al. Regorafenib in patients with solid tumors with braf alterations: Results From the Targeted Agent and Profiling Utilization Registry (TAPUR) Study. JCO Precis Oncol 2024; 8: e2300527. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Klute KA, Rothe M, Garrett-Mayer E, et al. Cobimetinib plus vemurafenib in patients with colorectal cancer with BRAF Mutations: Results From the Targeted Agent and Profiling Utilization Registry (TAPUR) Study. JCO Precis Oncol 2022; 6: e2200191. [DOI] [PubMed] [Google Scholar]

[R11] 11.Halabi S, Mangat PK, Garrett-Mayer E, et al. Advancing precision oncology: TADRUCA, a model for global collaboration. [Online]; 2021, https://globalforum.diaglobal.org/issue/april-2021/advancing-precision-oncology-tadruca-a-model-for-global-collaboration/.

[R12] 12.Zheng H and Wason JM. Borrowing of information across patient subgroups in a basket trial based on distributional discrepancy. Biostat 2022; 23(1): 120–135. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Zhou H, Liu F, Wu C, et al. Optimal two-stage designs for exploratory basket trials. Contemp Clin Trials 2019; 85: e105807. [DOI] [PubMed] [Google Scholar]

[R14] 14.Berger HU, Gerlinger C, Harbron C, et al. The use of external controls: To what extent can it currently be recommended? Pharm Stat 2021; 20(6): 1002–1016. [DOI] [PubMed] [Google Scholar]

[R15] 15.Vandekerkhove G, Giri VN, Halabi S, et al. Toward informed selection and interpretation of clinical genomic tests in prostate cancer. JCO Precis Oncol 2024; 8: e2300654. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Incorporating data from multiple ongoing trials for Bayesian two-stage Phase-II single-arm studies

Susan Halabi

Taehwa Choi

Elizabeth Garrett-Mayer

Richard L Schilsky

Lorenzo Trippa

Abstract

Background/Aim:

Methods:

Results:

Conclusion:

Introduction

Methods

Model and design

Figure 1.

Simulation studies

Table 1.

Table 2.

Table 3.

Table 4.

Results

Simulation Scenario 1: Time to results: key operating characteristic in rare populations

Simulation Scenario 2: fixed number of trials

Figure 2.

Real Life Example 1: Interim analysis at the first stage

Real Life Example 2: Decision making at the second stage based on lower number of patients

Real Life Example 3: Decision making at the second stage variable number of trials

Discussion

Supplementary Material

Acknowledgement

Funding

Funding:

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases