Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Jan 1.
Published in final edited form as: Clin Trials. 2013;10(3):10.1177/1740774513483934. doi: 10.1177/1740774513483934

Adaptive adjustment of the randomization ratio using historical control data

Brian P Hobbs a, Bradley P Carlin b, Daniel J Sargent c
PMCID: PMC3856641  NIHMSID: NIHMS528545  PMID: 23690095

Abstract

Background

Prospective trial design often occurs in the presence of “acceptable” [1] historical control data. Typically this data is only utilized for treatment comparison in a posteriori retrospective analysis to estimate population-averaged effects in a random-effects meta-analysis.

Purpose

We propose and investigate an adaptive trial design in the context of an actual randomized controlled colorectal cancer trial. This trial, originally reported by Goldberg et al. [2], succeeded a similar trial reported by Saltz et al. [3], and used a control therapy identical to that tested (and found beneficial) in the Saltz trial.

Methods

The proposed trial implements an adaptive randomization procedure for allocating patients aimed at balancing total information (concurrent and historical) among the study arms. This is accomplished by assigning more patients to receive the novel therapy in the absence of strong evidence for heterogeneity among the concurrent and historical controls. Allocation probabilities adapt as a function of the effective historical sample size (EHSS) characterizing relative informativeness defined in the context of a piecewise exponential model for evaluating time to disease progression. Commensurate priors [4] are utilized to assess historical and concurrent heterogeneity at interim analyses and to borrow strength from the historical data in the final analysis. The adaptive trial’s frequentist properties are simulated using the actual patient-level historical control data from the Saltz trial and the actual enrollment dates for patients enrolled into the Goldberg trial.

Results

Assessing concurrent and historical heterogeneity at interim analyses and balancing total information with the adaptive randomization procedure leads to trials that on average assign more new patients to the novel treatment when the historical controls are unbiased or slightly biased compared to the concurrent controls. Large magnitudes of bias lead to approximately equal allocation of patients among the treatment arms. Using the proposed commensurate prior model to borrow strength from the historical data, after balancing total information with the adaptive randomization procedure, provides admissible estimators of the novel treatment effect with desirable bias-variance trade-offs.

Limitations

Adaptive randomization methods in general are sensitive to population drift and more suitable for trials that initiate with gradual enrollment. Balancing information among study arms in time-to-event analyses is difficult in the presence of informative right-censoring.

Conclusions

The proposed design could prove important in trials that follow recent evaluations of a control therapy. Efficient use of the historical controls is especially important in contexts where reliance on pre-existing information is unavoidable because the control therapy is exceptionally hazardous, expensive, or the disease is rare.

Keywords: adaptive designs, Bayesian analysis, historical controls

1 Introduction

Adaptive designs of clinical trials facilitate mid-trial modifications based on interim information from internal or external sources. Recently, methods have been proposed to facilitate adaptivity for prospective modification of many trial features including randomization [5, 6]. Friedman, Furberg, and DeMets [7] broadly refer to randomization methods that facilitate mid-trial adjustments to the allocation ratios as adaptive. Two types of adaptive randomization (AR) procedures are commonly implemented in clinical trials. Baseline AR designs are used to balance the study arms with respect to prognostic factors that are available at baseline [8, 9]. By contrast, response-adaptive or outcome-adaptive designs were developed for the purpose of assigning more patients to more effective or safer treatment regimens based on interim data from an ongoing trial [1014]. While Korn and Freidlin [15] have questioned the usefulness of designs that use outcome-adaptive randomization for this purpose in the two-armed setting, the potential remains to benefit from adaptive designs that assign more new patients to learn about newer, less studied therapies in the presence of historical controls that satisfy Pocock’s six “acceptability” criteria [1, p.177]. Such designs promise to enhance efficiency when implementing controlled clinical trials that follow recent evaluations of a control therapy by facilitating more precise estimates of the treatment effect. However, assumptions of exchangeability among the historical and concurrent control data may lead to poor frequentist operating characteristics and highly biased results.

In this article we propose an adaptive analog of an actual randomized controlled colorectal cancer trial originally reported by Goldberg et al. [2] that succeeded a similar trial reported by Saltz et al. [3]. The Goldberg trial used as its control arm an identical treatment to that found superior in the Saltz trial. The proposed adaptive design is based on a randomization method that adapts as a function of the relative informativeness of the historical data for evaluating the endpoint of interest (in this case time to disease progression). Actual patient-level data from the Saltz trial (historical) and enrollment dates from the Goldberg (current) are used to simulate the proposed adaptive design’s frequentist properties. Historical data is formally incorporated into the Bayesian analysis of the Goldberg trial using commensurate priors [4, 16] in the context of a piecewise exponential model [17].

In the absence of strong evidence for heterogeneity among the historical and accumulating current controls, the proposed adaptive design assigns more new patients to learn more about novel therapies, thus attempting to impose balance between concurrent and historical information. After an initial stage of equal allocation, the allocation probability is adjusted as a function of the effective historical sample size (EHSS). EHSS is defined as a function of a measure of the degree to which estimates of model parameters are “shrunk” toward their historical counterparts.

Designs such as this promise to be important in contexts where reliance on pre-existing information is unavoidable, as occurs when the control therapy is exceptionally hazardous or expensive, or when the disease is rare. There are many sub-areas of oncology that could benefit from applications of these methodologies, including rare subtypes of sarcomas, rare progressions of renal cell carcinoma such as advanced sarcomatoid, and pediatric brain tumors such as choroid plexus carcinoma, medulloblastoma, and pontine glioma, to mention a few.

This article proceeds as follows. Section 2 describes the colorectal cancer clinical trials that motivated the research. Section 3 introduces the concept of effective historical sample size in the context of Bayesian analysis. Section 4 introduces the probability models used to evaluate time to disease progression (TTP) in the proposed, adaptive trial. In Section 5 we formulate our proposed alternative, adaptive design. Section 6 discusses the adaptive design’s frequentist properties. Finally, Section 7 concludes and suggests avenues for further development.

2 Colon cancer trials

The proposed adaptive design is motivated by two successive randomized controlled colorectal cancer clinical trials originally reported by [2] and [3]. The initial trial [3] randomized 683 patients with previously untreated metastatic colorectal cancer between May 1996 and May 1998 to one of three regimens: Irinotecan alone; Irinotecan and bolus Fluorouracil plus Leucovorin (IFL); or a regimen of Fluorouracil and Leucovorin (5FU/LV) (“standard therapy”). IFL resulted in significantly longer TTP and overall survival than both Irinotecan alone and 5FU/LV, and thus became the standard of care therapy. The subsequent trial [2] compared two new (at the time) drug combinations in 795 patients with previously untreated metastatic colorectal cancer, randomized between May 1999 and April 2001. Patients in the first drug group received the current “standard therapy,” the IFL regimen identical to that used in the historical study. The second group received Oxaliplatin and infused Fluorouracil plus Leucovorin (abbreviated FOLFOX), while the third group received Irinotecan and Oxaliplatin (abbreviated IROX); both of these latter two regimens were new as of the beginning of the second trial.

Historical control data from the Saltz trial appears to satisfy Pocock’s acceptability criteria [1]. Less than one year elapsed in the time between the last patient enrolling into the Saltz trial and the first patient enrolling into the Goldberg trial. Both trials used identically defined IFL therapies, both trials used similar inclusion and exclusion criteria, and both used identical criteria to assess TTP. For example, inclusion required an Eastern Cooperative Oncology Group performance status (PS) of 2 or less. Exploratory data analysis suggests that the trials enrolled patients from comparable populations. The first, second, and third quartiles of the sum of the longest diameters (in cm) of up to 9 tumors at baseline are 5.0, 8.5, 12.8 in the Saltz trial and 4.7, 7.9, 12.7 in the Goldberg trial; for age they are 54, 62, 69 in the Saltz trial and 53, 61, 69 in the Goldberg trial.

Our proposed adaptive trial compares TTP between the FOLFOX and IFL regimens. The historical controls consist of patients randomized to IFL in the Saltz study; the current data consists of patients randomized to IFL or FOLFOX in the Goldberg trial. For simplicity, we omit data from the Irinotecan alone and 5FU/LV arms in the Saltz study, and the IROX arm in the Goldberg study. We consider only patients that had measurable tumors at baseline, bringing the total sample size to 643: 224 historical and 419 current observations. Among the current patients, there are 208 controls (IFL) and 211 patients treated with the new regimen (FOLFOX).

3 Effective historical sample size

Balancing information among treatment arms as data accrues in an adaptive, controlled trial, requires interim assessment of the relative informativeness of the historical data. Before formulating our adaptive trial design we briefly introduce the concept of effective historical sample size (EHSS) in the context of Bayesian analysis, a concept that will be used in Section 5 to adjust allocation probabilities among study arms for new patients enrolled thereafter, in the context of a hierarchical model. The concept of EHSS is related to the work of Morita et al. [18], who considered effective sample sizes of parametric prior distributions for non-hierarchical models.

Let y0 denote the vector of outcomes for n0 patients assigned to the control therapy in the historical trial. Similarly, let y denote the vector of outcomes for n patients in the current trial. Let θ0 and θ denote analogous model parameters and L (θ|y) and L0 (θ0|y0) denote the likelihood functions corresponding to the historical and current data, respectively, where θ is the parameter of interest. Let p*(θ) denote a suitable non-informative prior distribution for θ. If the historical information is ignored, inference proceeds with respect to the posterior distribution of θ|y:

q(θy)p(θ)L(θy). (1)

We refer to (1) as the “reference” model.

Now consider borrowing strength from the historical data. We may model a conditional relationship between θ and θ0 by assuming a conditional prior distribution, p(θ|θ0, η), that is dependent upon a hyperparameter, η, controlling the amount of cross-study borrowing. Let p0(θ0) denote a non-informative prior distribution for θ0. Inference proceeds with respect to the posterior distribution of θ|y0, y, η induced by the Bayesian model:

q(θy0,y,η)L(θy)θ0p(θθ0,η)p0(θ0)L0(θ0y0)dθ0. (2)

Model (2) facilitates borrowing of strength via joint modeling of the historical and current data. Thus, we refer to (2) as the “joint” model.

Let Inline graphic(y) denote the posterior precision of θ|y corresponding to inference under the reference model, Inline graphic(y) = [Eθ|y{θEθ|y(θ)}2]−1. Let Inline graphic(y0, y, η) denote the posterior precision of θ|y0, y, η corresponding to inference under the joint model, Inline graphic(y0, y, η) = [Eθ|y0,y{θEθ|y0,y,η (θ)}2]−1. If the relationship among sample size and precision is reasonably linear under the reference model, then the rate of precision per patient may be approximated by Inline graphic(y)/n. Therefore, the effective sample size of the joint model’s posterior is approximately nP(y0,y,η)P(y), suggesting that a sensible functional relationship between joint posterior precision and EHSS follows as,

EHSSn{P(y0,y,η)P(y)-1}. (3)

In this formulation, EHSS is approximately the effective sample size of the joint posterior minus the current sample size n. If the joint model leads to little gain in precision, then the EHSS is small. In contrast, relatively large gains in precision will produce large values of EHSS. Of note, the above formulation considers the case where θ is a scalar. Mapping relative gains in precision among multiple parameters into a single EHSS presents additional complexity. Our example adaptive colon cancer trial presents one approach for time-to-event analysis using a piecewise exponential likelihood with multiple baseline hazards.

4 Probability models

Pocock [1] proposed Bayesian models for borrowing of strength from historical controls for time-to-event data in the context of parametric exponential models. He suggested that inference should proceed with fixed “magnitudes” of the historical bias, noting that one may wish to repeat the analysis with several alternative values expressing varying degrees of trust in the historical controls. For the case of one historical study, the general commensurate prior approach [4] extends the approach of Pocock and proposes a set of prior distributions for estimating the magnitude of historical bias from the observed data in the context of fully hierarchical and empirical Bayesian analysis for parametric inference with exponential families.

4.1 Piecewise exponential likelihood

Let Y and c denote denote vectors containing time-to-event variables and right-censoring indicators for n patients, respectively. We use a piecewise exponential model that assumes constant baseline hazards within finite partitions of the time axis. Such a flexible model accommodates numerous shapes of the baseline hazard over the partition intervals. Following the notation of Ibrahim, Chen, and Sinha [17, p.48], we partition the time axis into J + 1 intervals, (0, s1], (s1, s2], …, (sJ−1, sJ], (sJ, ∞) where 0 < s1 < s2 < … < sJ < ∞, and sJ denotes the end of patient follow-up, the maximum possible time that a patient could be followed during the trial. For y in the jth interval, sj−1 < ysj, the model assumes a constant baseline hazard λj > 0, for j = 1, …, J. Suppose xi is a vector of p patient-specific baseline covariates, where i = 1, …, n indexes patient, and β = (β1, …, βp) is the corresponding vector of regression coefficients. Let λ denote the vector of baseline hazards, λ = (λ1, …, λJ). Within the jth time partition interval, the hazard for the ith patient is assumed to be h(λj, xi, β) = λj exp (xiβ). Let θ denote the vector of model parameters θ = (λ, β). The ith patient’s full contribution to the likelihood follows as,

LY(θYi)=j=1J+1{λjexp(xiβ)}I(sj-1<Yisj)(1-ci)×exp[-I(sj-1<Yisj){λj(Yi-sj-1)+l=1j-1λl(sl-sl-1)}exp(xiβ)]. (4)

The likelihood in (4) assumes that the ratio of hazards for two individuals is constant over time, and thus represents a proportional hazards model.

4.2 Commensurate prior model

In this subsection we formulate a commensurate prior model [4] for the scenario presented by the motivating successive colon cancer trials, where historical data are available only for the control group. The general commensurate prior approach facilitates estimation of the extent to which analogous parameters from distinct data sources have similar (“commensurate”) posteriors through the specification of a particular hierarchical model. Commensurate prior distributions center model parameters defined in the context of the likelihood for the current data at their historical counterparts. Hobbs, Sargent, and Carlin [4] use sparsity-inducing spike-and-slab hyperpriors for the dispersion parameters to favor the concurrent information in the presence of heterogeneity.

Let Y0 and c0 denote vectors of length n0 containing time-to-event variables and right-censoring indicators for patients assigned to the current control therapy in the historical study. Let Y and c denote data from n patients in the current trial. Furthermore, let di denote an indicator of the novel treatment for the ith patient in the current trial, i = 1, …, n. In this context, the hazard for a historical patient in the jth time partition interval is identical to the jth historical baseline hazard, h(λ0,j) = λ0,j. For the ith patient in the current trial, the jth baseline hazard, λj, is modified by novel treatment status, di: h(λj, di, ξ) = λj exp(diξ), where ξ is the log acceleration factor corresponding to the novel treatment. Both historical and current baseline hazard parameters are defined with respect to a single partition of the time axis characterized by boundary points 0 < s1 < s2 < … < sJ < ∞.

To borrow strength from the historical controls, the commensurate prior approach can be applied by assuming a normal prior for the jth log baseline hazard, log(λj), centered at its historical counterpart, log(λ0,j), with precision τj: log(λj) ~ N(log(λ0,j), 1/τj). Given no additional prior information about the historical baseline hazards or novel treatment effect, inference may proceed with non-informative Gaussian prior distributions for the log(λ0,j)s and ξ. Let Inline graphic and Inline graphic denote the current and historical data: Inline graphic = (Y, d, c) and Inline graphic = (Y0, c0), respectively. Let θ denote the parameter vector θ = {ξ, log(λ1), …, log(λJ)}, let θ0 denote the parameter vector θ0 = {log(λ0,1), …, log(λ0,J)}, and let τ denote vector τ = (τ1, …, τJ). Following from (4), the joint posterior distribution of θ|τ is proportional to

q(θτ,D,D0)N(ξ0,100)i=1nLY(θYi)θ0j=1JN(log(λj)log(λ0,j),1/τj)×N(log(λ0,j)0,100)kn0LY0(θ0Y0,k)dθ0, (5)

where LY0(θ0|·) denotes the likelihood function corresponding to the piecewise exponential model for the historical data.

Following Hobbs et al. [4], we assume “spike and slab” prior distributions [19] for the τjs. This sparsity-inducing prior distribution is locally uniform between two limits, 0 ≤ Inline graphic < Inline graphic except for a bit of probability mass concentrated at a point Inline graphic > Inline graphic:

Pr(τj<u)=p0{(u-Sl)/(Su-Sl)},SluSuandPr(τj>Su)=1-p0. (6)

The additional hyperparameter, p0, denotes the prior probability of the existence within the slab, i.e., Inline graphicτjInline graphic. Given Inline graphic and Inline graphic, smaller values of Inline graphic impose more borrowing of strength from the historical information, requiring more evidence for heterogeneity to overcome the prior probability mass assigned to the spike, 1 − p0. The aforementioned authors [4] show that this prior, when properly calibrated, leads to desirable bias-variance trade-offs for estimating concurrent effects in the context of commensurate prior models for exponential families. The results of our simulation study discussed in Section 6 illustrate the advantages of the spike and slab commensurate prior model in this context. Markov chain Monte Carlo (MCMC) methods can be used to sample from the joint posterior (5).

5 Adaptive trial design

In this section we formulate our proposed alternative design of the Goldberg trial following completion of the Saltz trial, which uses permuted-block randomization in both non-adaptive and adaptive phases. During the initial, non-adaptive phase, new patients are allocated equally between the control (IFL) and novel (FOLFOX) treatments until a targeted number of events selected to ensure that sufficient information has accrued to assess heterogeneity among the concurrent and historical controls. Thereafter, the treatment allocation probability is adjusted as a function of the EHSS and the numbers of patients assigned to each regimen. We use the piecewise exponential commensurate prior model (5) to jointly model the concurrent and historical data at each interim analysis. Given large EHSS, the adaptive design will randomize more new patients to the newer, less studied therapy.

5.1 Historical data

We use likelihood inference based on (4) to select the time axis partition for the full Bayesian analysis of the historical control data alone. The Akaike information criterion optimal partition contains two intervals with boundary point at s = 243 days. Thus, the piecewise exponential model contains two parameters characterizing baseline hazards within intervals [0, 243) and [243, max(Y0)], respectively. Figure 1 contains the associated TTP curves. The dashed line represents the Kaplan-Meier curve, while dotted lines correspond to associated 95% log-transformed [20, see e.g. p.105] pointwise confidence intervals. Results for the piecewise exponential analysis are plotted with a solid line. Posterior summaries corresponding to the Bayesian analysis are provided in Table 1. The posterior mean of median survival is approximately 205 days.

Figure 1.

Figure 1

Time to disease progression for patients assigned to IFL in the Saltz trial: Kaplan-Meier curve (dashed), piecewise exponential curve (solid), 95% log-transformed pointwise confidence intervals (dotted). Right-censored observations are marked by +.

Table 1.

Posterior means and standard deviations derived from the piecewise exponential analysis of time to disease progression for patients randomized to IFL in the Saltz trial, n0 = 224.

mean sd
log baseline hazard 1 −5.689 0.094
log baseline hazard 2 −4.858 0.133

5.2 New trial

The Goldberg trial actually enrolled a total of n = 419 patients with measurable tumors assigned to either IFL (control) or FOLFOX (novel treatment) regimens. Figure 2 plots actual enrollment over time. Each enrollment is characterized by a dot representing the trial’s cumulative sample size (y-axis) in calendar time (x-axis). The first patient was enrolled on May 20, 1999, the last patient was enroll on April 25, 2001. We use the Goldberg et al. [2] article’s submission date as the time of final analysis, which occurred approximately 28 months after close of enrollment.

Figure 2.

Figure 2

Enrollment over time for patients randomized to the FOLFOX or IFL regimens in the Goldberg trial. The y-axis represents cumulative sample size. The x-axis represents calendar time.

Our proposed trial consists of two phases. During the first, non-adaptive phase, patients are randomized 1:1 to the treatment regimens until O1 events are observed. In order to ensure periodic balance in the numbers of patients assigned to each regimen, permuted-block randomization [21] is used with blocks of size 2 patients. An initial interim analysis to assess historical and concurrent control heterogeneity and compute EHSS will occur at that time. An adaptive, permuted-block randomization procedure that is a function of EHSS will be used to assign new patients to treatment arms thereafter in blocks of size B patients. EHSS will be re-assessed after every Bth enrollment. The final analysis, using the joint model, occurs 51 months after the trial’s initiation. O1 needs to be large enough so that sufficient information to assess control heterogeneity has accrued, yet small enough to justify the adaptive treatment allocation. In practice, O1 and B should be selected in the context of a comprehensive assessment of the trial’s operating characteristics in the presence of the historical data and expected patient enrollment. Our simulation study, discussed in Section 6, considers frequentist properties of the adaptive design under 3 different values of O1.

5.2.1 Data structure

Let T = 1552 denote the time of the trial’s final analysis in days. Let t denote “trial time,” 0 < t < T, and let ei, 0 < ei < 706, represent the time at which the ith patient enrolls. Let Yi(t) denote the random variable containing the value of the ith patient’s TTP process at time t. Note that Yi(t) is recorded from ei, so that 0 < Yi(t) < Tei. Let Y(t) ={Y1(t), …, Yn(t)(t)} denote the vector of values of the TTP processes for the n(t) patients enrolled prior to time t, n(t)=i=1nI(ei<t), where I is the indicator function. Let ci(t) denote an indicator of the ith patient’s right-censoring status at time t, and let c(t) = {c1(t), …, cn(t)(t)} denote the vector of values of censoring processes for the n(t) patients enrolled prior to time t. Finally, let di denote an treatment indicator for the ith patient, with 0 indicating IFL and 1 FOLFOX, and let d(t) = {d1, …, dn(t)}. Denote concurrent data at time t by Inline graphic(t) = {Y(t), c(t), d(t)}, and historical data by Inline graphic = (Y0, c0).

5.2.2 Posterior inference

The piecewise exponential models used for analysis at time t contain two baseline hazard parameters, λ1 and λ2, corresponding to intervals [0, m(t)) and [m(t), T], where m(t) denotes the median TTP among the concurrent events observed by time t. This balances the available concurrent information between the two partition intervals at each analysis. Following the notation of Section 4, let ξ denote the FOLFOX effect, and denote the current and historical parameter vectors by θ and θ0, where θ = {ξ, log(λ1), log(λ2)}, and θ0 = {log(λ0,1), log(λ0,2)}, respectively.

The reference model (1) in this context consists of the Bayesian piecewise exponential model that estimates θ in the absence of the historical controls. It assumes non-informative Gaussian prior distributions (with small precision w = 0.01) for the FOLFOX effect, ξ, and the log baseline hazards, log(λ1) and log(λ2). At trial time t, the posterior distribution of θ| Inline graphic(t) under the reference model follows from (4) as

q(θD(t))N(ξ0,1/w)j=12N(log(λj)0,1/w)i=1n(t)LY(θYi(t)). (7)

The joint model (2) consists of a commensurate prior model (5) that borrows strength from the historical control data for estimating θ. Following from (4), (5), and (6), the posterior distribution of θ| Inline graphic(t), Inline graphic under the joint model at trial time t is

q(θD(t),D0)N(ξ0,1/w)i=1n(t)LY(θYi(t))×τθ0j=12N(log(λj)log(λ0,j),1/τj)N(log(λ0,j)0,1/w)pτ(τj)kn0LY0(θ0Y0,k)dθ0dτ, (8)

where pτ(u) denotes the spike-and-slab prior distribution in (6) evaluated at u.

5.2.3 Effective number of historical controls and adaptive randomization

We compute EHSS at interim analyses, with use thereafter to guide the randomization procedure. In the Appendix we investigate the linear association between number of events and posterior precision for log(λ1)| Inline graphic(t) in the context of the reference model (7) in the presence of the historical data. EHSS in this context maps the joint model’s (8) relative gain in precision for estimating the set of current model parameters, λ, that assume commensurate prior distributions (defined conditionally upon analogous historical parameters), to a number of additional events effectively observed for control. The first interim assessment of EHSS will occur at the trial time t1 at which at least O1 events have been observed, t1=argmint[i=1n(t){1-ci(t)}=O1]. EHSS will then be re-assessed after every B additional patients enroll. Thus, the hth re-assessment of EHSS will occur at trial time, th, where th = arg mint {n(t) = n(th−1) + B}, for h = 2, 3, …. Let Oh denote the number of events that have occurred by trial time th, Oh=i=1n(th){1-ci(th)}. Following Section 3, let Pj{D(th)} denote the posterior precision of log(λj)| Inline graphic(th) at the hth interim analysis under the reference model (7). Similarly, let Inline graphic{ Inline graphic(th), Inline graphic} denote the posterior precision of log(λj)| Inline graphic(th), Inline graphic at the hth interim analysis under the joint model (8). Recall that we have defined the time axis partition for analysis at time th such that approximately Oh/2 of the observed events occur within each time axis interval. The effective historical sample size at the hth interim analysis follows from (3) as the sum of parameter-specific EHSSs (each based upon Oh/2 events) for the current baseline hazards,

EHSSh=max{j=12(Oh2[Pj{D(th),D0}Pj{D(th)}-1]),0}. (9)

We introduced EHSS (3) in the context of a Bayesian analysis with fixed hyperparameter η. Because we now average over the uncertainty for estimating hyperparameter τj, sufficient heterogeneity may result in less precision for the joint posterior inference, Pj{D(th),D0}<Pj{D(th)}. Thus, we define EHSSh to be non-negative valued.

After the hth interim analysis, we seek to obtain balance between the numbers of patients assigned to each regimen at the end of the trial relative to EHSSh in the context of a permuted-block randomization procedure with blocks of size B. Let nFOX(t) and nIFL(t) denote the numbers of patients assigned to FOLFOX and IFL at trial time t, respectively. Let R(t) denote the number of remaining patients at trial time t, R(t) = nnIFL(t) − nFOX(t). Following the hth interim analysis, approximately πhB of the next B patients to enroll into the trial will be randomly assigned to FOLFOX and (1 − πh)B assigned to IFL, where πh is defined to satisfy the following equation

nFOX(th)+πhR(th)=EHSSh+nIFL(th)+(1-πh)R(th). (10)

Therefore, πh is

πh=min{12(EHSSh+nIFL(th)-nFOX(th)R(th)+1),1}. (11)

The design can accommodate alternative default allocation ratios (besides 1 : 1) through modification of (10) to reflect the desired sample size imbalance at the trial’s completion.

6 Simulations

We use simulation to evaluate the frequentist properties of the proposed alternative, adaptive design of the Goldberg trial. The simulation uses the actual patient-level data from n0 = 224 historical controls assigned to IFL in the Saltz trial. Patient accrual in the simulated trials follows the actual enrollment dates (shown in Figure 2) for n = 419 patients randomized to FOLFOX or IFL during the Goldberg trial. Following Subsection 5.2, each simulated trial proceeds as follows. Patients are randomized 1:1 to IFL and FOLFOX using permuted-block randomization with blocks of size 2 until O1 total events are observed at time t1. Thereafter, the probability of assigning a new patient to FOLFOX (11) adjusts in blocks of size B = 30 as a function of EHSS and the numbers of patients already assigned to each treatment arm. The joint commensurate prior model (8) used to compute EHSS at each interim analysis assumes spike and slab priors (6) for τj, j = 1, 2, with the following hyperparameters: Inline graphic = 0.1, Inline graphic = 2, Inline graphic = 5000, and p0 = 0.5. A final analysis occurs at time T, 51 months following initial enrollment using the joint commensurate prior model with Inline graphic = 5 and p0 = 0.9. The approach provides for more borrowing of strength from the historical data for computing EHSS, while facilitating slightly less borrowing when estimating the FOLFOX treatment effect, ξ, at the final analysis.

Frequentist properties are considered under varying magnitudes of assumed historical bias. Let θ̃ = {ξ̃, log(λ̃1), log(λ̃2)} denote a set of fixed parameters characterizing a true state of the current model, where the baseline hazards, λ̃1 and λ̃2, are defined under [0, 243) and [243, T] (the AIC-optimal time axis partition for inference on the historical data alone). TTP for current patients are generated from θ̃ by shifting the log baseline hazards, log(λ̃), from their historical estimates in Table 1 by Δ100%, where λ̃(Δ) = (exp{−5.689(1 − Δ)}, exp{−4.858(1 − Δ)}). The historical data is assumed to be unbiased for the case where Δ = 0. Without loss of generality, we fix ξ̃ = 0.

Figure 3 plots results for mean EHSS and mean proportion of patients (out of n = 419) assigned to FOLFOX as functions of trial time. Each column corresponds to a different value of Δ100%, indicating relative magnitudes of assumed historical bias. Each graph in the bottom row of plots shows results for designs that initiate adaptive randomization after O1 = 60, 90, and 120 observed events. Results are shown in the top row of plots for O1 = 60. The plots show that increasing (decreasing) the baseline hazards leads to designs for which the initial computation of EHSS (determined by O1), and initiation of the AR procedure thereafter, occurs earlier (later), on average. The top row of graphs show us that the actual historical data is most commensurate, on average, with the concurrent data when Δ100% = 0% (no change) and 5% (5% increase). The joint model borrows more strength from the historical data for these two scenarios, which leads to larger initial EHSS, and increasing trends over time as more events facilitate larger relative gains in posterior precision. This induces monotonically increasing trends over time in the mean proportion of patients assigned to FOLFOX. Note that because increasing the baseline line hazards yields fewer right-censored observations, EHSS is ultimately slightly larger for Δ100% = 5% when compared to = 0%. In addition, AR initiates earlier for Δ100% = 5%. Thus, slightly more patients, on average, are assigned to FOLFOX when Δ100% = 5%. In contrast, the graphs show smaller initial EHSS with decreasing trends over time for the scenarios where Δ100% = −10%, −5%, and 10%, leading to less allocation to FOLFOX, on average. For these three scenarios, we see that the trends over time in mean proportion of patients assigned to FOLFOX for the three choices of O1 ultimately converge. This suggests that our adaptive design is robust among reasonable choices in the initial timing of the interim assessments of EHSS.

Figure 3.

Figure 3

Simulated mean EHSS (top row) and mean proportion of patients assigned to FOLFOX (bottom row) as functions of trial time corresponding to designs that initiate adaptive randomization after 60 (solid), 90 (dashed), and 120 (dotted) observed events. Results are shown for five magnitudes of assumed historical bias, indicating the amounts of relative change in the concurrent log baseline hazards, log(λ1) and log(λ2), from their historical estimates in Table 1.

Because our simulation study uses the actual historical data (as opposed to generating data from a piecewise exponential model with parameters fixed at their estimates in Table 1), the results lack symmetry around the case where the historical data is assumed to be unbiased, Δ = 0. The actual historical information appears to be most commensurate with accumulating concurrent information when current data is generated from the piecewise exponential model with 0 < Δ < 0.05.

In addition, we evaluated the adaptive design’s frequentist properties for point estimation of the FOLFOX effect, ξ, using the posterior mean at the final analysis, Inline graphic (ξ). Specifically, we simulated expected bias and risk under squared error loss with respect to the conditional distribution of the data, Inline graphic(T), given Δ and the actual historical data, Inline graphic:

ED(T)Δ,D0{EξD(T),D0(ξ)}-ξ,andED(T)Δ,D0[{EξD(T),D0(ξ)-ξ}2]. (12)

Results for the adaptive designs are compared to a fixed 1:1 allocation design with analysis under the associated reference (“no borrowing”) model and a “pooled” model, which simply pools the historical and current controls and randomizes new patients to FOLFOX with probability 0.77, so that the final analysis has approximately 321 total patients (historical and current) assigned to each regimen.

Figure 4 presents expected bias and risk under SEL for estimating ξ as functions of Δ100%. Each column corresponds to a different value of O1. A total of 1000 replicated trials were simulated for each of 7 scenarios corresponding to values of Δ100% ranging from −15% to 15%. Each approach leads to an admissible estimator. Under no borrowing, Inline graphic is ignored. Thus, the posterior mean of ξ| Inline graphic(T) is unbiased. Increasing the baseline hazards results in fewer right-censored patients, which decreases risk. In contrast, the pooled design offers maximal variance reduction, and thus is associated with the largest reductions in risk at Δ100% = 0%. However, this approach precludes learning about intra-control heterogeneity, which leads to prohibitively biased estimators with sharply, uniformly increasing risk that exceeds no borrowing for |Δ100%| > 2%. The adaptive design offers a compromise. For |Δ100%| < 3.5%, it leads to estimators that dominate no borrowing for risk. Bias is attenuated for large values of |Δ100%|, leading to near replication of no borrowing risk for |Δ100%| > 10%. No borrowing dominates for intermediate intra-control effects, (−10% < Δ100% < −3.5%) ∪ (3.5% < Δ100% < 10%). However, adaptive borrowing leads to considerable reductions in risk and bias when compared to naive pooling over this portion of the parameter space. The hierarchical model’s spike-and-slab hyperparameters can be adjusted to modulate risk tolerance in the presence of intermediate intra-control effects. In addition, the simulations suggest the estimator is relatively robust to the choice of O1.

Figure 4.

Figure 4

Expected bias (top row) and risk under squared error loss (bottom row) for estimating the FOLFOX effect, ξ, as functions of percent change in the log baseline hazards from their historical estimates in Table 1. Results are shown for three values of the number of observed events required before initiating the adaptive randomization procedure.

7 Discussion

In this article we proposed an adaptive trial design that implements a randomization procedure for allocating patients aimed at balancing total information (concurrent and historical) among the study arms by assigning more patients to receive the novel therapy in the absence of evidence for heterogeneity among the concurrent and historical controls, and yields an estimator of the novel treatment effect that obtains desirable frequentist properties. The proposed design promises to enhance efficiency in the prospective evaluation of new therapies involving controlled clinical trials that follow recent evaluations of the control therapy in the presence of little intra-control heterogeneity without risking highly biased estimation in the presence of substantial intra-control heterogeneity.

The proposed adaptive trial was considered in a “non-sequential” setting involving a single treatment comparison at the end of the trial. That is, the design, while sequential in assessing evidence for heterogeneity among the historical and concurrent controls, did not incorporate decision rules for early stopping for efficacy or futility. Extending the proposed adaptive trial to allow for earlier stopping is an area for potential future development. Indeed, other areas for future development are many, since they include any setting where we wish to adjust trial enrollment adaptively based on an updated estimate of how much strength may be sensibly borrowed from external data sources. For instance, when historical information is also available on the new treatment, we may have two commensurability calculations, each having their own impact on the randomization ratio (11). Alternatively, in a one-arm device safety trial we may use commensurate priors to decide when to stop accrual based on the predictive probability of a favorable end result given the current estimate of the total effective sample size. While our approach is exclusively Bayesian, Freedman and Spiegelhalter [22] have considered clinical trials that use Bayesian methods for design and frequentist methods for final analysis.

Acknowledgments

The work of the first author was supported in part by the University of Texas M.D. Anderson’s Cancer Center Support Grant NIH P30 CA016672. The work of the second author was supported in part by National Cancer Institute grant 1-R01-CA157458-01A1. The work of the third author was supported in part by NCI grant CA25224.

Appendix

Here we investigate the association between number of events and posterior precision for log(λ1)| Inline graphic(t) in the context of the reference model (7). Specifically, we simulated 7, 000 datasets from a piecewise exponential probability distribution. For each simulated dataset, Inline graphic, the sample size, n, and right-censoring probability, πcens, were randomly generated from uniform distributions covering broad ranges of the sample spaces of relevance: n ~ Uniform(50, 600) and πcens ~ Uniform(0.1, 0.8). Time-to-event observations were generated for each of the ñ patients from θ̃ containing baseline hazards fixed at their corresponding historical estimates in Table 1, log(λ̃1) = −5.689 and log(λ̃2) = −4.858, and treatment effect ξ̃ = 0. For each dataset we recorded the number of events observed in the initial partition interval, and fit the reference model (7) to obtain the associated posterior precision of log(λ1)| Inline graphic. The results in Figure 5 reveal a strong linear association (Pearson correlation of 0.996) among posterior precision and number of observed events for the considered portion of the parameter space.

Figure 5.

Figure 5

Scatterplot of simulation results. Each dot depicts the number of events (y-axis) as a function of posterior precision of log(λ1)| Inline graphic(t) (x-axis) under the reference model (7).

References

  • 1.Pocock SJ. The combination of randomized and historical controls in clinical trials. Journal of Chronic Diseases. 1976;29:175–188. doi: 10.1016/0021-9681(76)90044-8. [DOI] [PubMed] [Google Scholar]
  • 2.Goldberg RM, Sargent DJ, Morton RF, Fuchs CS, Ramanathan RK, Williamson SK, et al. A randomized controlled trial of fluorouracil plus leucovorin, irinotecan, and oxaliplatin combinations in patients with previously untreated metastatic colorectal cancer. Journal of Clinical Oncology. 2004;22:23–30. doi: 10.1200/JCO.2004.09.046. [DOI] [PubMed] [Google Scholar]
  • 3.Saltz LB, Cox JV, Blanke C, et al. Irinotecan plus fluorouracil and leucovorin for metastatic colorectal cancer. The New England Journal of Medicine. 2000;343:905–914. doi: 10.1056/NEJM200009283431302. [DOI] [PubMed] [Google Scholar]
  • 4.Hobbs BP, Sargent DJ, Carlin BP. Commensurate priors for incorporating historical information in clinical trials using general and generalized linear models. Bayesian Analysis. 2012;7:639–674. doi: 10.1214/12-BA722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Bretz F, Koenig F, Brannath W, Glimm E, Posch M. Adaptive designs for confirmatory clinical trials. Statistics in Medicine. 2009;28:1181–1217. doi: 10.1002/sim.3538. [DOI] [PubMed] [Google Scholar]
  • 6.Berry SM, Carlin BP, Lee JJ, Müller P. Bayesian Adaptive Methods for Clinical Trials. Boca Raton, FL: Chapman and Hall/CRC Press; 2010. [Google Scholar]
  • 7.Friedman L, Furberg C, DeMets DL. Fundamentals of Clinical Trials. 3. New York: Springer-Verlag; 1998. [Google Scholar]
  • 8.Pocock SJ, Simon R. Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. Biometrics. 1975;31:103–115. [PubMed] [Google Scholar]
  • 9.Anderson G, LeBlanc M, Liu PY, Crowley J. On use of covariates in randomization and analysis of clinical trials. In: Crowley J, Hoering A, editors. Handbook of statistics in clinical oncology. 3. Boca Raton, FL: Chapman and Hall/CRC Press; 2011. [Google Scholar]
  • 10.Berry DA, Eick S. Adaptive assignment versus balanced randomization in clinical trials: a decision analysis. Statistics in Medicine. 1995;14:231–246. doi: 10.1002/sim.4780140302. [DOI] [PubMed] [Google Scholar]
  • 11.Thall PF, Wathen JK. Practical Bayesian adaptive randomisation in clinical trials. European Journal of Cancer. 2007;43:859–866. doi: 10.1016/j.ejca.2007.01.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zhou X, Liu S, Kim ES, Herbst RS, Lee JJ. Bayesian adaptive design for targeted therapy development in lung cancer – a step toward personalized medicine. Clinical Trials. 2008;5:181–193. doi: 10.1177/1740774508091815. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Huang X, Ning J, Li Y, Estey E, Issa JP, Berry DA. Using short-term response information to facilitate adaptive randomization for survival clinical trials. Statistics in Medicine. 2009;28:1680–1689. doi: 10.1002/sim.3578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Lee JJ, Gu X, Liu S. Bayesian adaptive randomization designs for targeted agent development. Clinical Trials. 2010;7:584–597. doi: 10.1177/1740774510373120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Korn EL, Freidlin B. Outcome-adpative randomization: is it useful? Journal of Clinical Oncology. 2011;29:771–776. doi: 10.1200/JCO.2010.31.1423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hobbs BP, Carlin BP, Mandekar SJ, Sargent DJ. Hierarchical commensurate and power prior models for adaptive incorporation of historical information in clinical trials. Biometrics. 2011;67:1047–1056. doi: 10.1111/j.1541-0420.2011.01564.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Ibrahim JG, Chen MH, Sinha D. Bayesian Survival Analysis. New York: Springer-Verlag; 2001. [Google Scholar]
  • 18.Morita S, Thall PF, Müller P. Determining the effective sample size of a parametric prior. Biometrics. 2008;64:595–602. doi: 10.1111/j.1541-0420.2007.00888.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Mitchell TJ, Beauchamp JJ. Bayesian variable selection in linear regression. Journal of the American Statistical Association. 1988;83:1023–1032. [Google Scholar]
  • 20.Klein JP, Moeschberger ML. Survival Analysis: Techniques for Censored and Truncated Data. 2. New York: Springer-Verlag; 2003. [Google Scholar]
  • 21.Matts JP, Lachin JM. Properties of permuted-block randomization in clinical trials. Controlled Clinical Trials. 1988;9:327–344. doi: 10.1016/0197-2456(88)90047-5. [DOI] [PubMed] [Google Scholar]
  • 22.Freedman LS, Spiegelhalter DJ. The assessment of subjective opinion and its use in relation to stopping rules for clinical trials. The Statistician. 1983;32:153–160. [Google Scholar]

RESOURCES