Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Feb 20.
Published in final edited form as: Stat Med. 2021 Nov 9;41(4):698–718. doi: 10.1002/sim.9249

A Group-Sequential Randomized Trial Design Utilizing Supplemental Trial Data

Ales Kotalik 1, David M Vock 2, Brian P Hobbs 3, Joseph S Koopmeiners 2
PMCID: PMC8795487  NIHMSID: NIHMS1751034  PMID: 34755388

Abstract

Definitive clinical trials are resource intensive, often requiring a large number of participants over several years. One approach to improving the efficiency of clinical trials is to incorporate historical information into the primary trial analysis. This approach has tremendous potential in the areas of pediatric or rare disease trials, where achieving reasonable power is difficult. In this manuscript, we introduce a novel Bayesian group-sequential trial design based on Multisource Exchangeability Models, which allows for dynamic borrowing of historical information at the interim analyses. Our approach achieves synergy between group sequential and adaptive borrowing methodology to attain improved power and reduced sample size. We explore the frequentist operating characteristics of our design through simulation and compare our method to a traditional group-sequential design. Our method achieves earlier stopping of the primary study while increasing power under the alternative hypothesis but has a potential for type I error inflation under some null scenarios. We discuss the issues of decision boundary determination, power and sample size calculations, and the issue of information accrual. We present our method for a continuous and binary outcome, as well as in a linear regression setting.

Keywords: Group-sequential Trial, Borrowing Strength, Data Aggregation, Exchangeability

1 |. INTRODUCTION

Implementing an adaptive clinical trial can reduce the required sample size and the duration of a trial1;2;3. The reductions in sample size and duration can also reduce the time to approval of new therapies and provide faster access to the new treatment4. These benefits are one of the main goals of the 21st Century Cures Act5. A closely related problem in biomedical research is the rising cost of clinical trials6;7;8. It has been noted that the costs of clinical trials rise faster than the inflation rate7;9, making the real cost of drug development higher over time. Some argue that rising costs could stifle innovation as well as increase the out-of-pocket costs to patients8;6. In addition, shorter trials conducted in a rapidly changing environment are less likely to be outdated by the time the results are published. Lastly, there are compelling ethical reasons for using adaptive designs. For example, a clinical trial should be terminated if one arm is clearly superior to the other as it is no longer ethical to randomize participant to the inferior arm(s)10.

Group-sequential designs are a widely used adaptive design method that can address many of these issues. Such designs allow us to perform interim analyses to compare the treatment arms and to stop the trial early if there is strong evidence for or against the alternative hypothesis. Allowing for early stopping can reduce the expected sample size, reducing cost and trial duration.

An alternative approach to reducing required sample size is to incorporate supplemental or historical information. There is existing literature discussing borrowing of supplemental data11;12;13;14;15;16. Static borrowing methods are the simplest approach, but they are sub-optimal as they do not allow for any re-evaluation of the initial assumption about exchangeability13. Dynamic approaches are data-driven and do not require that the extent of borrowing be pre-specified. Common dynamic approaches use a hierarchical structure17;18;19;20;15;13;11;21, but are sensitive to the choice of hyperpriors that control the amount of borrowing14;22;23. Multisource Exchangeability Models (MEMs)24;25;26 are an approach to dynamic borrowing that enumerate all the possible exchangeability patterns and aggregate the posteriors for each exchangeability pattern using Bayesian Model Averaging (BMA)27. Unlike other existing methods, MEMs can incorporate multiple supplemental sources to a different degree into the primary study analysis.

Supplemental data may take many forms. For example, the same intervention could have been studied previously in another part of the world, in different population, or even in the earlier phases of the development process (i.e., using phase II data to augment phase III data). We consider one such situation where we supplement a clinical trial with data from the pilot study. We will show that under the alternative hypothesis, using our method would allow the ongoing trial to both stop earlier (and save costs by reducing necessary sample size) and increase the power to detect a treatment effect.

In this manuscript, we propose a group sequential design utilizing dynamic borrowing at the interim analyses of a randomized controlled trial (RCT) to increase efficiency. Our design utilizes the MEM framework24;28. It combines the group-sequential and dynamic borrowing methods to achieve synergy and attain even greater savings in terms of sample size. We consider the case where the supplemental trial is already concluded, as well as the case where the two trials are running concurrently. We investigate the operating characteristics of our design assuming a continuous and binary outcome, as well as an extension to a regression setting. We show that our method yields greater power to detect a treatment difference in the current trial as compared to traditional group-sequential design, albeit with minimal type I error probability inflation under the null hypothesis. Our method also tends to achieve earlier stopping and smaller sample size in the primary study. Methods for the planning of the primary trial (power / sample size calculations) are discussed. We also address the issue of potentially non-monotone information accrual.

As a motivating example, we consider a sequence of randomized trials in tobacco regulatory science, which focuses on evaluating potential strategies for regulating the sale, marketing, and content of tobacco products. This field is currently undergoing major and rapid changes in light of the emerging research on the dangers of vaping and e-cigarettes29;30;31.

The Impact of Very Low Nicotine Content Cigarettes in a Complex Marketplace (VLNC-CM) study (ClinicalTrials.gov Identifier NCT03272685) is an ongoing randomized trial which compares measures of smoking behavior when smokers have access to a tobacco marketplace with normal nicotine content (NNC; 16 mg nicotine per gram of tobacco) cigarettes to outcomes when smokers have access to a tobacco marketplace where NNC cigarettes are replaced by very low nicotine content (VLNC; 0.4 mg nicotine per gram of tobacco) cigarettes. Participants, who are smokers not currently interested in quitting, go through 3 stages of the trial: In the first stage they are followed for a 2 week baseline period, during which they continue their usual smoking behaviors. In the second stage, they are introduced for 2 weeks to an experimental marketplace where they can exchange vouchers for their usual brand of cigarettes as well as non-combustible tobacco and nicotine products (smokeless tobacco, snus, electronic cigarettes, medicinal nicotine replacement). After the 2-week second stage, participants are randomized to a marketplace that includes either VLNC or NNC cigarettes along with the other tobacco and nicotine products, as in stage 2 of the trial. At the end of the study, they are reimbursed for unspent vouchers. The primary outcomes are the number of cigarettes smoked per day (CPD) and cigarette-free days. As such, this study examines the effect of VLNC cigarettes in a complex marketplace where participants randomized to VLNC can supplement nicotine from alternative sources. This scenario mimics a hypothetical regulatory environment in which the FDA lowers the nicotine content of cigarettes sold in the US to very low levels, under the authority granted to the FDA by the 2009 Family Smoking Prevention and Tobacco Control Act. The ongoing VLNC-CM study will be considered the primary data source.

A pilot study titled Reduced Nicotine Content Cigarettes and Use of Alternative Nicotine Products32 compared 3 experimental conditions: NNC with access to non-cigarette combusted and non-combusted tobacco/nicotine products, VLNC 1: VLNC cigarettes with access to non-cigarette combusted and non-combusted tobacco/nicotine products, or VLNC 2: with access to only non-combusted products. To consider only interventions consistent with the primary study, we will focus only on the NNC and VLNC 1 groups. The inclusion-exclusion criteria and study conduct were similar to those described above for the primary study, with the exception that there was no second phase (introduction to the marketplace). We will consider trial designs that integrate data from the pilot study to improve efficiency.

2 |. METHODS

2.1 |. Notation

We first present MEMs in generality and discuss MEMs for normal and binary outcomes, as well as for linear regression. Assume that the primary study is a two-arm randomized, controlled trial and the secondary sources contain data on either one or both of the trial arms, with the same endpoint as the primary study.

Assume that data are available from the current primary study P and 1, …, H supplemental sources with sample sizes nP, n1, …, nH. Let g ∈ 1, 2 denote the experimental and control arms respectively. Similarly, let θ1 denote the parameter of interest in the experimental trial arm, and θ2 the parameter of interest in the control arm (mean, binomial proportion, etc.). Throughout, we assume that the summary measure of the treatment effect can be expressed as the difference in θ1 and θ2 (i.e., θ = θ1θ2), but other contrasts could be considered. Following the notation from previous literature24, let Y denote the outcome of interest, T the treatment indicator and S the study to which the patient belongs, S ∈ (P, 1, 2 …, H ) and let Dg denote all the data collected in trial arm g and let D denote all the data collected. Finally, let Zh, h = 1, …, H denote an indicator for whether or not the data observed in supplementary study h are exchangeable with the primary study. MEMs enumerate all possible exchangeability patterns, which can be expressed in terms of Zh, h =1, …, H. Each exchangeability pattern is denoted as Ωm = (Z1 = z1,m, Z2 = z2,m, …, ZH = zH,m ), where m =1, …, 2H and where zh,m indicates whether source h is assumed exchangeable with the primary source in pattern Ωm.

2.2 |. Multisource Exchangeability Models

2.2.1 |. MEMs for normally distributed outcomes

We begin with a review of MEMs for a normally distributed response and known variance24. The parameter of interest is the mean of Y in a given arm g, g ∈ 1, 2 of the primary study. In the primary source study arm g, y1,Pg,y2,Pg,,ynP,Pg are independently, identically distributed (i.i.d.) N(μPg,σPg). Similarly, in arm g of the secondary source h, y1,hg,y2,hg,,ynh,hg are i.i.d. N(μhg,σhg). The parameter of interest θg is the mean response μPg for a given trial arm in the primary study. The exchangeability patterns are defined as: if μPgμhg then the arms g of sources P and h are not exchangeable, but if μPg=μhg then the arms g of the sources are exchangeable (two possibilities per source, 2H permutations in total). To model the treatment effect, we fit independent MEMs for the two treatment arms and base our inference on the posterior distribution of θ=μP1μP2, P (θ |D). In other words, the MEMs are applied twice, once in the active and once in the control arm. Each yields a posterior for the parameter of interest within the trial arm, from which we derive the posterior distribution for the treatment effect via MCMC.

Under each assumed exchangeability pattern Ωk, let P(μPg|Ωk,Dg) be the posterior for the parameter of interest in arm g. The posterior distribution for μPg is a mixture, obtained by Bayesian Model Averaging of the sub-distributions P(μPg|Ωk,Dg):

P(μPg|Dg)=k=12HP(μPg|Ωk,Dg)P(Ωk|Dg)=k=12HwkgP(μPg|Ωk,Dg) (1)

where P (Ωk |D g ) are the posterior model weights defined below. For a normally distributed response with known variances and flat prior on μPg, P(μPg|Dg) is:

p(μPg|Dg)=k=12HwkgN(μPg|y¯Pgh=1H{(vhg)zh,k}+i=1H{vPgy¯igvig}j=1K{(vjg)zj,k}vPg[l=1H{zl,kvlg}m=1H{(vmg)zm,k}]+r=1H{(vrg)zr,k],(1vPg+e=1Hze,kveg)1) (2)

where y¯hg is the source h trial arm g sample mean, zh,k is the indicator of whether source h is assumed exchangeable with the primary source in pattern k and vhg=(σhg)2/nh.

The posterior weight of pattern Ωk in arm g is wkg=P(Ωk|Dg)=π(Ωk)P(Dg|Ωk)j=12Hπ(Ωj)P(Dg|Ωj), where π (Ωk ) is the prior probability of pattern Ωk. The prior π (Ωk ) can be expressed as π (Ωk ) = P (Z1 = z1,k ) × · · · × P (ZH = zH,k ). In our manuscript, we let πe = P (Zh = 1) for all h, i.e., we use equal prior probabilities of exchangeability for all sources h = 1, …, H. In the calculation of weights we also require P (Dg |Ωk ) which is given by: P(Dg|Ωk)=P(Dg|μPg,Ωk)π(μPg|Ωk)dμPg. Assuming a flat prior for μPg, π(μPg)1, the marginal distribution p (Dg |Ωk ) is24:

p(Dg|Ωk)=(2π)(H+1)h=1H{zh,k}(1vPg+i=1H{zi,kvig})(j=1H{[1vjg]1zj,k})×exp(12[l=1H{zl,k(y¯Pgy¯lg)2vPg+vlg+vPgvlg(ml{zm,k(vmg)1})+l<rH{zl,kzr,k(y¯lgy¯rg)2vlg+vrg+vlgvrg((vPg)1+el,r{ze,k(veg)1})}}]) (3)

Previous literature24 took an empirical Bayesian approach and plugged the maximum likelihood estimate of vhg into Equations 3 and 2. Below we will introduce an extension that treats σhg as unknown. Similar results can be derived when the outcome is binary and are shown elsewhere33.

MEMs have the desirable property of the consistency of MEM weights, as described in previous literature24, where it is shown in Theorem 3.1 that under normally distributed data with known variance, as nP, …, nH → ∞, wk∗ → 1 for some model k defined by (Z1=z1,k*,Z2=z2,k*,,ZH=zH,k*) and wk → 0 for all kk. In other words, as the primary and supplemental sample sizes approach infinity, the MEMs model weights converge to 1 for a single model and to 0 for all other models.

The parameter πe is a crucial one when using any MEM method, as it can have a substantial impact on the operating characteristics, which can be seen from simulation results presented below. In general, a researcher would select πe using simulations that achieves preferred trial operating characteristics. The discussion of a selection procedure for this parameter is beyond the scope of this paper, but an exploration of this issue is presented in other literature33;28, where optimization criteria for calibrating and evaluating different hyperprior choices are discussed.

2.2.2 |. Extension to Regression

Using the model described above, we can estimate the difference in means (or proportions) between trial arms, but cannot adjust for covariates. Additionally, we assumed known variances. In this section, we describe MEMs for a linear model, which allows us to both control for covariates and to relax the known variances assumption.

Assume we only have one supplemental source, with the same two arms as the primary source. The extension to more than one supplemental source is straightforward. Let X be a collection of covariates. In the primary source, assume y1,P,y2,P,,ynP,P are independently distributed as N(β0,P+Xγ+TβT,P,σP2I) where T is again the treatment indicator. Similarly, in the secondary source, y1,1,y2,1,,yn1,1 are independently distributed as N(β0,1+Xγ+TβT,1,σ12I). In this case, the parameter of interest is the treatment effect in the primary study, i.e., θ = βT,P.

Previously, we evaluated exchangeability separately for each trial arm and calculated the difference between the estimated means of treatment groups in the primary data source. In this case, we borrow directly on the treatment effect. That is, if βT,P = βT,1 we consider the two sources exchangeable. As such, there are only two exchangeability patterns: either the sources are exchangeable or not, which correspond to the following regression equations:

1.Y=(β0,P+TβT,P+Xγ)I(S=P)+(β0,1+TβT,1+Xγ1)I(S=1)+ϵ
2.Y=(β0,P+TβT,P+Xγ)I(S=P)+(β0,1+TβT,P+Xγ1)I(S=1)+ϵ

where ϵN (0, E ). The two regression equations correspond to two linear models. In the first regression model, corresponding to the first exchangeability pattern, the sources are assumed to be non-exchangeable. As such, there is a regression model with an intercept, treatment effect and a covariate X used independently for each of the two data sources. Therefore, the treatment effect βT,P is estimated only using primary data. Notice how the only difference introduced in the second regression model is that the main effect of treatment βT,P is forced equal across the primary and secondary source. As such, in model 2, βT,P is estimated using both primary and supplemental data. All other nuisance parameters are allowed to vary between studies. We assume equal variance of the response Y within a study, but unequal across studies, therefore we assume E to be block diagonal where each block is σl2I,l=P,1.

In this manuscript, we use vague independent normal priors N (0, 1002) on each regression coefficient and vague independent inverse gamma priors on the variance parameters σl2, π(σl2)IG(0.001,0.001),l=P,1. The two models yield different posterior distributions, samples from which can be obtained using standard Markov Chain Monte Carlo (MCMC) software. The final posterior is P(βt,P|D)=k=12wkP(βt,P|Ωk,D) where again wk=P(Ωk|D)=π(Ωk)P(D|Ωk)j=12π(Ωj)P(D|Ωj). An additional challenge that arises in this setting is that there is no analytical solution for P (D |Ωj ). We approximate the posterior weight for model k using the BIC approximation, as described in previous literature34, as wk=π(Ωk)exp(0.5Δk)j=12π(Ωj)exp(0.5Δj), where Δk = BI Ckmin (BI C1, BI C2). An alternative linear model-based MEMs, which can utilize supplemental sources containing information on only one of the trial arms, is outlined in the Supporting Web Materials.

2.3 |. Group-Sequential Design Based on MEMs

In this section, we discuss design considerations for a group sequential trial using MEMs. In a group-sequential design, we complete a hypothesis test at a set of pre-defined time points and terminate the trial early based on a pre-defined stopping rule. In this paper we utilize a flat decision boundary, similar to Pocock boundaries, and evaluate their frequentist operating characteristics. We note, though, that other boundary shapes could also be considered.

2.3.1 |. Decision Boundary Selection

Assume a equally spaced analyses in the primary study, with the final analysis being performed when the maximum sample size in the primary study nPmax is reached. We use the overall type I error level at α =2.5%. We compare our design that allows borrowing to a standard, frequentist design with Pocock stopping boundaries, which assumes a constant critical value ci = c across all analyses i =1, …, a, performs a statistical test for superiority at each analysis, yields a p-value pi, and stops and declares superiority of the experimental arm if pi < c.

MEMs are fit in the Bayesian paradigm; as with any Bayesian method, all inference is based on the posterior distribution (in our case, the posterior distribution of θ). We consider a stopping rule based on the posterior probability of the treatment effect being greater than 0 at analysis i

pi*=P(θ>0|Di)=0P(θ|Di)dθ, (4)

where Di denotes the data collected both in primary and supplementary studies up to the analysis i and P (θ |Di ) denotes the posterior distribution of the treatment effect after observing data Di. Our Pocock-style boundary is: if pi*>c* at any analysis i =1, …, a, stop and declare superiority of the treatment arm. Otherwise, continue to the next analysis or (if final analysis was reached) terminate the trial and fail to conclude superiority for the treatment arm.

Although MEMs are implemented in the Bayesian paradigm, we wish to select the boundary value c which controls the frequentist overall type I error rate at the desired α =2.5% level. We define overall type I error rate α as:

α=P(p1*>c*|θ1=θ2)+P(p2*>c*|p1*c*,θ1=θ2)++P(pa*>c*|p1*c*,,pa1*c*,θ1=θ2)

The selection of c is not straightforward in the Bayesian setting. Bayesian designs suffer from the same issue of testing multiplicity when frequentist operating characteristics are considered35;36. The multiplicity arises due to the fact that the posterior distribution is paired with a decision rule and the decision rule is evaluated multiple times.

The boundary value c that controls the overall type I error rate is selected using a modification of the theoretical approach of Shi and Yin35. As described in Shi and Yin35, there is an asymptotic connection between c and c. First, notice that pi* is the posterior probability of a half space. A half space H is a set satisfying H={δ:aTδb}, where δ ∈ Δ, Δ is an open subset of d, ad, and b is a scalar. As such, pi* is the posterior probability of a half space defined by {(θ1, θ2) : θ1 > θ2}. Dudley and Haughton37 studied the asymptotic normality of half spaces. Using Theorem 1 from Shi and Yin35, under the 7 regularity assumptions described in Dudley and Haughton37, we have:

Theorem 1

When no borrowing is allowed, for cumulative sample sizes nPmaxa,nPmax, as nPmax the joint statistics {Φ1(p1*),,Φ1(pa*)} converge in distribution to (SnPmax/a,,SnPmax), which converges in distribution to a multivariate normal distribution of the form described in Jennison and Turnbull36, Chapter 11.2.

where Φ() is the standard normal cumulative distribution function and Sn,P is the signed root likelihood ratio statistic based on sample size of nP. In other words, the posterior probability of H at each interim analysis converges in distribution to a normal distribution as nPmax when no borrowing from supplemental sources takes place. The proof of Theorem 1 follows from Shi and Yin35. This theorem only concerns the selection of the decision boundary c under no borrowing. As shown in later results and in previous literature24;38, if borrowing takes place under non-exchangeability, this can lead to bias an inflated type I error rate. However, we believe that c should be determined assuming no borrowing. This approach may actually lead to conservative analyses, as borrowing methods often reduce type I error rates below α in a “global” null scenario, i.e., where both the primary and supplemental studies are generated under the null hypothesis39. The “global” null setting is usually only relevant in a concurrent studies setting. When using historical data, borrowing often only takes place if the historical data yielded positive results. Calculating c assuming no borrowing controls type I error at α in a “local null” situation, where historical data are generated under the alternative hypothesis, primary data are generated under the null hypothesis, and no borrowing takes place. However, calibration of c across all null scenarios may be desirable, and may yield more attractive type I error / power tradeoff.

Using Theorem 1, frequentist group sequential boundaries can be translated to the posterior probability scale to approximately control the overall type I error rate for a Bayesian design. To take advantage of asymptotic normality, we can set c =1 − c as seen in previous literature35. Such an approach would yield decision boundaries that control type I error asymptotically. Otherwise, one could use simulations to select decision boundaries that control type I error in finite samples. In our manuscript, we use the simulation based approach to select our decision boundaries. However, when constructing the simulations to investigate which decision boundary achieves the desired type I error, we construct a grid search over values near the asymptotic boundary. Such an approach can cut down on computation time.

2.3.2 |. Power and Sample Size Calculations

We consider a power and sample size calculation to facilitate the planning of the primary trial utilizing our method. The researcher has two options for planning the primary trial. Either she can assume the worst-case scenario (where the secondary data will be clearly non-exchangeable with the primary data and no borrowing will take place) and simply perform a traditional power / sample size calculation to obtain the sample size required to achieve the desired power. Such an approach is conservative and may result in over-powering, but early stopping due to group sequential testing would likely partially offset this disadvantage. Alternatively, the researcher can assume a certain amount of similarity between sources and perform a power / sample size calculation specific to our method. This would be appropriate in settings where the sample size required to achieve the desired power with no borrowing is unrealistic (i.e. rare diseases), in which case borrowing is required to achieve adequate power regardless of sequential testing.

For a fixed-sample trial, we can perform power and sample size calculation for normal outcomes analytically by using the results in Equations 2 and 3. Given sample sizes nP and n1, let y¯1g,g1,2 be the secondary study response means in the treatment and control group, with response standard deviation σ1. Next, assume control group mean in the primary study y¯P2 and response standard deviation σP. Then, use a numerical search to find the critical value (denoted as cr ) of the decision rule based on Equations 2 and 3 in terms of y¯P1 (there is no closed form solution for cr ). To calculate power to detect treatment effect ω in the primary study, we note that under the alternative hypothesis, the treatment effect in the primary study is distributed as N(ω,2σp2/np). From that distribution, we can calculate the probability of concluding superiority as 1Φ(ω(cry¯P2)2σp/np).

The maximum sample size of a group sequential design must be inflated relative to a fixed-sample design to preserve power (due to multiple testing). Typically, we would determine the maximum sample size by simulations, but below we present a quick approximation of the inflation factor, which can be computed by using the asymptotic connection between frequentist and Bayesian methods in Theorem 1.

The calculation of some inflation factor s that preserves power in a frequentist group-sequential design is wellknown36. An implication of Theorem 1 is that assuming a sample size n achieves desired power in a fixed-sample setting for a Bayesian method which satisfies the conditions in Theorem 1, a maximum group-sequential sample size of nmax = s × n achieves approximately the same overall power in a group-sequential setting. However, Theorem 1 only applies when no borrowing is allowed.

To consider a power calculation for a group sequential design using MEMs, combine Theorems 1 and Theorem 3.1 in Kaizer et al.24, assuming supplemental studies are already concluded, to get:

Theorem 2

For cumulative sample sizes nPmaxa+n1+nH,,nPmax++nH, as nPmax,,nH, the joint statistics {Φ1(p1*),,Φ1(pa*)} from MEMs converge in distribution to (Sns,1,,Sns), which converge in distribution to a multivariate normal distribution of the form described in Jennison and Turnbull36, Chapter 11.2.

where the quantities pi* are posterior probabilities of H={(θ1,θ2):θ1>θ2}, ns,1=nPmaxa+h=1Hnhzh,k* and ns=nPmax+h=1Hnhzh,k*. Theorem 2 follows from Theorem 1, Theorem 3.1 in Kaizer et al.24 and Slutsky’s theorem. Theorem 3.1 in in Kaizer et al.24 shows the MEM weights converge to 1 for a particular model k and to 0 for all other models. As such, MEMs asymptotically pool source h with the primary study if zh,k*=1 and ignore source h if zh,k*=0. Combining this fact with Theorem 1, it follows that the joint statistics {Φ1(p1*),,Φ1(pa*)} converge in distribution to multivariate normal distribution.

The power to detect treatment effect ω is:

1β=E(i=1aI(pi*>c*)|θ=ω)=E(I(p1*>c*)|θ=ω)+i=2aE(i=1i1I(pi*c*)I(pi*>c*)|θ=ω))=P(p1*>c*|θ=ω)+P(p1*c*|θ=ω)P(p2*>c*|p1*c*,θ=ω)++P(p1*c*|θ=ω)××P(pa1*c*|p1*c*,,pa2*c*,θ=ω)×P(pa*>c*|p1*c*,,pa1*c*,θ=ω) (5)

where I () is the indicator function. Each of the quantities in Equation 5 can be calculated by integrating a multivariate normal distribution36. For more detail on group sequential designs and boundary calculations, see also the work by Yin40. To take advantage of Theorem 2 and the implication of Theorem 1 discussed above, one can calculate the sample sizes required to achieve the desired power using MEMs for a fixed-sample design, and then use an inflation factor s from standard frequentist software, for example the R package gsDesign41. However, this approach only preserves power approximately.

2.3.3 |. Truncated MEMs

Information (or information fraction)36 is an important quantity in the group-sequential design literature. In a standard design, there is a direct relationship between information growth and the proportion of data observed: if for example 50% of the data are collected, the information fraction is 0.5. When borrowing is allowed, information growth is more complicated because some information comes from supplemental sources and the extent of borrowing changes throughout the course of the trial. Above we made the common assumption of equally spaced analyses. This assumption is sufficient in the absence of borrowing, but linear information accrual can not be assumed when implementing dynamic borrowing (or indeed even monotone information accrual). For example, if our method borrows heavily at the first analysis but borrows substantially less at the second analysis, the amount of information available at second interim analysis could actually be lower than the information at first analysis even though the total number of participants in the primary trial increases from first to second interim analysis.

In this section, we introduce a modified approach that constrains the maximum amount of borrowing allowed during interim analyses. While the prior source inclusion parameter π (Ωk ) can be thought of as a tuning parameter that can be adjusted to achieve a desired tradeoff between efficiency and bias, it affects all interim analyses equally and changing π (Ωk ) is not sufficient to address the aforementioned information issue.

To address this issue, one can utilize a constraint based on effective supplementary sample size (E SSS), which extends the concept of effective sample size42 to the context of dynamic borrowing. Consistent with previous literature14;24, define ESSSi,kg at analysis i for a treatment arm g and some model k as ESSSi,kg=nP,ig(PkgP1g1) where P1g is the precision of posterior for θg in the no-borrowing model, Pkg is the equivalent quantity for a model k that is the precision of posterior for θg allows some borrowing at analysis i, and nP,ig is the sample size in the primary study arm g at analysis i. E SSS can be interpreted as additional primary study sample size that would be needed to achieve the same precision achieved by borrowing.

At analysis i, ESSSig=np,igkwkg(PkgP1g1)), k =1, …, 2H 24 where wkg are again model weights as above, P1g is the posterior precision of the no-borrowing model, i.e., one of the 2H models which utilizes primary data only.

An extension to MEMs, which we refer to as truncated MEMs, constrains the maximum amount of borrowing at each interim analyses, is implemented as follows:

  1. If ESSSigζi, where ζi is some maximum amount of borrowing at analysis i, continue to next stage as planned and leave the method unmodified. Otherwise continue to the next steps.

  2. Let s=ζi/ESSSig be a shrinkage factor.

  3. Update the posterior weight of the no-borrowing model, w1g to w1g=w1g+(1s)(1w1g) where w1g is the new weight of no-borrowing model in arm g, w1g is the original posterior weight as calculated by MEMs.

  4. For all models k =2, …, 2H, all of which utilize some amount of borrowing, let wkg=swkg where wkg and wkg are as defined in previous step.

As such, we shrink the weights of all models that allow for borrowing and inflate the weight of the no-borrowing model. This retains the relative weight among the models that allow borrowing while increasing the weight given to the model with no borrowing to cap the E SSS. Truncated MEMs introduce a second tuning parameter ζi. This new tuning parameter is useful in the sense that it allows the trial operating characteristics over time to be more detached from the global tuning parameter πe (which does not change over time). One would traditionally calibrate πe to achieve desired trial operating characteristics, which may be difficult if the aim is to modify the operating characteristics over time. When implementing truncated MEMs to control the information behavior, a natural choice is ζi=np,i+1g for all i =1, …, a − 1 where np,i+1g is the number of patients who will be enrolled in the next stage of the primary study in a given study arm g. In other words, if in the next stage of the trial we are about to enroll 25 participants, but the ESSS would have been 50 without a constraint, the proposed method would perform less aggressive borrowing and scale ESSS down to 25. This is the constraint we will use throughout this manuscript. No constraint will be imposed at the final analysis as there is no concern about decreasing information in the future. The parameter ζi could also be set to other values, or it could be set to vary over time. Our choice was motivated by controlling the behavior of the information fraction, but ζi could be used for other purposes. Briefly, the choice of ζi could be impacted by the overall size of the trial, frequency and timing of interim analyses or even the amount of information collected in the trial to date. Operating characteristics of those different choices would then be explored using simulations. A thorough investigation of this tuning parameter is outside the scope of this manuscript and would be a great topic for future work.

Another reason to utilize the truncated MEMs method is that especially if a is large, at early analyses there is very limited amount of data, which can lead to considerable uncertainty as to whether sources are exchangeable. A possible consequence may be that MEMs borrow too much data too early. Tuning πe again does not address that issue, but truncated MEMs can be used to achieve the desired trial operating characteristics. However, it is important to note that when utilizing the truncated MEMs method in a group-sequential setting, Theorem 2 no longer applies, and the asymptotic power and sample size calculations introduced in Section 2.3.2 cannot be used. In this setting, power must be explored using simulations.

3 |. APPLICATION

In this section, we will consider various possible scenarios for the VLNC-CM trial via simulations and explore the operating characteristics of the proposed design. As described above, we will use the pilot study as the supplemental source. The pilot study is already concluded, and therefore data from the pilot study are fixed. The primary outcome CPD is assumed to be normally distributed, and the linear-model based MEM described in Section 2.2.2 will be used. In the pilot study, the mean CPD in the treatment and control groups were 11.9 and 19.2 (treatment effect of 7.3), with standard deviations of 7.4 and 8 respectively. The sample sizes are 53 in the treatment group and 27 in the control. The primary study will be simulated under 6 different data generating scenarios, where the primary study treatment effect is varied.

There are a = 4 equally spaced analyses assumed in the primary study. For a = 4 and a target overall type I error rate of 0.025, the frequentist Pocock boundary uses a critical value c =0.0091 and our method uses boundary c =0.9909, which was selected using the approach outlined above. We set the primary study maximum sample size to 320, 4 times larger than the secondary study. The primary study has a randomization ratio of 1:1. We present results for a group sequential design that allows dynamic borrowing with a prior probability of exchangeability of πe =0.05 and another result using πe =0.1. Although these choices of πe seem overly conservative, they provide an attractive tradeoff between efficiency due to borrowing and bias. The results for constrained MEMs are presented for higher values of πe of 0.2 and 0.5, as for low πe the constrained and unconstrained results are very similar. A choice of higher πe coupled with the MEM constraint introduced above could offer a different benefit-cost tradeoff: on one hand the method is a priori more optimistic towards borrowing, but this is counteracted by the truncation.

Table 1 shows the power, type I error rate, expected sample size and the distribution of stopping times in the primary study for the 5 methods considered over 10,000 simulations.

TABLE 1.

Simulation of VLNC-CM trial using pilot study as supplemental source.

Method Primary study treatment effect

7.3 5 4 3 2 0

Power or type I error Pocock >0.99 0.984 0.909 0.677 0.352 0.025
MEMs(0.05) >0.99 0.988 0.914 0.682 0.371 0.027
MEMs(0.1) >0.99 0.988 0.917 0.703 0.380 0.034
MEMs*(0.2) >0.99 0.989 0.927 0.721 0.406 0.039
MEMs*(0.5) >0.99 >0.99 0.943 0.777 0.466 0.060

Expected sample size Pocock 97.6 148.0 188.0 237.6 280.8 316.0
MEMs(0.05) 93.6 139.2 179.2 232.0 275.2 315.2
MEMs(0.1) 92.0 133.6 172.8 225.6 272.0 314.4
MEMs*(0.2) 90.4 128.8 167.2 217.6 265.6 313.6
MEMs*(0.5) 89.6 124.0 158.4 204.0 255.2 309.6

P(Stop at 1st analysis) Pocock 0.79 0.43 0.26 0.15 0.07 0.01
MEMs(0.05) 0.84 0.49 0.32 0.18 0.09 0.01
MEMs(0.1) 0.86 0.53 0.36 0.21 0.11 0.02
MEMs*(0.2) 0.87 0.57 0.39 0.24 0.13 0.02
MEMs*(0.5) 0.88 0.57 0.40 0.26 0.14 0.03

P(Stop at 2nd analysis) Pocock 0.20 0.36 0.33 0.20 0.09 0.01
MEMs(0.05) 0.15 0.33 0.30 0.20 0.09 0.01
MEMs(0.1) 0.13 0.30 0.29 0.20 0.10 0.01
MEMs*(0.2) 0.12 0.29 0.29 0.21 0.11 0.01
MEMs*(0.5) 0.12 0.33 0.34 0.27 0.15 0.02

P(Stop at 3rd analysis) Pocock 0.01 0.15 0.21 0.18 0.09 0.01
MEMs(0.05) 0.01 0.13 0.19 0.17 0.10 0.01
MEMs(0.1) 0.01 0.12 0.18 0.17 0.09 0.01
MEMs*(0.2) 0.01 0.10 0.16 0.16 0.09 0.01
MEMs*(0.5) <0.01 0.07 0.13 0.15 0.10 0.01

P(Stop at 4th analysis) Pocock <0.01 0.06 0.20 0.47 0.75 0.97
MEMs(0.05) <0.01 0.05 0.19 0.45 0.72 0.97
MEMs(0.1) <0.01 0.05 0.17 0.42 0.70 0.96
MEMs*(0.2) <0.01 0.04 0.16 0.39 0.67 0.96
MEMs*(0.5) <0.01 0.03 0.13 0.32 0.61 0.94

MEMs(x) denotes MEMs with x prior probability of exchangeability. Asterisk denotes truncated MEMs, ζi = 40 for both arms at each interim.

We see that with a prior probability of exchangeability of 0.05, we can control the overall type I error rate very close to the desired level of 2.5% while still realizing improvements in terms of power and expected sample size under the alternative hypothesis. Our method has a higher chance to stop the trial early and also attains improved power to detect a treatment effect. The difference is clearest when considering the probability of stopping the trial at the first interim analysis, where the trial has a considerably higher chance of stopping when allowing for dynamic borrowing (for example, 79% vs. 84% probability of declaring superiority at first analysis under treatment effect of 7.3). The earlier stopping also results in smaller expected sample size. The design that allows dynamic borrowing is able to realize those improvements under wide variety of treatment effects, even when the treatment effect is smaller than the expected treatment effect of approximately 7. For example, under a treatment effect of only 2, our method improves power by approximately 2%, while also stopping the trial earlier on average. Larger gains are observed if we use a less conservative prior probability of exchangeability, but this results in a inflated type I error rate. The constrained MEM results also demonstrate this trend.

A natural question to ask is which of the benefits (increase in power, decrease in expected sample size) of borrowing presented in Table 1 are due to type I error inflation and what benefits there are due to increased efficiency and sample size. We explore this distinction in Figure 1 where we use the same 4 MEMs-based methods as above, but also introduce a calibrated traditional no-borrowing analysis, which is calibrated to achieve the type I error presented in Table 1 corresponding to a matched MEMs method. This allows us to separate which benefits are due to type I error inflation and which benefits are achieved through other ways. In Figure 1 we present power,expected sample size and mean square error (MSE) as a function of the treatment effect size. It is important to note that the MSE is simply the MSE of a final point estimate yielded at whichever analysis the study stops. In each plot we present a given MEMs method, a corresponding calibrated traditional method and also an uncalibrated traditional method, which achieves the nominal type I error rate of 0.025. As we can see in Figure 1, the MEMs methods generally achieve virtually identical power and expected sample size curves as the corresponding calibrated traditional method. As such, improvements in these metrics seen in Table 1 were mainly due to type I error inflation. We believe this is due to the fact that the supplemental source consists only of 80 patients randomized in a 2:1 ratio with a large treatment effect of 7.3, while the simulated primary study is much larger. The studies are truly exchangeable only when treatment effect in the primary study is also at or near 7.3, which is where data borrowing is most appropriate, but at this point the study is already over-powered and it is difficult to yield improvements. Traditionally, the primary study would be smaller than the secondary study and not over-powered at the point of exchangeability; this setting is explored in simulations below.

FIGURE 1.

FIGURE 1

Simulation of VLNC-CM trial using pilot study as supplemental source. MEMs(x) denotes MEMs method with x prior probability of exchangeability. Asterisk denotes truncated MEMs with ζi =25 for both arms at each interim analysis. Traditional (calib.) denotes a traditional no-borrowing analysis that is calibrated to achieve the same type I error rate as the corresponding MEM-based method. MSE=mean square error. Expected sample size denotes the expected sample size in the primary trial. In all plots the grey dashed line denotes the traditional method that controls type I error at the nominal rate of 0.025. Solid black vertical line denotes the point where sources are exchangeable

As discussed above, the choice of πe is quite important and throughout our manuscript, we considered πe =0.05 as our preferred choice, as it often provided a good trade-off between the benefits and drawbacks of the proposed method. However, the optimal choice can be quite subjective, which is why we present a variety of values for πe.

4 |. CONCURRENT SUPPLEMENTAL SOURCES SIMULATIONS

In Section 3, we considered the case where the supplemental trial was completed before the primary study began. Alternately, we could consider a scenario where the two trials are run concurrently. It is common practice to incorporate smaller substudies into larger trials as a means to answer additional questions4, for instance in pediatric trial settings, which can be built-into an adult trial. As an example, consider the Phase III trials of Besivance, a drug aimed at treatment of bacterial conjunctivitis43;44. In both trials, patients 1 year or older were enrolled, which resulted in a very wide patient age range (approximately 1 to 90 years old). Consider the problem of seeking regulatory approval for the pediatric population. Traditionally, this approval would be based on pediatric data, but the sample size may be small. In addition, it seems reasonable to assume that the drug may work similarly in adults. One could also consider different age groups as separate sources (e.g., adolescents, adults, seniors), which naturally calls for a method like MEMs, as it can account for the fact that adolescents are more likely to be exchangeable with children than adults or seniors. In a trial such as this, it would be possible to implement a group-sequential design where the pediatric population is considered the primary source for the purposes of pediatric regulatory approval while other age groups are supplemental. In that case, the sources would enroll patients concurrently. Additionally, the primary study will often be smaller than the secondary source. In the following simulations, we explore this scenario.

There is another compelling reason to share infrastructure between the primary and secondary studies. Whenever we incorporate supplemental data, we should be concerned about inter-trial effects45;24 which could manifest themselves as differences in standards of care, event definitions, protocols etc. Many of these concerns are resolved when we allow the primary and secondary sources to be a part of one larger study, as recruitment happens in the same places and at the same time, the exclusion and inclusion criteria, study protocols, event definitions are all likely very similar. Furthermore, a similar approach has recently been proposed in order to expedite the generation of child-specific evidence46.

We again assume a = 4 equally spaced analyses and only one supplemental two-arm trial with concurrent enrollment with the primary study. We use the same critical values for early termination as were used in Section 3. The maximum sample sizes are 200 in the primary study and 400 in the secondary study. The simulations explore type I error, power and distribution of stopping times. Our simulations assume that the mean for the control group is the same for the two studies and we vary the treatment effect. We consider 4 scenarios: 1) one where there is an equal treatment effect in the primary and secondary data sources (where the MEMs method stands to gain the most, as sources truly are exchangeable), 2) a scenario where there is no treatment effect in the primary study, but the treatment effect in the secondary study is the same as in scenario 1, 3) a scenario where there is a small effect in the supplemental source, but no effect in the primary source, and 4) a scenario where there is no treatment effect in either study (global null setting). In scenario 2 the sources should be noticeably different, with low amount of data borrowing happening, but also there is a considerable potential for bias when data borrowing does happen. In scenario 3 there is no treatment effect in the primary study, but a small treatment effect in the secondary study, which could make it more difficult for MEMs to recognize that the sources are different. As such, we would expect more data borrowing to happen between sources, but the potential for bias is lower than in scenario 2. In scenario 4 the sources are again in truth exchangeable. Therefore, scenario 1 is investigating power, while scenarios 2, 3 and 4 are investigating the type I error rate, with scenarios 2 and 3 exploring type I error under a “local null” setting, while scenario 4 is under a “global null” setting. All the simulations are done with 10,000 replications. We consider the same 5 methods as above (no borrowing, two unconstrained MEMs, two truncated MEMs).

4.1 |. Normal Response

In this simulation, the response Y in each trial arm is normally distributed. The treatment effect is the difference in means of Y between treatment and control arms. In the first scenario, the control mean of Y is 5 and the treatment mean is 6 in the primary study, therefore there is a true treatment effect of 1. The parameters are the same in the secondary study. In the second scenario, the mean of Y is 5 in both arms of the primary study (i.e., there is no treatment effect), but the treatment effect equals 1 in the secondary study. The third scenario is the same as the second, except that the treatment effect in the secondary study is 0.5 instead of 1. In all scenarios, the standard deviations of the outcome Y are 3 and 4 in the primary and secondary studies respectively.

In scenario 1 in Table 2, the group sequential design using MEMs with πe =0.05 achieves a considerable gain in power over a frequentist design with Pocock boundaries (approximately 56% vs. 63%) along with an expected sample size that is slightly lower than the frequentist method. Our design stops more often for superiority at all the interim analyses and reaches the maximum sample size less often than the model that does not allow borrowing. There is a slight type I error rate inflation in scenario 2 which gives us an idea of the price one pays for the considerable benefits demonstrated in scenario 1. In scenario 3, the methods perform similarly. In scenario 4 we see the borrowing methods can lead to a rather considerable reduction in type I error rate under the “global null” setting. The constrained MEMs demonstrate a different yet interesting benefit-cost tradeoff, with much larger power gain and smaller savings in terms of expected sample size under the alternative. Under the “local” null, type I error inflation is now slightly larger using the constrained MEMs, but lower under the “global” null.

TABLE 2.

Simulation results: Normal response.

Method Scenario 1 (1,1) Scenario 2 (0,1) Scenario 3 (0,.5) Scenario 4 (0,0)

Power or type I error Pocock 0.563 0.027 0.027 0.026
MEMs(0.05) 0.635 0.030 0.027 0.023
MEMs(0.1) 0.675 0.036 0.029 0.022
MEMs*(0.2) 0.717 0.040 0.030 0.019
MEMs*(0.5) 0.793 0.058 0.040 0.017

Expected sample size Pocock 159.5 197.5 197.5 197.5
MEMs(0.05) 154.5 197.5 197.5 198.0
MEMs(0.1) 152.0 197.0 197.5 198.0
MEMs*(0.2) 154.0 197.5 197.5 198.0
MEMs*(0.5) 153.5 197.0 197.5 198.5

P(Stop at 1st analysis) Pocock 0.11 0.01 0.01 0.01
MEMs(0.05) 0.13 0.01 0.01 0.01
MEMs(0.1) 0.13 0.01 0.01 0.01
MEMs*(0.2) 0.14 0.01 0.01 0.01
MEMs*(0.5) 0.15 0.01 0.01 0.01

P(Stop at 2nd analysis) Pocock 0.16 0.01 0.01 0.01
MEMs(0.05) 0.17 0.01 0.01 0.01
MEMs(0.1) 0.19 0.01 0.01 0.01
MEMs*(0.2) 0.18 0.01 0.01 0.01
MEMs*(0.5) 0.16 0.01 0.01 0.00

P(Stop at 3rd analysis) Pocock 0.16 0.01 0.01 0.01
MEMs(0.05) 0.18 0.01 0.01 0.00
MEMs(0.1) 0.19 0.01 0.01 0.00
MEMs*(0.2) 0.16 0.01 0.01 0.00
MEMs*(0.5) 0.16 0.01 0.01 0.00

P(Stop at 4th analysis) Pocock 0.57 0.97 0.97 0.97
MEMs(0.05) 0.52 0.97 0.97 0.98
MEMs(0.1) 0.49 0.97 0.97 0.98
MEMs*(0.2) 0.52 0.97 0.97 0.98
MEMs*(0.5) 0.53 0.97 0.97 0.99

True treatment effects for primary, secondary study are in parentheses for each scenario. Asterisk denotes truncated MEMs with ζi = 25 for both arms at each interim analysis.

We again produce a figure comparing the MEMs methods, a matched calibrated traditional method and an uncalibrated traditional method in terms of power, expected sample size and MSE in Figure 2. In this Figure, we see that the MEMs methods yield some improvement in power even over the traditional methods calibrated to achieve the same type I error. The MEMs methods also achieve generally the same or higher expected sample size, along with a generally lower MSE of the point estimate. Therefore, unlike the above results in Figure 1, in this more common setting we now demonstrate advantages even over the calibrated traditional method.

FIGURE 2.

FIGURE 2

Simulation results: Normal response. MEMs(x) denotes MEMs method with x prior probability of exchangeability. Asterisk denotes truncated MEMs with ζi =25 for both arms at each interim analysis. Traditional (calib.) denotes a traditional no-borrowing analysis that is calibrated to achieve the same type I error rate as the corresponding MEM-based method. MSE=mean square error. In all plots the grey dashed line denotes the traditional method that controls type I error at the nominal rate of 0.025. Solid black vertical line denotes the point where sources are exchangeable

4.2 |. Binary Response

In this simulation, the response Y in each trial arm is Bernoulli distributed. The treatment effect is the difference in probabilities of Y =1 between treatment and control arms. In the first scenario, the control probability of success is 0.4 and the treatment mean is 0.6 in the primary study, therefore there is a true treatment effect of 0.2. The parameters are the same in the secondary study. In the second scenario, the probability of success is 0.4 in both treatment and control arms in the primary study (i.e., there is no treatment effect), but in the secondary study the treatment effect equals 0.2. Lastly, the third scenario is the same as the second one, except there is a smaller treatment effect in the secondary study of 0.1. The prior used for the proportion of successes is beta(1, 1), i.e., a uniform prior.

In scenario 1 in Table 3, the group-sequential design using MEMs has a noticeable gain in power over a frequentist design with Pocock boundaries. Furthermore, the expected sample size is lower under dynamic borrowing. Our design again stops more often for superiority at all the interim analyses and reaches the maximum sample size much less often than a model that does not allow borrowing. In these simulation results, the MEMs with πe =0.05 and πe =0.1 seem to have an advantageous benefit/cost tradeoff. Using this method there would likely be a considerable saving in terms of the length of the trial and power. The constrained MEMs are also an appealing option with a similar tradeoff vs unconstrained MEMs as seen above (smaller gain in expected sample size, but larger gain in power). The results in scenarios 2 and 3 are similar and mostly suggest little type I error rate inflation. The results in scenario 4 show some reduction in type I error rate under the “global” null scenario.

TABLE 3.

Simulation results: Binary response.

Method Scenario 1 (.2,.2) Scenario 2 (0,.2) Scenario 3 (0,.1) Scenario 4 (0,0)

Power or type I error Pocock 0.738 0.025 0.025 0.025
MEMs(0.05) 0.766 0.027 0.025 0.023
MEMs(0.1) 0.800 0.028 0.026 0.022
MEMs*(0.2) 0.829 0.027 0.028 0.019
MEMs*(0.5) 0.899 0.032 0.034 0.017

Expected sample size Pocock 143.0 197.5 197.5 197.5
MEMs(0.05) 141.5 197.5 198.0 198.0
MEMs(0.1) 136.5 197.5 198.0 198.0
MEMs*(0.2) 139.0 197.5 198.0 198.0
MEMs*(0.5) 139.0 197.5 198.0 198.5

P(Stop at 1st analysis) Pocock 0.15 0.01 0.01 0.01
MEMs(0.05) 0.16 0.01 0.01 0.01
MEMs(0.1) 0.18 0.01 0.01 0.01
MEMs*(0.2) 0.19 0.01 0.01 0.01
MEMs*(0.5) 0.18 0.01 0.01 0.01

P(Stop at 2nd analysis) Pocock 0.25 0.01 0.01 0.01
MEMs(0.05) 0.24 0.01 0.01 0.01
MEMs(0.1) 0.27 0.01 0.01 0.01
MEMs*(0.2) 0.23 0.01 0.01 0.01
MEMs*(0.5) 0.23 0.01 0.01 0.00

P(Stop at 3rd analysis) Pocock 0.18 0.01 0.01 0.01
MEMs(0.05) 0.21 0.01 0.01 0.00
MEMs(0.1) 0.21 0.01 0.01 0.00
MEMs*(0.2) 0.20 0.01 0.01 0.00
MEMs*(0.5) 0.21 0.01 0.01 0.00

P(Stop at 4th analysis) Pocock 0.42 0.97 0.97 0.97
MEMs(0.05) 0.39 0.97 0.97 0.98
MEMs(0.1) 0.34 0.97 0.97 0.98
MEMs*(0.2) 0.38 0.97 0.97 0.98
MEMs*(0.5) 0.38 0.97 0.97 0.99

True treatment effects for primary, secondary study are in parentheses for each scenario. Asterisk denotes truncated MEMs with ζi = 25 for both arms at each interim analysis.

We also produce a figure comparing the MEMs methods, a matched calibrated traditional method and an uncalibrated traditional method in terms of power, expected sample size and MSE in Figure 3. In this Figure, we see that the MEMs methods again yield encouraging results with possible large improvement in power even over the traditional methods calibrated to achieve the same type I error, along with generally the same or higher expected sample size and considerably lower MSE of the point estimate close to the point of exchangeability. We also note the impact of borrowing on MSE can vary depending on how close to the point of exchangeability the two studies are, which is a behavior consistent with results in previous literature24;38.

FIGURE 3.

FIGURE 3

Simulation results: binary response. MEMs(x) denotes MEMs method with x prior probability of exchangeability. Asterisk denotes truncated MEMs with ζi =25 for both arms at each interim analysis. Traditional (calib.) denotes a traditional no-borrowing analysis that is calibrated to achieve the same type I error rate as the corresponding MEM-based method. MSE=mean square error. In all plots the grey dashed line denotes the traditional method that controls type I error at the nominal rate of 0.025. Solid black vertical line denotes the point where sources are exchangeable

4.3 |. Linear Model Based MEMs

In this simulation, the response Y in each trial arm is again normally distributed, but we are interested in estimating a treatment effect as difference in means of Y between treatment and control groups after controlling for one important (continuous) predictor. We present 4 scenarios similar to those used above.

Simulation parameters described in Section 2.2.2 are set to: β0,P = β0,1 =2, γ = (0.8), σP2=52, σ12=42 and the covariate X is distributed as N (14, 22) in the primary source and N (12, 32) in the supplemental source. The regression parameter βT,P is varied based on simulation scenario. The proportion of the variance in Y explained by the covariates is approximately 26%. In scenario 1, there is a treatment effect of 3 in both sources, in scenario 2, the treatment effects are 0 and 3 in the primary and secondary sources respectively, and in scenario 3, the treatment effects are 0 and 2 in the primary and secondary sources respectively.

As we can see from Table 4, we gain a large amount of power in scenario 1, with considerably lower expected sample size for all MEMs. In scenario 2, the type I error inflation of our method is more considerable for almost all MEMs methods presented, which means that the large benefits in scenario 1 come at a considerable price. This price could be diminished by tuning the MEM hyperparameters, but the benefits would be diminished as well. In scenario 3 we also see considerable type I error inflation. The unconstrained MEMs with πe = 0.05 seem to offer the most advantageous benefit-cost tradeoff overall. Truncated MEMs could again be an interesting option, with a different benefit-cost tradeoff. However, we can see that the rather large increases in “local” type I error rates in scenarios 2 and 3 go hand in hand with similarly noticeable reductions in type I error rates under the “global” null setting in scenario 4 when data borrowing is used.

TABLE 4.

Simulation results: regression-based MEMs.

Method Scenario 1 (2, 2) Scenario 2 (0, 2) Scenario 3 (0, 1) Scenario 4 (0, 0)

Power or type I error Pocock 0.708 0.026 0.026 0.026
MEMs(0.05) 0.810 0.043 0.038 0.017
MEMs(0.1) 0.852 0.065 0.050 0.015
MEMs*(0.2) 0.875 0.060 0.056 0.010
MEMs*(0.5) 0.937 0.109 0.115 0.010

Expected sample size Pocock 146.0 197.5 197.5 197.5
MEMs(0.05) 134.5 196.0 197.0 198.5
MEMs(0.1) 126.5 194.5 196.5 198.5
MEMs*(0.2) 138.0 196.5 197.5 199.0
MEMs*(0.5) 138.0 196.5 197.5 199.0

P(Stop at 1st analysis) Pocock 0.15 0.01 0.01 0.01
MEMs(0.05) 0.19 0.01 0.01 0.01
MEMs(0.1) 0.21 0.02 0.01 0.01
MEMs*(0.2) 0.18 0.01 0.01 0.01
MEMs*(0.5) 0.18 0.01 0.01 0.01

P(Stop at 2nd analysis) Pocock 0.22 0.01 0.01 0.01
MEMs(0.05) 0.27 0.01 0.01 0.01
MEMs(0.1) 0.31 0.02 0.01 0.01
MEMs*(0.2) 0.25 0.01 0.01 0.00
MEMs*(0.5) 0.25 0.02 0.01 0.00

P(Stop at 3rd analysis) Pocock 0.19 0.01 0.01 0.01
MEMs(0.05) 0.21 0.01 0.01 0.00
MEMs(0.1) 0.21 0.01 0.01 0.00
MEMs*(0.2) 0.20 0.01 0.01 0.00
MEMs*(0.5) 0.20 0.02 0.01 0.00

P(Stop at 4th analysis) Pocock 0.44 0.97 0.97 0.97
MEMs(0.05) 0.33 0.97 0.97 0.98
MEMs(0.1) 0.27 0.95 0.97 0.98
MEMs*(0.2) 0.37 0.97 0.97 0.99
MEMs*(0.5) 0.37 0.97 0.97 0.99

True treatment effects for primary, secondary study are in parentheses for each scenario. Asterisk denotes truncated MEMs with ζi = 25 for both arms at each interim analysis.

Lastly, we again produce a figure comparing the MEMs methods, a matched calibrated traditional method and an uncalibrated traditional method in terms of power, expected sample size and MSE in Figure 4. In this Figure, we see that the MEMs methods again yield encouraging results with improvement in power even over the traditional methods calibrated to achieve the same type I error, along with generally higher expected sample size and considerably lower MSE of the point estimate close to the point of exchangeability.

FIGURE 4.

FIGURE 4

Simulation results: regression-based MEMs. MEMs(x) denotes MEMs method with x prior probability of exchangeability. Asterisk denotes truncated MEMs with ζi =25 for both arms at each interim analysis. Traditional (calib.) denotes a traditional no-borrowing analysis that is calibrated to achieve the same type I error rate as the corresponding MEM-based method. MSE=mean square error. In all plots the grey dashed line represents the traditional method that controls type I error at the nominal rate of 0.025. Solid black vertical line denotes the point where sources are exchangeable

5 |. DISCUSSION

Clinical investigation of pediatric or rare diseases is challenging using conventional methods for trial design, due to the fact such trials can be difficult to enroll47;48;49. This means performing an adequately powered stand-alone trial may not be feasible50. Databases facilitating real-world and clinical trial evidence sharing are becoming more accessible, which presents a unique opportunity for approaches that augment trial data using data from other sources. Innovations in adaptive trial design are needed to fully take advantage of these opportunities. Our method allows for adaptive use of both control and treatment arm borrowing at each interim analysis of a primary trial of interest. We illustrated sequential testing using MEM to facilitate dynamic borrowing for a normally distributed endpoint, binary endpoint, and linear regression. Our method attained lower expected sample size than a frequentist analysis that does not allow borrowing, with greater power under the alternative. Lower expected sample size could potentially lead to substantial cost-saving in the primary trial. We also note that running the primary and secondary studies concurrently would allow the trials to share infrastructure, study personnel and support, therefore saving costs. This advantage could result in further cost savings above and beyond the reduced sample size. The drawbacks of our method manifest themselves under the null hypothesis (specifically the “local” null) where hyperparameters can be tuned to control the type I error rate inflation. However, type I error rates were consistently reduced below the nominal level under a “global” null setting, which is in line with previous literature39. The benefits and drawbacks are scenario-specific, which emphasizes the importance of completing extensive simulations to understand the operating characteristics under the desired scenario.

Borrowing can lead to inflated type I error rates under the “local” null setting, but our motivating example does not fit into the traditional drug approval setting. Instead, it seems prudent to combine data sources when answering the question of a policy impact. We showed that using our method could improve power, save sample size, as well as costs of the VLNC-CM trial as compared to the traditional analysis. Completing the VLNC-CM trial earlier would be desirable, considering the tobacco and nicotine replacement market is rapidly evolving and cutting edge research is needed to inform regulatory decisions.

We utilized results and theorems from previous literature to draw a parallel between frequentist group-sequential theory and Bayesian group-sequential methodology, which can be used for selection of the decision boundaries and for power and sample size calculations in Bayesian designs that utilize supplemental data borrowing. We also proposed a truncated modification of the MEM method which can control the operating characteristics of the design over time. Traditionally Bayesian models (especially model priors) have to be calibrated to achieve desired frequentist trial operating characteristics. The calibration is often achieved through modifications of the prior and simulations performed using a variety of priors. Such “calibrated priors” to some extent attenuate the purpose of a Bayesian analysis, which may be subjective. Our truncated MEM method may allow for the prior specifications to be more detached from the frequentist operating characteristics, as the tuning parameter ζi adds another layer of flexibility. In our simulations we demonstrated how using the truncated MEMs could help the researcher align trial operating characteristics with their preferences (for example when increased overall power is paramount, and earlier stopping of the trial is not as important, the truncated MEMs presented could be a better option).

We considered normally distributed and binary endpoints, as well as a linear regression analysis. Future work may extend the regression-based approach to other types of outcomes, such as time-to-event. Although we described the regression-based method on a linear model, it can be easily extended for example to generalized linear models. We only considered early stopping for superiority. A futility rule could be developed based on conditional power to detect a treatment effect at full enrollment given all data (both primary and supplemental data) available at the time of the interim analysis. Finally, one could consider adaptive randomization extension, which would change the randomization ratio in favor of the treatment group with smaller ESSS, as there is less information available for that group51.

In conclusion, our method showed several benefits over the standard method under the alternative hypothesis. One should consider borrowing of supplemental information in some cases, for example in the area of pediatric or rare disease trials. The primary trial will often have a small sample size, which naturally calls for methods that incorporate supplemental data. In these cases, our method can provide a favorable tradeoff between precision and type I error rate. Our method could be used to address both cost and ethical challenges in such trials.

Supplementary Material

Supinfo

Acknowledgements

This research was partially funded by NIH under grants T32-HL129956 from the National Heart, Lung, and Blood Institute, P30-CA077598, R01-CA214824, and R01-CA225190 from the National Cancer Institute, and R03-DA041870, R01-DA046320 and U54-DA031659 from the National Institute on Drug Abuse and FDA Center for Tobacco Products (CTP). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or Food and Drug Administration Center for Tobacco Products.

Funding information

Research reported in this publication was supported in part by the National Heart, Lung, and Blood Institute (Award Number T32HL129956), National Cancer Institute (Awards Numbers P30-CA077598, R01-CA214824, and R01-CA225190), and the National Institute on Drug Abuse (R03-DA041870, R01-DA046320 and U54-DA031659). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or Food and Drug Administration Center for Tobacco Products.

Footnotes

Supporting Information

Data sharing not applicable to this article as no datasets were generated or analysed during the current study. Code used to yield the presented simulation results can be found at https://github.com/akotalik/groupseqMEM. Additional supporting information may be found online in the Supporting Information section at the end of this article.

references

  • 1.Berry DA. Adaptive clinical trials in oncology. Nature Reviews Clinical Oncology 2012; 9(4): 199–207. [DOI] [PubMed] [Google Scholar]
  • 2.Chow SC, Chang M. Adaptive design methods in clinical trials. CRC press. 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Food and Drug Administration. Adaptive Design Clinical Trials for Drugs and Biologics. Draft Guidance 2010. [Google Scholar]
  • 4.Food and Drug Administration. Master Protocols: Efficient Clinical Trial Design Strategies to Expedite Development of Oncology Drugs and Biologics Guidance for Industry-DRAFT GUIDANCE. Draft Guidance 2018. [Google Scholar]
  • 5.Kesselheim AS, Avorn J. New “21st Century Cures” legislation: speed and ease vs science. Jama 2017; 317(6): 581–582. [DOI] [PubMed] [Google Scholar]
  • 6.Dickson M, Gagnon JP. Key factors in the rising cost of new drug discovery and development. Nature Reviews Drug Discovery 2004; 3(5): 417–429. [DOI] [PubMed] [Google Scholar]
  • 7.DiMasi JA, Grabowski HG, Hansen RW. Innovation in the pharmaceutical industry: New estimates of R&D costs. Journal of Health Economics 2016; 47: 20–33. [DOI] [PubMed] [Google Scholar]
  • 8.Collier R Rapidly rising clinical trial costs worry researchers.. CMAJ : Canadian Medical Association journal 2009; 180(3): 277–278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.DiMasi JA, Hansen RW, Grabowski HG. The price of innovation: New estimates of drug development costs. Journal of Health Economics 2003; 22(2): 151–185. [DOI] [PubMed] [Google Scholar]
  • 10.Freedman B. Equipoise and the Ethics of Clinical Research. New England Journal of Medicine 1987; 317(3): 141–145. [DOI] [PubMed] [Google Scholar]
  • 11.Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian approaches to clinical trials and health-care evaluation. John Wiley & Sons. 2004. [Google Scholar]
  • 12.Berry S, Carlin B, Lee J, Müller P. Bayesian Adaptive Methods for Clinical Trials. 38 of Chapman & Hall/CRC Biostatistics Series. CRC Press. 2010 [Google Scholar]
  • 13.Viele K, Berry S, Neuenschwander B, et al. Use of historical control data for assessing treatment effects in clinical trials. Pharmaceutical Statistics 2014; 13(1): 41–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Hobbs BP, Carlin BP, Mandrekar SJ, Sargent DJ. Hierarchical Commensurate and Power Prior Models for Adaptive Incorporation of Historical Information in Clinical Trials. Biometrics 2011; 67(3): 1047–1056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Neuenschwander B, Capkun-Niggli G, Branson M, Spiegelhalter DJ. Summarizing historical information on controls in clinical trials. Clinical Trials: Journal of the Society for Clinical Trials 2010; 7(1): 5–18. [DOI] [PubMed] [Google Scholar]
  • 16.Jin H, Yin G. Unit information prior for adaptive information borrowing from multiple historical datasets. arXiv preprint arXiv:2102.00796 2021. [DOI] [PubMed] [Google Scholar]
  • 17.Han B, Zhan J, John Zhong Z, Liu D, Lindborg S. Covariate-adjusted borrowing of historical control data in randomized clinical trials. Pharmaceutical Statistics 2017; 16(4): 296–308. [DOI] [PubMed] [Google Scholar]
  • 18.Smith TC, Spiegelhalter DJ, Thomas A. Bayesian approaches to random-effects meta-analysis: a comparative study.. Statistics in medicine 1995; 14(24): 2685–2699. [DOI] [PubMed] [Google Scholar]
  • 19.Browne WJ, Draper D. A comparison of Bayesian and likelihood-based methods for fitting multilevel models. Bayesian Analysis 2006; 1(3): 473–514. [Google Scholar]
  • 20.Chen MH, Ibrahim JG, Lam P, Yu A, Zhang Y. Bayesian Design of Noninferiority Trials for Medical Devices Using Historical Data. Biometrics 2011; 67(3): 1163–1170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Chen N, Carlin BP, Hobbs BP. Web-based statistical tools for the analysis and design of clinical trials that incorporate historical controls. Computational Statistics & Data Analysis 2018; 127: 50–68. [Google Scholar]
  • 22.Gelman A Prior Distribution for Variance Parameters in Hierarchical Models. Bayesian Analysis 2006; 1(3): 515–533. [Google Scholar]
  • 23.Spiegelhalter DJ. Bayesian methods for cluster randomized trials with continuous responses. Statistics in medicine 2001; 20(3): 435–452. [DOI] [PubMed] [Google Scholar]
  • 24.Kaizer AM, Koopmeiners JS, Hobbs BP. Bayesian hierarchical modeling based on multisource exchangeability. Biostatistics 2018; 19(2): 169–184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Hobbs BP, Landin R. Bayesian basket trial design with exchangeability monitoring. Statistics in Medicine 2018; 37(25): 3557–3572. [DOI] [PubMed] [Google Scholar]
  • 26.Kane MJ, Chen N, Kaizer AM, Jiang X, Xia HA, Hobbs BP. Analyzing basket trials under multisource exchangeability assumptions. arXiv preprint arXiv:1908.00618 2019. [Google Scholar]
  • 27.Hoeting JA, Madigan D, Raftery AE, Volinsky CT. Bayesian Model Averaging: A Tutorial. Statistical Science 1999; 14(4): 382–417. [Google Scholar]
  • 28.Kaizer AM, Koopmeiners JS, Chen N, Hobbs BP. Statistical design considerations for trials that study multiple indications. arXiv preprint arXiv:2007.03792 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Spindel ER, McEvoy CT. The role of nicotine in the effects of maternal smoking during pregnancy on lung development and childhood respiratory disease: Implications for dangers of e-cigarettes. American Journal of Respiratory and Critical Care Medicine 2016; 193(5): 486–494. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Abouk R, Adams S. Bans on electronic cigarette sales to minors and smoking among high school students. Journal of Health Economics 2017; 54: 17–24. [DOI] [PubMed] [Google Scholar]
  • 31.Korfei M The underestimated danger of E-cigarettes - also in the absence of nicotine. Respiratory Research 2018; 19(1). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Hatsukami DK, Luo X, Dick L, et al. Reduced nicotine content cigarettes and use of alternative nicotine products: exploratory trial. Addiction 2017; 112(1): 156–167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Kaizer AM, Hobbs BP, Koopmeiners JS. A multi-source adaptive platform design for testing sequential combinatorial therapeutic strategies. Biometrics 2018; 74: 1082–1094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Neath AA, Cavanaugh JE. The Bayesian information criterion: Background, derivation, and applications. Wiley Interdisciplinary Reviews: Computational Statistics 2012; 4(2): 199–203. [Google Scholar]
  • 35.Shi H, Yin G. Control of type I error rates in Bayesian sequential designs. Bayesian Analysis 2019; 14(2): 399–425. [Google Scholar]
  • 36.Jennison C, Turnbull B. Group sequential methods with applications to clinical trials. CRC Press. 1999. [Google Scholar]
  • 37.Dudley RM, Haughton D. Asymptotic normality with small relative errors of posterior probabilities of half-spaces. Annals of Statistics 2002; 30(5): 1311–1344. [Google Scholar]
  • 38.Kotalik A, Vock DM, Donny EC, Hatsukami DK, Koopmeiners JS. Dynamic borrowing in the presence of treatment effect heterogeneity. Biostatistics 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Kaizer AM, Koopmeiners JS, Kane MJ, Roychoudhury S, Hong DS, Hobbs BP. Basket designs: Statistical considerations for oncology trials. JCO Precision Oncology 2019; 3: 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Yin G Clinical trial design: Bayesian and frequentist adaptive methods. 876. John Wiley & Sons. 2012. [Google Scholar]
  • 41.Anderson KM. gsDesign: An R Package for Designing Group Sequential Clinical Trials Version 2.3 Manual. 2011. [Google Scholar]
  • 42.Morita S, Thall PF, Müller P. Determining the effective sample size of a parametric prior. Biometrics 2008; 64(2): 595–602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Karpecki P, DePaolis M, Hunter JA, et al. Besifloxacin ophthalmic suspension 0.6% in patients with bacterial conjunctivitis: A multicenter, prospective, randomized, double-masked, vehicle-controlled, 5-day efficacy and safety study. Clinical Therapeutics 2009; 31(3): 514–526. [DOI] [PubMed] [Google Scholar]
  • 44.Silverstein BE, Allaire C, Bateman KM, Gearinger LS, Morris TW, Comstock TL. Efficacy and Tolerability of Besifloxacin Ophthalmic Suspension 0.6% Administered Twice Daily for 3 Days in the Treatment of Bacterial Conjunctivitis: A Multicenter, Randomized, Double-Masked, Vehicle-Controlled, Parallel-Group Study in Adults and Children. Clinical Therapeutics 2011; 33(1): 13–26. [DOI] [PubMed] [Google Scholar]
  • 45.Hobbs BP, Chen N, Lee JJ. Controlled multi-arm platform design using predictive probability. Statistical Methods in Medical Research 2018; 27(1): 65–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Murthy S, Fontela P, Berry S. Incorporating Adult Evidence Into Pediatric Research and Practice: Bayesian Designs to Expedite Obtaining Child-Specific Evidence. JAMA 2021; 325(19): 1937–1938. [DOI] [PubMed] [Google Scholar]
  • 47.Augustine EF, Adams HR, Mink JW. Clinical trials in rare disease: Challenges and opportunities. Journal of Child Neurology 2013; 28(9): 1142–1150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Bavdekar SB. Pediatric clinical trials. Perspectives in Clinical Research 2013; 4(1): 89. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Tishler CL, Reiss NS. Pediatric Drug-Trial Recruitment: Enticement Without Coercion. PEDIATRICS 2011; 127(5): 949–954. [DOI] [PubMed] [Google Scholar]
  • 50.Lilford RJ, Thornton JG, Braunholtz D. Clinical trials and rare diseases: A way out of a conundrum. BMJ 1995; 311(7020): 1621. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Hobbs BP, Carlin BP, Sargent DJ. Adaptive adjustment of the randomization ratio using historical control data. Clinical Trials 2013; 10(3): 430–440. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supinfo

RESOURCES