Comparative Effectiveness Research using Bayesian Adaptive Designs for Rare Diseases: Response Adaptive Randomization Reusing Participants

Fengming Tang; Byron J Gajewski

doi:10.1080/19466315.2021.1961854

. Author manuscript; available in PMC: 2024 Jan 1.

Published in final edited form as: Stat Biopharm Res. 2021 Aug 31;15(1):154–163. doi: 10.1080/19466315.2021.1961854

Comparative Effectiveness Research using Bayesian Adaptive Designs for Rare Diseases: Response Adaptive Randomization Reusing Participants

Fengming Tang ^1,^2,^*, Byron J Gajewski ¹

PMCID: PMC9979780 NIHMSID: NIHMS1872894 PMID: 36875290

Abstract

Slow accrual rate is a major challenge in clinical trials for rare diseases and is identified as the most frequent reason for clinical trials to fail. This challenge is amplified in comparative effectiveness research where multiple treatments are compared to identify the best treatment. Novel efficient clinical trial designs are in urgent need in these areas. Our proposed response adaptive randomization (RAR) reusing participants trial design mimics the real-world clinical practice that allows patients to switch treatments when desired outcome is not achieved. The proposed design increases efficiency by two strategies: 1) Allowing participants to switch treatments so that each participant can have more than one observation and hence it is possible to control for participant specific variability to increase statistical power; and 2) Utilizing RAR to allocate more participants to the promising arms such that ethical and efficient studies will be achieved. Extensive simulations were conducted and showed that, compared with trials where each participant receives one treatment, the proposed participants reusing RAR design can achieve comparable power with a smaller sample size and a shorter trial duration, especially when the accrual rate is low. The efficiency gain decreases as the accrual rate increases.

Keywords: Comparative Effectiveness Research, Bayesian Adaptive Model, hierarchical models, response adaptive randomization

1. Introduction

One of the biggest challenges in designing clinical trials for rare diseases is slow accrual rate. Recent studies show that slow accrual rate is a significant hurdle in advancing the translation of clinical discoveries (Tang et al., 2017), and a poor patient accrual is identified as the most frequent reason for clinical trials to be classified as “fail to complete” (Stensland et al., 2014). This challenge is amplified in comparative effectiveness research where multiple treatments are compared to identify the treatment that works best for improving health. Frequently, investigators have to reduce the number of arms because sufficient patients cannot be enrolled in a reasonable length of study duration. Novel efficient clinical trial designs are in urgent need in these areas.

In conventional parallel randomized clinical trial designs, participants are randomized to one study treatment and each participant contributes one observation regardless of the participant’s outcome. However, in real world clinical practice, patients often switch therapies if the desired outcome is not achieved. This motivated our proposal of reusing participants in a clinical trial design. In our design, participants are randomized to one study drug as the initial treatment. If the participant responds to the initial treatment, then the participant completes the study and no more treatment will be assigned to the participant. On the other hand, if the participant does not respond to the initial treatment, the participant will be assigned a new treatment from the remaining therapies. This process is repeated until either the desired treatment outcome is achieved, or all study treatments are given to the participant. The advantage of the proposed design is that it mimics the real-world clinical practice and it can achieve the desired power with fewer participants.

The proposed reusing participants design is an extension to the two-arms crossover trial for absorbing binary endpoint proposed by Nason and Follmann (2010). An absorbing binary endpoint is an outcome that cannot be repeated in the second period if it occurs in the first period, such as mortality or pregnancy in infertility studies. In our proposed design, responding to a treatment is analyzed as an absorbing binary endpoint and participants will not switch to a new treatment if the desired outcome is achieved. It is more ethical than the conventional crossover design which requires participants to receive all candidate treatments in sequence(Wellek et al., 2012) and results in participants who receive effective treatments first to cross-over to ineffective treatments.

With efficiency in mind, we further improve the reusing participants design by employing a Bayesian adaptive design. Bayesian adaptive designs have been broadly accepted to be able to increase efficacy, reduce duration, and provide more ethical clinical trials(Berry et al., 2011). The basic idea is to utilize response adaptive randomization (RAR) to assign more participants to the arms that are most promising, by updating the randomization probability using interim analyses. The reusing participants RAR design starts by randomizing participants with equal probability to one of the study treatments. As enrollment continues, interim analyses will be performed according to a pre-specified schedule. The data available at the interim analyses will be used to calculate the posterior probabilities of treatments being the most effective, which will then be used to update the RAR allocation rates for future participants. It is worth noting that, in order to avoid overly complicating trial operations, the RAR randomization only applies to the initial treatment of each participant. Once the initial treatment is determined for a participant, the order of subsequent treatments will be determined using sampling without replacement from the remaining study treatments. Participants will receive treatments until they achieve the desired outcome or until the they go through all the study treatments.

We will compare this design, called Reuse-RAR, with conventional adaptive clinical design (Conventional-RAR) where each participant is randomized to one treatment only, such as the design described by Gajewski et al. (2015). In addition, we will also compare the Reuse-RAR design with a non-adaptive design that reuses participants (Reuse-noRAR) and a non-adaptive design that does not reuse participants (Conventional-noRAR). The remainder of this article is arranged as follows. In section 2.1, we first describe the motivation study and give an overall summary for each of the four designs in the context of the motivation study. In section 2.2, we describe the statistical models for designs that reuse participants (i.e., Reuse-noRAR and Reuse-RAR) and designs that do not reuse participants (i.e., Conventional-RAR and Conventional-noRAR) separately. Section 2.3 – 2.9 cover accrual rate patterns, interim analysis schedule, response adaptive randomization, virtual response rate, success criteria and model calibration, carryover effect and period effect, and simulation. They are applied to both Reuse-RAR and Conventional-RAR designs to ensure a fair comparison is made. Extensive simulations are used to compare operating characteristics of the designs including power, duration of study, number of participants required. The results are summarized in section 3. In section 4, we draw conclusion from our analysis and discuss the advantages and limitations of our proposed Reuse-RAR design. Section 5 is discussion and future work.

2. Methods

2.1. Trial summary

To illustrate the method, we use the setting of Patient Assisted Intervention for Neuropathy: comparison of Treatment in Real Life Situations (PAIN-CONTRoLS) (Gajewski et al., 2015; Brown et al., 2016), a comparative effectiveness clinical trial studying four treatments for cryptogenic sensory polyneuropathy (CSPN). CSPN, also known as idiopathic polyneuropathy, is a diagnosis made when all known causes of neuropathy have been ruled out. Although CSPN accounts for 10 – 30% of all polyneurophy cases (Pasnoor et al., 2013), very few trials have been conducted to study the treatments of CSPN. There is an urgent need for evidence generating trials to guide physicians treating CSPN patients (Brown et al., 2016). PAIN-CONTRoLS is one of the first such trials. The primary endpoint is evaluated using visual analog scale pain score (VAS) (Burchhardt et al., 2003). A subject is considered a responder if the VAS score drops by 50% or more after being on a treatment for 12 weeks. The goal of the study is to identify which drug is the most effective in reducing pain with fewest side effects. Although the actual PAIN-CONTRoLS trial had four arms we consider a “what-if” trial with five arms and for simplicity we assume a binary endpoint rather than the trinary endpoint used in PAIN-CONTRoLS. Below is a summary for the four designs (Conventional-noRAR, Conventional-RAR, Reuse-noRAR, and Reuse-RAR) in the context of PAIN-CONTRoLS trial.

In the Reuse-RAR design, participants are randomized to one of the five treatments as their initial treatment. At first the study uses equal randomization, which is then updated using RAR after the first interim analysis. The order of the subsequent treatment assignments is determined by sampling without replacement from the four remaining treatments. After 12 weeks, depending on the VAS score measurements, the participants maybe given the next treatment in line if the desired effect is not achieved, or be considered as a responder and complete the trial. Each participant can have multiple observations (between one and five).

In the Conventional-RAR design, participants are randomized to one of the five treatments using the same sample randomization scheme as Reuse-RAR, however, no additional treatments will be assigned beyond the first treatment. Each participant can only have one observation.

In both Reuse-RAR design and Conventional-RAR design, interim analyses will be performed according to a pre-specified schedule. At each interim, all current data will be analyzed, and the treatment allocation rates will be updated so that more participants will be allocated to the arm with the maximal effect.

In the Reuse-noRAR design, participants are randomized to one of the five treatments as their initial treatment using equal randomization. The order of the subsequent treatment assignments is determined by sampling without replacement from the four remaining treatments. No interim analyses will be performed, and the allocation rates will stay the same for the whole study. Like the Reuse-RAR design, each participant can have one to five observations.

In the Conventional-noRAR design, participants are randomized to one of the five treatments using equal allocation rates. Each participant will have one observation. No interim analysis will be performed.

2.2. Statistical models

In all four designs, we assume the five treatments are not ordered in any explicit manner.

In Conventional-RAR and Conventional-noRAR design, each participant has exactly one observation. These designs will use an independent logistic model, described in section 2.2.1.

In Reuse-RAR and Reuse-noRAR design, each participant can have more than one observation. The observations from the same participant are correlated due to participant variation (or participant disease severity), which is modeled by including participant as a random effect in a hierarchical logistic model (also known as linear mixed model). A normal, hierarchical prior on the logit scale is used for the participant effect. This approach is similar to that of Nason and Follmann(2010), where participant variation was modeled using Beta distribution. Furthermore, we assume that the carryover effect is consistent across participants and treatments. A single carryover factor is used to model the amount of effect that is carried over from previous period. Furthermore, a period effect can also be incorporated to account for the effect of period. Model details are described in 2.2.2.

2.2.1. Independent logistic model (for Conventional-RAR design and Conventional-noRAR design)

Let x_i be a 5-element binary vector indicating the treatment participant i received. For example, x_i = (0,0,1,0,0) indicates participant i received the 3^rd treatment. Let y_i be the binary outcome variable (0 non-responder, 1 responder). Assuming y_i follows a Bernoulli(p_i) distribution, where

logit (p_{i}) = x_{i} β

β = {β₁, β₂, β₃, β₄, β₅} is a 5-element vector denoting the treatment effect on logit scale. $θ_{j} = \frac{e x p (β_{j})}{1 + e x p (β_{j})}$ is the probability of being a responder for a participant received treatment j. A vague normal prior, N(0, 5²) is assigned to each β_j. When transformed back to probability scale using anti-logit function, the vague prior gives a 95% equal-tailed interval of (0.001, 0.999). Hamiltonian Monte Carlo (Betancourt; Gelman et al., 2014) is used to obtain the posterior distribution for {β₁, β₂, β₃, β₄, β₅|y}. The best arm is defined as j_max = arg max_{j∈(1,2,3,4,5)}(β_j). The probability of being the best arm for arm j is denoted as prob(j = j_max|y).

2.2.2. Hierarchical logistic model (for Reuse-RAR design and Reuse-noRAR design)

Let T_i be the number of periods for participant i and let t ∈ {1, 2, …, T_i} denote the period index. Let x_it be a 5-element binary vector indicating the treatment participant i received during period t. For example, x_it = (0,0,1,0,0) indicates participant i received the 3^rd treatment during period t. y_it is the binary outcome variable for participant i during period t (0 for non-responder and 1 for responder), and follows a Bernoulli(p_it) distribution, where

logit (p_{i t}) = x_{i t} β + ϵ_{i}

β = {β₁, β₂, β₃, β₄, β₅} is a 5-element vector denoting the treatment effect for the 5 treatments on logit scale. ϵ_i denotes the participant-specific effect (i.e. participant disease severity) on logit scale and it follows a normal distribution: $ϵ_{i} ~ N (0, σ_{ϵ}^{2})$ . For priors, an independent normal distribution N(0,5²) is used for β_j and a truncated normal distribution N(0,3²) is used for $σ_{ϵ}^{2}$ .

To accommodate the carryover effect, the model can be expanded as follows,

logit (p_{i t}) = {\begin{array}{l} x_{i t} β + ϵ_{i}, & t = 1 \\ x_{i t} β + π * x_{i (t - 1)} β + ϵ_{i}, & t > 1 \end{array}

Where π is the carryover factor which models the proportion of treatment effect that persists from one treatment to the next. A prior of N(0,0.5²) is used for π.

In cases where study period has important impact on treatment outcome, we can further expand the model as follows,

logit (p_{i t}) = {\begin{array}{l} x_{i t} β + ϵ_{i}, & t = 1 \\ x_{i t} β + π * x_{i (t - 1)} β + f (t) + ϵ_{i}, & t > 1 \end{array}

where f(t) is a function of t, which can be chosen to model a potential period effect. It can be a polynomial function or a function representing flexible cubic splines. For example, a linear function is a reasonable choice for the PAIN-CONTRoLS study due to the small number of periods,

f (t) = γ_{1} t

where γ₁ is the regression coefficient. A vague N(0,5²) prior is used for γ₁.

Hamiltonian Monte Carlo is used to obtain the posterior distribution for model parameters : {β₁, β₂, β₃, β₄, β₅, π, γ₁ | y₁, y₂, .. y_n}. The best arm is defined as j_max = arg max_{j∈(1,2,3,4,5)}(β_j). The probability of being the best arm for arm j is denoted as prob(j = j_max).

2.3. Accrual rate patterns

We assume the distribution of participant accrual patterns follows a Poisson distribution. In order to investigate the operating characteristics of the trial designs, we run simulations using four different rates: 1.5 participants per week, 3 participants per week, 4.5 participants per week, and 6 participants per week. If a participant is reused in a trial, we assume no waiting time between their last visit and randomization to the next study drug.

2.4. Interim analysis schedule

For the Reuse-RAR design, each participant can have multiple observations (between 1 and 5) with each observation corresponding to a treatment the participant received. We will use number of observations initiated instead of number of participants enrolled to describe sample size. The interim analyses will be conducted when 300, 500, and 700 observations are initiated. Only observations with assessable endpoints (being on a treatment for 12 weeks and with an additional 4 weeks of lag for collecting and observing endpoints) will be included in the interim analyses. A final analysis will be conducted when 900 observations are assessable. Conventional-RAR design will have the same interim and final analysis schedule but in terms of participants enrolled. For the Reuse-noRAR and Conventional-noRAR design, no interim analysis will be performed. A final analysis will be conducted after assessable endpoint is available for 900 observations in Reuse-noRAR design and 900 participants in Conventional-noRAR design.

2.5. Response Adaptive Randomization (RAR)

For Conventional-RAR and Reuse-RAR, at each interim, the randomization probability needs to be updated to allocate more future participants to the most promising arms. There are many choices of the formula for the randomization probability. For example, one choice is proportional to the posterior probabilities that the arms have maximum effect (i.e., Pr(j = j_max)). We use the information formula for RAR allocations (Gajewski 2015), $V_{j} = \sqrt{\frac{Pr (j = j_{m a x}) V a r (θ_{j})}{n_{j} + 1}}$ , where n_j is the number of participants whose initial treatment is drug j and var(θ_j) is the sample variance of θ_j|y_j, and $\frac{v a r (θ_{j})}{n_{j} + 1}$ is the expected change in variance (a proxy for information gained). This approach balances the goal of randomizing to the arm with the maximum effect and the design to gain new information by allocating to under explored arms..

2.6. Virtual response rate

Virtual response rate is the true efficacy rate used to generate participant response in simulations. We label $θ^{T} = (θ_{1}^{T}, θ_{2}^{T}, θ_{3}^{T}, θ_{4}^{T}, θ_{5}^{T})$ the virtual response rates for the five study treatments. If we assume there is no participant variation, which means participant outcome is solely determined by the treatment the participant received, the sampling distribution of the participant outcome is $y_{i t} ~ Bernoulli (θ_{j}^{T})$ , where j represents the treatment participant i received during period t. For the purpose of the study, we investigated three scenarios. The first scenario assumes θ^T = (0.2, 0.2, 0.2, 0.2, 0.2), where all treatments are equivalent. In comparative effectiveness setting, the scenarios where all treatments are equivalent is the null scenarios. This is the null scenario, denoted by H₀. The second scenario assumes θ^T = (0.3, 0.3, 0.3, 0.4, 0.5), where treatment 5 is the most effective and treatment 4 is the second effective. We denote it by H₁. The third scenario assumes θ^T = (0.3, 0.3, 0.3, 0.5, 0.5), where treatment 5 and treatment 4 are equally effective. We denote it by H₂. H₁ and H₂ are two alternative scenarios.

In the real world, it is not realistic to assume there is no participant variation. The observations from the same participant are usually more alike than those from different participants. We assume, on logit scale, participant variation follows a normal distribution: $ϵ_{i}^{T} ~ normal (0, σ_{ϵ^{T}}^{2})$ . And the sampling distribution of participant outcome becomes $y_{i t} ~ Bernoulli ({logit}^{- 1} (logit (θ_{j}^{T}) + ϵ_{i}^{T}))$ . We use $σ_{ϵ^{T}}^{2}$ = 0.25 in simulations, which can be translate to an ICC (intraclass correlation coefficient) of 0.07.

2.7. Success criteria and Model calibration

At the final analysis, an arm may be declared superior if its posterior probability of being the best arm meets a pre-specified success criterion, i.e. Pr(j = j_max) > δ. Type I error is the proportion of simulations that meet the success threshold in null scenarios (U.S. Department of Health and Human Services Food and Drug Administration, 2019). Power is the proportion of simulations that meet the success threshold in alternative scenarios. Below we will discuss how to prespecify the success thresholds (δ).

In order to make the designs comparable, success thresholds (δ) are chosen to achieve similar type I error rates across designs using simulations in null scenarios. For example, figure 1 is the plot for the proportion of success (i.e. Type 1 error) by threshold (δ) based on simulations using the Conventional-noRAR design in the null scenario when λ = 1.5. As the threshold increases, the proportion of simulations meet the success criterion (i.e. type I error) decreases. When the threshold is 0.829, the type I error rate is roughly 4.9%. Using the same method, we identified δ is to be 0.829, 0.794, 0.832, and 0.827 for Conventional-noRAR, Conventional-RAR, Reuse-noRAR, and Reuse-RAR, respectively. It is worth noting that Conventional-RAR has a lower cutoff than Conventional-noRAR (0.794 vs. 0.829) and Reuse-RAR has a lower cutoff than Reuse-noRAR (0.827 vs. 0.832). The reason is that, in RAR designs, when one arm has a high response rate in the early stage of the study due to random variation, more participants will be assigned to that arm and the response rates will regress because of the actual rate, and hence it will be less likely to observe simulations with extremely high prob(j = j_max). Along the same lines of reasoning, Conventional-RAR has a much lower cutoff than the Reuse-RAR (0.794 vs. 0.827) because Reuse-RAR adapts much less aggressively than Conventional-RAR in two aspects: (1) Reuse-RAR runs much faster than the Conventional-RAR and has much less time to adapt; (2) RAR only applies to the first treatment of each participant in the Reuse-RAR while it applies to all the observations in Conventional-RAR.

Proportion of success (Type I error) by threshold (δ) based on simulations for Conventional-noRAR design in the null scenario when λ = 1.5

The success rates (Type I error rates) for each scenario under the null hypothesis are given in Table 1. All the Type I error rates are controlled at around 5%, with a range between 4.8% and 5.1%.

Table 1.

Type I error under H₀

		Accrual rate
Design	Threshold	1.5	3	4.5	6
Conventional-noRAR	0.829	4.9%	5.0%	5.0%	5.0%
Conventional-RAR	0.794	4.9%	5.1%	5.0%	4.9%
Reuse-noRAR	0.832	5.0%	5.0%	4.9%	5.1%
Reuse-RAR	0.827	5.1%	4.8%	4.8%	4.8%

Open in a new tab

2.8. Carryover effect and period effect

For Reuse-noRAR and Reuse-RAR, we assume there is a 20% carryover effect and no period effect. The sampling distribution is

y_{i t} ~ {\begin{array}{l} Bernoulli ({logit}^{- 1} (logit (θ_{j}^{T}) + ϵ_{i}^{T})) & when t = 1 \\ Bernoulli ({logit}^{- 1} (logit (θ_{j}^{T} + π^{T} * θ_{j'}^{T}) + ϵ_{i}^{T})) & when t > 1 \end{array}

where $θ_{j'}^{T}$ is the virtual response rate of the treatment participant received during period t − 1; π^T = 20%.

2.9. Simulations

In total, we investigated 12 scenarios: the combinations of 4 different accrual rates (λ: 1.5, 3, 4.5, and 6), 1 participant variation ( $σ_{ϵ^{T}}^{2} : 0.25$ ), and 3 virtual response rates (θ^T = (0.2, 0.2, 0.2, 0.2, 0.2), (0.3, 0.3, 0.3, 0.4, 0.5), (0.3, 0.3, 0.3, 0.5, 0.5)). Each scenarios will be conducted using 4 designs: Conventional-noRAR, Conventional-RAR, Reuse-noRAR, and Reuse-RAR.

For each scenario, we run 10,000 simulations. The maximum 95% margin of error is $1.96 \sqrt{0.5 * 0.5 / 10000} < 0.01$ . With a Type I error of 0.05 or power of 0.90, the margin of error is much smaller, $1.96 \sqrt{0.05 * 0.95 / 1000} = 0.004$ and is $1.96 \sqrt{0.1 * 0.9 / 1000} = 0.005$ respectively.

The simulations are implemented in R(R Core Team, 2017; Stan Development Team 2017) and Stan (Stan Modeling Language User’s Guide and Reference Manual). R is used to generate participant response data and Stan is used to perform interim and final analyses.

3. Results

In this section, we report the simulation results comparing the four designs in terms of the following operating characteristics: power, number of participants enrolled, trial duration, and proportion of observations that received the best treatment. We also explored the performance of the Reuse-RAR design when allowing RAR for subsequent treatments and the impact of participant dropouts on the Reuse-RAR design.

3.1. Power

The power for different scenarios under H₁: θ^T = (0.3, 0.3, 0.3, 0.4, 0.5) and H₂: θ^T = (0.3, 0.3, 0.3, 0.5, 0.5) are given in Figure 2. For both H₁ and H₂, Conventional-noRAR design had the lowest power and Conventional-RAR design had the highest power. The two Reuse designs had a power between that of the two Conventional designs, with Reuse-RAR higher than the Reuse-noRAR. RAR increased power in both Conventional designs and Reuse designs.

Under H₁, when there was a single drug that was better than the other drugs, Conventional-noRAR had a notably lower power than Reuse-noRAR. Given that both designs had exactly 900 observations, the reason to have such a big difference in power was that the independent logistic model used by Conventional-noRAR design did not control for patient specific variation while the hierarchical logistic model used by the Reuse-RAR did. On the other hand, we did not see a big difference between the Conventional-RAR and Reuse-RAR design. Two factors reduced the power advantage of the hierarchical logistic model used by Reuse-RAR design: (1) Reuse-RAR design ran faster than the Conventional-RAR and had less time to adapt; (2) RAR only applied to the first treatment of each participant in the Reuse-RAR while it applied to all the observations in Conventional-RAR. As a result, a smaller proportion of observations were assigned to the better treatments (Figure 5) and the efficient gain due to RAR in the Reuse-RAR design was less than in the Conventional-RAR design.

Proportion of observations received treatment 5

Under H₂, when there are two equally effective treatments, the power is much lower than under H₁ across the designs and scenarios. In section 2.6, we define power as the proportion of simulations that meet the success threshold: Pr(j = j_max) > δ. Treatment 4 and treatment 5 compete against each other under H₂ and the probability of meeting success threshold is much lower than in H₁. The success criterion we choose is not appropriate when there is no single winner arm. We will discuss some other options in the discussion section.

3.2. Number of participants enrolled

The numbers of participants enrolled in different scenarios are shown in Figure 3. Conventional-noRAR and Conventional-RAR design enrolled 900 participants in all scenarios. Reuse-noRAR and Reuse-RAR designs enrolled much fewer participants than the two conventional designs (less than 450) owing to their ability to reuse participants. The reduction in number of participants enrolled in the Reuse designs is the greatest when the accrual rate is low and it decreases gradually as accrual rate increases. Reuse-RAR enrolled slightly more participants than the Reuse-noRAR. This is because Reuse-RAR design assigns more participants to the better drugs as their initial treatment, which in turn deceases the average number of periods per participant. Consequently, with a fixed number of observations (900) for all designs, the number of participants enrolled in the Reuse-RAR is more than the Reuse-noRAR.

3.3. Trial duration:

Trial duration for different scenarios are presented in Figure 4. It is directly related to the number of participants enrolled. Conventional-noRAR and conventional-RAR had roughly the same trial duration due to the same number of participants enrolled (900). The two Reuse designs had a much shorter trial duration than the two Conventional designs. The reduction in trial duration was the highest when accrual rate is low. And Reuse-RAR had a slight longer trial duration than the Reuse-noRAR.

3.4. Proportion of observations that received treatment 5

The proportion of observations that received treatment 5 (the best treatment under H₁) in different scenarios are shown in Figure 5. Under H₀, 20% of observations received treatment 5 regardless of scenarios and designs. Under H₁ and H₂, the rates were about 20% for both Conventional-noRAR and Reuse-noRAR with reuse-noRAR slightly higher. Conventional-RAR had the highest proportion of observations receiving treatment 5, and it was followed by Reuse-RAR.

3.5. Compare Reuse-RAR(complete) and Reuse-RAR

In the introduction section, we pointed out that, the RAR randomization only applied to the initial treatment of each participant in order to avoid overly complicating trial conduction. We call this approach Reuse-RAR. We can further improve the efficiency of the Reuse-RAR design by increasing the aggressiveness of adaption to allow RAR randomization for the subsequent treatment assignments. Specifically, instead of using sampling without replacement to determine the order of subsequent treatments, υ_j of the remaining treatments were normalized and used to randomly assign treatment for the next period. This approach is called Reuse-RAR(complete). Simulations were conducted to assess the impact of Reuse-RAR(complete) on operating characteristics. Figure 6 compares Reuse-RAR(complete) and Reuse-RAR under H₁ when the accrual rate is 3 and participant variation is 0.25. The Reuse-RAR(complete) assigned slightly more observations to the best arm and increased power slightly. The number of participants enrolled and the trial duration are almost identical.

Compare Reuse-RAR(complete) and Reuse-RAR(initial)

3.7. Participant dropouts

In this section, we explored the impact of participant dropouts on operating characteristics. Figure 7 shows simulation result comparing the scenarios with 10% dropouts and the scenarios with no dropouts when accrual rate is 3 and participant variation is 0.25. Overall, in scenarios with a 10% dropout, the number of assessable observations decreased by around 10% and the power decreased by around 4% across all designs. While trial duration and number of participants enrolled were not affected by the dropouts in the Conventional-RAR and Conventional-noRAR design, they were slightly higher in the Reuse-RAR and Reuse-noRAR design when there was a 10% dropouts. Nonetheless, in both scenarios with or without dropouts, the Reuse-RAR and Reuse-noRAR design were much more efficient in terms of trial duration and number of participants enrolled.

Compare scenarios with a 10% dropout with scenarios with no dropouts

4. Conclusion

Our simulations showed that, of the four designs, Reuse-RAR is the most efficient design which can achieve a higher power with a shorter trial duration and a smaller number of participants. Conventional-noRAR is the least efficient design. RAR does improve efficiency in both Conventional designs and Reuse designs.

When compared with Reuse-noRAR design, Reuse-RAR has a slightly higher power with a comparable number of participants and trial duration. This efficiency improvement is achieved by assigning more participants to the promising treatments.

When compared with Conventional-RAR design, the Reuse-RAR design can achieve a slightly lower power with a much smaller number of participants and a much shorter trial duration, especially when the accrual rate is low. This efficiency improvement is achieved by reusing participants, and it goes beyond the conventional-RAR design’s efficiency gain by assigning more participants to the promising arms. However, when the accrual rate increases, the efficiency gain decreases. This is because the Reuse-RAR runs much faster than the Conventional-RAR design and it has less time for the trial to adapt.

5. Discussion

The proposed Reuse-RAR design belongs to the large class of RAR designs. The aggressiveness and timing of adaptation have a significant impact on the RAR performance (Jang et al., 2017). Reuse-RAR(complete) can further improve the efficiency of the Reuse-RAR design by increasing the aggressiveness of adaption to allow RAR randomization for the subsequent treatment assignments. Our simulations showed that, compared with Reuse-RAR, Reuse-RAR(complete) assigned slightly more observations to the best arm and increased the power slightly. However, the efficiency gain was at the cost of increased trial conduction complexity, which has been one of the major critiques of RAR design. Whether to employ Reuse-RAR(complete) should be decided case by case by balancing the efficiency gain against the increased trial conduction complexity.

The two designs that reuse participants (Reuse-RAR and Reuse-noRAR) require participants to be engaged in the study longer than Conventional-RAR and Conventional-noRAR. In the extreme case where participants do not respond to any treatments, the time required to be engaged in the Reuse-RAR and Reuse-noRAR could be as long as 5 times that of the Conventional-RAR and Conventional-noRAR. As a result, Reuse-RAR and Reuse-noRAR are more susceptible to dropouts. Simulations in section 3.7 showed that participant dropouts slightly increase the trial duration and number of participants for the designs that reuse participants, but not for the conventional designs. However, simulations also showed that, even in the presence of participant dropouts, Reuse-RAR and Reuse-noRAR performed better than the conventional designs by achieving a similar power with a much shorter trial duration and less participants. Another related concern caused by the long engagement time required in the Reuse designs is that participant characteristics, including disease stage, drug exposure, treatment resistance, etc., can evolve during the course of the treatment. Bias can be introduced because observations at later treatment periods may have more severe conditions or higher drug resistance. To mitigate the bias, we can expand the model by adjusting related participant characteristics. Furthermore, covariate-adjusted adaptive randomization (Rosenberger et al., 2001), which allows allocation rules to consider both patient response and patient characteristics, can be used to further improve RAR.

In the simulations, we assumed there was a carryover effect that was consistent across different participants and different treatments. This assumption may not be true in general. The model can be modified to better capture the carryover effect according to substantive subject matter knowledge. Another frequently used approach is to include a washout period between two treatments. Including a washout period will increase the trial duration for Reuse-RAR and Reuse-noRAR. The extent to which the washout period will affect the trial duration is determined by the length of the washout period.

In section 3.1, simulations showed that the power was much lower under H₂, when there were two equally effective arms, than under H₁, when there was a single treatment that was better than the rest of the treatments. Power was defined as the proportion of simulations that meet the success threshold: Pr(j = j_max) >δ. Under H₂, treatment 4 and treatment 5 compete against each other and the likelihood of having an arm to meet the success threshold Pr(j = j_max) >δ is very low. For scenarios with multiple arms that are equally effective, the success criterion we chose is not appropriate. An option is to assess the probability of being a better arm when compared with other arms. The success criterion can be defined as Pr(θ_j > θ_j′) > δ′ for any j ∈ (1,2,3,4,5) and j′ ∈ (1,2,3,4,5).

Freidlin et al. (Freidlin et al., 2013) pointed out, for studies with only 2 arms, the RAR performs poorly and results in a lower power due to the deviation from the optimal 1:1 randomization. We should use fixed 1:1 randomization in two arms studies, if optimizing power is of primary concern (a caveat would be if one is willing to sacrifice a bit of power for placing participants on the better arm, see Wick et al. (Wick et al, 2017)). However, the reusing participant scheme is still relevant, and it may result in smaller and shorter clinical trial. The benefit may be small to moderate due to the fact each participant will contribute maximum of 2 observations. More research is needed to evaluate the performance of the reusing participants scheme in two arm studies. The Reuse participants scheme is best suit for studies with multiple arms and with a slow accrual rate.

Acknowledgement

This study was supported in part by a NIH Clinical and Translational Science Award (UL1TR002366) to the University of Kansas, and KUMC Biostatistics & Data Science Department, as well as The University of Kansas Cancer Center (P30 CA168524).

Appendix. R code and Stan code

Stan code for Conventional-RAR design:

data {
        int<lower=0> N;
        int<lower=0> K;
        matrix[N,K] X;
        int<lower=0, upper=1> Success[N];
}
parameters {
      vector[K] b;
}
model {
        b ~ normal(0,5);
        Success ~ bernoulli_logit(X*b);
}
generated quantities{
      vector[K] p =exp(b)./(1+exp(b));
}

Stan code for Reuse-RAR and Reuse-noRAR design:

data {
        int<lower=0> N;  //number of observations
        int<lower=0> J;  //number of participants
        int<lower=0> K;  //number of treatments
        matrix[N,K] X;   //treatment indicators
        matrix[N,K] X1;  //prior treatment indicators
        int<lower=0> ptid[N];  
        int<lower=0, upper=1> Success[N];
}
parameters {
        vector[K] b;
        vector[J] theta;
        real<lower=0> sigma;
        real<lower=0> pi1;
}
model {
        thetañormal(0,1);
        sigmañormal(0,3);
        bñormal(0,5);
        pi1ñormal(0,0.5);
        Success ~ bernoulli_logit(X*b+X1*b*pi1+theta[ptid]*sigma);
}
generated quantities{
        vector[K] p =exp(b)./(1+exp(b));
}

References

1.Berry SM, Carlin BP, Lee JJ, Muller P, Bayesian Adaptive Methods for Clinical Trials. New York, NY: CRC Press; 2011. [Google Scholar]
2.Betancourt. A conceptual introduction to Hamiltonian Monte Carlo. Preprint arXiv:1701.02434 Columbia University, New York. [Google Scholar]
3.Brown AR, Gajewski BJ, Aaronson LS, Mudaranthakam DP, Hunt SL, Berry SM, Quintana M, Pasnoor M, Dimachkie MM, Jawdat O, Herbelin L, Barohn RJ. A Bayesian comparative effectiveness trial in action: developing a platform for multisite study adaptive randomization. Trials. 2016; 17(1): 428. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Burckhardt CS, Jones KD. Adult measures of pain: The McGill Pain Questionnaire (MPQ), Rheumatoid Arthritis Pain Scale (RAPS), Short-Form McGill Pain Questionnaire (SF-MPQ), Verbal Descriptive Scale (VDS), Visual Analog Scale (VAS), and West Haven-Yale Multidisciplinary Pain Inventory (WHYMPI). Arthritis Care Res. 2003;49(S5):S96–S104. [Google Scholar]
5.Freidlin B, Korn EL. Adpative randomization versus interim monitoring. J clin Oncol 2013; 31(7): 969–970. [DOI] [PubMed] [Google Scholar]
6.Gajewski BJ, Berry SM, Quintana M, Pasnoor M, Dimachkie M, Herbelin L, Barohn R. Building efficient comparative effectiveness trials through adaptive designs, utility functions, and accrual rate optimization: finding the sweet spot. Stat Med. 2015; 34(7):1134–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Barohn Gajewski, Pasnoor Brown, Herbelin Kimminau, Mudaranthakam Jawdat, Dimachkie, PAIN-CONTRoLS Study Team (in press), “Patient Assisted Intervention for Neuropathy: Comparison of Treatment in Real Life Situations (PAIN-CONTRoLS): Bayesian Adaptive Comparative Effectiveness Trial,” JAMA Neurology. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.U.S. Department of Health and Human Services Food and Drug Administration, Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER). Adaptive Designs for Clinical Trials of Drugs and Biologics Guidance for Industry. Nov. 2019.
9.Gelman A, Carlin J, Stern H, Dunson D, Vehtari A, Rubin D. Bayesian Data Analysis. New York, NY: CRC Press; 2014. [Google Scholar]
10.Jang Y, Zhao W, Durkalsk-Mauldin V. Impact of adaption algorithm, timing, and stopping boundaries on the performance of Bayesian response adaptive randomization in confirmative trials with binary end-point. Contemp Clin Trials 2017; 62: 114–120 [DOI] [PubMed] [Google Scholar]
11.Nason M, Follman D. Design and Analysis of Crossover Trials for Absorbing Binary Endpoints. Biometrics 2010;7:958–965 [DOI] [PubMed] [Google Scholar]
12.Pasnoor M, Dimachkie MM, Barohn RJ. Cryptogenic sensory polyneuropathy. Neurol Clin. 2013;31:463–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.R Core Team(2017). R: a Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing [Google Scholar]
14.Stan Development Team (2017) RStan: the R interface to Stan, version 2.16.1 (Available from http://mc-stan.org.)
15.Stan Modeling Language User’s Guide and Reference Manual, Version 2.16.0 Stan Development Team. (Available from http://mc-stan.org.) [Google Scholar]
16.Stensland KD, McBride RB, Latif A, Wisnivesky J, Hendricks R, Roper N, et al. Adult cancer clinical trials that fail to complete: an epidemic. J Natl Cancer Inst 2014 [DOI] [PubMed] [Google Scholar]
17.Tang c, Sherman SI, Price M, Weng J, Davis SE, Hong DS, Yao JC, Buzdar A, Wilding J, Lee JJ. Clinical Trial Characteristics and Barriers to Participant Accrual: The MD Anderson Cancer Center Experience over 30 years, a Historical Foundation for Trial Improvement. Cancer Therapy: Clinical March 2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Rosenberger WF, Vidyashankar AN, Agarwal DK. Covariate-adjusted response-adaptive designs for binary response. Journal of Biopharmaceutical Statistics, 11(4), 227–236 (2001) [PubMed] [Google Scholar]
19.Wellek S, Blettner M. On the proper use of the crossover design in clinical trials: part 18 of a series on evaluation of scientific publications. Dtsch Arztebl Int. 2012;109(15):276–281. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Wick J, Berry SM, Yeh H, Choi W, Pacheco CM, Daley C, Gajewski BJ (2017), A Novel Evaluation of Optimality for Randomized Controlled Trials. Journal of Biopharmaceutical Statistics, 27 (4), 659–672. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Berry SM, Carlin BP, Lee JJ, Muller P, Bayesian Adaptive Methods for Clinical Trials. New York, NY: CRC Press; 2011. [Google Scholar]

[R2] 2.Betancourt. A conceptual introduction to Hamiltonian Monte Carlo. Preprint arXiv:1701.02434 Columbia University, New York. [Google Scholar]

[R3] 3.Brown AR, Gajewski BJ, Aaronson LS, Mudaranthakam DP, Hunt SL, Berry SM, Quintana M, Pasnoor M, Dimachkie MM, Jawdat O, Herbelin L, Barohn RJ. A Bayesian comparative effectiveness trial in action: developing a platform for multisite study adaptive randomization. Trials. 2016; 17(1): 428. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Burckhardt CS, Jones KD. Adult measures of pain: The McGill Pain Questionnaire (MPQ), Rheumatoid Arthritis Pain Scale (RAPS), Short-Form McGill Pain Questionnaire (SF-MPQ), Verbal Descriptive Scale (VDS), Visual Analog Scale (VAS), and West Haven-Yale Multidisciplinary Pain Inventory (WHYMPI). Arthritis Care Res. 2003;49(S5):S96–S104. [Google Scholar]

[R5] 5.Freidlin B, Korn EL. Adpative randomization versus interim monitoring. J clin Oncol 2013; 31(7): 969–970. [DOI] [PubMed] [Google Scholar]

[R6] 6.Gajewski BJ, Berry SM, Quintana M, Pasnoor M, Dimachkie M, Herbelin L, Barohn R. Building efficient comparative effectiveness trials through adaptive designs, utility functions, and accrual rate optimization: finding the sweet spot. Stat Med. 2015; 34(7):1134–49. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Barohn Gajewski, Pasnoor Brown, Herbelin Kimminau, Mudaranthakam Jawdat, Dimachkie, PAIN-CONTRoLS Study Team (in press), “Patient Assisted Intervention for Neuropathy: Comparison of Treatment in Real Life Situations (PAIN-CONTRoLS): Bayesian Adaptive Comparative Effectiveness Trial,” JAMA Neurology. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.U.S. Department of Health and Human Services Food and Drug Administration, Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER). Adaptive Designs for Clinical Trials of Drugs and Biologics Guidance for Industry. Nov. 2019.

[R9] 9.Gelman A, Carlin J, Stern H, Dunson D, Vehtari A, Rubin D. Bayesian Data Analysis. New York, NY: CRC Press; 2014. [Google Scholar]

[R10] 10.Jang Y, Zhao W, Durkalsk-Mauldin V. Impact of adaption algorithm, timing, and stopping boundaries on the performance of Bayesian response adaptive randomization in confirmative trials with binary end-point. Contemp Clin Trials 2017; 62: 114–120 [DOI] [PubMed] [Google Scholar]

[R11] 11.Nason M, Follman D. Design and Analysis of Crossover Trials for Absorbing Binary Endpoints. Biometrics 2010;7:958–965 [DOI] [PubMed] [Google Scholar]

[R12] 12.Pasnoor M, Dimachkie MM, Barohn RJ. Cryptogenic sensory polyneuropathy. Neurol Clin. 2013;31:463–76. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.R Core Team(2017). R: a Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing [Google Scholar]

[R14] 14.Stan Development Team (2017) RStan: the R interface to Stan, version 2.16.1 (Available from http://mc-stan.org.)

[R15] 15.Stan Modeling Language User’s Guide and Reference Manual, Version 2.16.0 Stan Development Team. (Available from http://mc-stan.org.) [Google Scholar]

[R16] 16.Stensland KD, McBride RB, Latif A, Wisnivesky J, Hendricks R, Roper N, et al. Adult cancer clinical trials that fail to complete: an epidemic. J Natl Cancer Inst 2014 [DOI] [PubMed] [Google Scholar]

[R17] 17.Tang c, Sherman SI, Price M, Weng J, Davis SE, Hong DS, Yao JC, Buzdar A, Wilding J, Lee JJ. Clinical Trial Characteristics and Barriers to Participant Accrual: The MD Anderson Cancer Center Experience over 30 years, a Historical Foundation for Trial Improvement. Cancer Therapy: Clinical March 2017 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Rosenberger WF, Vidyashankar AN, Agarwal DK. Covariate-adjusted response-adaptive designs for binary response. Journal of Biopharmaceutical Statistics, 11(4), 227–236 (2001) [PubMed] [Google Scholar]

[R19] 19.Wellek S, Blettner M. On the proper use of the crossover design in clinical trials: part 18 of a series on evaluation of scientific publications. Dtsch Arztebl Int. 2012;109(15):276–281. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Wick J, Berry SM, Yeh H, Choi W, Pacheco CM, Daley C, Gajewski BJ (2017), A Novel Evaluation of Optimality for Randomized Controlled Trials. Journal of Biopharmaceutical Statistics, 27 (4), 659–672. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Comparative Effectiveness Research using Bayesian Adaptive Designs for Rare Diseases: Response Adaptive Randomization Reusing Participants

Fengming Tang

Byron J Gajewski

Abstract

1. Introduction

2. Methods

2.1. Trial summary

2.2. Statistical models

2.2.1. Independent logistic model (for Conventional-RAR design and Conventional-noRAR design)

2.2.2. Hierarchical logistic model (for Reuse-RAR design and Reuse-noRAR design)

2.3. Accrual rate patterns

2.4. Interim analysis schedule

2.5. Response Adaptive Randomization (RAR)

2.6. Virtual response rate

2.7. Success criteria and Model calibration

Figure 1.

Table 1.

2.8. Carryover effect and period effect

2.9. Simulations

3. Results

3.1. Power

Figure 2.

Figure 5.

3.2. Number of participants enrolled

Figure 3.

3.3. Trial duration:

Figure 4.

3.4. Proportion of observations that received treatment 5

3.5. Compare Reuse-RAR(complete) and Reuse-RAR

Figure 6.

3.7. Participant dropouts

Figure 7.

4. Conclusion

5. Discussion

Acknowledgement

Appendix. R code and Stan code

Stan code for Conventional-RAR design:

Stan code for Reuse-RAR and Reuse-noRAR design:

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases