Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 May 19.
Published in final edited form as: Stat Biopharm Res. 2024 Feb 26;16(4):547–557. doi: 10.1080/19466315.2024.2308877

A Two-Stage Covariate-Adjusted Response-Adaptive Enrichment Design

Li Yang 1, Guoqing Diao 2, William F Rosenberger 3
PMCID: PMC12087802  NIHMSID: NIHMS1992064  PMID: 40391207

Abstract

In the precision medicine paradigm, it is of interest to identify subgroups that benefit most from the treatment. However, the subgroup often cannot be identified until after a large-scale clinical trial. Clinical trials are often designed under the assumption of no treatment-by-covariate interaction effect and enroll all comers. This makes many patients go through unnecessary treatment and may decrease the efficiency of the trial. We propose a two-stage enrichment design that uses covariate-adjusted response-adaptive (CARA) allocation and a novel interaction pseudo-randomization test to evaluate the interaction effect in the interim analysis for binary and continuous outcomes. A pre-defined alpha level is used as the threshold to decide whether a subgroup will be identified and recruited in the second stage. If a below-threshold interaction effect is found, a regression model will be fit and the stratum with the largest treatment effect will be chosen as the best stratum. The trial will continue to the second stage with patients from the best stratum only. If the p-value from the interim analysis is above the threshold, the trial continues with all patients. The primary aim is to test the treatment effect between treatment groups. Different CARA procedures are compared in terms of type I error rates, power, and ethical considerations. The CARA procedure that balances better between efficiency and ethics is used in the proposed two-stage enrichment design.

Keywords: Adaptive designs, Precision medicine, Permutation test, Randomization test, Threshold

1. Introduction

With the rapid development in genomic and genetic research, precision medicine has gained more attention in modern clinical trials. Molecularly targeted therapies are likely to only work in a subgroup of biomarker-positive patients. Many clinical trial designs have been developed to incorporate biomarkers. An enrichment design, also called a targeted design, was first studied by Simon and Maitournam (2004) and Maitournam and Simon (2005). In this single stage design, patients are screened and selected by their biomarker status such that only biomarker-positive patients are enrolled and randomized to treatment groups. Only one null hypothesis of no treatment effect in the biomarker-positive subgroup can be tested.

An adaptive design is defined as a multistage study design that uses accumulating data to decide how to modify aspects of the study without undermining the validity and integrity of the trial (Dragalin, 2006). In order to expand the testable hypotheses and/or deal with no clearly defined subgroup at the beginning of the trial, several adaptive enrichment designs have been proposed. One of the first biomarker-based, adaptive enrichment designs was introduced by Wang et al. (2007). The proposed two-stage adaptive enrichment design randomizes all subjects to treatment or control groups in stage I. If the treatment effect reaches a futility boundary in the biomarker-negative group in the interim analysis, the recruitment of the biomarker-negative subjects is terminated at the second stage, and the remaining sample size is reallocated to biomarker-positive patients. In this case, the primary hypothesis is to test the treatment effect in the biomarker-positive subgroup. Otherwise, if the futility threshold is not reached in the biomarker-negative group in the interim analysis, the trial continues for all patients, and both overall and subgroup-specific tests are performed. The sample size is calculated based on the non-adaptive approach and kept unchanged in the interim analysis.

However, in practice, there may not be a clearly defined subgroup at the beginning of the phase III trial. The biomarker may be continuous with no known cutpoint or no clear single biomarker available to define a subgroup. Simon and Simon (2013) proposed a phase III adaptive enrichment design, which begins with all patients in the trial, and sequentially restricts entry in an adaptive manner. This enrichment approach does not require a pre-defined subgroup. The primary null hypothesis is that no subgroup benefits more from treatment over control. Adaptive enrichment designs may increase power, especially when only a small subset of patients benefit from the treatment.

Simon (2015) discussed the application of and challenges for adaptive enrichment designs under three common scenarios: a single categorical biomarker, a single continuous biomarker with an unknown cut point, and multidimensional biomarkers/combining multiple candidate biomarkers. Strata-based designs are effective for simple categorical biomarkers, but they often cannot control type I error rate well. For univariate continuous biomarkers, strata-based designs need a pre-defined strata and do not leverage the ordering of the categories; model-based designs do not require pre-defined strata, but they can only test a single null hypothesis of no subgroup benefits more from treatment over control. Model-based designs are more effective for multivariate biomarkers.

Strata-based adaptive enrichment designs have clearly defined subgroup characteristics at the design stage. Follmann (1997) was one of the first ones to explore adaptively changing stratum proportions. They conclude that the adaptation does not affect the type I error rate under the conditions that either all observations are i.i.d. under the null hypothesis for unstratified test or observations are i.i.d. within each stratum under the null hypothesis for stratified test statistic. Mehta and Gao (2011) proposed an adaptive enrichment design, where adaptive modification is made of an ongoing group sequential trial based on an interim analysis. The modifications include adaptations in the number, spacing, and information times of subsequent interim analyses, as well as population enrichment. In order to evaluate the overall treatment effect in both biomarker-positive and biomarker-negative patients in enrichment designs. Yang et al. (2015) presented a novel design that enrolls biomarker-negative patients after sufficiently powering biomarker-positive patients have been enrolled. They derive an unbiased weighted statistic to assess the overall treatment effect. The overall treatment effect will only be tested if the treatment benefit is established in biomarker-positive patients first. Chiu et al. (2018) proposed a subgroup selection design for normally distributed endpoints. The selection is based on the maximum test statistic. They assess the bias of the maximum likelihood estimator under different scenarios and find that bias is almost always positive leading to an overestimated true treatment effect.

Model-based adaptive enrichment designs are two-stage designs where the complete specification of subgroup characteristics is only available at the end of the first stage, based on the interim analysis. Simon and Simon (2013) developed a general model and statistical significance tests for eligibility modification. They illustrate the framework in the setting of adaptive threshold enrichment of a single continuous biomarker with no known cut-off at the beginning of the trial. Renfro et al. (2014) proposed a two-stage adaptive enrichment phase II design incorporating prospective continuous marker threshold selection, possible early futility stopping, possible mid-trial accrual restriction, and final marker and treatment evaluation in marker-positive patients. Simulation studies demonstrated that type I error rates are in the acceptable range. The power is highly dependent on the successful classification of the true predictive biomarker during the interim analysis. This critical classification of the biomarker depends not only on the biomarker prevalence and effect size, but also on the timing of the interim analysis. Fridlyand et al. (2013) discussed the difficulties and lack of widely accepted statistical approaches to estimate a relevant threshold for a continuous biomarker. They provide recommendations for continuous biomarkers without a clear threshold in Phase III clinical trials.

Spencer et al. (2016) proposed a continuous biomarker-adaptive threshold trial design, which both selectively recruits from the start of the trial and also modifies the eligibility criteria to a targeted subgroup that will have a statistically significant response rate. All subjects are used in the final test for efficacy, even if the eligibility criteria is changed during the interim. The total and stage specific sample sizes are fixed before the study begins. A preliminary threshold is chosen based on prior knowledge at stage I and is updated at stage II based on the results from the first stage. A binomial exact test using all subjects recruited to the trial to test whether the response rate exceeds a pre-defined reference rate. The adaptive design has higher power in both overall and completed studies when the true threshold value is above the 0.3 quantile. Zhao and LeBlanc (2020) discussed the ultimate impact of a study as the expected outcome for patients if they were in the standard therapy group plus the expected treatment improvement for the fraction of patients in the study target group times the study power. A higher ultimate impact will benefit more individuals in the whole disease population. They propose a Max-Impact design that optimizes the ultimate impact. The design selects the most appropriate subpopulation for a future study and considers the benefits that would be obtained for both on and offstudy patients. They focus on continuous biomarkers and the design applies to continuous, binary, and survival endpoints.

Despite the existence of a treatment-by-covariate interaction effect in many situations, clinical trials are often designed under the assumption of no such effect. Ayanlowo and Redden’s (2008) two-stage adaptive design examines the interaction effect and a trial moves to the second stage only if a significant interaction effect is detected in the interim analysis. The second stage is non-enriched, but stratified based on covariate strata. Simon and Simon’s (2013) design selects the responsive patients in the second stage. If some but not all sub-groups respond to the treatment, it is an enrichment design. If all cases benefit from the treatment, it becomes a non-enrichment design. Zhu et al. (2013) proposed a non-enrichment optimal allocation to two treatment groups to maximize the power of testing the interaction between treatment and dichotomized covariate. Freidlin et al. (2010) summarized the characteristics for randomized clinical trials with biomarkers. They noted that one of the main limitations of the classical enrichment design is that one must be confident that the biomarker can be used to identify the subgroup who benefits from the treatment with reasonable accuracy. It may lead to slowing down accrual, increasing expense, and even abandoning a beneficial therapy if the treatment benefits all patients.

Adaptive designs mentioned above mostly use equal allocation procedures in the first stage and traditional population model-based analyses during the interim analyses. Compared to equal allocation procedures, covariate-adjusted response-adaptive (CARA) procedures assign more patients to the superior treatment group based on characteristics of patients. They take into consideration patient heterogeneity to achieve both ethical and efficiency goals (Hu and Rosenberger, 2006).

Population model-based analyses rely on certain model assumptions and large-sample theory. Randomization tests have been studied as assumption-free alternatives or complements to the traditional population model-based analyses. Rosenberger et al. (2019) discussed the advantages of randomization tests and the application of the randomization tests in testing the primary outcome and sequential monitoring. Other studies have tested the treatment effect among groups using randomization tests (Galbete and Rosenberger, 2016, Parhat et al., 2014, Plamadeala et al., 2012). Parhat et al. (2014) show that, under model misspecification, randomization tests preserve the size and power well for generalized linear regression, survival, and longitudinal models, while population model-based tests have inflated type I error rates and reduced power. Still and White (1981) showed the application of the randomization test in assessing the interaction effect in the analysis of variance settings. However, the effect and application of using randomization-based tests to examine the interaction effect in the generalized linear model setting when we have multiple covariates strata or categorical outcomes remain unclear and unstudied.

In this paper, we propose a two-stage CARA enrichment design, which uses a CARA procedure in the first stage to allocate patients, a pseudo-randomization test for the interaction effect developed in the spirit of randomization tests in the interim analysis to test the treatment-by-covariate interaction effect, and a pre-defined alpha level as the threshold to decide whether a subgroup will be identified and exclusively recruited in the second stage. We compare different CARA procedures in terms of efficiency and ethics consideration. The efficiency is measured by the testing power and ethics is measured by the overall success rate. Section 2 introduces the basic layout of the two-stage enrichment design and the simulation protocol. In Section 3, we compare the performance of different designs for binary and continuous outcomes. Simon and Simon (2013) design is used as the enrichment benchmark since it covers situations for both binary and continuous outcome and biomarkers with or without pre-defined threshold. In Section 4, a real data example from NSABP trial is used to evaluate the performance of the proposed design compared with the original non-adaptive design. We draw some conclusions and future work recommendations in Section 5.

2. Methods

2.1. Basic Layout of the Two-stage Enrichment Design

Here, we consider a generalized linear model (GLM) setting for two groups. Suppose for a given d×1 vector of covariates Z, the response Yk of the treatment k=1,2 has a distribution in the exponential family under the GLM:

fYkZ,θk=expYkμk-αμk/ψk+bYk,ψk,

with the link function μk=hθkTZ, where θk=θk1,,θkd, k=1,2 are the unknown parameters. Assume the scale parameter ψk is fixed. Under this model, EYkZ=αμk, VarYkZ=αμkψk.

For binary outcomes, let p1(Z) be the success rate for the treatment group, q1(Z)=1-p1(Z) and p2(Z) the success rate for the control group, q2(Z)=1-p2(Z). The treatment effect size Δ is defined by the relative risk p1(Z)/p2(Z). For continuous outcomes, let μ1(Z) be the mean response of the primary efficacy outcome for the treatment group and μ2(Z) the mean response for the control group. Assume that the response variable in the treatment and the control groups has an equal variance denoted by σ2. The treatment effect size Δ is defined by Δ=μ1(Z)-μ2(Z).

For binary outcomes, let πm+1,1zm+1 be the probability of allocating the (m+1)-th patient with covariate zm+1 to the treatment group. three target allocation rules are considered in the first stage of our design:

The CARA1 target allocation is proportional to the covariate-adjusted odds ratio. The CARA2 minimizes the expected number of treatment failures subject to the fixed asymptotic variance of the test statistic. The CARA3 is the optimal allocation that minimizes expected treatment failures subject to the fixed asymptotic variance of the log-odds ratio.

In order to decrease the variability and preserve the randomness of those adaptive procedures which depend on unknown parameters p1(z) and p2(z), doubly-adaptive biased coin design (DBCD) (Hu et al., 2004) and efficient randomized adaptive design (ERADE) (Hu et al., 2009) are used for each target allocation. The Covariate-adjusted DBCD allocation rule is defined as:

ϕm+1,1zm+1=πˆm+1,1zm+1πˆm+1,1zm+1Nm+1,1zm+1γπˆm+1,1zm+1πˆm+1,1zm+1Nm+1,1zm+1γ+1-πˆm+1,1zm+11-πˆm+1,1zm+1Nm+1,2zm+1γ,

where ϕm+1,1zm+1 is the probability of allocating the (m+1)-th patient with covariate zm+1 to the treatment group, πˆm+1,1zm+1 is the calculated allocation probability from a CARA procedure, Nn,kz is the number of patients with given covariate z in group k, and γ is the tuning parameter that controls the variability of the procedure. When γ=0, this procedure reduces to the sequential maximum likelihood estimation CARA procedure.

The ERADE allocation rule is defined as:

ϕm+1,1zm+1=απˆm+1,1zm+1,ifNm+1,1zm+1/mzm+1>πˆm+1,1zm+1πˆm+1,1zm+1,ifNm+1,1zm+1/mzm+1=πˆm+1,1zm+11-α1-πˆm+1,1zm+1ifNm+1,1zm+1/mzm+1<πˆm+1,1zm+1,

where α[0,1]. The authors recommend choosing an α between 0.4 and 0.7.

For continuous outcomes, assuming that larger responses are desirable, the target allocation which is based on Zhang et al.’s (2007) CARA design is calculated as:

πm+1,1zm+1=Φzm+1θˆ1-zm+1θˆ2G, (2)

where Φ() is the cumulative distribution function of the standard normal distribution, θkˆ are the parameter estimators for the kth group (k=1,2), and G is a tuning parameter. The smaller the G value, the more the allocation is skewed.

At the interim analysis stage, when testing the interaction effect, under the null hypothesis, there still might be overall treatment effects. Randomization tests are computed under the null hypothesis of no treatment effect, and consequently, outcomes and treatment assignments are independent. By permuting all possible treatment assignments and re-calculating the test statistic, we obtain an assumption-free p-value as the sum of the probabilities of all sequences giving a more extreme test statistic. However, in the presence of a treatment-covariate interaction, we cannot conduct a randomization test under the null hypothesis. Permutation tests are also not appropriate, because the CARA design induces non-exchangeable randomization sequences. Consequently, we describe an analog to a Monte Carlo randomization test in such a setting.

In our design, after using a CARA procedure to allocate all n patients to the two treatment groups and collecting the responses, a generalized linear regression model is fit. The score test statistic for testing the interaction effect based on the observed dataset is calculated as the observed test statistic. The score test only involves the restricted maximum likelihood estimation and the test statistic is calculated as:

Sn=l(θ)T-l(θ)-1l(θ)θ=θ˜,

where l(θ) and l(θ) are the first and second derivatives of the log-likelihood function, respectively.

While keeping the treatment allocation and the covariate status fixed, we regenerate Ls sequences of the interaction terms using Monte Carlo simulation and the score test statistics are calculated for each set of permutations. The two-sided p-value is calculated as

pˆ=l=1LsISlSobsLs.

The value of L at 2, 500 will bound the mean squared error of the p-value at 0.0001. In order to estimate very small p-values accurately, Rosenberger and Lachin (2015) suggest 20, 000 sequences. Galbete and Rosenberger (2016) demonstrated that 15, 000 sequences produce tests that are almost identical to exact tests. This test has no validity under a randomization-based inference or permutation-testing paradigms; however, as shown in Proschan and Dodd (2019), it will preserve the type I error rate. We refer to this test as an “interaction pseudo-randomization test.

We use a pre-defined alpha level as the threshold to decide whether a subgroup will be identified and recruited in the second stage. If a below-threshold interaction effect is found, a regression model will be fitted and the subgroup with the largest treatment effect will be chosen as the best subgroup. The trial will continue to the second stage with patients from the best subgroup only. If the p-value from the interim analysis is above the threshold, the trial continues with all patients. The primary aim is to test the treatment effect between treatment groups.

2.2. Simulation Protocol

We consider different covariate profiles, including binary and categorical covariates with equal and unequal probabilities among the categories. Given the covariates, we consider both binary and continuous outcomes.

2.2.1. Binary Outcomes in Logistic Regression Models

First, we consider a binary outcome with two treatment groups and one categorical covariate. Let Yi=1 if a patient’s response is a success, and Yi=0 otherwise. Let pi=PrYi=1Zi be the probability of success for a given covariate Z (which includes the intercept term zi1=1, treatment indicator zi2=Ti, biomarker Xi or dummy variables defined based on Xi, and the interactions between them) and qi=1-pi. The logistic regression model is given as

logitpi=ZiTθ,i=1,,n, (3)

where θ is a d×1 vector of model parameters. Notice that d=4 when there are two covariate strata and d=8 when there are four covariate strata. Our objective in the interim analysis is to identify the subgroup that responds to the treatment better by testing the treatment-by-covariate interaction effect. For a logistic regression model with a two-strata covariate:

logitpi=θ1+θ2Ti+θ3Xi+θ4TiXi,

where θ4 is the interaction effect, the following hypotheses are tested:

H0:θ4=0versusH1:θ40.

If the covariate has four categories (levels 1,2,3,4), then the model takes the form of:

logitpi=θ1+θ2Ti+θ3IXi=2+θ4IXi=3+θ5IXi=4+θ6TiIXi=2+θ7TiIXi=3+θ8TiIXi=4,

where θ6,θ7,θ8 are the interaction effects. We are testing

H0:θ6=θ7=θ8=0versusH1:Not all three equal to0.

The main treatment effect is tested in the final analysis.

For a binary outcome with two treatment groups in a logistic regression model, simulations are run to compare the proposed design with the traditional non-enrichment design and the Simon and Simon’s enrichment design (Simon and Simon, 2013). The Wald test and the interaction pseudo-randomization test are used in the final analysis for the proposed design. The parameters used in the binary covariate scenarios are: θ1=0.5,θ2=0,θ3=0.5, and θ4 ranging from 0 to 1.1. The sample size of 1, 000 is chosen so that the non-enrichment design achieves at least 80% power when θ4=1.1. The parameter values used in the categorical covariate with four strata scenarios are: θ1=0.5,θ2=0,θ3=1,θ4=0.5,θ5=0, and θ6-θ8 ranging from 0 to 0.7. The sample size of 1, 500 is chosen so that the non-enrichment design achieves at least 80% power when θ6=θ7=θ8=0.5. We use bias(θˆ)=E(θˆ)-θ as a performance measure for estimation assessment.

2.2.2. Continuous Outcomes in Linear Regression Models

We then consider continuous normal outcomes with two treatment groups and one categorical covariate. Suppose responses follow a linear regression model with homoscedastic variance

Yi=ZiTθ+ϵ, (4)

where ϵ~N0,σ2,Yi is an n×1 vector of responses, Zi is an n×d covariate matrix (including the intercept zi1=1 and group indicators zi2=Ti), and θ is a d×1 vector of model parameters. Similar tests are considered as described in Section 2.2.1. The main treatment effect is tested in the final analysis.

Simulations are conducted to compare non-enrichment designs, the Simon and Simon’s enrichment design, the Zhu’s design (Zhu et al., 2013), and our CARA enrichment designs. Since there are no explicit test statistics given by Simon and Simon (2013) for continuous outcomes in a sequential enrollment scenario, the group sequential analysis method with block size two is used. The test statistic is calculated based on

1nkKnkx(T,k)-x(C,k)σˆ(T,k)2/nT,k-1+σˆ(C,k)2/nC,k-1, (5)

where x(T,k),x(C,k),σˆ(T,k)2,σˆ(C,k)2,nT,k, and nC,k denote the treatment and control sample means, variances, and sample sizes in the kth block, respectively.

The parameters used in the binary covariate scenarios are: θ1=0.5,θ2=0,θ3=0.5, and θ4 ranging from 0 to 0.7. The sample size of 400 is chosen so that the non-enrichment design achieves at least 80% power when θ4=0.7. The parameter values used in the categorical covariate with four strata scenarios are: θ1=0.5,θ2=0,θ3=1,θ4=0.5,θ5=0, and θ6-θ8 ranging from 0 to 0.8. The sample size of 400 is chosen so that the non-enrichment design reaches at least 80% power when θ6=θ7=θ8=0.8 with 10% outliers. The two weight parameters and the allocation probability parameter in Zhu et al.’s (2013) design are chosen based on the authors’ recommendation with w1,w2=(1,1) and p=0.8.

3. Results

3.1. Binary Responses

DBCD (γ=2) targeting CARA3 (1) is used in the first stage of our design for binary outcomes since it has a better balance between the power and the overall success rate. Table 1 shows the results from the binary covariate with two equally-distributed strata. All four designs preserve the type I error rates. Under the null hypothesis θ4=0 or when effect sizes are small θ4=0.3, Simon and Simon’s enrichment designs could end up with smaller sample sizes since neither covariate strata meets the inclusion criterion; therefore, it could be less powerful than the standard non-enrichment design. As the effect size gets larger θ4=0.5,0.7, the Simon and Simon’s enrichment design becomes more powerful than the non-enrichment design. The overall success rates in Simon and Simon’s enrichment designs are always higher than non-enrichment designs due to the biased sampling in the second stage. The two proposed CARA enrichment designs consistently have higher power than non-enrichment and Simon and Simon’s enrichment designs under different effect sizes. CARA enrichment designs with non-enriched second stages have lower overall success rates than Simon and Simon’s enrichment designs, but those with enriched second stages have higher overall success rates. A threshold of 0.2 leads to a non-enriched second stage and a threshold of 0.4 leads to an enriched second stage in the two proposed CARA enrichment designs θ4=0.3,0.5. When there are only a small portion of patients who would benefit from the treatment (Table 2), the two proposed CARA enrichment designs with enriched second stages θ4=0.7 have much higher power than non-enrichment and Simon and Simon’s enrichment designs. The two CARA enrichment designs have similar power under all scenarios. The variances of the success rates under Simon and Simon’s enrichment designs are higher than in the other three designs. The bias is small under the null hypothesis for binary outcomes with equally-distributed covariates. The bias increases slightly under the alternatives (Table 3). Similar results are found for unequally-distributed covariate strata scenario.

Table 1:

Power and overall success rates from different designs for two equally-distributed covariate strata and binary outcomes using all data from both stages

Theta Design Threshold Rejection Rate SR1(var*N)
(0.5, 0, 0.5, 0) NED2 0.051 0.677(0.222)
SED3 0.041 0.677(0.578)
MCED14 0.2 & 0.4 0.051 0.677(0.222)
MCED25 0.2 & 0.4 0.048 0.677(0.225)
(0.5, 0, 0.5, 0.3) NED 0.159 0.690(0.215)
SED 0.151 0.705(0.628)
MCED1 0.2 0.215 0.692(0.221)
MCED2 0.2 0.217 0.692(0.216)
MCED1 0.4 0.374 0.726(0.203)
MCED2 0.4 0.371 0.726(0.204)
(0.5, 0, 0.5, 0.5) NED 0.327 0.698(0.210)
SED 0.401 0.721(0.575)
MCED1 0.2 0.491 0.702(0.218)
MCED2 0.2 0.485 0.702(0.216)
MCED1 0.4 0.750 0.742(0.198)
MCED2 0.4 0.757 0.742(0.201)
(0.5, 0, 0.5, 0.7) NED 0.517 0.705(0.207)
SED 0.715 0.732(0.555)
MCED1 0.2 & 0.4 0.947 0.757(0.193)
MCED2 0.2 & 0.4 0.956 0.757(0.197)
1

success rate

2

non-enrichment design

3

Simon and Simon’s enrichment design

4

CARA enrichment design using Wald test

5

CARA enrichment design using pseudo-randomization test

Table 2:

Power and overall success rates from different designs for two unequally (2 : 8) distributed covariate strata and binary outcomes using all data from both stages

Theta Design Threshold Rejection Rate SR1(var*N)
(0.5, 0, 0.5, 0) NED2 0.052 0.645(0.223)
SED3 0.048 0.649(0.585)
MCED14 0.2 & 0.4 0.050 0.644(0.231)
MCED25 0.2 & 0.4 0.052 0.644(0.231)
(0.5, 0, 0.5, 0.3) NED 0.069 0.650(0.231)
SED 0.062 0.668(0.983)
MCED1 0.2 0.087 0.650(0.231)
MCED2 0.2 0.082 0.650(0.230)
MCED1 0.4 0.283 0.705(0.207)
MCED2 0.4 0.276 0.705(0.209)
(0.5, 0, 0.5, 0.5) NED 0.091 0.653(0.230)
SED 0.156 0.681(1.129)
MCED1 0.2 & 0.4 0.141 0.654(0.232)
MCED2 0.2 & 0.4 0.135 0.654(0.231)
(0.5, 0, 0.5, 0.7) NED 0.121 0.656(0.229)
SED 0.332 0.695(1.217)
MCED1 0.2 & 0.4 0.855 0.730(0.201)
MCED2 0.2 & 0.4 0.859 0.730(0.203)
1

success rate

2

non-enrichment design

3

Simon and Simon’s enrichment design

4

CARA enrichment design using Wald test

5

CARA enrichment design using pseudo-randomization test

Table 3:

Parameter estimation bias from the proposed design for two equally-distributed covariate strata and binary outcomes using all data from both stages

Theta Bias
(0.5, 0, 0.5, 0) (−0.0012, −0.0002, −0.0020, 0.0006)
(0.5, 0, 0.5, 0.3) (0.0004, −0.0067, −0.0052, 0.0124)
(0.5, 0, 0.5, 0.5) (0.0004, −0.0067, −0.0048, 0.0113)
(0.5, 0, 0.5, 0.7) (0.0004, −0.0067, −0.0050, 0.0129)

When there are four equally-distributed covariate strata and three of the four strata have higher success rates in the treatment group (Table 4), all four designs preserve the type I error rates. Non-enrichment designs have higher power than Simon and Simon’s enrichment designs when the effect sizes are small. As the effect sizes increase, Simon and Simon’s enrichment designs become more powerful than non-enrichment designs. The CARA enrichment designs have higher power than non-enrichment designs, however the overall success rates are lower since the third covariate group is chosen and enriched based on the interim analysis results. Although as noted in Diao et al. (2018), using the best group only in the final analysis leads to inflated type I error rates, type I error rate inflation is not observed in our simulations. Since the power of a test depends on both the sample size in the final analysis and the effect size, for the selected parameters, under the equal allocation, using all data is more powerful than using the second stage data only. On the other hand, using second stage data only is more powerful than using all data when there are only a small portion of the patients who respond better to the treatment (Figure 1). Additional simulation results of comparing different CARA procedures in the first stage and evaluating the performance of the pseudo-randomization test in the interim analysis for binary outcomes are included in the Supplementary Materials.

Table 4:

Power and overall success rates from different designs for four equally-distributed covariate strata and binary outcomes using all data from both stages

Theta Design Threshold Rejection Rate SR1(var*N)
(0.5, 0, 1, 0.5, 0, 0, 0, 0) NED2 0.049 0.698(0.219)
SED3 0.034 0.698(1.172)
MCED14 0.2 & 0.4 0.049 0.698(0.209)
MCED25 0.2 & 0.4 0.050 0.698(0.215)
(0.5, 0, 1, 0.5, 0, 0.3, 0.3, 0.3) NED 0.409 0.719(0.206)
SED 0.380 0.725(0.816)
MCED1 0.2 0.514 0.720(0.203)
MCED2 0.2 0.513 0.720(0.200)
MCED1 0.4 0.653 0.689(0.215)
MCED2 0.4 0.656 0.689(0.218)
(0.5, 0, 1, 0.5, 0, 0.5, 0.5, 0.5) NED 0.806 0.731(0.200)
SED 0.831 0.740(0.509)
MCED1 0.2 0.892 0.735(0.203)
MCED2 0.2 0.891 0.735(0.198)
MCED1 0.4 0.966 0.709(0.216)
MCED2 0.4 0.964 0.709(0.213)
(0.5, 0, 1, 0.5, 0, 0.7, 0.7, 0.7) NED 0.970 0.741(0.195)
SED 0.983 0.753(0.345)
MCED1 0.2 & 0.4 0.999 0.730(0.215)
MCED2 0.2 & 0.4 0.999 0.730(0.213)
1

success rate

2

non-enrichment design

3

Simon and Simon’s enrichment design

4

CARA enrichment design using Wald test

5

CARA enrichment design using pseudo-randomization test

Figure 1:

Figure 1:

Power curve for binary outcome and binary covariate

3.2. Continuous Responses

DBCD (γ=2) targeting the CARA procedure (2) with a tuning parameter of 6 is chosen in the first stage since it has the higher power with moderately low allocation variability. For a continuous outcome with two treatment groups in a linear regression model (4), simulations are run to compare non-enrichment designs, CARA enrichment designs, and the Simon and Simon’s enrichment design. The test statistic is calculated based on Equation (5) with K=2. All four designs preserve type I error rates (Table 5 and Table 6) when there is one binary covariate. Outliers reduce testing power under all designs. Simon and Simon’s enrichment designs are consistently more powerful than non-enrichment designs.

Table 5:

Type I error rates and power from different designs for two equally-distributed covariate strata and continuous outcomes using all data from both stages

Theta Design Threshold Rejection Rate
(0.5, 0, 0.5, 0) NED1 0.048
SED2 0.050
ZHU3 0.044
MCED14 0.2 & 0.4 0.047
MCED25 0.2 & 0.4 0.050
(0.5, 0, 0.5, 0.2) NED 0.160
SED 0.201
ZHU 0.142
MCED1 0.2 0.173
MCED2 0.2 0.172
MCED1 0.4 0.324
MCED2 0.4 0.316
(0.5, 0, 0.5, 0.4) NED 0.462
SED 0.673
ZHU 0.460
MCED1 0.2 & 0.4 0.854
MCED2 0.2 & 0.4 0.853
(0.5, 0, 0.5, 0.6) NED 0.778
SED 0.945
ZHU 0.807
MCED1 0.2 & 0.4 0.994
MCED2 0.2 & 0.4 0.994
1

non-enrichment design

2

Simon and Simon’s enrichment design

3

Zhu’s adaptive design

4

CARA enrichment design using Wald test

5

CARA enrichment design using pseudo-randomization test

Table 6:

Type I error rates and power from different designs for two equally-distributed covariate strata and continuous outcomes with 10% outliers using all data from both stages

Theta Design Threshold Rejection Rate
(0.5, 0, 0.5, 0) NED1 0.047
SED2 0.047
ZHU3 0.046
MCED14 0.2 0.051
MCED25 0.2 0.048
MCED1 0.4 0.050
MCED2 0.4 0.047
(0.5, 0, 0.5, 0.1) NED 0.053
SED 0.064
ZHU 0.055
MCED1 0.2 & 0.4 0.068
MCED2 0.2 & 0.4 0.068
(0.5, 0, 0.5, 0.3) NED 0.110
SED 0.161
ZHU 0.116
MCED1 0.2 & 0.4 0.222
MCED2 0.2 & 0.4 0.237
(0.5, 0, 0.5, 0.5) NED 0.248
SED 0.373
ZHU 0.245
MCED1 0.2 & 0.4 0.535
MCED2 0.2 & 0.4 0.544
(0.5, 0, 0.5, 0.7) NED 0.431
SED 0.630
ZHU 0.437
MCED1 0.2 & 0.4 0.816
MCED2 0.2 & 0.4 0.826
1

non-enrichment design

2

Simon and Simon’s enrichment design

3

Zhu’s adaptive design

4

CARA enrichment design using Wald test

5

CARA enrichment design using pseudo-randomization test

Both CARA enrichment designs are generally more powerful than Simon and Simon’s enrichment designs with the exception when there is a relatively small effect size and CARA enrichment designs using a tighter threshold of 0.2. In this case, since CARA enrichment designs continue to the second stage with all comers, the Simon and Simon’s enrichment design is more powerful. A looser threshold of 0.4 leads to an enrichment in the second stage, and thus a higher power. The results were similar to those for binary responses as described in Section 3.1 (Table 7).

Table 7:

Parameter estimation bias from the proposed design for two equally-distributed covariate strata and continuous outcomes using all data from both stages

Theta Bias
(0.5, 0, 0.5, 0) (−0.0019, 0.0005, 0.0003, −0.0001)
(0.5, 0, 0.5, 0.2) (−0.0024, 0.0018, 0.0011, 0.0025)
(0.5, 0, 0.5, 0.4) (−0.0024, 0.0018, 0.0010, 0.0027)
(0.5, 0, 0.5, 0.6) (−0.0024, 0.0018, 0.0009, 0.0029)

All designs preserve the type I error rates (Table 8 and Table 9) when there is one categorical covariate with four equally-distributed strata. The non-enrichment design is consistently less powerful than the other three designs even under CARA enrichment designs without enrichment in the second stage since CARA enrichment designs use CARA randomization and assign more patients from the responsive strata to the treatment group. The Simon and Simon’s enrichment design has higher power when CARA enrichment designs do not enrich in the second stage under no outliers scenarios. An enriched second stage in a CARA enrichment design guarantees a higher power. The two CARA enrichment designs have similar power under all scenarios. Zhu et al.’s (2013) design has powers similar to those of the non-enrichment design. Similar to binary outcomes, using any three types of data preserve type I error rates (Figure 2). Using all data yields higher power than the other two types of data when three out of four strata benefit from the treatment. The power differences are greater when there are outliers and larger effect sizes. Additional simulation results of comparing different CARA procedures in the first stage and evaluating the performance of the pseudo-randomization test in the interim analysis for continuous outcomes are included in the Supplementary Materials. Zhang et al.’s (2007) test has inflated type I error rates when outliers are present. Although both the Wald and the interaction pseudo-randomization tests preserve type I error rates under model misspecification; the interaction pseudo-randomization test consistently has higher power than the Wald test under different effect size scenarios.

Table 8:

Type I error rates and power from different designs for four equally-distributed covariate strata and continuous outcomes using all data from both stages

Theta Design Threshold Rejection Rate
(0.5, 0, 1, 0.5, 0, 0, 0, 0) NED1 0.050
SED2 0.058
MCED13 0.2 & 0.4 0.040
MCED24 0.2 & 0.4 0.043
(0.5, 0, 1, 0.5, 0, 0.2, 0.2, 0.2) NED 0.275
SED 0.314
MCED1 0.2 & 0.4 0.285
MCED2 0.2 & 0.4 0.280
(0.5, 0, 1, 0.5, 0, 0.4, 0.4, 0.4) NED 0.766
SED 0.845
MCED1 0.2 0.827
MCED2 0.2 0.826
MCED1 0.4 0.916
MCED2 0.4 0.923
(0.5, 0, 1, 0.5, 0, 0.6, 0.6, 0.6) NED 0.978
SED 0.993
MCED1 0.2 & 0.4 0.999
MCED2 0.2 & 0.4 0.999
1

non-enrichment design

2

Simon and Simon’s enrichment design

3

CARA enrichment design using Wald test

4

CARA enrichment design using pseudo-randomization test

Table 9:

Type I error rates and power from different designs for four equally-distributed covariate strata and continuous outcomes with 10% outliers using all data from both stages

Theta Design Threshold Rejection Rate
(0.5, 0, 1, 0.5, 0, 0, 0, 0) NED1 0.048
SED2 0.053
MCED13 0.2 & 0.4 0.048
MCED24 0.2 & 0.4 0.042
(0.5, 0, 1, 0.5, 0, 0.2, 0.2, 0.2) NED 0.108
SED 0.127
MCED1 0.2 & 0.4 0.143
MCED2 0.2 & 0.4 0.152
(0.5, 0, 1, 0.5, 0, 0.4, 0.4, 0.4) NED 0.334
SED 0.377
MCED1 0.2 & 0.4 0.450
MCED2 0.2 & 0.4 0.472
(0.5, 0, 1, 0.5, 0, 0.6, 0.6, 0.6) NED 0.627
SED 0.713
MCED1 0.2 & 0.4 0.790
MCED2 0.2 & 0.4 0.808
(0.5, 0, 1, 0.5, 0, 0.8, 0.8, 0.8) NED 0.856
SED 0.915
MCED1 0.2 & 0.4 0.959
MCED2 0.2 & 0.4 0.965
1

non-enrichment design

2

Simon and Simon’s enrichment design

3

CARA enrichment design using Wald test

4

CARA enrichment design using pseudo-randomization test

Figure 2:

Figure 2:

Power curve for binary and categorical covariates in linear regression

4. Redesigning the NSABP Trial

The National Surgical Adjuvant Breast and Bowel Project (NSABP) B-35 is a phase III trial to compare anastrozole versus tamoxifen in postmenopausal women with hormone (estrogen and/or progesterone) receptor positive ductal carcinoma in situ undergoing lumpectomy plus radiotherapy. The qualified patients were enrolled and randomly assigned (1 : 1) to receive either oral tamoxifen 20 mg per day or oral anastrozole 1 mg per day for 5 years. Margolese et al. (2016) reported the primary results from this study. A total of 3104 patients were enrolled between Jan 1, 2003 and Jun 15, 2006 and 3077 patients had disease-free endpoint data by Feb 28, 2015. Anastrozole was found superior to tamoxifen in the younger than 60-year-old group, but not in the 60 and older group. For those who are younger than 60 years old, 34 out of 724 (4.7%) in the anastrozole group and 63 out of 723 (8.7%) in the tamoxifen group had recurrent breast cancer events. For those who are 60 and older, 56 out of 815 (6.9%) in the anastrozole group and 59 out of 815 (7.2%) in the tamoxifen group had recurrent breast cancer events. Consider a logistic regression model for two treatment groups and one binary covariate with interaction effect:

logitpi=θ1+θ2Ti+θ3Xi+θ4TiXi,i=1,,n,

where Ti=1 when a patient is in the anastrozole group, Ti=0 when a patient is in the tamoxifen group, Xi=1 when a patient is younger than 60-year-old (47%), Xi=0 when a patient is 60 and older (53%), the outcome Yi=1 when a patient has no recurrent breast cancer events, and Yi=0 when a patient has any recurrent breast cancer events. The parameters used in the simulation are: θ1=2.5564,θ2=0.0458,θ3=-0.2055, and θ4=0.6129. Under the non-enrichment design, all 3104 patients are equally-allocated to the two treatment groups using permuted block randomization with block size of 16. Under the Simon and Simon’s enrichment design, the first 1552 patients are equally-allocated using permuted block randomization with block size of 16. The enrollment and allocation in the second stage are sequentially performed based on the selection criterion. Under the CARA enrichment design, patients are allocated using DBCD (γ=2) with CARA3. An interaction pseudo-randomization test with 15, 000 permutations is used after the first 1552 patients are enrolled. Two thresholds (0.2 and 0.4) are tested. A Wald test is used in the final analysis. The results are presented in Table 10. The CARA enrichment design with a threshold of 0.4 leads to an enriched second stage with the highest overall success rate and is able to detect a significant treatment effect between two drug groups (p=0.0176). Since both age groups have higher success rates in the anastrozole group, the Simon and Simon’s enrichment design is enriched after 2070 patients. CARA enrichment design (0.4) is enriched after 1552 patients. CARA randomization and the enriched second stage lead to 60.3% of patients who are younger than 60 and 54.6% of patients who are 60 and older be allocated to the anastrozole group.

Table 10:

Success rates and p-value from different designs based on NSABP trial

Design p-value Overall SR1 < 60 SR >= 60 SR
NED2 0.053 0.933 0.931 0.935
SED3 0.082 0.927 0.924 0.932
MCED4(0.2) 0.208 0.939 0.943 0.935
MCED(0.4) 0.018 0.939 0.941 0.936
1

success rate

2

non-enrichment design

3

Simon and Simon’s enrichment design

4

CARA enrichment design

5. Conclusions

In this paper, a two-stage enrichment design is proposed. First, we use numerical studies to evaluate the performance of different CARA allocation rules for testing the interaction effect. The performance is measured in terms of the testing power and overall success rate. DBCD(γ=2) targeting CARA2 is the most powerful with the lowest overall success rate among all the CARA procedures compared. DBCD(γ=2) targeting CARA1 skews the allocation the most in the price of a reduced power. DBCD(γ=2) targeting CARA3 is chosen since it has a better balance between the power and the overall success rate.

We then propose an interaction pseudo-randomization test that uses Monte Carlo resampling of the interaction terms based on regression models to examine the interaction effect. Although the observed and generated score test statistics are used in calculating the p-values, the procedure itself is non-parametric since it is based on the randomization distribution induced by the particular sequence. Population model-based and the proposed tests perform equally well when there are two strata and no misspecified data. However, the proposed test is robust to outliers. It preserves the type I error rate under all scenarios for both binary and continuous outcomes, while inflated type I error rates are observed in the population model-based tests under some situations. It has higher power than the population model-based tests when outliers are present. The sample size is calculated based on the non-adaptive design and kept unchanged during the interim analysis since there may be no closed-form solution incorporating enrichment in the second stage. If the enrichment occurred at the second stage, and the remaining sample size is reallocated to the best subgroup.

The proposed design is compared with equal allocation non-enrichment design, the Simon and Simon’s design (Simon and Simon, 2013), and the Zhu’s design (Zhu et al., 2013). The proposed design using all data from both stages is more powerful than the traditional non-enrichment design and the Simon and Simon’s design when enrichment is implemented in the second stage. The Simon and Simon’s design is more powerful than the proposed design in the scenarios wherein an enrichment is not implemented. A less stringent threshold increases the chance of an enriched second stage. Based on our simulation results, the threshold of 0.4 is recommended. The Simon and Simon’s design is less powerful than the traditional non-enrichment design when the majority of the patients benefit from the treatment and the effect size is relatively small. Sequential allocation in the Simon and Simon’s binary outcome scenarios potentially leads to reduced sample size and power. Using the pseudo-randomization test or a Wald test in the final analysis in our design yields similar power levels. The proposed design is robust to outliers compared to the the traditional non-enrichment design and the Simon and Simon’s design. Although our simulations did not find inflated type I error rates while using the data from the best subgroup only, as discussed in Diao et al. (2018), selecting the best subgroup in the second stage induces biased sampling and leads to an inflated type I error rate if only the best subgroup data are used in the final analysis. Moreover, when multiple subgroups benefit from the treatment, using all data in both stages increases the power. Further modification can be implemented to select more than one subgroup if other subgroups show lesser but clinically significant benefits to the treatment.

The proposed design sequentially allocates patients based on patient’s covariate profile and all previous patients’ responses. In real clinical trials, this could be very costly and challenging since the outcome has to be observed relatively early and reliable. Furthermore, potential bias may arise from possible temporal trends in accruing outcome data. The sample size probably needs to be large to be useful in practice as well. In the context of an enrichment design based on the estimation of the threshold of a continuous biomarker, Frieri et al. (2022) found that the sample size required in the first stage is prohibitively large in order to identify the benefitting subgroup. Given finite patient and financial resources, we need to consider whether the improvement in power with a large adaptive design is truly more efficient (Antonijevic, 2016, Chen and Beckman, 2014). Chen and Beckman (2009, 2014) suggested to use the benefit-cost ratio as an efficiency function to measure the cost-effectiveness of Phase II proof-of-concept trials. The net present value approach in Antonijevic (2016) requires detailed financial information, whereas the benefit-cost ratio approach in Chen and Beckman (2009, 2014) does not require any financial information, as the benefit-cost ratio essentially involves only the expected sample sizes. While the current paper does not consider the monetary issue of a trial, it will be interesting to incorporate the benefit-cost ratio approach in our setting. Future research is warranted. Additionally, a group sequential approach might be considered to enhance practicality, reduce trial length, and minimize potential time trend bias.

Different numbers of covariate strata can be incorporated into the C code and similar analyses can be performed. However, multiple treatment arms, continuous covariates, survival, and longitudinal outcomes are not addressed. In practice, we often can re-categorize continuous covariates into categorical variables.

Supplementary Material

Supplementary materials

Acknowledgments

This manuscript was prepared using data from NCT00053898-D1-Dataset from the NCTN Data Archive of the National Cancer Institutes (NCIs) National Clinical Trials Network (NCTN). Data were originally collected from clinical trial NCT00053898 The National Surgical Adjuvant Breast and Bowel Project (NSABP) B-35. All analyses and conclusions in this manuscript are the sole responsibility of the authors and do not necessarily reflect the opinions or views of the clinical trial investigators, the NCTN, or the NCI. This research was supported in part by the NIH Clinical Center Intramural Program.

The authors thank the Editor, the Associate Editor, and two anonymous referees for their valuable comments that have improved the presentation of the paper.

Disclosure Statement

The authors report there are no competing interests to delcare. This research was supported in part by the NIH Clinical Center Intramural Program.

Footnotes

Supplementary Materials

Supplementary materials contain additional simulation results of comparing different CARA procedures in the first stage and evaluating the performance of the pseudo-randomization test in the interim analysis for binary and continuous outcomes.

Contributor Information

Li Yang, Translational Biobehavioral and Health Disparities Branch, National Institutes of Health Clinical Center, Bethesda, MD.

Guoqing Diao, Department of Biostatistics and Bioinformatics, The George Washington University, Washington, D.C..

William F. Rosenberger, Department of Statistics, George Mason University, Fairfax, VA

References

  1. Antonijevic Z (2016). The impact of adaptive design on portfolio optimization. Therapeutic Innovation 8 Regulatory Science, 50(5):615–619. [DOI] [PubMed] [Google Scholar]
  2. Ayanlowo A and Redden D (2008). A two stage conditional power adaptive design adjusting for treatment by covariate interaction. Contemporary Clinical Trials, 29(3):428–438. [DOI] [PubMed] [Google Scholar]
  3. Chen C and Beckman RA (2014). Maximizing return on socioeconomic investment in phase ii proof-of-concept trials. Clinical Cancer Research, 20(7):1730–1734. [DOI] [PubMed] [Google Scholar]
  4. Chen C and Beckman RA (2009). Optimal cost-effective designs of phase ii proof of concept trials and associated go-no go decisions. Journal of Biopharmaceutical Statistics, 19(3):424–436. [DOI] [PubMed] [Google Scholar]
  5. Chiu Y-D, Koenig F, Posch M, and Jaki T (2018). Design and estimation in clinical trials with subpopulation selection. Statistics in Medicine, 37(29):4335–4352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Diao G, Dong J, Zeng D, Ke C, Rong A, and Ibrahim JG (2018). Biomarker threshold adaptive designs for survival endpoints. Journal of Biopharmaceutical Statistics, 28(6):1038–1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Dragalin V (2006). Adaptive designs: terminology and classification. Therapeutic Innovation 8 Regulatory Science, 40(4):425. [Google Scholar]
  8. Follmann D (1997). Adaptively changing subgroup proportions in clinical trials. Statistica Sinica, pages 1085–1102. [Google Scholar]
  9. Freidlin B, McShane LM, and Korn EL (2010). Randomized clinical trials with biomarkers: design issues. Journal of the National Cancer Institute, 102(3):152–160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Fridlyand J, Simon RM, Walrath JC, Roach N, Buller R, Schenkein DP, Flaherty KT, Allen JD, Sigal EV, and Scher HI (2013). Considerations for the successful co-development of targeted cancer therapies and companion diagnostics. Nature Reviews Drug Discovery, 12(10):743–755. [DOI] [PubMed] [Google Scholar]
  11. Frieri R, Rosenberger WF, Flournoy N, and Lin Z (2022). Design considerations for two-stage enrichment clinical trials. Biometrics. [DOI] [PubMed] [Google Scholar]
  12. Galbete A and Rosenberger WF (2016). On the use of randomization tests following adaptive designs. Journal of Biopharmaceutical Statistics, 26(3):466–474. [DOI] [PubMed] [Google Scholar]
  13. Hu F and Rosenberger WF (2006). The Theory of Response-Adaptive Randomization in Clinical Trials. John Wiley & Sons, New York. [Google Scholar]
  14. Hu F, Zhang L-X, et al. (2004). Asymptotic properties of doubly adaptive biased coin designs for multitreatment clinical trials. The Annals of Statistics, 32(1):268–301. [Google Scholar]
  15. Hu F, Zhang L-X, He X, et al. (2009). Efficient randomized-adaptive designs. The Annals of Statistics, 37(5A):2543–2560. [Google Scholar]
  16. Maitournam A and Simon R (2005). On the efficiency of targeted clinical trials. Statistics in Medicine, 24(3):329–339. [DOI] [PubMed] [Google Scholar]
  17. Margolese RG, Cecchini RS, Julian TB, Ganz PA, Costantino JP, Vallow LA, Albain KS, Whitworth PW, Cianfrocca ME, Brufsky AM, et al. (2016). Anastrozole versus tamoxifen in postmenopausal women with ductal carcinoma in situ undergoing lumpectomy plus radiotherapy (nsabp b-35): a randomised, double-blind, phase 3 clinical trial. The Lancet, 387(10021):849–856. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Mehta CR and Gao P (2011). Population enrichment designs: case study of a large multinational trial. Journal of Biopharmaceutical Statistics, 21(4):831–845. [DOI] [PubMed] [Google Scholar]
  19. Parhat P, Rosenberger WF, and Diao G (2014). Conditional monte carlo randomization tests for regression models. Statistics in Medicine, 33(18):3078–3088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Plamadeala V, Rosenberger WF, et al. (2012). Sequential monitoring with conditional randomization tests. The Annals of Statistics, 40(1):30–44. [Google Scholar]
  21. Proschan MA and Dodd LE (2019). Re-randomization tests in clinical trials. Statistics in Medicine, 38(12):2292–2302. [DOI] [PubMed] [Google Scholar]
  22. Renfro LA, Coughlin CM, Grothey AM, and Sargent DJ (2014). Adaptive randomized phase ii design for biomarker threshold selection and independent evaluation. Chinese Clinical Oncology, 3(1):3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Rosenberger WF and Lachin JM (2015). Randomization in Clinical Trials: Theory and Practice. John Wiley & Sons, New York. [Google Scholar]
  24. Rosenberger WF, Stallard N, Ivanova A, Harper CN, and Ricks ML (2001a). Optimal adaptive designs for binary response trials. Biometrics, 57(3):909–913. [DOI] [PubMed] [Google Scholar]
  25. Rosenberger WF and Sverdlov O (2008). Handling covariates in the design of clinical trials. Statistical Science, 23(3):404–419. [Google Scholar]
  26. Rosenberger WF, Uschner D, and Wang Y (2019). Randomization: The forgotten component of the randomized clinical trial. Statistics in Medicine, 38(1):1–12. [DOI] [PubMed] [Google Scholar]
  27. Rosenberger WF, Vidyashankar A, and Agarwal DK (2001b). Covariate-adjusted response-adaptive designs for binary response. Journal of Biopharmaceutical Statistics, 11(4):227–236. [PubMed] [Google Scholar]
  28. Simon N (2015). Adaptive enrichment designs: applications and challenges. Clinical Investigation, 5(4):383–391. [Google Scholar]
  29. Simon N and Simon R (2013). Adaptive enrichment designs for clinical trials. Biostatistics, 14(4):613–625. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Simon R and Maitournam A (2004). Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research, 10(20):6759–6763. [DOI] [PubMed] [Google Scholar]
  31. Spencer AV, Harbron C, Mander A, Wason J, and Peers I (2016). An adaptive design for updating the threshold value of a continuous biomarker. Statistics in Medicine, 35(27):4909–4923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Still A and White A (1981). The approximate randomization test as an alternative to the f test in analysis of variance. British Journal of Mathematical and Statistical Psychology, 34(2):243–252. [Google Scholar]
  33. Wang S-J, O’Neill RT, and Hung H (2007). Approaches to evaluation of treatment effect in randomized clinical trials with genomic subset. Pharmaceutical Statistics, 6(3):227–244. [DOI] [PubMed] [Google Scholar]
  34. Yang B, Zhou Y, Zhang L, and Cui L (2015). Enrichment design with patient population augmentation. Contemporary Clinical Trials, 42:60–67. [DOI] [PubMed] [Google Scholar]
  35. Zhang L-X, Hu F, Cheung SH, and Chan WS (2007). Asymptotic properties of covariate-adjusted response-adaptive designs. The Annals of Statistics, 35(3):1166–1182. [Google Scholar]
  36. Zhao Y-Q and LeBlanc ML (2020). Designing precision medicine trials to yield a greater population impact. Biometrics, 76(2):643–653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Zhu H, Hu F, and Zhao H (2013). Adaptive clinical trial designs to detect interaction between treatment and a dichotomous biomarker. Canadian Journal of Statistics, 41(3):525–539. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary materials

RESOURCES