Abstract
Background
Stepped‐wedge cluster randomized trial (SW‐CRT) designs are often used when there is a desire to provide an intervention to all enrolled clusters, because of a belief that it will be effective. However, given there should be equipoise at trial commencement, there has been discussion around whether a pre‐trial decision to provide the intervention to all clusters is appropriate. In pharmaceutical drug development, a solution to a similar desire to provide more patients with an effective treatment is to use a response adaptive (RA) design.
Methods
We introduce a way in which RA design could be incorporated in an SW‐CRT, permitting modification of the intervention allocation during the trial. The proposed framework explicitly permits a balance to be sought between power and patient benefit considerations. A simulation study evaluates the methodology.
Results
In one scenario, for one particular RA design, the proportion of cluster‐periods spent in the intervention condition was observed to increase from 32.2% to 67.9% as the intervention effect was increased. A cost of this was a 6.2% power drop compared to a design that maximized power by fixing the proportion of time in the intervention condition at 45.0%, regardless of the intervention effect.
Conclusions
An RA approach may be most applicable to settings for which the intervention has substantial individual or societal benefit considerations, potentially in combination with notable safety concerns. In such a setting, the proposed methodology may routinely provide the desired adaptability of the roll‐out speed, with only a small cost to the study's power.
Keywords: adaptive design, clinical trial, interim analysis, multi‐stage, sequential allocation
1. INTRODUCTION
Stepped‐wedge cluster randomized trials (SW‐CRTs) roll an intervention out over several time periods, with all clusters typically ending the trial in the intervention condition. 1 SW‐CRTs have been favored for several reasons, including that sequential roll‐out may assist with logistical constraints. However, SW‐CRTs have not been without criticism. In particular, there has been much discussion of another reason commonly given for using an SW‐CRT: a strong belief that the intervention will do more good than harm, which implies its allocation to all clusters is advantageous. Kotz et al 2 argued this makes SW‐CRTs troubling, because a decision to provide the intervention to all clusters should not be made when its effectiveness remains unproven.
It has been pointed out, however, that the design has typically been used when the intervention has been “shown to be effective in more controlled…settings.” 3 This raises a further important issue though around whether there can be equipoise in an SW‐CRT if there is a strong belief, perhaps emboldened by previous studies, that the intervention will be effective. Given that it has been argued “genuine uncertainty…about the preferred treatment” is a prerequisite for conducting a randomized trial, 4 this calls in to question when an SW‐CRT could be conducted.
Prost et al 5 suggested a constructive solution to this question is to consider whether the evidence in favor of the intervention is sufficient to suggest equipoise is truly disturbed. While there may be a consensus that the intervention will be beneficial, there may still be true uncertainty about its effectiveness in a given context. Thus equipoise may still apply. Ultimately, it has been argued SW‐CRTs in which equipoise is disturbed should not be undertaken. 6 Given equipoise, though, we return to the scenario above where there may then be concern around a decision to provide the intervention to all clusters. This could be particularly true of closed‐cohort SW‐CRT designs, where all participants would then receive the intervention, or when the intervention is associated with substantial safety considerations.
In drug development, response adaptive (RA) design has been suggested as way to address deviations from equipoise that could arise from data collected during a trial. To introduce RA design, consider a parallel two‐arm individually randomized trial. With RA design, the trial incorporates interim analyses at which the allocation ratio can be modified, with the standard being to increase allocation to the best performing treatment. The number of patients expected to receive the best treatment is then increased. If the endpoint used to evaluate the treatments is related to patient benefit, then on average this provides an advantage to patients enrolled on the trial compared to fixed 1:1 randomization. Importantly, any decision to increase allocation to a particular treatment is made using concurrent study data; unlike in an SW‐CRT, which makes this decision pre‐trial. For an overview of RA trial design, see one of several recent monographs 7 , 8 , 9 or the recent review by Robertson et al. 10
It is interesting therefore to ask whether and/or how a conventional SW‐CRT could be modified to incorporate RA intervention allocation, enabling an intervention to be provided to more participants when it is effective, but its roll‐out slowed, or stopped, when ineffective. In this article, we describe a flexible framework for modifying an SW‐CRT allocation matrix at a series of interim analyses. To evaluate the framework, we present the results of an extensive simulation study. To conclude, we describe several practical issues associated with utilizing an RA SW‐CRT design and discuss when it may be useful.
2. METHODS
2.1. Design setting
We suppose an SW‐CRT will be used to compare an intervention to a control; aiming to contrast an RA SW‐CRT with its conventional fixed‐sample analog. We suppose this fixed‐sample SW‐CRT has been designed, omitting discussion on how this can be achieved as it is has been covered elsewhere. 11 , 12 , 13 , 14 Thus, we assume the number of clusters , time periods , and measurements per cluster‐period, have been specified. We consider designs where the m measurements from each cluster‐period are from the same (closed‐cohort design) or from different (cross‐sectional design) participants. We comment on application to open‐cohort designs in Section 4. We also suppose a treatment allocation matrix has been nominated, , , , with implying cluster i receives the intervention in time period j, and otherwise. We refer to this as the initially planned allocation matrix.
We denote the responses to be accrued up to time period p, , by . Specifically, suppose measurement from cluster in period is denoted by . Then
We suppose that at the design stage a particular linear mixed model has been designated for data analysis, and thus it has been assumed , for known nonsingular covariance matrix , design matrix , and fixed effects . As we see in our simulation study later, would typically be expected to include an intercept term, factors to adjust for time effects, and an effect for the intervention relative to the control. Furthermore, we note that a large number of possible analysis models have been proposed for SW‐CRTs; see Li et al 15 for an overview of many of these. To emphasize, our designations above allow for any of these that work within a linear mixed model framework, including those assuming a decaying correlation structure. Finally, we note that we explicitly state the dependence of and upon X since X will later be treated as a variable.
We assume the goal is to make inference on the intervention's effect relative to the control. We suppose this is estimated through and refer to this from here as for brevity. We assume that the one‐sided hypothesis will be tested, with a type‐I error‐rate of desired when . Later, we also compare designs in terms of their power when , for specified , with the target to achieve power of .
Note the generalized least squares estimate of after time period p is . Extracting the last element, , the following Wald test statistic can be calculated
A conventional SW‐CRT would proceed by enrolling C clusters, accruing m measurements per cluster in time periods , and allocating treatments according to X. Its final analysis could be conducted by assessing whether . Our aim, as discussed, is to describe methodology through which X may be altered mid‐trial. Note that it is only X we modify; to provide a fairer comparison to the corresponding conventional fixed‐sample SW‐CRT we assume the initial values of m, C, and P are not altered at the interim analyses.
2.2. Response adaptive stepped‐wedge cluster randomized trials
First, a set of integers , with for , are specified. Then, L interim analyses at which the allocation matrix may be altered are conducted; after time periods . Accordingly, we denote by , , the matrix containing the allocations used in time periods and those planned for time periods . We set , using the initially planned allocation matrix X.
Next, sets are specified, giving the possible allocation matrices to be chosen from at the analysis following time period p, dependent on the value of . That is, . Arbitrary restrictions can be placed on the as are desired. In all instances though, must consist of binary matrices whose elements in columns match those from (as past allocations cannot be changed), and whose elements are such that if then (as clusters cannot switch back to the control). Thus, formally, we must always have that
Note that , so it is always possible to ensure .
To illustrate the possible specification of more clearly, consider an example with and an interim analysis conducted after time period 2. Suppose that
Placing no restrictions on beyond those which are always required (ie, )
If we wished to ensure that all clusters receive the intervention by the trial's completion, we would modify the above to
Note that we order the sequences in the allocation matrices such that a nonincreasing proportion of time is spent in the intervention condition. This removes any degeneracy in the choice of possible allocation matrices.
The remaining component required is a function such that provides a score associated with a choice of . Our approach is then to set
In practice, could be defined in any way that reasonably evaluates the suitability of . Our approach is to specify to permit a balance to be sought between desires to (i) maximize allocation to the most effective arm and (ii) maximize power. We set
Here, assesses the performance of in terms of whether it allocates clusters to the most effective arm (ie, it monitors patient benefit considerations). Note the term involving the information levels evaluates in terms of the power it likely provides. Thus, w is an explicit weight balancing (i) and (ii) above. Note the two factors are rescaled because they exist on different scales. Furthermore, should usually be avoided as a means to breaking ties between designs with identical values for or .
The above formulation has been used previously in RA design, for example, for sequence specification in individually randomized crossover trials. 16 Nonetheless, specifying is complex for SW‐CRTs because allocation is to be adapted for clusters already in the trial. In practice, there may be good reason to make a complex function that, for example, incorporates penalties for the speed or cost of the intervention roll‐out if its availability is limited. Here, we consider a function of arguably more general utility, using only current evidence of effectiveness to guide allocation.
It is logical to insist that as increases, should score designs switching a larger number of clusters to the intervention more highly. It is thus desirable to ensure that when the allocation matrix that switches all clusters to the intervention immediately is recommended. Similarly, as , the matrix that switches no additional clusters to the intervention should be recommended. Many functions will have these properties. In the Supplementary Material, we describe a form for that could be useful if the desire is to only alter the design for extreme intervention effects. To more clearly describe the benefits of RA SW‐CRTs, we focus here on a probabilistic form for that can recommend a broader range of designs, taking
To understand this formulae, note that is the number of clusters in the control condition after time period p. Thus is the number of cluster‐periods for which the roll‐out could be modified. Similarly, is the number of the modifiable cluster‐periods matrix spends in the intervention condition. The form for the success probability, , is chosen to provide the sought after qualities of the function and to provide flexibility such that a search can be conducted for an RA design that has desirable operating characteristics. First, is used to map the continuous Wald test statistic to , enabling its value to serve as a probability that controls the speed of the roll‐out conditional on the interim effectiveness. In addition, is a parameter that can be chosen to influence the value of ; larger values of result in smaller values of , favoring designs slowing the roll‐out of the intervention. Similarly, parameter influences how extreme the values of are, with larger shifting the success probability toward 0.5, which should translate to a more balanced intervention roll‐out. Finally, the denominator includes the factor to scale the success probabilities, allowing them to be more extreme for larger p (ie, when more information is available to base the decision upon). This form for is also discussed further in the Supplementary Material.
The above fully describes the proposed framework for incorporating RA intervention allocation in an SW‐CRT, with the final analysis conducted here analogously to a conventional SW‐CRT by assessing whether . We comment in the discussion on potential alternatives to this rejection rule that may be useful in practice. An algorithm on the conduct of an RA SW‐CRT is provided in the Supplementary Material.
2.3. Simulation study
We assess the performance of the proposed framework through an extensive simulation study that considers three trial design scenarios (TDSs). Each TDS assumes the following model for data generation and analysis 14 , 17 , 18
Here, is the response from individual , in cluster , in period , is a fixed effect for time period j, is a random cluster effect, is a random cluster‐period effect, is a random individual effect, and is the residual error. Thus, in this case, . Primary results for TDS1 are presented here, where TDS2 is also used to provide a simple illustration of the method's use. Additional findings for TDS1 are given in the Supplementary Material, where the results for TDS2 and TDS3 are also presented.
TDS1 is a cross‐sectional SW‐CRT () that has been considered previously. 19 , 20 , 21 It is based on the average characteristics of SW‐CRTs according to Grayling et al, 22 setting and . In X, three clusters switch to the intervention in each of time periods 2 to 5, and two clusters switch in each of time periods 6 to 9. To give a larger value for the intra‐cluster correlation than TDS2, it has and . Additionally, , , and . Using the sample size calculation method from Hussey and Hughes 11 (ie, ), is chosen. For the RA designs, we consider conducting a single interim analyses after time period , , or , and conducting two interim analyses after time periods .
TDS2 is a cross‐sectional SW‐CRT () based upon the trial presented in Bashour et al; 23 a study assessing the effect of training doctors in communication skills on women's satisfaction with doctor‐woman relationship during labor and delivery. In this case, and , with X switching one cluster to the intervention in each of time periods 2 to 5. The final analysis estimated that and . We use these values in all simulations. Following the approach of Hussey and Hughes 11 (), for these variance components the trial would have required 70 patients per cluster‐period for its desired type‐I error‐rate of 5% and its desired type‐II error‐rate of 10% when . Thus, we fix , , , and . We consider conducting interim analyses after time periods and .
TDS3 is a closed‐cohort SW‐CRT scenario, based on the “Girls on the Go!” program to improve self‐esteem in young women in Australia, 24 following the calculation in Hooper et al. 14 Thus, we consider a case where and , with X switching four clusters to the intervention in time periods 2 to 4. Measurements from individuals are assumed to be collected in each cluster and the primary outcome measure (Rosenberg Self‐esteem Scale) is assumed to have , , , and . The conventional design achieves for with . We consider conducting interim analyses after time periods and .
In all three TDSs, we consider performance when , , , and . These values were chosen by factoring in what was computationally feasible and through an initial grid search to identify a range for the parameters beyond which the operating characteristics did not appear to vary substantially. In all cases, we place no restrictions on the beyond those required (ie, we always set ).
For each combination of design parameters, 100 000 replicate simulations are used to estimate several key quantities. These are
The empirical rejection probability (ERP) for , with the values for and referred to as the empirical type‐I error‐rate and power.
The empirical average, standard deviation, and probability mass function of the proportion of cluster‐periods spent in the intervention condition. We refer to the average and standard deviations of this quantity for brevity as the EACP and ESDCP, respectively. The EACP and ESDCP together evaluate patient benefit, for example, larger (smaller) values of the EACP are desired for larger (smaller) treatment effects, while we would likely always prefer small ESDCP. Note that when evaluating these quantities, one must account for the fact that the choice of imparts particular minimal and maximal values for the time spendable in the intervention condition; these will be indicated on all relevant plots.
The empirical average value of , denoted .
The empirical bias (EB) and root‐mean‐square error (ERMSE) of the final point estimate of , . Previous work for individually randomized trials has explored the negative impacts of RA design on point estimation when it is performed in a manner that does not take in to account the interim analyses. 25
Code to reproduce our results is available from https://github.com/mjg211/article_code.
3. RESULTS
3.1. Illustrative description: Trial design scenario 2
To make the proposed methodology more tangible, we illustrate its application to TDS2, where the low number of clusters () and time periods () makes the possible allocation matrices limited. As discussed, Bashour et al 23 utilized the following allocation matrix
Suppose that there was concern around use of this allocation matrix, such that RA design was to be utilized. In practice, this could happen for one of numerous reasons, though principally it may often be because investigators wish to provide a larger number of participants with the intervention if it is effective (this is often especially true for disease settings in which the condition under investigation can be particularly harmful), or because downsides (eg, cost or harm/safety concerns) mean that they would want to limit roll‐out if the intervention was ineffective. As discussed, the first step is then to specify the time periods after which interim analyses will be conducted. As a basic example, we suppose that this is after period , such that .
Thus, the RA trial would proceed by conducting periods 1 to 3 and then computing using the interim data. Placing no constraints on beyond those required, we would have
For the assumed Hussey and Hughes model and variance parameters (, , , ), it can be shown that
To determine the choice of the interim specified allocation matrix, , we then must also calculate the values of the . Suppose that and , and as an example assume . Using our definition of S, we have
That is, . Using our definition of , this gives
Finally, supposing that , we can use the above to show that
Thus, is the matrix that maximizes , and so we set and conduct periods 4 to 5 of the trial using its roll‐out. At the end of the study, we have that the proportion of cluster‐periods spent in the intervention condition is 55%, while the value of determines whether is rejected.
This is of course description of one possible realization of carrying out an RA trial. Our key concerns revolve around what the expected performance of this approach would look like, in terms of our metrics the ERP, EACP, ESDCP, EB, and ERMSE. We present these evaluations in the Supplementary Materials, where we also consider conducting interim analyses after time periods .
3.2. Trial design scenario 1
Switching to TDS1, we commence our investigation of the expected performance of RA procedures. Note that additional results for TDS1 are given in the Supplementary Materials.
3.2.1. Operating characteristics for and
Figure 1 displays the ERP, EACP, ESDCP, EB, and ERMSE of several RA SW‐CRT designs as a function of w and when . As an example, results for and are displayed. Increasing the value of w results in increased power as would be expected, though the difference between the power curves for is small. For the priority given to maximizing power results in an empirical power of 83.0%; above the desired level. The EB is observed to be small, relative to the value of , regardless of the value of w. However, only for is the final point estimate unbiased. A slightly larger impact on the ERMSE is seen for compared to the impact on the EB, though arguably performance is surprisingly strong considering results in the design that minimizes the ERMSE.
FIGURE 1.

The empirical rejection probability (ERP) and empirical average proportion of cluster‐periods spent in the intervention condition (EACP), as functions of w and , of the response adaptive (RA) stepped‐wedge cluster randomized trial (SW‐CRT) designs with , , and , in trial design scenario 1 (TDS1). The dashed lines in the ERP plot indicate the desired type‐I and type‐II error‐rates. In the EACP plot they indicate the minimal, initially planned, and maximal values of the EACP based on
For the EACP is almost identical and increases monotonically in . For the EACP is constant, indicative of the same design being chosen to maximize power no matter the value of . For , the EACP initially increases in , but the competing factors in eventually result in decreases for larger . The ESDCP is maximized for each w when . The precise values of the ESDCP are arguably small when considered in unison with the EACP. For example, for , the ESDCP for together with the corresponding EACP indicates that in the majority of cases we would expect the roll‐out to be sped up, as would be desired.
For , the EACP ranges from 32.2% when to 67.9% when . Under the null and alternative hypotheses the corresponding figures are 48.0% and 61.8%, respectively. This contrasts to 54.4% for the fixed (initially‐planned) design and 45.0% for . Figure 2 displays this pictorially, giving the average value of when . Similarly, Figure 3 presents the probability mass function of the proportion of time spent in the intervention condition. The probability of making an “incorrect” decision (eg, decreasing the roll‐out speed for a large true intervention effect) is evidently small when the absolute value of is large. A potential downside of RA design is observed for, for example, , where the precise variation in the final proportion of participants who received the intervention is evident, when in this case we may prefer some (fixed) value close to 50%. The empirical type‐I error‐rate and power are 5.6% and 76.8%, respectively, in this case.
FIGURE 2.

The empirical average final allocation matrix (), as a function of , of the response adaptive (RA) stepped‐wedge cluster randomized trial (SW‐CRT) design with , , , and , in trial design scenario 1 (TDS1). The dashed lines indicate the timing of the interim analyses
FIGURE 3.

The empirical probability mass function of the proportion of cluster‐periods spent in the intervention condition, as a function of , of the response adaptive (RA) stepped‐wedge cluster randomized trial (SW‐CRT) design with , , , and , in trial design scenario 1 (TDS1)
3.2.2. Operating characteristics as a function of and
Figures 4, 5, 6, respectively, present the ERP, EACP, and ESDCP of the RA SW‐CRT designs, as functions of w and , for different combinations of and . Corresponding presentations for the EB and ERMSE are given in the Supplementary Materials. For several combinations of and the power curves are similar across for multiple values of w, attaining approximately the desired type‐I error‐rate and power. Larger differences are observed in some instances, however, typically for more extreme values of and . For fixed , increasing generally results in an increase in power. This should be anticipated as larger promotes a more steady roll‐out, which will often correspond to allocation matrices with power closer to the desired level. Similarly, for fixed , increasing initially results in power gains, but in many cases eventually leads to power loss as the procedure recommends those designs that terminate the roll‐out.
FIGURE 4.

The empirical rejection probability (ERP), as a function of w and , of the response adaptive (RA) stepped‐wedge cluster randomized trial (SW‐CRT) designs with for different combinations of and , in trial design scenario 1 (TDS1). The dashed lines indicate the desired type‐I and type‐II error‐rates
FIGURE 5.

The empirical average proportion of cluster‐periods spent in the intervention condition (EACP), as a function of w and , of the response adaptive (RA) stepped‐wedge cluster randomized trial (SW‐CRT) designs with for different combinations of and , in trial design scenario 1 (TDS1). The dashed lines indicate the minimal, initially planned, and maximal values of the EACP based on
FIGURE 6.

The empirical standard deviation of the proportion of cluster‐periods spent in the intervention condition (ESDCP), as a function of w and , of the response adaptive (RA) stepped‐wedge cluster randomized trial (SW‐CRT) designs with for different combinations of and , in trial design scenario 1 (TDS1)
These comments match the plots in Figure 5, with for example those designs with having very low values for the EACP. Furthermore, it can be seen that for , for example, increasing generally results in a flattening of the EACP curve as a function of , as the more extreme roll‐outs attain lower values for .
Qualitatively different findings are observed in Figure 6, however. For , the ESDCP is similar for all and varies little as a function of or . This is a consequence of large placing a high preference on approximately 50% of cluster‐periods being spent in the intervention condition. For , the ESDCP again varies little across values of , but now varies substantially as a function of and . The maximal values of the ESDCP for can often be considered low when viewed in combination with the corresponding EACP. This is not always the case for , though, where for certain w (eg, ) the ESDCP indicates variation in the roll‐out speed such that performance may often be considered poor (eg, an increase in roll‐out from that initially planned when ).
3.2.3. Operating characteristics as a function of
Figure 7 presents the ERP, EACP, ESDCP, EB, and ERMSE of the RA SW‐CRT designs as functions of for and . It can be seen that there is little evidence changing the timing of a single‐interim analysis from to carries a cost to the ERP. However, delaying the timing of a single‐interim analysis inhibits the ability of the RA designs to offer a wider range of values for the EACP. In addition, the EACP curves are similar for an increasing number of values of w the later the timing of the interim analysis; this is a consequence of both the decrease in possible allocation matrices that can be chosen from and increasing precision in the value of the . Another consequence of this is that the ESDCP is smaller when the first interim analysis is conducted later in the trial, though the difference is only pronounced when is contrasted with .
FIGURE 7.

The empirical rejection probability (ERP), average proportion of cluster‐periods spent in the intervention condition (EACP), standard deviation of cluster‐periods spent in the intervention condition (EACP), bias (EB), and root‐mean‐square error (RMSE), as functions of w and , of the response adaptive (RA) stepped‐wedge cluster randomized trial (SW‐CRT) designs with and , for different values of , in trial design scenario 1 (TDS1). The dashed lines in the ERP plot indicate the desired type‐I and type‐II error‐rates. In the EACP plot they indicate the minimal, initially planned, and maximal values of the EACP based on
There is a larger cost to the EB for certain w when an interim analysis is conducted earlier in the trial (ie, for {3} and {3,6}). However, the actual cost remains small relative to the value of . Similar statements are true for the ERMSE.
Compared to the designs with , those with incur a small cost to their empirical power. However, this is counterbalanced by them achieving a wider range of values for the EACP when .
4. DISCUSSION
Concerns have been expressed over the pre‐trial decision of SW‐CRTs to provide the intervention to all clusters. It may therefore be advantageous to allow the intervention roll‐out to be sped‐up or slowed‐down according to information accrued during the trial. Accordingly, we have presented methodology through which this could be achieved. Our presented framework is flexible, allowing the design to be constructed to balance considerations on power and ethical allocation. Furthermore, while we focused on data analysis via a linear mixed model, the framework is dependent only on the availability of an interim estimate of effectiveness. It could therefore be readily modified, for example, for a generalized estimating equation analysis of noncontinuous data (see, eg, Li et al 26 or Ford and Westgate 27 for relevant methodology in the nonadaptive setting).
To examine the performance of the framework, we conducted a large simulation study. From this, several important observations can be made. Principally, it should not be assumed that any choice of values for and will provide desirable operating characteristics. However, in all three TDSs it was possible to find combinations that provided monotonically increasing values for the EACP without major inflation of the type‐I or type‐II error‐rate (eg, in TDS1 , , and provided such performance). Our recommendation would be therefore that these should be chosen carefully in practice, via a comprehensive simulation study. Nonetheless, it was clear that some small impact to the error‐rates may be unavoidable if one is to attain a design with large variation in the EACP as a function of the intervention effect. The small power loss may be resolved in practice through a small increase to the sample size computed for the corresponding fixed sample design.
Addressing the observed type‐I error‐rate inflation poses an interesting question as to whether methodology developed to help attain a desired test size in small fixed‐sample CRTs could find additional utility in adaptive design scenarios. Such methodology has been a topic of much recent interest. For example, Leyrat et al 28 considered the performance of numerous analysis methods (eg, weighted and unweighted cluster‐level analyses, mixed‐effects models with different degree‐of‐freedom corrections, GEEs with and without a small‐sample correction) for parallel‐group CRTs with a low number of clusters and a continuous outcome. Scott et al 29 and Ford and Westgate 27 examined possible correction methods for GEE analyses of SW‐CRTs. Thompson et al 30 recently provided an extensive comparison of such small‐sample correction methods and degrees‐of‐freedom corrections for GEE analyses of binary data in SW‐CRTs, with Ren et al 31 previously conducting similar work in a continuous outcome setting. While the type‐I error‐rate inflation observed in our RA SW‐CRTs was often small, if addressing such inflation was a priority then it is likely such methodology would offer a potential, albeit heuristic, solution. We note though that simulation would be required to ascertain which approach may be most appropriate, as there is no guarantee results in a fixed‐sample setting would be directly transferable to RA design.
The advantageous performance of the RA designs is particularly noteworthy since only designs with a small number of interim analyses were evaluated. One may have anticipated that more interim analyses may have been required to realize benefits of RA randomization. A small number of interim analyses may be important in practice to reduce their logistical burden. It is also more computationally feasible to evaluate performance in this setting and it may be anticipated to be associated with smaller inflation of the type‐I error‐rate as the data is assessed less frequently.
The findings should perhaps not be surprising, given the large number of alternatives to the initially planned allocation matrix that will have similar power means there are often other choices available that can at least slightly alter the intervention's allocation without compromising on power. Furthermore, the timing of the first interim analysis provides a natural and effective means of protecting a degree of data accrual in the intervention and control conditions; this is similar to the typical use of a burn‐in period for RA designs in individually randomized trials. The timing of the first interim analysis can also be seen to be crucial to enabling a wider range of EACP values to be possible; the one‐directional switching of SW‐CRTs means that RA design can offer far less later in a trial as the number of possible allocation schemes decreases. However, we note that even when only small changes in the EACP are achieved this can have a substantial impact on the number of patients who receive the intervention, depending on the value of the total trial sample size. Finally, there was substantial degeneracy in the operating characteristics for different values of w, particularly in those designs where the first interim analysis was timed later in the trial. In practice, only a small number of values for w may need to be considered, and in many instances the choice of worked well.
It is important to acknowledge some limitations to our work. First, while our investigations reveal limited impact on the bias in the final point estimate from utilizing an RA design, we have not addressed potentially important characteristics of the asymptotic properties of the estimator (eg, consistency) or provided a way to remove any bias. We leave extending bias removal methodology for individually randomized RA trials 25 to this SW‐CRT setting for future work. Nor have we examined the potential implications of model mis‐specification on the utility of the proposed RA procedure. Recent work has, as discussed, highlighted a range of possible analysis methods that make, for example, differing assumptions on the correlation between the outcome measurements. 15 It is possible that model mis‐specification may impact RA design more starkly than it does a fixed‐sample SW‐CRT. While there is potential in an adaptive setting to adaptively update the chosen analysis model, which could help overcome such a problem, we have not addressed this here and no work to date is available to indicate whether this may be a fruitful approach. Each of these considerations may, in particular, impact the applicability of the proposed methodology in a regulated trial setting.
In addition, while we have provided examples on cross‐sectional and closed‐cohort designs, we have not directly addressed RA design of an open‐cohort SW‐CRT. Our methods could be applied to an open‐cohort SW‐CRT under the assumption of some particular sampling scheme. 32 However, the degree to which the assumed sampling scheme is “correct” would then likely influence the usefulness of RA design. Consequently, the approach to RA design for an open‐cohort trial should arguably also attempt to re‐estimate the “true” sampling scheme at the time of the interim analyses, which we have not presented methodology for here. Regardless of the approach used, thorough investigation of the utility of RA design for open‐cohort designs would then require simulations to be performed under a variety of open‐cohort sampling schemes, with exploration of the impact of these being correctly or incorrectly specified.
The practical considerations in relation to utilizing an RA SW‐CRT design should also be recognized. Many of these are similar to those described in Grayling et al 22 within the context of early termination in SW‐CRTs. In particular, while the time period structure of SW‐CRTs may appear to lend itself naturally to sequential methodology, the interim analyses would be highly dependent upon the efficient collection, storage, and processing of data. Arguably the largest issue for RA intervention allocation, though, is whether logistical or practical constraints may inhibit the ability to modify the roll‐out. While a roll‐out could likely often be slowed down, it may be challenging to speed it up. Furthermore, allowing slow‐down could be argued to disincentivize cluster participation.
Limitations above aside, our results indicate RA allocation of the intervention could potentially provide notable advantages. It is important to discuss therefore when such a design may be useful. In practice, RA design could be deemed useful in a wide variety of settings, where this conclusion may not be immediately apparent; a number of SW‐CRTs have now incorporated interim evaluations of efficacy/futility, 33 , 34 , 35 , 36 , 37 , 38 , 39 and it is not always clear from published information why such adaptations were included. However, we note that RA could be particularly helpful when either the intervention itself or its evaluation is highly expensive, such that investigators would not wish to complete the roll‐out unless it was effective. Most likely though, in our opinion, it may be helpful when there are substantial patient benefit considerations associated with the intervention, potentially in combination with notable safety concerns. This could be true, for example, of vaccine development during an epidemic.
Following the Ebola outbreak of 2014 to 2015, many authors discussed the applicability of SW‐CRTs to evaluating vaccine effectiveness. 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 Importantly, this setting was one in which a short time was expected between intervention delivery and outcome accrual, 52 which is important for RA design. Furthermore, there was little data available about the safety or immunogenicity of the vaccine candidates. 44 Consequently, proposals to use SW‐CRT designs were not based on preliminary data that the vaccine may do more good than harm and the safety considerations arguably amplify the need to prevent roll‐out if a vaccine was ineffective. Indeed, van der Tweel and van der Graaf 56 noted their concerns that many clusters could end up being exposed to an inferior treatment, while Doussau and Grady 44 went as far as to state that interim analyses may be needed. It also seems reasonable to assume such a setting would be one in which resources would be made available to carry out interim adaptations efficiently, owing to the degree of the public health emergency.
The main limitation to utilizing an RA SW‐CRT design of the type considered here would be the aforementioned resource availability to speed up a vaccines roll‐out. It would be important to ensure that at the epidemic's onset manufacturing processes were put in place to scale up the development of any vaccine for which preliminary evidence of effectiveness was obtained. The other principal limitation, discussed extensively by Bellan et al, 40 is that SW‐CRTs are not well equipped to handling spatiotemporal variation in a virus outbreak; much power can often be gained from prioritizing where to administer a vaccine. This issue cannot be handled by the type of RA SW‐CRT proposed here. However, it indicates that an adaptive incomplete‐block CRT may be worth considering in future studies of the efficient evaluation of a vaccine. Such a design could add new clusters during the course of the study, constraining the randomization to prioritize the speed of its delivery to specific hot‐spots. We note it may also be important to consider incorporating other types of adaptation in to this type of design, including stopping rules 19 , 22 or sample size re‐estimation, 21 in order to identify the most suitable CRT design.
In conclusion, when it is feasible to modify an intervention's allocation in an SW‐CRT, RA design theory could help improve the trial's patient benefit characteristics. This may be particularly relevant to settings in which the intervention is expensive or could be associated with significant harm.
Supporting information
Data S1 Supplementary material
ACKNOWLEDGEMENTS
This work was supported by the Medical Research Council (grant number MC_UU_00002/6 to JMSW and grant number MC_UU_00002/15 to SSV).
Grayling MJ, Wason JMS, Villar SS. Response adaptive intervention allocation in stepped‐wedge cluster randomized trials. Statistics in Medicine. 2022;41(6):1081–1099. doi: 10.1002/sim.9317
Funding information Medical Research Council, MC_UU_00002/15; MC_UU_00002/6
DATA AVAILABILITY STATEMENT
Code to reproduce our results is available from https://github.com/mjg211/article_code.
REFERENCES
- 1. Hemming K, Taljaard M, McKenzie J, et al. Reporting of stepped wedge cluster randomised trials: extension of the CONSORT 2010 statement with explanation and elaboration. BMJ. 2018;363:k1614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Kotz D, Spigt M, Arts I, Crutzen R, Viechtbauer W. Use of the stepped wedge design cannot be recommended: a critical appraisal and comparison with the classic cluster randomized controlled trial design. J Clin Epidemiol. 2012;65:1249‐1252. [DOI] [PubMed] [Google Scholar]
- 3. Mdege N, Man MS, Taylor nee Brown C, Torgerson D. There are some circumstances where the stepped‐wedge cluster randomized trial is preferable to the alternative: no randomized trial at all. response to the commentary by Kotz and colleagues. J Clin Epidemiol. 2012;65:1253‐1254. [DOI] [PubMed] [Google Scholar]
- 4. Freedman B. Equipoise and the ethics of clinical research. N Engl J Med. 1987;317:141‐145. [DOI] [PubMed] [Google Scholar]
- 5. Prost A, Binik A, Abubakar I, et al. Logistic, ethical, and political dimensions of stepped wedge trials: critical review and case studies. Trials. 2015;16:351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. de Hoop E, van der Tweel I, van der Graaf R, et al. The need to balance merits and limitations from different disciplines when considering the stepped wedge cluster randomized trial design. BMC Med Res Methodol. 2015;15:93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Hu F, Rosenberger W. The Theory of Response‐Adaptive Randomization in Clinical Trials. Hoboken, NJ: John Wiley & Sons; 2006. [Google Scholar]
- 8. Atkinson A, Biswas A. Randomised Response‐Adaptive Designs in Clinical Trials. Boca Raton, FL: Chapman & Hall/CRC Press; 2014. [Google Scholar]
- 9. Antognini A, Giovagnoli A. Adaptive Designs for Sequential Treatment Allocation. Boca Raton, FL: Chapman & Hall/CRC Press; 2015. [Google Scholar]
- 10. Robertson D, Lee K, Lopez‐Kolkovska B, Villar S. Response‐adaptive randomization in clinical trials: from myths to practical considerations; 2020. arXiv:2005.00564. [DOI] [PMC free article] [PubMed]
- 11. Hussey M, Hughes J. Design and analysis of stepped wedge cluster randomized trials. Contemp Clin Trials. 2007;28:182‐191. [DOI] [PubMed] [Google Scholar]
- 12. Woertman W, de Hoop E, Moerbeek M, Zuidema S, Gerritsen D, Teerenstra S. Stepped wedge designs could reduce the required sample size in cluster randomized trials. J Clin Epidemiol. 2013;66:752‐758. [DOI] [PubMed] [Google Scholar]
- 13. Hemming K, Taljaard M. Sample size calculations for stepped wedge and cluster randomised trials: a unified approach. J Clin Epidemiol. 2016;69:137‐146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Hooper R, Teerenstra S, de Hoop E, Eldridge S. Sample size calculation for stepped wedge and other longitudinal cluster randomised trials. Stat Med. 2016;35:4718‐4728. [DOI] [PubMed] [Google Scholar]
- 15. Li F, Hughes J, Hemming K, Taljaard M, Melnick E, Heagerty P. Mixed‐effects models for the design and analysis of stepped wedge cluster randomized trials: an overview. Stat Meth Med Res. 2021;30:612‐639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Liang Y, Li Y, Wang J, Carriere K. Multiple‐objective response‐adaptive repeated measurement designs in clinical trials for binary responses. Stat Med. 2014;33:607‐617. [DOI] [PubMed] [Google Scholar]
- 17. Girling A, Hemming K. Statistical efficiency and optimal design for stepped cluster studies under linear mixed effects models. Stat Med. 2016;35:2149‐2166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Li F, Turner E, Preisser J. Optimal allocation of clusters in cohort stepped wedge designs. Stat Probab Lett. 2018;137:257‐263. [Google Scholar]
- 19. Grayling M, Robertson D, Wason J, Mander A. Design optimisation and post‐trial analysis in group sequential stepped‐wedge cluster randomised trials; 2018. arXiv:1803.09691v1.
- 20. Grayling M, Wason J, Mander A. Stepped wedge cluster randomized controlled trial designs: a review of reporting quality and design features. Trials. 2017;18:33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Grayling M, Mander A, Wason J. Blinded and unblinded sample size re‐estimation procedures for stepped‐wedge cluster randomized trials. Biom J. 2018;60:903‐916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Grayling M, Wason J, Mander A. Group sequential designs for stepped‐wedge cluster randomised trials. Clin Trials. 2017;14:507‐517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Bashour H, Kanaan M, Kharouf M, Abdulsalam A, Tabbaa M, Cheikha SA. The effect of training doctors in communication skills on women's satisfaction with doctor‐woman relationship during labour and delivery: a stepped wedge cluster randomised trial in damascus. BMJ Open. 2013;3:e002674. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Tirlea L, Truby H, Haines T. Investigation of the effectiveness of the "Girls on the Go!" program for building self‐esteem in young women: trial protocol. Springerplus. 2013;2:683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Bowden J, Trippa L. Unbiased estimation for response adaptive clinical trials. Stat Meth Med Res. 2017;26:2376‐2388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Li F, Turner E, Preisser J. Sample size determination for GEE analyses of stepped wedge cluster randomized trials. Biometrics. 2018;74:1450‐1458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Ford W, Westgate P. Maintaining the validity of inference in small‐sample stepped wedge cluster randomized trials with binary outcomes when using generalized estimating equations. Stat Med. 2020;39:2779‐2792. [DOI] [PubMed] [Google Scholar]
- 28. Leyrat C, Morgan K, Leurent B, Kahan B. Cluster randomized trials with a small number of clusters: which analyses should be used? Int J Epidemiol. 2018;47:321‐331. [DOI] [PubMed] [Google Scholar]
- 29. Scott J, deCamp A, Juraska M, Fay M, Gilbert P. Finite‐sample corrected generalized estimating equation of population average treatment effects in stepped wedge cluster randomized trials. Stat Meth Med Res. 2017;26:583‐597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Thompson J, Hemming K, Forbes A, Fielding K, Hayes R. Comparison of small‐sample standard‐error corrections for generalised estimating equations in stepped wedge cluster randomised trials with a binary outcome: a simulation study. Stat Meth Med Res. 2021;30:425‐439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Ren Y, Hughes J, Heagerty P. A simulation study of statistical approaches to data analysis in the stepped wedge design. Stat Biosci. 2019;12:399‐415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Kasza J, Hooper R, Copas A, Forbes A. Sample size and power calculations for open cohort longitudinal cluster randomized trials. Stat Med. 2020;39:1871‐1883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Curley M, Gedeit R, Dodson B, et al. Methods in the design and implementation of the randomized evaluation of sedation titration for respiratory failure (RESTORE) clinical trial. Trials. 2018;19:687. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Dias M, De Oliveira L, Jeyabalan A, et al. PREPARE: protocol for a stepped wedge trial to evaluate whether a risk stratification model can reduce preterm deliveries among women with suspected or confirmed preterm pre‐eclampsia. BMC Preg Child. 2019;19:343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Hayes‐Ryan D, Hemming K, Breathnach F, et al. PARROT Ireland: placental growth factor in assessment of women with suspected pre‐eclampsia to reduce maternal morbidity: a stepped wedge cluster randomised control trial research study protocol. BMJ Open. 2019;9:e023562. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Huffman M, Mohanan P, Devarajan R, et al. Effect of a quality improvement intervention on clinical outcomes in patients in india with acute myocardial infarction: the ACS QUIK randomized clinical trial. JAMA. 2018;319:567‐578. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Lundström E, Isaksson E, Wester P, Laska AC, Näsman P. Enhancing recruitment using teleconference and commitment contract (ERUTECC): study protocol for a randomised, stepped‐wedge cluster trial within the EFFECTS trial. Trials. 2018;19:14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Newman K, Rogers J, McCulloch D, et al. Point‐of‐care molecular testing and antiviral treatment of influenza in residents of homeless shelters in Seattle, WA: study protocol for a stepped‐wedge cluster‐randomized controlled trial. Trials. 2020;21:956. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Reeder R, Girling A, Wolfe H, et al. Improving outcomes after pediatric cardiac arrest – the ICU‐resuscitation project: study protocol for a randomized controlled trial. Trials. 2018;19:213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Bellan S, Pulliam J, Pearson C, et al. The statistical power and validity of Ebola vaccine trials in Sierra Leone: a simulation study of trial design and analysis. Lancet Infect Dis. 2015;15:703‐710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Chowell G, Viboud C. Ebola vaccine trials: a race against the clock. Lancet Infect Dis. 2015;15:624‐626. [DOI] [PubMed] [Google Scholar]
- 42. Dean N, Gsell P, Brookmeyer R, et al. Considerations for the design of vaccine efficacy trials during public health emergencies. Sci Transl Med. 2019;11:eaat0360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Diakite I, Mooring E, Velasquez G, Murray M. Novel ordered stepped‐wedge cluster trial designs for detecting Ebola vaccine efficacy using a spatially structured mathematical model. PLoS Negl Trop Dis. 2016;10:e0004866. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Doussau A, Grady C. Deciphering assumptions about stepped wedge designs: the case of Ebola vaccine research. J Med Ethics. 2016;42:797‐804. [DOI] [PubMed] [Google Scholar]
- 45. Edwards S. Response to open peer commentaries on "Ethics of clinical science in a public health emergency: drug discovery at the bedside". Am J Bioethics. 2013;13:9. [DOI] [PubMed] [Google Scholar]
- 46. Eyal N, Lipsitch M. Vaccine testing for emerging infections: the case for individual randomisation. J Med Ethics. 2017;43:625‐631. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Halloran M, Auranen K, Baird S, et al. Simulations for designing and interpreting intervention trials in infectious diseases. BMC Med. 2017;15:223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Hitchings M, Grais R, Lipsitch M. Using simulation to aid trial design: ring vaccination trials. PLoS Negl Trop Dis. 2017;11:e0005470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Kahn R, Rid A, Smith P, Eyal N, Lipsitch M. Choices in vaccine trial design in epidemics of emerging infections. PLoS Med. 2018;15:e1002632. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Lipsitch M, Eyal N. Improving vaccine trials in infectious disease emergencies. Science. 2018;357:153‐156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Nason M. Statistics and logistics: design of Ebola vaccine trials in West Africa. Clin Trials. 2016;13:87‐91. [DOI] [PubMed] [Google Scholar]
- 52. Piszczek J, Partlow E. Stepped‐wedge trial design to evaluate Ebola treatments. Lancet Infect Dis. 2015;15:762‐763. [DOI] [PubMed] [Google Scholar]
- 53. Pulliam J, Bellan S, Gambhir M, Ancel Meyers L, Dushoff J. Evaluating Ebola vaccine trials: insights from simulation. Lancet Infect Dis. 2015;15:1134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Tully C, Lambe T, Gilbert S, Hill A. Emergency Ebola response: a new approach to the rapid design and development of vaccines against emerging diseases. Lancet Infect Dis. 2015;15:1356‐1359. [DOI] [PubMed] [Google Scholar]
- 55. Vandebosch A, Mogg R, Goeyvaerts N, et al. Simulation‐guided phase 3 trial design to evaluate vaccine effectiveness to prevent Ebola virus disease infection: statistical considerations, design rationale, and challenges. Clin Trials. 2016;13:57‐65. [DOI] [PubMed] [Google Scholar]
- 56. van der Tweel I, van der Graaf R. Issues in the use of stepped wedge cluster and alternative designs in the case of pandemics. Am J Bioethics. 2013;13:W1‐W3. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1 Supplementary material
Data Availability Statement
Code to reproduce our results is available from https://github.com/mjg211/article_code.
