Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2013 Dec 18;56(2):332–349. doi: 10.1002/bimj.201200245

Conditionally unbiased and near unbiased estimation of the selected treatment mean for multistage drop-the-losers trials

Jack Bowden 1,*, Ekkehard Glimm 2
PMCID: PMC4034592  PMID: 24353149

Abstract

The two-stage drop-the-loser design provides a framework for selecting the most promising of K experimental treatments in stage one, in order to test it against a control in a confirmatory analysis at stage two. The multistage drop-the-losers design is both a natural extension of the original two-stage design, and a special case of the more general framework of Stallard & Friede (2008) (Stat. Med. 27, 6209–6227). It may be a useful strategy if deselecting all but the best performing treatment after one interim analysis is thought to pose an unacceptable risk of dropping the truly best treatment. However, estimation has yet to be considered for this design. Building on the work of Cohen & Sackrowitz (1989) (Stat. Prob. Lett. 8, 273–278), we derive unbiased and near-unbiased estimates in the multistage setting. Complications caused by the multistage selection process are shown to hinder a simple identification of the multistage uniform minimum variance conditionally unbiased estimate (UMVCUE); two separate but related estimators are therefore proposed, each containing some of the UMVCUEs theoretical characteristics. For a specific example of a three-stage drop-the-losers trial, we compare their performance against several alternative estimators in terms of bias, mean squared error, confidence interval width and coverage.

Keywords: Bias-adjusted estimation, Drop-the-losers design, Treatment selection, UMVCUE

1 Introduction

The maximum likelihood estimate (MLE) of the treatment effect is often reported as standard at the end of a multistage trial. It is of course a precise and readily available estimator, but since it ignores the trial's sequential nature it is generally biased (Whitehead, 1986) and considerable research has been conducted into estimation methods that address this fact. Although many bias adjusted estimation procedures have been proposed, and unbiasedness is certainly not the only characteristic by which an estimator can be judged, the only way to achieve an efficient and “purely” unbiased estimate is to execute the following procedure: (i) identify an unbiased estimate based on part of the data—Y say, (ii) identify complete, sufficient statistics for the parameter in question—Z say, and (iii) employ the Rao-Blackwell improvement formula to obtain Inline graphic—the uniform minimum variance unbiased estimate (UMVUE).

There are two distinct applications of the Rao-Blackwell approach to estimation in multi-stage trials. The first approach, as pioneered by Emerson & Fleming (1990) and further clarified by Liu & Hall (1999), applies to a group sequential trial with one active treatment and one control arm that stops when conclusive evidence (for or against the efficacy of the treatment) is first observed. The stage at which the trial stops—M say—is a random variable. Given M, a sufficient statistic of the data, Z, and a truncation adaptive stopping rule (Liu & Hall, 1999) one calculates the expectation of the first stage data, Y1 say, given the pair Inline graphic to obtain the truncation adaptable UMVUE. We refer to this approach as unconditional because, in a three-stage trial, for example, it produces an estimate of the treatment effect regardless of whether the trial stops at stage one, two, or three and it is therefore unbiased by definition when one averages across all possible realizations of the sequential trial.

However, in some circumstances one may feel it more appropriate to develop estimators that are UMVUE conditional on the occurrence of a particular subset of trial realizations. For example, in two-stage “drop-the-losers” trials, the best performing of K experimental treatments is selected after the first stage before being tested in isolation against a control group in a confirmatory analysis in stage two. The stage two estimate of the selected treatment, Y2 say, is unbiased, and Cohen & Sackrowitz (1989) obtain the UMVUE of the selected treatment, conditional on the order of the stage one treatment arm estimates, by calculating Inline graphic, Q denoting the stage 1 order statistic condition. We call this an example of an UMVCUE—C for conditional. UMVCUEs have also been proposed for use more generally in two-stage trials evaluating a single treatment that allow early stopping for futility only. This is because a strong argument can be made that estimation of the treatment's effect is only important when the trial does in fact continue to the final stage; see Pepe et al. (2009), Koopmeiners et al. (2012), and Kimani et al. (2013) for recent examples.

The two-stage drop-the-losers' design has been the focus of much attention in the research literature. Sampson & Sill (2005) and Wu et al. (2010) consider hypothesis testing methodology for this design whereas Sill & Sampson (2007), Bowden & Glimm (2008), and Bebu et al. (2010) target point estimation. More generally, the vast majority of adaptive trial designs have also followed a two-stage strategy in the following sense: Design adaptations (such as subgroup selection or sample size adjustment) are made at the first interim analysis. The trial then proceeds under a fixed design (possibly with additional interim analyses) for its remaining duration. However, it is questionable whether a two-stage approach is always the best strategy. As an example, an adaptive trial was recently conducted into a treatment for maintaining lung function in patients with Coronary Obstructive Pulmonary disease (COPD) (Barnes et al., 2010). The trial aimed to use its first stage data to select the most promising doses of a new drug, before testing them against a placebo in a confirmatory analysis at stage two. In the event, two out of the four doses were selected for continuation to the second stage.

By selecting two doses (instead of one), the study decreased its chances of accidentally discarding a dose that would ultimately be successful. Nonetheless, evaluating two experimental treatments in the confirmatory analysis is more challenging than for one, since multiple testing corrections must be applied. Furthermore, if the best performing of the two experimental treatments ends up being recommended, then its estimate may subsequently be queried as “biased”. There may, therefore, have been an advantage in allowing selection of the single most promising treatment–dose to occur after several interim analyses. With this in mind, a clear multistage analog for the two-stage drop-the-losers design exists: rather than selecting the best of K treatments at a single interim analysis, selection could be achieved by dropping a predetermined number of treatments at each stage due to (relative) poor performance until only one remained. The downside of inserting additional interim analyses into any clinical trial is clearly an increased administrative burden. Yet, in the context of a drop-the-loser trial, doing so can markedly increase the probability of selecting the truly best treatment for a given number of patients, as is shown in Section 2012.

The multistage drop-the-loser design approach is actually special case of a more general design framework for testing multiple treatments proposed by Stallard & Friede (2008). In their paper, the decision to drop a treatment need not be dictated by a predetermined rule based on efficacy data but, if it is, the family wise error rate of the trial can be controlled in the strong sense. Stallard & Friede (2008) do not touch on the issue of estimation for their general design, although it is highlighted by them as an area for future methods development. In this paper we focus on estimation, and attempt to derive the UMVCUE for the specific drop-the-losers case. A different derivation to that of Cohen & Sackrowitz (1989) is used. It is perhaps less intuitive than the original, but generalizes to an arbitrary number of stages far more easily. Furthermore, allowing additional stages of selection requires an increasingly strong and unexpected form of conditioning to be employed. So in order to best elucidate the approach we start in Section 2012 by considering the extension to the three-stage case and the general J-stage formula is left as an appendix. In Section 4, we apply our estimation proposals to some specific three-stage trial examples, and compare its performance against several other estimation strategies. Interval estimation for the selected treatment is also considered. We discuss the issues raised and point to future research in Section 5.

2 The three-stage drop-losers design

Imagine a three-stage trial initially involving K experimental treatments and a control group. The purpose of the first two stages is to identify which treatment has the most beneficial true effect, as reflected via an appropriate outcome measure. Throughout this paper we will assume that higher values of the outcome are more desirable. At the end of stage one, Inline graphic experimental treatments are dropped and then the best of the L remaining experimental treatments is selected at the end of stage two for confirmatory testing against the control group in stage three. We will refer to such a three stage design as a “K:L:1” trial. Let Inline graphic index the initial full set of treatments with Inline graphic referring to the control group. Let Inline graphic denote the response of the ith patient on treatment k, which is normally distributed with mean Inline graphic and variance Inline graphic. At stage j (Inline graphic), n subjects are randomized to each treatment arm still active in the trial and the experimental treatments are evaluated according to a test statistic of the form:

graphic file with name bimj1460-math-0011.jpg (1)

Without loss of generality we will refer to the true mean of the selected treatment as μ1, the reasons for this subscript notation are given in due course. At the conclusion of the trial it would be natural to primarily focus on testing the null hypothesis Inline graphic. In this paper, we focus on the task of estimating the contrast Inline graphic. In the next subsection, we show empirically that whilst K:L:1 trials can provide more power to select the truly best treatment for confirmatory testing (compared to an analogous two-stage drop-the-losers trial) the MLE for μ1 can be substantially biased. Section 2.1 serves to merely illustrate the problem. Further details regarding the notation, as well as the design and analysis of drop-the-losers trials, are covered from Section 2.2 onwards.

2.1 Initial motivation

Trial data are simulated under a 3:2:1 design. Each treatment arm at each stage is allotted n = 60 patients and the variance of each patient's individual treatment effect, Inline graphic, is 50. At the end of stage 2, 300 patients have been allocated to the experimental treatments. This is contrasted with a traditional two-stage “3:1” trial, where the same number of patients (100 per arm) are used to select the best performing treatment after stage one. We assume that the vector of true mean effects for the three treatments is (1, Δ, 2). Figure 1 shows the proportion of simulations for which the truly best treatment (with mean equal to 2) is selected, as Δ is varied between 0 and 2. Each point is the average of 50,000 simulations. The 3:2:1 design gives a marginally higher power of selecting the best treatment than the 3:1 design. The results of two further simulation scenarios are also shown in Fig. 1 (left), namely:

  • A 5:3:1 trial (75 per arm per stage) compared with a 5:1 trial (120 per arm). True vector of treatment means (1, 1, 1, Δ, 2). 600 patients used to select best treatment in total.

  • A 6:2:1 trial (75 per arm per stage) compared with a 6:1 trial (100 per arm). True vector of treatment means (1, 1, 1, 1, Δ, 2). 600 patients used to select best treatment in total.

Figure 1.

Figure 1

Left: power to select the truly best treatment for various two and three-stage drop-the-losers designs as a function of Δ. Right: bias and coverage of the MLE in a K:2:1 design, as a function of K.

For these scenarios, the difference in power between the two and three-stage designs is now much more pronounced, and the case for switching to a three-stage design is stronger. Increasing the sample size of the trial (i.e, reducing the variance of the estimates) will always increase the probability of selecting the best treatment (assuming that one is truly better than the others). In our example the treatment with mean effect 2 is always best. For increasing sample sizes, the power curves approach 1.

Note that, if equal numbers of patients were randomized to the control group as are to the experimental arms at each stage, the two and three-stage designs featured above would have different numbers of controls at different stages. However, since we have assumed that the trials always continue to the final stage, the control group data only impacts on the final stage analysis; it therefore does not affect the probability of selecting the best treatment.

Trial data is now simulated under a K:2:1 design for Inline graphic and with n = Inline graphic = 50 for all k. All treatments are assumed to have no (zero) effect. Figure 1 (right) plots the bias and coverage of the MLE for μ1 as a function of K. One can see that as the number of experimental treatments increases, the bias in the MLE increases and the coverage of its 95% confidence interval starts to fall well below its nominal level. In the context of a two-stage drop-the-losers trial, the bias in the MLE of the selected treatment is maximized when all treatments have the same effect (see Carreras & Brannath, 2013, for a proof). We conjecture that this is true under three-stage drop-the-losers selection as well. Despite the fact that Fig. 1 (right) most probably represents a worst case scenario in terms of the MLEs performance, there is certainly room for alternative estimation strategies to be developed.

2.2 Notation for the estimation problem

For simplicity we will now assume that the within treatment variance term Inline graphic is constant across treatments, and is hence referred to as v2. This means that ranking the treatments via the test statistic in Eq. 2010 over the first two stages is equivalent to ranking by the values of the experimental treatment terms only. That is, the cumulative control group data at stage j, Inline graphic, and the common square root term Inline graphic can be ignored, leaving the experimental treatment MLEs to be directly compared head-to-head.

Let Inline graphic (Inline graphic) represent the estimate for the mean effect of treatment k using only those subjects recruited at stage j. In order to add more flexibility, let the originally fixed number of subjects per-arm per-stage, n, now equal Inline graphic, so this number can vary across stages if required (as in Bowden & Glimm (2008)). Letting Inline graphic, and Inline graphic, then:

graphic file with name bimj1460-math-0025.jpg

We base all subsequent mathematical derivation on the Inline graphic random variables (or transformations of them), leaving the individual patient data Inline graphic notation behind. Let Inline graphic represent the vector (ψ1, …,Inline graphic) where Inline graphic is the ranking of treatment k using our design framework. Denote the event: Inline graphic, Inline graphic, …, Inline graphic by the letter Q. It is useful to label the Inline graphics so that event Q is satisfied. That is, so that Inline graphic refers to the treatment that is ultimately ranked as kth best at the end of the trial. However, it is important not to confuse or equate this convenient labeling with explicitly conditioning on event Q—this is done in Sections 3 and 3.1.

At the end of stage 1, the top L treatments are kept and the remainder are dropped. So, Inline graphic (which are identical to the stage one MLEs) satisfy:

graphic file with name bimj1460-math-0037.jpg (2)

This enables Inline graphic and Inline graphic to be associated with treatments Inline graphic after stage one. At stage two, the remaining L treatments are ranked according to their cumulative MLEs. So, assume that the remaining stage one statistics Inline graphic and Inline graphic satisfy:

graphic file with name bimj1460-math-0043.jpg (3)

This enables Inline graphic and Inline graphic to be associated with treatments Inline graphic at stage 2. A schematic diagram of this selection process is shown in Fig. 2. Ultimately, the selected experimental treatment is therefore associated with stage-wise statistics Inline graphic and additionally a Y13 ∼ N(μ1,Inline graphic), the sole experimental treatment tested at stage 3. Note that although Inline graphic fulfills the inequalities in Eqs. (2) and (3), this does not directly imply that Inline graphic or Inline graphic. Note further that although the truly best treatment has the highest chance of being selected, μ1 is not equivalent to maxInline graphic.

Figure 2.

Figure 2

Schematic diagram showing the selection process of the three stage trial.

At the third and final stage of the trial, we seek an efficient unbiased estimate of μ1 − μ0, where μ0 represents the mean parameter of the control group. As previously mentioned, the control group always progresses to the final stage of the trial, so μ0 can be trivially and unbiasedly estimated using all of the relevant data via its MLE. By contrast, the parameter μ1 is much more elusive; at the trial outset it is a discrete random variable with K possible values and, conditional on Q, it is not unbiasedly estimated by its final stage three MLE:

graphic file with name bimj1460-math-0053.jpg (4)

since Y11 and Y12 are conditionally biased with respect to μ1. We therefore focus on bias-adjusted estimation of μ1 for the remainder of the paper.

3 Unbiased and bias-adjusted estimation of μ1

Let Inline graphic and Inline graphic represent the vector of mutually independent unselected treatment estimates at stages one and two. Further, let Y1 and Y2 represent the complete set of normally distributed statistics at stage one and two so that, for example, Inline graphic. The joint distribution, f(.), of the complete data Inline graphic:

graphic file with name bimj1460-math-0058.jpg (5)

where Inline graphic, Inline graphic and Π equals all terms involving Inline graphic. The trio Inline graphic, Inline graphic and Inline graphic are therefore complete sufficient statistics for all mean parameters unconditionally. We note that the conditional density Inline graphic is essentially the same as Eq. (5), except that its support is restricted by Q. It must therefore be scaled up by a factor representing the probability of Q, in order to integrate to one over the restricted space.

In the spirit of Cohen & Sackrowitz (1989) who investigated two-stage UMVCUEs, a three-stage UMVCUE for μ1 would be a Rao-Blackwellization of the unbiased final stage estimate Y13 conditional on the complete sufficient statistic Inline graphic and under Q. We now clarify precisely how Y13 is restricted in this setting. At stage one, we can state from Eq. 2008 that (a): Inline graphic and from Eq. (3) (by removing a common term Inline graphic) that:

graphic file with name bimj1460-math-0069.jpg

The set of Y13 values that satisfy conditions (a) and (b) is the sampling distribution of Inline graphic. From the definition of Z1, condition (b) implies that:

graphic file with name bimj1460-math-0071.jpg

and condition (a) implies that:

graphic file with name bimj1460-math-0072.jpg

Therefore, Inline graphic must be less than

graphic file with name bimj1460-math-0074.jpg

Since T depends explicitly on the value of Y12 (as well as implicitly through Z1) it is natural to condition on (Inline graphic) rather than on (Inline graphic) in order to calculate Inline graphic. We will refer to this as the “RB1” estimator, and denote it by the symbol Inline graphic.

Lehmann & Scheffe (1950) have shown that a sufficient and complete statistic is also minimally sufficient (the converse is not true). Therefore, since (Inline graphic) is sufficient but not minimal, it cannot be complete. Thus, the RB1 estimator will be unbiased and have a smaller variance than Y13 by the Rao–Blackwell theorem, but it is not UMVCU. We now calculate Inline graphic, returning to a discussion of what properties can and can not be claimed by it in Section 3.3.

3.1 Deriving the RB1 estimator

We will derive the RB1 estimator in the following manner. We start by transforming:

graphic file with name bimj1460-math-0081.jpg

then show that Y13 is independent of (Inline graphic) given (Inline graphic), and finish by going from:

graphic file with name bimj1460-math-0084.jpg

Let the vector Inline graphic and diagonal matrix Σ be the mean and variance of the Inline graphic variables Inline graphic written as Inline graphic. They follow the ordering described in Section 2.2 and Fig. 2 so that, if Inline graphic is the lth variable, the l-th entry of Inline graphic is Inline graphic and the (Inline graphic)-th element of Σ is Inline graphic. Using a standard result (e.g., Srivastava, 2002, Theorem 2.5.1), Inline graphic follows the (Inline graphic)-dimensional multivariate normal (MVN) distribution: Inline graphic, where

graphic file with name bimj1460-math-0097.jpg

For convenience we have labeled the mean and variance parameters of Inline graphic using the generic symbols Inline graphic, not because they are all equal, but because they are equally irrelevant to subsequent development. Next, we apply another well known theorem on conditional multivariate normal distributions (e.g., Srivastava, 2002, Theorem 2.5.5). Let

graphic file with name bimj1460-math-0100.jpg

where Inline graphic= Inline graphic and Inline graphic is equal to the Inline graphic vector Inline graphic = Inline graphic. The inverse of the remaining Inline graphic × Inline graphic submatrix, Inline graphic, equals

graphic file with name bimj1460-math-0110.jpg

where Inline graphic. Defining Inline graphic, and Inline graphic = Inline graphic, we can write the conditional distribution Inline graphic as Inline graphic where:

graphic file with name bimj1460-math-0117.jpg

and Inline graphic = Inline graphic = Inline graphic. The conditional density of Y13 given Z1 and Y12 does not depend on Inline graphic, Inline graphic or μ. We therefore drop irrelevant terms by writing it as Inline graphic. Only now do we condition on event Q, which acts to restrict Inline graphic to be Inline graphic—the value of T being fixed by the observed values of Inline graphic and Inline graphic (Inline graphic). This yields the density

graphic file with name bimj1460-math-0129.jpg

where Inline graphic is the indicator function for event Q and

graphic file with name bimj1460-math-0131.jpg

Taking the expectation of Inline graphic yields

graphic file with name bimj1460-math-0133.jpg (6)

3.2 Deriving the RB2 estimator

The Inline graphic estimator in 2008 does not look like a direct analog of the two-stage UMVCUE of Bowden and Glimm (2008)—a correction to the two-stage MLE—because of the extra conditioning on Y12. One way of avoiding this extra conditioning would be to calculate Inline graphic, for a Inline graphic that did not depend on Y12. This can be achieved by defining the condition Inline graphic simply as the portion of Q coming from the stage two selection rule (b). We can then state that Inline graphic where:

graphic file with name bimj1460-math-0139.jpg

The quantity Inline graphic can then be obtained following the procedure as in Section 3.1, to yield an alternative estimator for μ1, which we will denote as Inline graphic and refer to as the “RB2” estimator. It has the same form as 2008 but with t replaced by the observed value Inline graphic of Inline graphic, with Inline graphic equal to the MLE Inline graphic and with Inline graphic equal to Inline graphic.

3.3 Summarizing the RB1 and RB2 estimators

The RB2 estimator is the expected value of Y13 given the complete sufficient statistic and under condition Inline graphic. So, if Inline graphic truly represented the condition restricting Inline graphic under drop-the-losers selection, then Inline graphic would be the UMVUE given Inline graphic. However, no such claim can be made because Inline graphic is not the correct condition, it is Q, so E [Inline graphic] is conditionally biased with respect to μ1. Likewise, the RB1 estimator can not claim to be the UMVUE, conditional on Q, because it does not condition on a minimal sufficient (and hence complete) statistic.

One could argue, however, that if we simply condition on Y12 and Q first, then (Inline graphic) is a minimal sufficient statistic and therefore the RB1 estimator is a UMVCUE of sorts. This might be perceived as simply a semantic trick, and it is clear that conditioning on Y12 is not as natural as conditioning on Q alone. For this reason we will refer to Inline graphic and Inline graphic as simply Rao-Blackwellized estimators.

3.3.1 A general formula for the RB1 estimator

In the Appendix, we derive the Inline graphic estimator for a J-stage drop-the-losers trial. This estimator will yield a more efficient, unbiased estimate of μ1 than using the stage J data alone. For the J-stage case, one is forced to condition on Z1 and Inline graphic additional variables corresponding to the individual treatment effect estimates of the ultimately selected treatment at stages 2 to Inline graphic. Although the precise values of Inline graphic, Inline graphic and t change, the J-stage estimator is identical in form to Eq. (6). Since it only ever requires the evaluation of a standard normal density and distribution function, it remains trivial to evaluate whatever the value of J.

3.4 Strength of unbiasedness of the RB1 estimator

Since μ1 is a random variable, it is convenient to consider the extended definition of bias due to Posch et al. (2005). That is, for a generic estimator Inline graphic of μ1, the bias is given by:

graphic file with name bimj1460-math-0164.jpg (7)

Here Inline graphic refers to the true mean of treatment k, the k-th element of μ. Inline graphic is the condition that treatment k is selected under the design and Inline graphic represents the bias of estimator Inline graphic conditional on treatment k being selected. Although Inline graphic = 0, it actually fulfills a stronger form of unbiasedness, namely that Inline graphic = 0 Inline graphic.

3.5 An alternative near-unbiased estimator

In Section 4, we show the price paid by the RB1 estimator for unbiasedness is a substantial increase in mean squared error (MSE). Part of the motivation for developing the RB2 estimator was to see if it could trade off small amounts of bias for a MSE reduction. An additional bias-adjusted (but not unbiased) estimator is now considered. Bebu et al. (2010) proposed a likelihood based procedure for obtaining a bias corrected MLE (which we will refer to as the BC-MLE) and confidence intervals for the selected treatment in a two-stage drop-the-losers trial. We now adapt their general approach to the specific example of a 3:2:1 drop-the-losers design setting in order to complement the simulations in Section 4. Extensions to the general three stage case are obvious, but when more treatment arms are added the computational effort in obtaining the BC-MLE increases markedly.

Assuming that the variance of all stage-wise statistics are known, the log-likelihood of the parameter vector Inline graphic conditional on Q:ψ=(1,2,3) is proportional to

graphic file with name bimj1460-math-0173.jpg (8)

Here, Inline graphic and Z1 are as defined in Section 3 after Eq. (5), Inline graphic being equal to the selected treatment's MLE with variance Inline graphic. Inline graphic is defined here as the MLE of the second best treatment at stage 2, Inline graphic, with mean μ2 and variance Inline graphic = Inline graphic = Inline graphic. Y31 is the sole statistic on the treatment that ranked last at stage one, with mean μ3 and variance Inline graphic. The penalizing term Inline graphic represents the probability of event Q given μ. This is equivalent to the probability that all three elements of the multivariate density:

graphic file with name bimj1460-math-0184.jpg

are positive, and it can be approximated to a high degree of accuracy in R using the pmvnorm() function (Genz & Bretz, 2009). The conditional log-likelihood 2010 can then be maximized to yield joint estimates for μ1, μ2, and μ3.

4 Simulation studies

We now conduct a simulation study to compare the performance of the MLE, RB1, RB2, BC-MLE, and stage three estimator Y13 in estimating μ1 under drop-the-losers selection. Various parameter constellations for the true treatment means and the stage-wise variances are considered. We note that in a real trial setting the quantity of interest would be the treatment versus control group comparison Inline graphic, but for reasons already discussed we can ignore estimation of μ0. By averaging over the results of all simulations, we obtain a Monte–Carlo estimate for the estimators' bias as given in Eq. (7). By summing the estimators' squared errors across all simulations, we can approximate their MSE, when defined analogously to Eq. (7) as well.

4.1 Point estimation for a 3:2:1 trial

In a simple initial simulation, all three treatments are assumed to have a true mean effect of 0, so that μ1 ≡ 0. The number of patients recruited to each remaining treatment arm at each stage (Inline graphic) is 100, 50, and 25, respectively. v2 = 50 so that Inline graphic = Inline graphic. Figure 3 shows the distribution of 100,000 realizations of the MLE, RB1 estimator, RB2 estimator, BC-MLE, and Y13. From this simulation their empirical biases and MSEs were (0.41, 0.00, 0.04, −0.26, 0.00) and (0.35, 0.85, 0.69, 0.80, 2.00), respectively. Y13 is unbiased but inefficient and at the other extreme the MLE is efficient but biased. The RB1 estimator is unbiased and has a substantially lower MSE than Y13 because it utilizes data from the first two stages. Incorrect conditioning induces a small amount of bias into the RB2 estimator but is accompanied by a substantial decrease in MSE. The BC-MLE reduces the magnitude of the bias in the MLE but is shown to overcorrect. Its modal value is, however, close to the true value of 0.

Figure 3.

Figure 3

Distribution of the 5 estimators' estimates for Inline graphic from a three-stage 3:2:1 drop-the-losers design, given stage-wise variances Inline graphic = Inline graphic. Each distribution is based on 100,000 simulations.

In Table 1, we show the bias and MSE of the estimators for this choice of parameter values and three additional true parameter constellations. Trial data were simulated with 50 patients per-arm per-stage so that Inline graphic = 1. The same general pattern is observed across all scenarios, except the magnitudes of the biases and MSEs change.

Table 1.

Bias and MSE of the various estimands over the four scenarios (50,000 simulations per scenario). In each case Inline graphic = (1, 1,1)

Parameter values MLE RB1 RB2 BC-MLE Stage three Y13
Bias
(0, 0, 0) 0.377 −0.005 0.038 −0.142 −0.004
(0, Inline graphic, Inline graphic) 0.357 0.007 0.043 −0.132 0.007
(0, Inline graphic, 1) 0.310 0.004 0.041 −0.122 0.012
(0, Inline graphic, 2) 0.111 −0.005 0.010 −0.092 0.012
MSE
(0, 0, 0) 0.388 0.690 0.563 0.579 0.982
(0, Inline graphic, Inline graphic) 0.384 0.686 0.550 0.570 1.002
(0, Inline graphic, 1) 0.372 0.670 0.524 0.553 1.017
(0, Inline graphic, 2) 0.347 0.587 0.413 0.471 1.019

4.2 Interval estimation for a 3:2:1 trial

It is possible to derive an expression for the variance of the RB1 estimator using the delta method, as, for example, Koopmeiners et al. (2012) do in the context of a single arm two-stage trial with a binary endpoint. However, from Fig. 3 we see that, in the context of a three-stage drop-the-losers trial, the distribution of Inline graphic is highly skewed. Therefore, even if we were to derive analogous expressions for the variance of these quantities, it would not appear sensible to use them to furnish symmetric confidence intervals around the point estimate. For this reason, we adapt the nonparametric bootstrap procedure—originally proposed by Pepe et al. (2009) for a single arm two-stage trial—to the three-stage drop the loser setting. Specifically, we perform the following resampling schema to trial data Inline graphic:

  1. Produce bootstrap sample of first stage data, with mean Inline graphic.

  2. If Inline graphic is ≥ original observed value Inline graphic:
    1. Produce bootstrap samples of second stage data, with mean Inline graphic.
    2. If stage two MLE Inline graphic is ≥ original observed value Inline graphic:
      1. Produce bootstrap samples of third stage data, with mean Inline graphic.
      2. Calculate the RB1 estimator Inline graphic from Eq. (6) given Inline graphic, and original observed value t.

This should be repeated until a large enough collection of Inline graphics have been obtained to accurately assess its sampling distribution. Empirical quantiles of this distribution can then be read-off to give confidence intervals for Inline graphic. Upon implementing this procedure, it is no extra effort to additionally calculate bootstrapped confidence intervals for the RB2 estimator at the same time.

Confidence intervals for the BC-MLE of μ1 are obtained by the profile likelihood approach described in Bebu et al. (2010). That is, we calculate the statistic Inline graphic as twice the difference between the log-likelihood evaluated at the joint BC-MLEs for Inline graphic and at the BC-MLEs for Inline graphic given the constraint Inline graphic. A (1 − α) level confidence interval for μ1 is then the set of values for which Inline graphicInline graphic.

Table 2 shows, for Inline graphic, the average confidence interval width and the resulting coverage of the BC-MLE, RB1, and RB2 estimators for the four scenarios already introduced and with 50 patients per treatment arm per stage as before. Within each simulation, confidence intervals and coverage were assessed with respect to the true (fixed) value of μ1, and then averaged across simulations. Each bootstrapped confidence interval calculated was based on 1000 simulated values of Inline graphic. The overall figures are based on only 10,000 simulations—obtaining a confidence interval for the BC-MLE using the profile likelihood method required substantially more computational effort than the bootstrap procedure, and so this was the limiting factor.

Table 2.

Coverage and mean confidence interval width for the BC-MLE, RB1, and RB2 estimators. (10,000 simulations per scenario). In each case Inline graphic = (1, 1, 1)

RB1 BC-MLE RB2
Parameter values CovInline graphic CI width CovInline graphic CI width CovInline graphic CI width
(0, 0, 0) 0.963 3.31 0.949 2.92 0.951 2.91
(0, Inline graphic, Inline graphic) 0.958 3.27 0.950 2.90 0.949 2.87
(0, Inline graphic, 1) 0.954 3.20 0.948 2.83 0.946 2.80
(0, Inline graphic, 2) 0.956 2.96 0.945 2.57 0.956 2.49

Both the bootstrap and profile likelihood approaches appear to provide confidence intervals with nominal coverage. The RB1 estimate's confidence interval width is by far the widest. The smallest width is obtained from the RB2 estimator, but it is only marginally smaller than that of the BC-MLE.

4.3 Further results for K:2:1 trials

Trial data is now simulated under a K:2:1 design with all treatment means equal to 0 (so that Inline graphic), as in Section 2.1. Figure 4 (top-left to bottom-right) plots the bias, MSE, coverage and confidence interval width of the MLE, RB1, and RB2 estimators as a function of K. The properties of the stage three estimator Y13 are shown where informative (it is unbiased, with a known variance, so a simple symmetric confidence interval around it will achieve its nominal coverage). Each point is based on 50,000 simulations. The BC-MLE is not evaluated as it becomes too computationally demanding. However, there is no reason to suspect that its performance significantly worsens as more treatment arms are added. Standard symmetric confidence intervals are used for the MLE, assuming a normal distribution, and ignoring selection. While this could be termed a “standard” analysis, there is of course no reason to believe that this confidence interval will achieve its nominal coverage probability. As K increases, the bias of RB2 estimator increases but stays at modest levels compared to the MLE. The RB1 estimator and Y13 are unbiased. The MSE of the MLE is substantially lower than the other two estimators when K=3, but rises more quickly than the other two as K increases. The MSE of Y13 is 1, by definition.

Figure 4.

Figure 4

Bias, MSE, coverage (w.r.t. 95% confidence intervals), and average 95% confidence interval width of the MLE, RB1, and RB2 estimators as a function of k, for Inline graphic 0, σ123=1. The stage three estimate Y13 is also shown where informative.

The coverage of the RB1 estimator's bootstrapped 95% confidence interval stays relatively constant over the range of K, but is always slightly conservative. The RB2 estimator's coverage starts to worsen as K increases, but not to the same extent as the MLE. Table 3 shows, for these three estimators, the proportion of times that their 95% confidence intervals are above or below the true value of μ1. In each case, being above the true value is far more likely. This can be understood by the fact that all three are more likely to over-estimate than under-estimate the true effect. For example, in the simulations shown in Fig. 3 the MLE, RB1, and RB2 estimators overestimate μ1 83%, 57%, and 61% of the time, respectively. Of course, in the case of the RB1 estimator, this tendency to overestimate is perfectly cancelled out by less frequent, but larger, underestimation so that Inline graphic + Inline graphic. While the above/below ratio of the RB1 estimator stays fairly constant (around 10:1), the RB2 and MLE above/below ratio increases rapidly with increasing K. This reflects their increasing positive bias.

Table 3.

Proportion of times the 95% confidence interval for the MLE, RB1, and RB2 estimators are above or below μ1

RB1 RB2 MLE
K % Above μ1 % Below μ1 % Above μ1 % Below μ1 % Above μ1 % Below μ1
3 3.9 0.48 5.3 0.130 6.4 0.084
4 4.0 0.34 6.2 0.052 7.9 0.044
5 4.2 0.32 6.9 0.044 9.6 0.012
6 4.1 0.31 7.1 0.044 10.0 0.016
7 4.3 0.30 7.7 0.024 12.0 0.016
8 4.3 0.32 8.0 0.032 13.0 0.012

The MLEs average confidence interval width (a constant value of 3.92Inline graphic) is far lower than the other two estimators—the RB1 estimator's confidence interval is on average 60% wider than the MLEs when K = 8, but the MLE suffers from suboptimal coverage as a result. The confidence interval width of Y13 is a constant value of 3.92, which gives an idea as to the additional gain in using the RB1 estimator, if unbiased estimation is required.

5 Discussion

In this paper, we have explored the issue of estimation for a multistage analog of the two-stage drop-the-loser design. Our main focus was to generalize the work of Cohen & Sackrowitz (1989) to enable efficient unbiased estimation of the selected treatment. In this regard, we can only claim to have been partially successful. The RB1 estimator is unbiased and has a lower variance than the final-stage estimator, but it is derived using an additional condition that is needed to overcome technical difficulties. Further work may reveal that these conditions can be relaxed to yield more efficient unbiased estimators. Perhaps unsurprisingly, the RB1 estimator was shown to have a large MSE, due to its unbiasedness. The RB2 estimator was derived as an alternative; its less stringent conditioning (on the minimal sufficient statistic) resulted in an estimator with a small amount of bias but a greatly reduced MSE in the context of a 3:2:1 trial. However, its performance (in particular the coverage of its bootstrapped confidence intervals) worsened for K:2:1 trials as K increased.

Our derivation of the RB1 and RB2 estimators assumes that the within treatment arm variances (the Inline graphics) and the number patients randomized to each treatment arm at stage j (Inline graphic) are equal across treatments. This meant that ranking via the test statistic in Eq. (1) is equivalent to ranking by the mutually independent experimental treatment MLEs at each stage. Indeed, the independence property gained by ignoring the common control group data in the selection process is key to the proof. If these two conditions are not met then ranking via test statistic and MLE will not be equivalent, so the RB1 and RB2 estimators as stated here will be invalid. Our development also assumes that the Inline graphic terms are known and a very different approach would be required if they were assumed unknown (Cohen & Sackrowitz, 1989). The BC-MLE approach of Bebu et al. (2010) is, in contrast, much better suited to the unknown variance case, since they can simply be included as additional parameters in their conditional likelihood.

The multistage drop-the-losers design assumes that the trial always proceeds to the final stage. Whilst this could be criticized as nonsensical and inefficient when strong evidence exists to stop the trial early, it gives the trial a fixed sample size that may make it attractive to practitioners and funding bodies alike (Kairalla et al., 2012). One may, however, wish to augment this design with an efficacy/futility stopping rule, as do Stallard & Friede (2008) and Wu et al. (2010). Our approach can easily be adapted to yield a RB1 or RB2 estimator in this context, by recalculating (in the three stage case) exactly how the sampling space of Y13 was restricted conditional on Inline graphic and given the trial made it to the final stage. However, from an estimation perspective, conditioning on reaching the final stage only really makes sense when the trial can stop early for futility and not efficacy (Pepe et al., 2009). Moreover, in this case whilst estimation of μ1 is still trivial, unbiased estimation of the treatment control comparison Inline graphic does not immediately follow, because the control group data must itself be corrected for some selection bias induced by the stopping rules. This extra complication has, however, been successfully addressed in recent work by Kimani et al. (2013).

Further research is needed to explore and understand how best to design and analyze J-stage drop-the-losers trials from an operational planning and hypothesis testing perspective. For example, how to calculate the critical value for testing the selected treatment against the control at the final stage, whether it is possible to control the type I error rate of this test in the strong sense; finding multistage designs (e.g., the K and L in the three stage context) that are optimal—in terms of maximal power and minimal size. Some preliminary work on this subject can be found in a technical report (Wason & Bowden, 2012) available at https://sites.google.com/site/jmswason/supplementary-material. Software (in the form of R code) to reproduce our results can be found in the supplementary material accompanying this paper.

Acknowledgments

This work was funded by the Medical Research Council (grant numbers G0800860 and MR/J004979/1). The authors would like to thank the reviewers for their helpful comments which greatly improved the quality of this paper.

Appendix

The RB1 estimator for a J-stage drop-the-losers trial

In this section, we introduce a more general notation. Let Inline graphic represent the number of experimental treatment arms active in the trial at stage j, for Inline graphic. Clearly Inline graphic and Inline graphic. Define Inline graphic to be the set of all treatment arms in the trial at stage j, so that:

graphic file with name bimj1460-math-0246.jpg

Let Inline graphic represent the effect estimate of treatment u at stage j for Inline graphic. Let Inline graphic be the standard deviation of the estimates at stage j. Further define:

graphic file with name bimj1460-math-0250.jpg

Let Inline graphic represent the J effect estimates associated with the (ultimately) selected treatment with true mean effect μ1. Let Inline graphic represent the vector of the unselected treatment effect estimates at stage j. The vector of all treatment estimates across all stages can be written as Inline graphic and has length N = Inline graphic. The MLE of μ1 at stage j, Inline graphic, is equal to

graphic file with name bimj1460-math-0256.jpg (A1)

Define Inline graphic and rewrite Inline graphic in terms of Z1 and Inline graphic as

graphic file with name bimj1460-math-0260.jpg

As for the three-stage example in Section 2012, in the first Inline graphic stages of a J-stage trial, we sequentially rank the treatment arms by the order of their cumulative MLEs, defined for each treatment remaining in the trial at stage j as in Eq. (A1). The j-th stage imposes the restriction:

graphic file with name bimj1460-math-0262.jpg

Given Z1, when Inline graphic satisfies all of Inline graphic selection conditions required by event Q, it is restricted to be less than or equal to

graphic file with name bimj1460-math-0265.jpg (A2)

where Inline graphic represents the treatment effect estimate from stage u associated with the (Inline graphic)-th largest cumulative MLE at stage j and the second summation is only defined and evaluated if Inline graphic. For example, in Section 2012 we set J=3 and conditions (a) and (b) on Y13 correspond to setting j equal to 1 and 2 in formula (A2) respectively. This suggests that when conditioning Inline graphic on Q, we additionally need to condition on Inline graphic when calculating the RB1 estimator. Let Y11, Y1, Inline graphic and Inline graphic represent the complete data from the J-stage trial. Following the previous development in Section 3, we shall transform the multivariate normal densities as follows:

graphic file with name bimj1460-math-0273.jpg

Upon demonstrating that Inline graphic is independent of Inline graphic given Inline graphic we finish by going from

graphic file with name bimj1460-math-0277.jpg

The density Inline graphic is Inline graphic where:

graphic file with name bimj1460-math-0280.jpg

Letting Inline graphic we define:

graphic file with name bimj1460-math-0282.jpg

where Inline graphic= Inline graphic, Inline graphic is equal to the Inline graphic vector Inline graphic = Inline graphic and Inline graphic is the remaining Inline graphic × Inline graphic matrix. The conditional distribution Inline graphic is Inline graphic, where:

graphic file with name bimj1460-math-0294.jpg

for Inline graphic, Inline graphic equal to mean parameter vector of a and Inline graphic. Using block-wise inversion techniques (e.g. Srivastava, 2002, Corollary A.5.2) Inline graphic can be expressed as

graphic file with name bimj1460-math-0299.jpg

where M1 is the scaler Inline graphic, Inline graphic and the values of M3 and M4 are unimportant. We now see that

graphic file with name bimj1460-math-0302.jpg

and Inline graphic = Inline graphic. Noting that this distribution does not depend on Inline graphic or Inline graphic we write Inline graphic as

graphic file with name bimj1460-math-0308.jpg

and Inline graphic equals

graphic file with name bimj1460-math-0310.jpg (A3)

The J-stage RB2 estimator could be derived as in Section 3.2, by ignoring the first Inline graphic selection steps that depend on Inline graphic. However, the bias of the RB2 estimator is likely to be increasing as a function of J.

Conflict of interest

The authors have declared no conflict of interest.

As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer reviewed and may be re-organized for online delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.

bimj0056-0332-sd1.zip (10.1KB, zip)

References

  1. Barnes PJ, Pocock SJ, Magnussen H, Iqbal A, Kramer B, Higgins M, Lawrence D. Integrating indacaterol dose selection in a clinical study in COPD using an adaptive seamless design. Pulmonary Pharmacology & Therapeutics. 2010;23:165–171. doi: 10.1016/j.pupt.2010.01.003. [DOI] [PubMed] [Google Scholar]
  2. Bebu I, Luta G, Dragalin V. Likelihood inference for a two-stage design with treatment selection. Biometrical Journal. 2010;52:811–822. doi: 10.1002/bimj.200900170. [DOI] [PubMed] [Google Scholar]
  3. Bowden J, Glimm E. Unbiased estimation of selected treatment means in two-stage trials. Biometrical Journal. 2008;50:515–527. doi: 10.1002/bimj.200810442. [DOI] [PubMed] [Google Scholar]
  4. Carreras M, Brannath W. Shrinkage estimation in two-stage adaptive designs with mid-trial treatment selection. Statistics in Medicine. 2013;32:1677–1690. doi: 10.1002/sim.5463. [DOI] [PubMed] [Google Scholar]
  5. Cohen A, Sackrowitz H. Two stage conditionally unbiased estimators of the selected mean. Statistics and Probability Letters. 1989;8:273–278. [Google Scholar]
  6. Emerson S, Fleming T. Estimation following group sequential hypothesis testing. Biometrika. 1990;77:875–892. [Google Scholar]
  7. Genz A, Bretz F. Lecture Notes in Statistics, Vol 195. Springer-Verlag, Berlin, DE; 2009. , chap. Computation of Multivariate Normal and t Probabilities. [Google Scholar]
  8. Kairalla J, Coffey C, Thomann M, Muller K. Adaptive trial designs: a review of barriers and opportunities. Trials. 2012;13:145. doi: 10.1186/1745-6215-13-145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Kimani P, Stallard N, Todd S. Conditionally unbiased estimation in phase II/III clinical trials with early stopping for futility. Statistics in Medicine. 2013;32:2893–2910. doi: 10.1002/sim.5757. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Koopmeiners J, Feng Z, Pepe M. Conditional estimation after a two-stage diagnostic biomarker study that allows early termination for futility. Statistics in Medicine. 2012;31:420–435. doi: 10.1002/sim.4430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Lehmann EL, Scheffè H. Completeness, similar regions, and unbiased estimation. Sankhya. 1950;10:305–340. [Google Scholar]
  12. Liu A, Hall W. Unbiased estimation following a group sequential test. Biometrika. 1999;86:71–78. [Google Scholar]
  13. Pepe M, Feng Z, Longton G, Koopmeiners J. Conditional estimation of sensitivity and specificity from a phase 2 biomarker study allowing early termination for futility. Statistics in Medicine. 2009;28:762–779. doi: 10.1002/sim.3506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Posch M, Koenig F, Branson M, Brannath W, Dunger-Baldauf C, Bauer P. Testing and estimation in flexible group sequential designs with adaptive treatment selection. Statistics in medicine. 2005;24:3697–3714. doi: 10.1002/sim.2389. [DOI] [PubMed] [Google Scholar]
  15. Sampson A, Sill M. Drop-the-losers design: Normal case. Biometrical Journal. 2005;47:257–268. doi: 10.1002/bimj.200410119. [DOI] [PubMed] [Google Scholar]
  16. Sill M, Sampson A. Extension of a two-stage conditionally unbiased estimator of the selected population to the bivariate normal case. Communications in Statistics-Theory and Methods. 2007;36:801–813. [Google Scholar]
  17. Srivastava M. Methods of Multivariate Statistics. John Wiley & Sons, New York, NY; 2002. [Google Scholar]
  18. Stallard N, Friede T. A group-sequential design for clinical trials with treatment selection. Statistics in Medicine. 2008;27:6209–6227. doi: 10.1002/sim.3436. [DOI] [PubMed] [Google Scholar]
  19. Wason J, Bowden J. 2012. Design issues in multi-stage drop-the-losers trials. Technical Report, MRC Biostatistics Unit, Cambridge, UK.
  20. Whitehead J. On the bias of maximum likelihood estimation following a sequential trial. Biometrika. 1986;73:573–581. [Google Scholar]
  21. Wu S, Wang W, Yang M. Interval estimation for drop-the-losers designs. Biometrika. 2010;97:406–418. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

bimj0056-0332-sd1.zip (10.1KB, zip)

Articles from Biometrical Journal. Biometrische Zeitschrift are provided here courtesy of Wiley

RESOURCES