Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2018 May 31;37(22):3179–3196. doi: 10.1002/sim.7831

Point estimation following two‐stage adaptive threshold enrichment clinical trials

Peter K Kimani 1,, Susan Todd 2, Lindsay A Renfro 3, Nigel Stallard 1
PMCID: PMC6175016  PMID: 29855066

Abstract

Recently, several study designs incorporating treatment effect assessment in biomarker‐based subpopulations have been proposed. Most statistical methodologies for such designs focus on the control of type I error rate and power. In this paper, we have developed point estimators for clinical trials that use the two‐stage adaptive enrichment threshold design. The design consists of two stages, where in stage 1, patients are recruited in the full population. Stage 1 outcome data are then used to perform interim analysis to decide whether the trial continues to stage 2 with the full population or a subpopulation. The subpopulation is defined based on one of the candidate threshold values of a numerical predictive biomarker. To estimate treatment effect in the selected subpopulation, we have derived unbiased estimators, shrinkage estimators, and estimators that estimate bias and subtract it from the naive estimate. We have recommended one of the unbiased estimators. However, since none of the estimators dominated in all simulation scenarios based on both bias and mean squared error, an alternative strategy would be to use a hybrid estimator where the estimator used depends on the subpopulation selected. This would require a simulation study of plausible scenarios before the trial.

Keywords: biomarker, multistage, personalized medicine, subgroup or subpopulation selection, targeted therapy

1. INTRODUCTION

An area of recent interest in development of new therapies is stratified medicine, which involves using a biomarker to stratify patients into subgroups to distinguish those with the best likelihood of responding to particular treatments. If a biomarker has two levels, it is common to refer to one level as biomarker negative and the other as biomarker positive. We consider predictive biomarkers that allow the possibility of differences in treatment effects in different subpopulations, that is, a treatment by biomarker interaction effect.1

Advances in genetics have played a key role in stratified medicine, where biomarkers are based on genes. This has led to targeted therapies, where investigators determine a target subset of patients (subpopulation) and develop a drug (a targeted therapy) expected to be more efficacious than the control for these patients, and is possibly not beneficial to others. The target subpopulation may consist of patients with a certain gene (specifically a gene containing a certain allele) or platform of genes (specifically certain alleles corresponding to multiple genes). However, genes are not the only characteristics that are used to define a subset of patients. Examples of other biomarkers in the cancer setting include the size of tumor, protein level in the blood, and graded scores. When the clinical utility of the biomarker is not very strong or clear from previous studies, the biomarker stratified design may be used to test the effect of an experimental treatment. In this design, a trial enrolls patients from the full population but with provision for analyses of outcomes from the subpopulation.

One methodological challenge in stratified medicine is how to design and analyze efficient clinical trials that incorporate identification of the subpopulation that will benefit from the experimental treatment. An efficient design in late phase clinical trials is the two‐stage adaptive enrichment design.2 In stage 1, patients are recruited from the full population and data are used to perform an interim analysis to decide whether, in stage 2, enrollment will be from the full population or the subpopulation. The final confirmatory analysis uses data from both stages. Although the design is efficient because stage 1 data are used for subpopulation selection and confirmatory analysis, the latter is complex because of inclusion of subpopulation selection data.

We consider the case of a continuous (or a graded score) biomarker where the cut‐off value to distinguish between biomarker positive and negative patients is not definite from previous trials. Consequently, several candidate cut‐off values are possible, with trial data used to determine the cut‐off value. Simon and Simon2 refer to such a design that includes threshold determination as an adaptive threshold enrichment design. We give examples of clinical trials where this design can be used in Section 2.1.

Subpopulation selection based on the treatment effect can be advantageous because using an appropriate rule, the subgroup is selected in the case where there is apparent benefit in the subgroup and not in its complement (qualitative interaction) such as was observed by Mok et al.3 The full population is selected if there is apparent benefit in the full population including when the drug benefits the subgroup and its complement with different magnitudes (quantitative interaction) such as was observed in Tran et al.4 A subpopulation selection based on a hypothesis test for interaction only would not be able to distinguish between the two types of interactions.

Previous research that considers analysis of adaptive threshold enrichment trials focuses on control of type I error rate and power with less emphases on point estimation.2, 5 Recently, Li et al6 have derived expressions for the biases of estimators that ignore the adaptation but do not propose point estimators that account for subpopulation selection. Kimani et al7 and Kunzmann et al8 have developed estimators for a setting analogous to a single fixed cut‐off value. However, these estimators do not allow for using stage 1 data to determine the cut‐off value in an adaptive threshold enrichment trial.

A setting similar to an adaptive threshold enrichment design is that of treatment selection, where a control is compared to multiple experimental treatments, with stage 1 data used to select the experimental treatment to test further in stage 2.9, 10, 11, 12, 13, 14, 15, 16 Although several point estimators for this setting exist, they cannot be applied directly in adaptive threshold enrichment clinical trials because the correlation structure of the stage 1 sample means used for selection is different.

In this paper, we develop estimators that account for subpopulation selection following adaptive threshold enrichment trials using the principles that have been used to obtain point estimators that account for treatment selection. Two unbiased estimators build on the works by Kimani et al7 and Robertson et al.17 Two estimators build on the works by Whitehead18 and Stallard and Todd10 and involve deriving the bias function to calculate bias and subtracting bias from the naive estimator. The last is a shrinkage estimator and builds on the works by Hwang19 and Carreras and Brannath.14

2. DESCRIPTION OF THE SETTING AND NAIVE ESTIMATION

2.1. Motivation and notation

A condition where continuous biomarkers are tested and so the adaptive threshold design may be used is depression. Examples of continuous predictive biomarkers in depression are protein levels in the blood and an electrophysiological measure.20 While introducing notation, we describe features of clinical trials that are key in our methodology based on the setting of depression.

Patients' outcomes will be assumed to be normally distributed with a known standard deviation σ. In the context of depression, Uher et al20 perform simulations to give a guidance of the treatment effect size to be sought when predictive biomarkers are evaluated. One outcome measure they consider that is widely used in trials is the Hamilton Rating Scale for Depression (HRSD) score and is usually assumed to be normally distributed. For a trial of a prespecified duration of treatment, the aim may be to estimate the mean difference (experimental arm minus control arm) in HRSD scores between two interventions at the final follow‐up visit. Based on two trials,21, 22 the standard deviation of HRSD scores may be taken to be 7, that is, σ = 7.

We will consider trials that allow stopping for futility at an interim analysis if the observed treatment difference is less than some value b that we refer to as the futility boundary. The UK NICE guidelines recommend that an intervention for depression should demonstrate a difference of at least 3 HRSD points20 to be considered superior to its comparator. Therefore, at an interim analysis, the treatment may be deemed not to warrant further testing if the observed mean difference <2 (slightly less than the recommended value of 3), that is, b = 2.

We assume that a single continuous biomarker is used to identify the patients who benefit from a new intervention. We assume that in regard to biomarker values, there is monotonicity in treatment effect so that a higher biomarker value leads to a bigger treatment effect or a higher biomarker value leads to a smaller treatment effect. For ease of notation, we use the latter to develop methodology. Note that, if a higher biomarker value leads to a bigger treatment effect, the biomarker values can be transformed by multiplying by −1.

Using some biomarker threshold values, the full population (F) is partitioned into distinct partitions. For example, if F is subdivided into four partitions, the candidate threshold values c 1, c 2, c 3, and c 4 are such that patients in partitions 1, 2, 3, and 4 have biomarker values less than c 1, between c 1 and c 2, between c 2 and c 3, and between c 3 and c 4, respectively. The true mean differences in partitions 1 to 4 are denoted by δ 1, δ 2, δ 3, and δ 4, respectively. We denote the number of partitions by K so that, in this case, K = 4. We refer to the parts of F below threshold values c 1, c 2, c 3, and c 4 as subpopulations S 1, S 2, S 3, and S 4. Note for K = 4, S K = S 4 = F, and S 1, S 2, S 3, and S 4 consist of partition 1, partitions 1 and 2, partitions 1 to 3, and partitions 1 to 4, respectively. The true mean differences in S 1, S 2, S 3, and S 4 are denoted by θ 1, θ 2, θ 3, and θ 4, respectively. If, as expected, a higher biomarker value leads to a smaller treatment effect, then δ 1 ≥ δ 2 ≥ δ 3 ≥ δ 4 and θ 1 ≥ θ 2 ≥ θ 3 ≥ θ 4.

We assume that the threshold values c 1,…,c K are prespecified. There are different ways for the choice of the thresholds values. For K = 4, quartiles may be used so that the prevalences for S 1 to S 4 are p 1 = 0.25, p 2 = 0.50, p 3 = 0.75, and p 4 = 1, respectively. Consequently, the partitions have equal prevalence (0.25) since if we set p 0 = 0, p i − p i − 1 = 0.25(i = 1,…,4). In some instances, the threshold values are chosen based on aspects such as biological activity so that the prevalences for partitions are not equal. Figure 1 summarizes the partitioning of F for any K ≥ 3.

Figure 1.

sim7831-fig-0001

Partitioning of the full population. Partitions to the left are expected to have bigger treatment effects. The pairs in the brackets are true mean differences and prevalences for partitions and candidate subpopulations

2.2. Hypothetical two‐stage adaptive threshold enrichment clinical trial

Predictive assessment of continuous biomarkers can been done in single‐stage clinical trials.23, 24 The alternative is to use the two‐stage adaptive threshold enrichment design, which is more efficient as more resources can be focused on the subpopulation that is most likely to benefit from the new treatment.24 The design has been used in recent trials with time‐to‐event (progression‐free survival) outcome data.5, 25, 26 As we propose in this paper, the design can be similarly used in trials with normally distributed outcome data. We note in Section 6 that the methods developed in this paper can be adapted for time‐to‐event outcome data.

We describe the form of the adaptive threshold enrichment design that we consider based on a hypothetical trial for depression, where for example protein level is used to partition F into quartiles. In stage 1, the trial recruits n 11 = 90, n 12 = 90, n 13 = 90, and n 1K = n 14 = 90 patients in partitions 1 to 4. The number of patients in S 1 to S 4 are m 11 = 90, m 12 = 180, m 13 = 270, and m 1K = n 14 = 360, respectively, since m1i=i=1in1i (i = 1,…,4). For simplicity, we assume that, in each partition, the 90 patients are equally split between the control and the experimental treatment. The outcome of interest is HRSD score and is assumed to be normally distributed with σ = 7. Let τ112=4σ2/n11, τ122=4σ2/n12, τ132=4σ2/n13, and τ142=4σ2/n14, the stage 1 sample mean differences in partitions 1 to 4 are X¯11N(δ1,τ112), X¯12N(δ2,τ122), X¯13N(δ2,τ132), and X¯14N(δ4,τ142), respectively. Let σ112=4σ2/m11, σ122=4σ2/m12, σ132=4σ2/m13, and σ142=4σ2/m14, the stage 1 sample means in S 1 to S 4 are Y¯11N(θ1,σ112), Y¯12N(θ2,σ122), Y¯13N(θ3,σ132), and Y¯14N(θ4,σ142), respectively. If the number of patients in a partition is not equally split between the control and the experimental treatment, the expressions for τ112 to τ142 and σ112 to σ142 are different. Note that, in this hypothetical trial, τ112==τ142=σ112=2.178, σ122=1.089, σ132=0.726, and σ142=0.544. The random vectors X¯1=(X¯11,X¯12,X¯13,X¯14) and Y¯1=(Y¯11,Y¯12,Y¯13,Y¯14) have a linear relationship and are multivariate normal with mean vectors δ=(δ 1,δ 2,δ 3,δ 4) and θ=(θ 1,θ 2,θ 3,θ 4), respectively (see supplementary material). Hence, selection rules based on observed values for X¯1 can be restated using the observed values for Y¯1 and vice versa.

Since a higher biomarker value is expected to lead to lower treatment effect, the largest subpopulation for which the observed stage 1 sample mean difference (in HRSD scores) is ≥b is selected to continue to stage 2. If the observed stage 1 sample mean differences in S 1, S 2, S 3, and S 4 = F are all less than b, the trial stops for futility. Note that the selected subpopulation is a random variable determined by observed stage 1 data. We use lower case s s{1,,4} as the index for the “observed” selected subpopulation, with S s s{1,,4} denoting the selected subpopulation. At the end of stage 2, the primary objective is to obtain an estimate for θ s, using an estimator that has good properties such as being mean unbiased and having small mean squared error (MSE).

Suppose that the stage 1 observed sample mean differences in partitions 1 to 4 are x¯11=3, x¯12=2, x¯13=0.8, and x¯14=0 so that S 1 to S 4 stage 1 observed sample mean differences are y¯11=3, y¯12=2.5, y¯13=1.93, and y¯14=1.45. Subpopulation 2 would be selected, that is, S s = S 2, since it is the largest subpopulation with observed mean difference of at least 2 points, so that θ s = θ 2.

In stage 2, the trial recruits n 21 = 120 and n 22 = 120 patients in partitions 1 and 2, respectively. The number of patients in S 1 and S 2 are m 21 = 120 and m 22 = 240, respectively, since m2i=i=1in2i (i = 1,…,s). The sample sizes n 21 and n 22 and, hence, m 21 and m 22, should be prespecified in advance for example by fixing the total stage 2 sample size and the ratio of allocation to the selected partitions. Let τ212=4σ2/n21 and τ222=4σ2/n22, the stage 2 sample mean differences in partitions 1 and 2 are X¯21N(δ1,τ212) and X¯22N(δ2,τ222), respectively. Let σ212=4σ2/m21 and σ222=4σ2/m22, the stage 2 sample means in S 1 and S 2 are Y¯21N(θ1,σ212) and Y¯22N(θ2,σ222), respectively. For this hypothetical trial, τ212=τ222=σ212=1.633 and σ222=0.817. Table 1 summarizes the notation we have introduced for any K ≥ 3. When a subscript in a notation includes two indices, the first corresponds to stage and the second to partition or subpopulation.

Table 1.

Summary of notation

Stage 1 Partitions Stage 2 Partitions
Measure Subgroup 1 2 K − 1 K 1 s{1,…,K }
Upper threshold c 1 c 2 c K − 1 c K c 1 c s
Sample size Partition n 11 n 12 n 1,K − 1 n 1K n 21 n 2S
Subpopulation m 11 m 12 m 1,K − 1 m 1K m 21 m 2s
Sample variance Partition
τ112
τ122
τ1,K12
τ1K2
τ212
τ2s2
Subpopulation
σ112
σ122
σ1,K12
σ1K2
σ212
σ2s2
True mean Partition δ 1 δ 2 δ K − 1 δ K δ 1 δ s
Subpopulation θ 1 θ 2 θ K − 1 θ K θ 1 θ s
Sample mean Partition
X¯11
X¯12
X¯1,K1
X¯1K
X¯21
X¯2s
Subpopulation
Y¯11
Y¯12
Y¯1,K1
Y¯1K
Y¯21
Y¯2s

Suppose that, in stage 2, the observed sample mean differences in partitions 1 and 2 are x¯21=3.0 and x¯22=2.4. Consequently, the stage 2 observed sample mean difference for S 2 is y¯22=2.7. The naive estimate for θ 2 is the two‐stage sample mean difference for S 2 given by θ^2,N=(m12y¯12+m22y¯22)/(m21+m22)=2.614. We describe in Section 2.4 that the naive estimates are biased because they ignore subpopulation selection. The aim of this paper is to develop estimators that adjust for subpopulation selection. The estimators are based on the selection rule described for the hypothetical trial, which we state for any K ≥ 3 partitions in the next section, and are conditional on the observed ordering of stage 1 data.

2.3. Selection rule

We derive estimators that are unbiased or with small bias conditional on the following specific selection rule. Other selection rules are considered in the discussion. Let b denote a futility boundary. The trial stops after stage 1 if y¯1i<b for all i(i = 1,…,K). The trial continues to stage 2 with the full population (S K) if y¯1Kb and with subpopulation S s ∈ {1,…,K − 1} if y¯1sb and y¯1i<b for all i ∈ {s + 1,…,K}. Thus, as shown in the supplementary material, subpopulation S s s{1,,K1} is selected if by¯1s<u, where

u=minps+1b(ps+1ps)x¯1,s+1ps,ps+2bi=s+1s+2(pipi1)x¯1ips,,pKbi=s+1K(pipi1)x¯1ips.

Equivalently, subpopulation S s s{1,,K1} is selected if for all i  ∈ {1,…,s}, vix¯1i<wi, where vi=1pipi1ps·bi=1iis(pipi1)x¯1i and

wi=minps+1bi=1iis+1(pipi1)x¯1ipipi1,ps+2bi=1iis+2(pipi1)x¯1ipipi1,,pKbi=1iiK(pipi1)x¯1ipipi1.

2.4. Naive estimation

For the selected subpopulation S s s{1,,K}, define t s = m 1s/(m 1s + m 2s). The naive estimator for θ s that ignores subpopulation selection is

θ^s,N=tsY¯1s+(1ts)Y¯2s. (1)

This is biased because the first term in (1) includes data used in the selection. Let 1[Ss] and Prob(S s) denote the indicator and probability of selecting S s, respectively. The conditional bias is

Bias(θ^s,N)=tsi=1s(pipi1)EX¯1i1[Ss]ps·Prob(Ss)θs. (2)

Using the joint density for X¯1 or Y¯1 to compute Prob(S s) and i=1s(pipi1)EX¯1i1[Ss] is computationally time consuming because the limits of integration for each element in the vector depend on the values of the other elements. To overcome this, we use Z = (Z 1,…,Z K), where Z1=X¯11 and Zi=i=1i(pipi1)X¯1i (i  = 2,…,K). The density for Z and the expressions for Prob(S s) and i=1s(pipi1)EX¯1i1[Ss] are provided in the supplementary material.

3. ESTIMATORS THAT ACCOUNT FOR SUBPOPULATION SELECTION

3.1. Unbiased estimators

3.1.1. General principles of obtaining unbiased estimators

One technique to account for subpopulation selection is Rao‐Blackwellization. By the Rao‐Blackwell theorem, conditional on a sufficient and complete statistic based on stages 1 and 2 data, the expected value of a conditionally unbiased estimator from the stage 2 data is the uniformly minimum variance conditional unbiased estimator (UMVCUE). We consider two methods for obtaining unbiased estimators for θ s: deriving an UMVCUE for θ s directly or, because the relationship between θ and δ is linear, deriving the UMVCUE for each δ i(i = 1,…,s) and using a linear function to obtain an unbiased (though not necessarily minimum variance) estimator for θ s. The latter builds on the work by Kimani et al.7 The former would involve correlated stage 1 statistics in the vector Y¯1 and builds on the work by Robertson et al.17

3.1.2. Uniformly minimum variance unbiased estimator following the work of Robertson et al (2016a)

The UMVCUE for θ s is the expected value of Y¯2s conditional on a sufficient and complete statistic. As before, let θ^s,N denote the naive estimator for θ s given by expression (1) and U be as u in Section 2.3 with x¯1,s+1,,x¯1K replaced with X¯1,s+1,,X¯1K. Following the work of Robertson et al,17 the UMVCUE for θ s is

θ^s,UMV=θ^s,Nσ2s2σ1s2+σ2s2φf(b)φf(U)Φf(b)Φf(U), (3)

where f(b)=σ1s2+σ2s2σ1s2θ^s,Nb, f(U)=σ1s2+σ2s2σ1s2θ^s,NU, and φ(.) and Φ(.) denote the density and distribution functions of a standard normal, respectively.

3.1.3. Unbiased estimator following the work of Kimani et al (2015)

The UMVCUE for δi (i  = 1,…,s) is the expected value of X¯2i conditional on a sufficient and complete statistic. Let δ^i,N=(n1iX¯1i+n2iX¯2i)/(n1i+n2i) (i  = 1,…,s) denote the naive estimator for δi. Furthermore, let Vi and Wi be as vi and wi in Section 2.3 with x¯11,,x¯1K replaced with X¯11,,X¯1K. Following the work of Kimani et al,7 the UMVCUE for δi (i  = 1,…,s) is

δ^i,UMVCUE=δ^i,Nτ2i2τ1i2+τ2i2φf(Vi)φf(Wi)Φf(Vi)Φf(Wi),

where f(Vi)=τ1i2+τ2i2τ1i2δ^i,NVi and f(Wi)=τ1i2+τ2i2τ1i2δ^i,NWi. Consequently, the unbiased estimator for θ s is

θ^s,U=i=1s(pipi1)δ^i,UMVCUEps. (4)

3.2. Bias‐adjusted estimators

3.2.1. An overview of bias‐adjusted estimation

Another technique to account for subpopulation selection would be to utilize the fact that we can calculate bias of the naive estimate using expression (2). The naive estimate is then adjusted by subtracting the bias. However, expression (2) is a function of δ (or equivalently θ), the vector of the unknown treatment effects. To overcome this, we estimate bias, and hence, bias‐adjusted estimators obtained in this way are not necessarily mean unbiased.

3.2.2. Single‐iteration bias‐adjusted estimator

We consider two bias‐adjusted estimators. For the first one, the bias is estimated based on the observed sample mean differences δ^i,N=n1ix¯1i+n2ix¯2i1[is]/n1i+n2i1[is] (i = 1,…,K). Let δ^=(δ^1,N,,δ^K,N) and bθs(δ^) denote the bias estimator for θ s obtained by replacing δ with δ^ in expression (2) to get an adjusted estimator for θ s of

θ^s,SI=θ^s,MLEbθs(δ^). (5)

We will refer to this estimator as the single‐iteration bias‐adjusted estimator.

3.2.3. Multiple‐iteration bias‐adjusted estimator

For the second bias‐adjusted estimator, the bias is estimated iteratively.10, 13, 18 Let θ^i (i = 1,…,K) denote the naive estimator for θ i and θ^=(θ^1,,θ^K). The biases for the naive estimators depend on θ and we denote bias for θ^i (i = 1,…,K) by b i(θ) and the vector b1(θ),,bK(θ) by b(θ). The second adjusted estimator, which we refer to as multiple‐iteration bias‐adjusted estimator is obtained by solving θ˜=θ^b(θ˜) iteratively. Using similar notation, alternatively, one could solve δ˜=δ^b(δ˜) and then use the relationship between θ and δ to obtain a bias‐adjusted estimate for θ s. For the simulations in Section 5, we solve δ˜=δ^b(δ˜) and with an accuracy of 0.001, convergence was achieved in almost all simulated trials. Suppose that the solution is obtained at iteration r and let bi(δ˜r) denote the bias for δ^i when δ is taken to be δ˜r, then the multiple‐iteration adjusted estimate for δ i is δ^i,MI=δ^ibi(δ˜r) and the multiple‐iteration bias‐adjusted estimator for θ s is

θ^s,MI=i=1s(pipi1)δ^i,MIps. (6)

The details of calculating bi(δ˜r) are given in the supplementary materials.

3.3. Shrinkage estimators

3.3.1. General principles for shrinkage estimation

A third technique for accounting for subpopulation selection is to use shrinkage methods. Hwang19 considered the case of estimating a treatment mean after ordering independent sample means in a single‐stage trial for K ≥ 4. A subpopulation selection rule that corresponds to Hwang's case is that of selecting only one partition based on some ordering of x¯11,,x¯1K. We initially consider Hwang's selection rule and denote the selected partition by s H sH{1,,K}. Hwang assigns a common normal prior distribution N(μ,ν 2) to each δ i(i = 1,…,K). The posterior mean for δsH, its Bayes estimator, is CX¯1sH+(1C)μ, where C = 1 − 2σ 2/(2σ 2 + n ν 2) and n is stage 1 sample size in each intervention in each partition. Replacing the unknown μ and C with their unbiased estimators Y¯1K=i=1KX¯1j/K and Ĉ=12(K3)σ2/[nj=1K(X¯1jY¯1K)2], respectively, gives the empirical Bayes estimator. Let Ĉ+=max{0,Ĉ}, Hwang indicates that a better estimator, which we refer to as the shrinkage estimator, is δ^sH,B1=Ĉ+X¯1sH+(1Ĉ+)Y¯1K.

Carreras and Brannath14 extended the work to two‐stage trials. Define tsH=n1sH/(n1sH+n2sH) to be the proportion of stage 1 data. The two‐stage shrinkage estimator for δsH is δ^sH,B=tsHδ^sH,B1+(1tsH)X¯2S. For K < 4, Carreras and Brannath propose defining Ĉ=12(K1)σ2/[ni=1K(X¯1iY¯1K)2]. Using the fact that the estimator of Hwang19 applies for all parameters δ i(i = 1,…,K) and that its examination by Carreras and Brannath showed that it works for any rule used to pick the parameters on which to make inference, in Sections 3.3.2 and 3.3.3, we extend this work to give two shrinkage estimators for the subpopulation selection rule in Section 2.3.

3.3.2. First shrinkage estimator

As in unbiased estimation, we consider both combining shrinkage estimators for treatment effects in partitions to obtain an estimator for θ s and directly obtaining a shrinkage estimator for θ s. From Section 3.3.1, the shrinkage estimator for δ i(i = 1,…,s) is δ^i,L=tsĈ+X¯1i+(1Ĉ+)Y¯1K+(1ts)X¯2i, where Ĉ+=max{0,Ĉ} and for K ≥ 4, Ĉ=12(K3)σ2/[nj=1K(X¯1jY¯1K)2], whereas for K < 4, Ĉ=12(K1)σ2/[nj=1K(X¯1jY¯1K)2]. The first shrinkage estimator for θ s is

θ^s,L1=i=1s(pipi1)δ^i,Lps. (7)

3.3.3. Second shrinkage estimator

The second shrinkage estimator, which we denote by θ^s,L2, involves using the entire parameter vector θ. A multivariate normal prior for θ is specified and updated with the data Y¯1. The resulting posterior is multivariate normal with nonzero covariance, and hence, the iterative procedure of Morris27 and Brüncker et al28 is utilized to obtain θ^s,L2 (see supplementary material).

4. WORKED EXAMPLE

We use data from the hypothetical trial for depression in Section 2.2 to demonstrate how to compute the naive ( θ^2,N), the UMVCUE ( θ^2,UMV), the unbiased ( θ^2,U), the single‐iteration bias‐adjusted ( θ^2,SI), the multiple‐iteration bias‐adjusted ( θ^2,MI), the first shrinkage ( θ^2,L1), and the second shrinkage ( θ^2,L2) estimates. We also use the example to demonstrate differences among the various estimates in a single trial. The data and the various estimates are summarized in Table 2. The explicit computations for the various estimates and the R program used are provided in the supplementary material. Here, we only give explicit details of computing θ^2,UMV and θ^2,U as they are easier to compute, and since based on the simulations in the next section, we recommend θ^2,UMV.

Table 2.

Worked example data and estimates

Data and Summary Measures
Stage 1 Partitions Stage 2 Partitions Estimating θ 2
Measure Subgroup 1 2 3 4 1 s = 2 Estimator Estimate
Sample Partition n 11 = 90 n 12 = 90 n 13 = 90 n 14 = 90 n 21 = 120 n 22 = 120
θ^2,N
2.614
size Subgroup m 11 = 90 m 12 = 180 m 13 = 270 m 14 = 360 m 21 = 120 m 22 = 240
θ^2,UMV
2.839
Sample Partition
τ112=2.178
τ122=2.178
τ132=2.178
τ142=2.178
τ212=1.633
τ222=1.633
θ^2,U
2.965
variance Subgroup
σ112=2.178
σ122=1.089
σ132=0.726
σ142=0.544
σ212=1.633
σ222=0.817
θ^2,SI
2.633
Sample Partition
x¯11=3
x¯12=2
x¯13=0.8
x¯14=0
x¯21=3
x¯22=2.4
θ^2,MI
2.666
mean Subgroup
y¯11=3
y¯12=2.5
y¯13=1.93
y¯14=1.45
y¯21=3
y¯22=2.7
θ^2,L1
2.164
θ^2,L2
2.194

For the UMVCUE ( θ^2,UMV) given by expression (3), the first term θ^s,N=θ^2,N=2.614. Furthermore, σ1s2=σ122=1.089 and σ2s2=σ222=0.817 so that σ2s2σ1s2+σ2s2=σ222σ122+σ222=0.592 and σ1s2+σ2s2σ1s2=σ122+σ222σ122=1.268. Since ( p i − p i − 1) = 0.25 for all i = 1,…,4, then u (the observed value for U) is given by

u=minp3b(p3p2)x¯13p2,p4bi=34(pipi1)x¯1ip2=min(0.75×2)(0.25×0.8)0.5,2[0.25×(0.8+0)]0.5=2.6.

Note that f (b) = 1.268 × (2.614 − 2) = 0.779 and f (u) = 1.268 × (2.614 − 2.6) = 0.018, so that substituting into expression (3), θ^2,UMV=2.839.

For the unbiased estimator ( θ^2,U) given by expression (4), we make the following calculations. The naive estimates for partitions 1 and 2 are δ^1,N=[(90×3)+(120×3)]/210=3 and δ^2,N=[(90×2)+(120×2.4)]/210=2.229, respectively. Note that τ212τ112+τ212=τ222τ212+τ222=0.837 and τ112+τ212τ112=τ212+τ222τ212=0.896. Since p i − p i − 1 = 0.25(i = 1,…,4), v1=10.25p2b0.25i=1i12x¯1i = 4 × [(0.5 × 0.2) − (0.25 × 2)] = 2 and v2=10.25p2b0.25i=1i22x¯1i=4×[(0.5×0.2)(0.25×3)]=1, respectively. For partition 1,

w1=minp3b0.25i=1i13x¯1i0.25,p4b0.25i=1i14x¯1i0.25=min(0.75×2)0.25×(2+0.8)0.25,20.25×(2+0.8+0)0.25=3.2.

Similarly, for partition 2, w 2 = 2.2. Then, for partition 1, f (v 1) = 0.896 × (3 − 2) = 0.896 and f (w 1) = 0.896 × (3 − 3.2) = −0.179, and for partition 2, f (w 2) = 0.896 × (2.229 − 1) = 1.101 and f (w 2) = 0.896 × (2.229 − 2.2) = 0.026. Now, we have all components required to obtain UMVCUEs for the effects in partitions 1 and 2, which give δ^1,UMVCUE=3.272 and δ^2,UMVCUE=2.657, respectively. The unbiased estimate is the weighted sum of the UMVCUEs in the partitions giving θ^2,U=2.965.

The estimates θ^2,UMV, θ^2,U, θ^2,SI, and θ^2,MI are greater than θ^2,N (see Table 2). This may be explained by the observation in Section 5.2 that, in some scenarios, the naive estimator is negatively biased. The estimate θ^2,SI is slightly smaller than θ^2,MI. Again, this may be explained by an observation in Section 5.2 that, for all scenarios in the simulation study, on average, the single‐iteration bias‐adjusted estimator gives a smaller estimate than the multiple‐iteration estimator.

5. SIMULATIONS TO COMPARE THE VARIOUS ESTIMATORS

5.1. Simulations setting

To evaluate the properties of the various estimators, we conducted simulations with σ 2 = 1 and b = 0. We initially consider the case of K = 4 and p i − p i − 1 = 0.25(i = 1,…,4). In all simulations, if the trial continues to stage 2, the combined stages 1 and 2 sample size is set to be 800. For example, if the stage 1 sample size is 400 patients, the stage 2 sample size is 400. The available patients in stage 1 are equally split among the four partitions and treatment arms. For example, with 400 patients in stage 1, in each partition, 50 patients are randomly allocated to each of the control and experimental treatment. Similarly, the patients available for testing in stage 2 are equally split among the partitions that continue to stage 2 and among the treatment arms. Hence, with 400 patients available in stage 2, if F is selected, the patient allocation in stage 2 is as in stage 1 with 400 patients. If S 2 is selected so that two partitions are tested in stage 2, in each partition, 100 patients are randomly allocated to each of the control and experimental treatment. We perform simulations for three cases of stage 1 sample size (200, 400, and 600 patients). Taking the combined stages 1 and 2 to be 800 patients is justified in the supplementary material.

We consider seven scenarios with true treatment effects as summarized in Table 3. The selection rule and estimators developed are aimed at identifying predictive effects, but since we are estimating mean differences, the methods are valid with or without prognostic effects. If the biomarker has no predictive effect but has a prognostic effect, we are in a scenario of equal treatment effects in all partitions. Scenarios 1, 3, and 7 could be such cases. If there are prognostic and predictive effects, we are in a scenario of unequal treatment effects in partitions. Scenarios 2, 4, 5, and 6 could be such cases. In Scenarios 1 to 3, the right decision is to continue to stage 2 with F, but with decreasing probability of selecting F. The right decisions for Scenarios 4 to 6 are to continue with S 3, S 2, and S 1, respectively. The ideal decision for Scenario 7 is to stop at stage 1. The probabilities for various decisions for different scenarios when stage 1 includes 200 patients (25 in each treatment arm in each partition) are also given in Table 3. These have been calculated using expressions in Section 2.4 and in the supplementary material. As expected, the probability of stopping the trial at stage 1 (last column) increases as the treatment effects in partitions become less than b in more partitions (from 0.007 for Scenario 1 to 0.482 for Scenario 7). In each of Scenarios 4 to 6, the probability of continuing with F is substantially larger than the probability of making the right decision, demonstrating that, in some configurations, decision making is challenging. In Section 5.2.1, simulations show that incorrect decisions tend to be made when observed means are substantially different from the true means and hence lead to bias.

Table 3.

Treatment effects and probabilities of different decisions for the various scenarios in the simulation study (probabilities of correct decisions are in bold)

Treatment Effect Probability of a Decision (n 1 = 200)
Scenario δ 1 δ 2 δ 3 δ 4 Ideal Selection F S 3 S 2 S 1 Stop
1  0.3  0.3  0.3  0.3 F 0.983 0.005 0.003 0.002 0.007
2  0.2  0.1  0.1  0.1 F 0.812 0.049 0.035 0.034 0.070
3  0.0  0.0  0.0  0.0 F 0.500 0.083 0.070 0.073 0.274
4  0.1  0.0  0.0 −0.2 S 3 0.430 0.179 0.093 0.093 0.205
5  0.1  0.0 −0.2 −0.1 S 2 0.362 0.112 0.179 0.115 0.232
6  0.1 −0.2 −0.1 −0.1 S 1 0.298 0.098 0.104 0.214 0.286
7 −0.1 −0.1 −0.1 −0.1 Stop 0.240 0.083 0.087 0.108 0.482

Table 4 gives probabilities of various decisions when the stage 1 sample sizes are 400 and 600. For scenario 3, where treatment effects are equal in all partitions and equal to the futility boundary, the probabilities of various decisions are approximately equal for different stage 1 sample sizes. For the other scenarios, by comparing the probabilities in bold, the probability of making a correct decision increases with stage 1 sample size.

Table 4.

Probabilities of different decisions for different stage 1 sample sizes for various scenarios in the simulation study (probabilities of correct decisions are in bold)

Ideal Probability of a Decision (n 1=400) Probability of a Decision (n 1=600)
Scenario Selection F S 3 S 2 S 1 Stop F S 3 S 2 S 1 Stop
1 F 0.9987 0.0004 0.0002 0.0002 0.0005 0.99988 0.00004 0.00002 0.00002 0.00004
2 F 0.8944 0.0312 0.0212 0.0200 0.0332 0.93711 0.02033 0.01326 0.01213 0.01717
3 F 0.5000 0.0833 0.0698 0.0734 0.2735 0.50000 0.08333 0.06981 0.07342 0.27344
4 S 3 0.4013 0.2286 0.0983 0.0971 0.1747 0.37973 0.26859 0.10095 0.09838 0.15235
5 S 2 0.3085 0.1220 0.2386 0.1261 0.2048 0.27015 0.12853 0.28802 0.13147 0.18183
6 S 1 0.2266 0.0977 0.1156 0.2939 0.2662 0.17916 0.09454 0.12250 0.35893 0.24487
7 Stop 0.1587 0.0724 0.0842 0.1157 0.5690 0.11034 0.06193 0.07895 0.11756 0.63122

For each of the seven scenarios and three different stage 1 sample sizes, we simulated stage 1 data for N = 1 000 000 trials. For each trial, the subpopulation with the largest simulated sample mean difference ≥0 continues to stage 2. If no subpopulation fulfills this, the trial stops. We consider estimation conditional on continuing to stage 2 and so bias and MSE for each estimator are evaluated based on simulated trials that continue to stage 2. Using θ^s,SI for illustration, for each s s=1,,4, bias and MSE are calculated as bias(θ^s,SI)=i=1N(θ^i,SIθi)1[i=s]/i=1N1[i=s] and MSE(θ^s,SI)=i=1N(θ^i,SIθi)21[i=s]/i=1N1[i=s].

5.2. Simulation results

5.2.1. Comparing biases for the various estimators

Figure 2 summarizes biases when the stage 1 sample size is 200 (top plots) and 600 patients (bottom plots). Plots for the case where the stage 1 sample size is 400 patients are provided in the supplementary material. Plots in Columns 1 to 4 correspond to the cases of selecting F, S 3, S 2, and S 1, respectively. The y‐axes correspond to biases divided by approximate standard errors (SEs). The approximate SE=4/(m1s+m2s) and so SEs are only equal when F is selected (Column 1). Although SEs are not equal, we will later observe from the boxplots of the estimates that the trend for bias is the same when bias is not divided by SE. The x‐axes correspond to the seven scenarios. As per the legend, biases for different estimators are distinguished by different line types. Estimators θ^s,UMV and θ^s,U are not included in Figure 2 because they are mean unbiased. For Scenario 1, the probabilities for selecting S 3, S 2, and S 1 are low and so simulations results are highly variable when S 3, S 2, or S 1 is selected but this does not change the general findings in this paper.

Figure 2.

sim7831-fig-0002

Biases in units of approximate standard error for different configurations. The dotted line is the point of no bias. Other line types correspond to different estimators. SE, standard error

We first describe the results for the case where the stage 1 sample size is 200 (top row). When F is selected, the naive estimator ( θ^s,N) and the first shrinkage estimator ( θ^s,L1) are the same and correspond to the line showing the largest biases. Focusing on the naive estimator, the bias when F is selected (Column 1) is positive in all scenarios. For scenarios where the right decision is to continue with F (Scenarios 1 to 3, see Table 3), bias when F is selected is attributable to the futility rule with the bias negligible when the effect in F is substantially larger than the futility boundary (Scenario 1). When the right decision is not to continue with F (Scenarios 4 to 7) but F is selected, the impact of selection and futility on bias would increase and consequently give a larger bias. Still focusing on the top row, when S 3 is selected (Column 2), the naive estimator for θ 3 is negatively biased for some scenarios and positively biased for other scenarios. The explanation for this pattern is given in the supplementary material. Comparing the bias when F, S 3, S 2, and S 1 are selected (Columns 1 to 4), the bias is smallest when S 1 is selected. This can be attributed partly to the enrichment, where the stage 2 sample size is fixed regardless of the size of the population selected so that when S 1 is selected, proportionally, there are more unbiased stage 2 data to estimate θ 1 compared to when F, S 3, or S 2 is selected. In summary, note that, in some scenarios, the bias of the naive estimator is substantial and so it is essential to use an estimator that corrects for subpopulation selection.

Still focusing on the top row, when F is selected, practically, the single‐iteration bias corrected estimator θ^s,SI is mean unbiased, especially for Scenarios 1 to 3 where the correct decision is to select F. When S 3 is selected, θ^s,SI almost eradicates bias in Scenarios 3 to 7 and is better than the naive estimator in Scenarios 1 and 2. When S 2 or S 1 is selected, θ^s,SI eradicates almost all bias in Scenarios 2 to 7 but does not do so in Scenario 1. In all scenarios, the line for the multiple‐iteration bias‐adjusted estimator ( θ^s,MI) is always slightly above that of θ^s,SI. Hence, comparing θ^s,SI and θ^s,MI, when θ^s,SI is negatively biased, θ^s,MI is preferable, whereas θ^s,SI is preferable when it is positively biased.

Comparing biases for the naive estimator for different stage 1 sample sizes (top versus bottom plots), as also indicated by expression (2), the bias increases with the proportion of stage 1 data. Increase in bias is also seen for both the single‐iteration ( θ^s,SI) and multiple‐iteration ( θ^s,MI) bias‐adjusted estimators. From the bottom row, θ^s,SI and θ^s,MI perform worst when some partitions that should be dropped at stage 1 continue to stage 2 or when some partitions that should continue to stage 2 are dropped. As before, the line for θ^s,MI is above that of θ^s,SI with the distances between the lines increasing with stage 1 sample size.

The pattern of the shrinkage estimators is best understood by considering all results in Figure 2. In all cases, the line for the first shrinkage estimator ( θ^s,L1) overlaps or is above that of the second shrinkage estimator ( θ^s,L2). Estimator θ^s,L1 performs similar to or better than θ^s,L2 when the selected subpopulation consists of partitions that should continue to stage 2 such as when F is selected in Scenarios 1 to 3 and such as when S 3 is selected in Scenarios 1 to 4. Estimator θ^s,L2 performs better than θ^s,L1 when the selected subpopulation consists of partitions that should not continue to stage 2 such as when F is selected in Scenarios 4 to 7 and such as when S 3 is selected in Scenarios 5 to 7.

In almost all scenarios, the two shrinkage estimators perform worse than the other estimators that account for adaptation. One reason for this may be the fact that the shrinkage estimators do not account for stopping for futility. When F is selected, the naive estimator is the same as the first shrinkage estimator. This is because the stage 1 estimate in partition i is Ĉ+X¯1i+(1Ĉ+)Y¯1K so that the shrinkage estimator shrinks to the effect in the full population, that is, to Y¯1K=(X¯11++X¯1K)/K. A reasonable alternative would be to use a weighted mean of Y¯11, Y¯12, …, Y¯1K. For example, if we shrink to (Y¯11++Y¯1K)/K, in terms of sample means in partitions, we are shrinking to a weighted sum such that for i < i ′, X¯1i has more weight than X¯1i. In such a case, shrinkage estimators will be closer to the naive estimators when fewer partitions are selected (see additional simulations in the supplementary material).

5.2.2. Comparing MSEs for the various estimators

Mean squared errors for the various estimators are given in Figure 3. The y‐axes are root mean squares ( RMSE=MSE) divided by approximate SEs. The best shrinkage estimator in terms of bias (either θ^s,L1 or θ^s,L2 depending on the scenario) has smaller or practically the same MSEs as the naive estimator. Hence, the best shrinkage estimators may be considered to be better than the naive estimator in terms of MSE. The challenge, however, is determining the best shrinkage estimator since the true treatment means are unknown.

Figure 3.

sim7831-fig-0003

Root mean squares in units of approximate standard error for different configurations. Different line types correspond to different estimators. MSE, mean squared error; SE, standard error

Since estimators that extend the works of Kimani et al ( θ^s,U) and of Robertson et al ( θ^s,UMV) are mean unbiased, their MSEs are variances. When S 1 is selected, by derivation, the two estimators are the same and, hence, have equal MSE. For any other selection, as expected, θ^s,UMV has smaller MSE than θ^s,U. The differences increase with stage 1 sample size (top versus bottom plots) and the size of the selected subpopulation (right to left panels). The MSEs of θ^s,U and θ^s,UMV are mostly larger than the MSEs for all the other estimators with the differences substantial when selection is performed later in the trial.

In general, the MSEs for the single‐iteration ( θ^s,SI) and multiple‐iteration ( θ^s,MI) bias‐adjusted estimators are practically the same. Hence, since their biases are also similar, the two estimators are approximately equivalent and so it is sufficient to compare one of them to the other estimators. The MSE for θ^s,SI is larger than that of the naive estimator ( θ^s,N) in most cases while it is always smaller than the MSEs for the unbiased estimators ( θ^s,U and θ^s,UMV).

5.2.3. Comparing the estimators using both bias and MSE

Comparing the shrinkage estimators ( θ^s,L1 and θ^s,L2) to the naive estimator ( θ^s,N), we prefer θ^s,N. This is because although a shrinkage estimator sometimes has a smaller MSE, it can have substantially higher bias than θ^s,N (for example, compare Columns 4 in Figures 2 and 3).

Comparing the single‐iteration bias‐adjusted estimator ( θ^s,SI) and the naive estimator ( θ^s,N), when F is selected, θ^s,SI is preferable as it reduces bias substantially and has smaller MSE. However, when S 1 is selected, θ^s,N is better as it has smaller MSE and it does not differ from θ^s,SI in terms of bias. When S 3 or S 2 is selected, θ^s,N is better when bias is not substantial (Scenarios 3 and 4), whereas for Scenarios 5 to 7, θ^s,SI is better as it reduces bias and its MSE is better or only slightly higher than that of θ^s,N. Overall, we consider θ^s,SI as a better estimator than θ^s,N as it performs better in cases with substantial bias.

When F is selected, the bias of the naive estimator ( θ^s,N) is substantial and compared to the UMVCUE ( θ^s,UMV), we prefer the latter since the difference in RMSE between the two estimators is smaller than the bias eradicated. When S 1 is selected, we would also recommend θ^s,UMV over θ^s,N as the former is mean unbiased in all scenarios, with the only case where it is not clearly superior due to high RMSE being when n 1 = 600. The conclusion when S 3 or S 2 is selected is the same as when S 1 is selected, that is, θ^s,UMV is better than θ^s,N.

Comparing the single‐iteration bias‐adjusted estimator θ^s,SI to the UMVCUE θ^s,UMV, we recommend the latter since, when F is selected, θ^s,SI has substantial bias that is larger than the difference in RMSE between it and θ^s,UMV. In addition, when S 1 is selected, the difference in RMSE between the two estimators is smaller than the bias of θ^s,SI. Consequently, based on the performance across the scenarios in the simulation study, we recommend θ^s,UMV when an adaptive threshold enrichment design is used.

For a more detailed comparison of the estimators, Figures 4 and 5 give boxplots of simulated estimates for Scenarios 1 (top plots), 4 (middle plots), and 6 (bottom plots) described in Table 3 when F and S 3 are selected. The boxplots emphasize the findings summarized above. As an example, when n 1 = 600(Figure 5), for Scenario 6 (bottom left panel), almost all naive estimates are above the true value and θ^s,UMV performs well in that case. From the left panels, we note the unbiased estimators ( θ^s,UMV and θ^s,U) have substantially higher variances compared to the other estimators.

Figure 4.

sim7831-fig-0004

Boxplots of estimates for different estimators when n 1 = 200. Results have been chosen when F and S 3 were selected and for Scenarios 1 (top panels), 4 (middle panels), and 6 (bottom panels). The dashed lines correspond to the true means in the selected subpopulation

Figure 5.

sim7831-fig-0005

Boxplots of estimates for different estimators when n 1 = 600. Results have been chosen when F and S 3 were selected and for Scenario 1 (top panels), 4 (middle panels), and 6 (bottom panels). The dashed lines correspond to the true means in the selected subpopulation

5.2.4. Summary findings and recommendations from the simulation study

The bias of the naive estimator can be substantial, and so it is essential to use an estimator that corrects for the decision made using stage 1 data. We recommend the estimator that follows the work of Robertson et al ( θ^s,UMV) since it is mean unbiased. Although it has larger MSE than some estimators, the bias eradicated in most cases was larger than the difference in RMSEs. Although the simulation study was based on four partitions and specific treatment effect scenarios, we expect similar findings for other configurations (that is, more candidate partitions and/or different effect sizes). Simulations for the case of 8 partitions are in the supplementary material.

We have recommended one estimator for all scenarios. An alternative is a hybrid estimator where the recommended estimator ( θ^s,SI or θ^s,UMV) depends on the subpopulation selected. This is suitable if investigators are willing to sacrifice unbiasedness for more precision. In this case, before the trial, a simulation study based on plausible scenarios would be required to compare bias and MSE conditional on the selected subpopulation.

6. DISCUSSION

Acknowledging that different patients may require different care has led to trial designs that incorporate assessment of treatment effects in different subsets of the population. Most statistical methodologies for such designs focus on hypothesis testing.2, 5, 24, 26, 29, 30, 31, 32, 33 In this paper, we have considered point estimation following an adaptive threshold enrichment clinical trial. We have assessed bias for the naive estimator when different subpopulations are selected. Depending on the scenario, the bias of the naive estimator of the treatment effect in the selected subpopulation is substantial and can be negative or positive. There is thus a need for new estimators. Building on estimators that have been proposed for treatment selection, we have derived several estimators that account for subpopulation selection. By derivation, two estimators are mean unbiased. In this paper, we have recommended the best among these two, that is, the UMVCUE. An alternative is a hybrid estimator where different estimators are recommended based on the selected subpopulation. This would require a simulation study before the trial and is suitable if investigators can accept some unbiasedness for a more precise estimator.

We have considered a specific selection rule but the proposed estimators can be modified for other selection rules. For example, it may be desired that different subpopulations have different futility boundaries. Futility boundaries may be based on factors such as subpopulation prevalence, and sponsor and public health gains.34 Another factor is safety where the futility boundary may be chosen to reflect investigators' willingness to accept moderate efficacy if the new treatment is substantially safer than the control. The selection rule we have used specifies that a higher biomarker value leads to a smaller treatment effect. If this is a misspecification of the relationship between the biomarker and treatment effect, the unbiased estimators will remain so because we condition on the selection rule. However, the probability of making the right decision will be low and we anticipate that the naive estimator will have more bias and that the unbiased estimators will have higher MSE.

In the derivations, we have not required the prevalences in different partitions to be equal. If the biomarker values are approximately continuous, then it is reasonable to subdivide the full population into equal partitions as we have done in the example and the simulations. Other numerical biomarker values may be discrete with few possible values, leading to partitions with varying sizes.

We have assumed the number of patients in each partition, and hence prevalence, is known. For the case of two partitions and a fixed cut‐off value, taking the stage 1 number of patients in a partition to have a binomial distribution, Kimani et al7 showed that using stage 1 prevalence estimates in the expressions for the unbiased estimators provides unbiased estimates for the treatment effects. This extends to the case of more than two partitions, where numbers of patients in partitions are taken to have a multinomial distribution. The proof is based on the fact that the estimator in a partition is unbiased conditional on the number of patients in an interval and that the proportion of patients in a partition is unbiased for the prevalence in the partition. The proof for the case of estimating the cut‐off values using stage 1 data is similar.

Conditional on continuing to stage 2, we have derived estimators for the effect in the selected subpopulation. Continuing to stage 2 is necessary for the unbiased estimators. This is not the case for the other estimators as they involve obtaining stage 1 estimates in all partitions that correct for the subpopulation selection and then combine them with the stage 2 unbiased estimates. Hence, estimates for effects in the dropped partitions that correct for subpopulation selection can be obtained using the shrinkage and bias‐adjusted estimators. However, they are not necessarily mean unbiased.

Methods developed for normally distributed data following treatment selection have been adapted for time‐to‐event data.28 Even after assuming asymptotic normality of the log hazard ratio, some of the estimators we have derived such as the UMVCUE may not be valid for time‐to‐event data. For example, if there is a quantitative interaction with hazard ratios in different partitions being unequal, a model that accounts for this is required. In this case, obtaining separate estimates for each partition is the valid approach.

Finally, since in all simulations, the combined stages 1 and 2 sample size was 800, for the different stage 1 sample sizes considered, there would be no savings or losses in terms of the cost of treating patients. The saving/loss is only made in terms of costs associated with biomarker testing. Hence, the case for performing subpopulation selection with a small proportion of patients can be justified if the biomarker is expensive, leading to savings if F is selected. The case for performing subpopulation selection with a large proportion of patients is justifiable if the biomarker is not expensive. In this case, the resources loss is not substantial if F is selected and yet, if only a part of the population will benefit, there is a higher probability of making the right decision that may improve power. The setting of fixed total sample size is sometimes referred to as enrichment because if some partitions are dropped in stage 2, the number of patients recruited from partitions in stage 2 is higher than if more partitions were selected. To save money on treatment costs or reduce the total sample size, subpopulation selection could be performed early, with no enrichment in stage 2. With no enrichment, the number of patients in a partition in stage 2 is fixed. The statistical properties of the estimators for the setting with no enrichment can be evaluated as in the case of enrichment.

Supporting information

SIM7831‐sup‐0001‐supplementary information.pdf

ACKNOWLEDGEMENTS

This work was funded by the UK Medical Research Council (grant MR/N028309/1). We thank the editor, the associate editor, and two reviewers for comments that greatly improved this paper.

Kimani PK, Todd S, Renfro LA, Stallard N. Point estimation following two‐stage adaptive threshold enrichment clinical trials. Statistics in Medicine. 2018;37:3179–3196. 10.1002/sim.7831

REFERENCES

  • 1. Ballman KV. Biomarker: predictive or prognostic? J Clin Oncol. 2015;33:3968‐3971. [DOI] [PubMed] [Google Scholar]
  • 2. Simon N, Simon R. Adaptive enrichment designs for clinical trials. Biostatistics. 2013;14(4):613‐625. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Mok TS, Wu Y‐L, Thongprasert S, et al. Gefitinib or carboplatin‐paclitaxel in pulmonary adenocarcinoma. N Engl J Med. 2009;361(10):947‐957. [DOI] [PubMed] [Google Scholar]
  • 4. Tran HT, Liu Y, Zurita AJ, et al. Prognostic or predictive plasma cytokines and angiogenic factors for patients treated with pazopanib for metastatic renal‐cell cancer: a retrospective analysis of phase 2 and phase 3 trials. Lancet Oncol. 2012;13(8):827‐837. [DOI] [PubMed] [Google Scholar]
  • 5. Renfro LA, Coughlin CM, Grothey AM, Sargent DJ. Adaptive randomized phase II design for biomarker threshold selection and independent evaluation. Chin Clin Oncol. 2014;3(1):3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Li W, Chen C, Lia X, Beckmanb RA. Estimation of treatment effect in two‐stage confirmatory oncology trials of personalized medicines. Statist Med. 2017;36:1843‐1861. [DOI] [PubMed] [Google Scholar]
  • 7. Kimani PK, Todd S, Stallard N. Estimation after subpopulation selection in adaptive seamless trials. Statist Med. 2015;34:2581‐2601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Kunzmann K, Benner L, Kieser M. Point estimation in adaptive enrichment designs. Statist Med. 2017;36(25):3935‐3947. [DOI] [PubMed] [Google Scholar]
  • 9. Cohen A, Sackrowitz HB. Two stage conditionally unbiased estimators of the selected mean. Stat Probab Lett. 1989;8:273‐278. [Google Scholar]
  • 10. Stallard N, Todd S. Point estimates and confidence regions for sequential trials involving selection. J Stat Plan Infer. 2005;135:402‐419. [Google Scholar]
  • 11. Bowden J, Glimm E. Unbiased estimation of selected treatment means in two‐stage trials. Biom J. 2008;50(4):515‐527. [DOI] [PubMed] [Google Scholar]
  • 12. Bauer P, Koenig F, Brannath W, Posch M. Selection and bias—two hostile brothers. Statist Med. 2010;29:1‐13. [DOI] [PubMed] [Google Scholar]
  • 13. Kimani PK, Todd S, Stallard N. Conditionally unbiased estimation in phase II/III clinical trials with early stopping for futility. Statist Med. 2013;32(17):2893‐2910. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Carreras M, Brannath W. Shrinkage estimation in two‐stage adaptive designs with midtrial treatment selection. Statist Med. 2013;32(10):1677‐1690. [DOI] [PubMed] [Google Scholar]
  • 15. Robertson DS, Prevost AT, Bowden J. Unbiased estimation in seamless phase II/III trials with unequal treatment effect variances and hypothesis‐driven selection rules. Statist Med. 2016a;35(22):3907‐3922. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Stallard N, Kimani PK. Uniformly minimum variance conditionally unbiased estimation in multi‐arm multi‐stage clinical trials. Biometrika. 2018;105:495‐501. [Google Scholar]
  • 17. Robertson DS, Prevost AT, Bowden J. Accounting for selection and correlation in the analysis of two‐stage genome‐wide association studies. Biostatistics. 2016b;17(4):634‐649. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Whitehead J. On the bias of maximum likelihood estimation following a sequential test. Biometrika. 1986;73(3):573‐581. [Google Scholar]
  • 19. Hwang JT. Empirical Bayes estimation for the means of the selected populations. Indian J Stat A. 1993;55:285‐311. [Google Scholar]
  • 20. Uher R, Tansey KE, Malki K, Perlis RH. Biomarkers predicting treatment outcome in depression: what is clinically significant? Pharmacogenomics. 2012;13(2):233‐240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Rush AJ, Fava M, Wisniewski SR, et al. Sequenced treatment alternatives to relieve depression (STAR*D): rationale and design. Control Clin Trials. 2004;25(1):119‐142. [DOI] [PubMed] [Google Scholar]
  • 22. Uher R, Maier W, Hauser J, et al. Differential efficacy of escitalopram and nortriptyline on dimensional measures of depression. Br J Psychiatry. 2009;194:252‐259. [DOI] [PubMed] [Google Scholar]
  • 23. Tsao AS, Liu S, Lee JJ, et al. Clinical outcomes and biomarker profiles of elderly pretreated NSCLC patients from the BATTLE trial. J Thorac Oncol. 2012;7(11):1645‐1652. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Jiang W, Freidlin B, Simon R. Biomarker‐adaptive threshold design: a procedure for evaluating treatment with possible biomarker‐defined subset effect. J Natl Cancer Inst. 2007;99(13):1036‐1043. [DOI] [PubMed] [Google Scholar]
  • 25. Grothey A, Strosberg JR, Renfro LA. A randomized, double‐blind, placebo‐controlled phase II study of the efficacy and safety of Monotherapy Ontuxizumab (MORAb‐004) plus best supportive care in patients with chemorefractory metastatic colorectal cancer. Clin Cancer Res. 2018;24(2):316‐325. [DOI] [PubMed] [Google Scholar]
  • 26. Joshi A, Zhang J, Fang L. Statistical design for a confirmatory trial with a continuous predictive biomarker: a case study. Contemp Clin Trials. 2017;63:19‐29. [DOI] [PubMed] [Google Scholar]
  • 27. Morris CN. Parametric empirical Bayes inference: theory and applications. J Am Stat Assoc. 1983;78:47‐55. [Google Scholar]
  • 28. Brückner M, Titman A, Jaki T. Estimation in multi‐arm two‐stage trials with treatment selection and time‐to‐event endpoint. Statist Med. 2017;36:3137‐3153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Liu A, Liu C, Li Q, Yu KF, Yuan VW. A threshold sample‐enrichment approach in a clinical trial with heterogeneous subpopulations. Clin Trials. 2010;7:537‐545. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Wason J, Marshall A, Dunn J, Stein RC, Stallard N. Adaptive designs for clinical trials assessing biomarker‐guided treatment strategies. Br J Cancer. 2014;110:1950‐1957. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Spencer AV, Harbron C, Mander A, Wason J, Peers I. An adaptive design for updating the threshold value of a continuous biomarker. Statist Med. 2016;35:4909‐4923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Antoniou M, Jorgensen AL, Kolamunnage‐Dona R. Biomarker‐guided adaptive trial designs in phase II and phase III: a methodological review. PLOS ONE. 2016;11(2):e0149803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Simon N, Simon R. Using Bayesian modeling in frequentist adaptive enrichment designs. Biostatistics. 2018;19(1):27‐41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Ondra T, Jobjörnsson S, Beckman RA, et al. Optimizing trial designs for targeted therapies. PLOS ONE. 2016;11(9):e0163726. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SIM7831‐sup‐0001‐supplementary information.pdf


Articles from Statistics in Medicine are provided here courtesy of Wiley

RESOURCES