Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Oct 15.
Published in final edited form as: Am Stat. 2012 Oct 15;66(3):155–162. doi: 10.1080/00031305.2012.720900

Estimation of Progression-Free Survival for All Treated Patients in the Randomized Discontinuation Trial Design

Theodore G Karrison 1, Mark J Ratain 2, Walter M Stadler 3, Gary L Rosner 4
PMCID: PMC3769804  NIHMSID: NIHMS493558  PMID: 24039273

Abstract

The randomized discontinuation trial (RDT) design is an enrichment-type design that has been used in a variety of diseases to evaluate the efficacy of new treatments. The RDT design seeks to select a more homogeneous group of patients, consisting of those who are more likely to show a treatment benefit if one exists. In oncology, the RDT design has been applied to evaluate the effects of cytostatic agents, that is, drugs that act primarily by slowing tumor growth rather than shrinking tumors. In the RDT design, all patients receive treatment during an initial, open-label run-in period of duration T. Patients with objective response (substantial tumor shrinkage) remain on therapy while those with early progressive disease are removed from the trial. Patients with stable disease (SD) are then randomized to either continue active treatment or switched to placebo. The main analysis compares outcomes, for example, progression-free survival (PFS), between the two randomized arms. As a secondary objective, investigators may seek to estimate PFS for all treated patients, measured from the time of entry into the study, by combining information from the run-in and post run-in periods. For t ≤ T, PFS is estimated by the observed proportion of patients who are progression-free among all patients enrolled. For t > T, the estimate can be expressed as Ŝ(t) = OR × ŜOR(t − T) + SD × ŜSD(t − T), where OR is the estimated probability of response during the run-in period, SD is the estimated probability of SD, and ŜOR(t − T) and ŜSD(t − T) are the Kaplan–Meier estimates of subsequent PFS in the responders and patients with SD randomized to continue treatment, respectively. In this article, we derive the variance of Ŝ(t), enabling the construction of confidence intervals for both S(t) and the median survival time. Simulation results indicate that the method provides accurate coverage rates. An interesting aspect of the design is that outcomes during the run-in phase have a negative multinomial distribution, something not frequently encountered in practice.

Keywords: Confidence limits, Enrichment design, Negative multinomial distribution, Phase II clinical trials

1. INTRODUCTION

In phase II oncology clinical trials, the primary endpoint has traditionally been objective response (OR), defined as a 50% or greater, or 30% or greater, reduction in tumor size depending upon whether the sum of bidimensional or unidimensional measurements are used (Therasse et al. 2000). Although this endpoint may be appropriate when the experimental agent is cytotoxic in nature (i.e., kills tumor cells, leading to measurable tumor shrinkage), it is not suitable for cytostatic drugs (i.e., drugs that act primarily by slowing tumor growth rather than shrinking tumors; Korn et al. 2001). While these agents may produce a certain number of objective responders, their primary mode of action is to delay tumor growth. Ratain et al. (2006) reported the results of a randomized discontinuation trial (RDT) of the drug sorafenib in patients with metastatic renal cell cancer. Sorafenib targets both tumor cells and the tumor vasculature and, if active, was believed more likely to delay growth rather than cause significant tumor regression. The data from this trial are analyzed in Section 3. Another important issue in the era of molecularly targeted therapy is the need to identify which patients are most likely to benefit from a given treatment regimen. As noted by Stadler (2007), however, this can be quite difficult, particularly during early-phase drug development. The RDT design (Rosner, Stadler, and Ratain 2002) offers a means to “select” a cohort most likely to benefit from a given treatment and to rigorously evaluate its disease-stabilizing activity.

2. THE RANDOMIZED DISCONTINUATION TRIAL DESIGN IN ONCOLOGY

The RDT is an enrichment design in which all patients receive the experimental treatment during an initial, open-label run-in period of length T. Enrichment designs select subjects for participation in a subsequent randomized comparison phase of a study on the basis of their response during the run-in phase. The objective is to select a subset that is relatively homogeneous and more likely to show a treatment benefit, thereby increasing statistical power. In the RDT design as typically applied in oncology, patients with an OR during the lead-in phase remain on therapy while those with early progressive disease (PD) are removed from the trial. Patients with stable disease (SD) are randomized to either continue active treatment or switched to placebo. The main analysis compares outcomes, for example, progression-free survival (PFS), between the two randomized arms measured from the point of randomization. The design is shown schematically in Figure 1.

Figure 1.

Figure 1

Randomized discontinuation trial design. The online version of this figure is in color.

The efficiency or power of the RDT design relative to standard, up-front randomization in which patients are assigned to drug or placebo at initial entry into the trial depends on the degree to which enrichment is achieved by eliminating early disease progressors. It must also be assumed that there is no lasting carryover effect from the run-in period for those patients subsequently randomized to placebo. Note that one aspect of the RDT that differs from usual enrichment-type designs is that patients with the most favorable initial outcome, OR, are not randomized, but remain on therapy for ethical reasons. Evaluations of the RDT compared with other designs can be found in Kopec, Abrahamowicz, and Esdaile (1993), Leber and Davis (1998), Capra (2004), Freidlin and Simon (2005), and Fu, Dowlati, and Schluchter (2009). RDT designs are employed much less frequently than standard (“up-front”) randomized designs, but their use is increasing. An Ovid MEDLINE search using the search phrases “Randomized Discontinuation” or “Randomized Withdrawal” found a total of 37 published trials employing the RDT design from 2002 to 2012, eight of which were cancer trials.

As with any enrichment design, a limitation of the RDT is that it does not provide an estimate of the overall efficacy of the agent in the unselected patient population. If an RDT trial is positive, it is likely that any follow-up phase III trial will need to be performed in an unselected population and will employ up-front randomization, which in turn generally requires assumptions about PFS rates for trial design purposes. While it would appear that the data from an RDT are not helpful in this regard, at least one side of the coin can be recovered, namely, the PFS rate among all treated patients measured from the time of enrollment into the study. This, together with historical data on outcomes following standard therapy, could then be used to postulate an effect size (hazard ratio [HR], say) in an unselected population. One can also estimate the HR in an unselected population based on the HR in the RDT and the randomization rate, together with assumptions regarding the HR in the nonrandomized patients. In the phase II RDT trial of sorafenib, Ratain et al. (2006) provided such a PFS estimate as a secondary analysis. The procedure requires combining information from the run-in and post run-in periods. The purpose of this article is to show that the method provides an unbiased estimate and to derive point-wise confidence intervals (CIs) for the PFS curve. Once this is achieved, CIs for the median survival time can be obtained using, for example, the procedure described in Brookmeyer and Crowley (1982).

The next section describes the methodology. The data from the published renal cell cancer trial of Ratain et al. are then used to illustrate the approach. Simulation results are presented in Section 5 to verify the validity of the proposed technique, followed by a summary and discussion.

3. METHODS

As discussed above, following the main analysis from an RDT it may be useful to estimate PFS for all treated patients, measured from the time of entry into the study. This can be performed by combining information from the run-in and post run-in periods. For our purposes, it is useful to conceive of the true PFS curve as S(t) for t ≤ T and S(t) = Pr(OR) SOR(t − T) + Pr(SD) SSD(t − T) for t > T, where Pr(OR) is the probability of being a responder at time T, Pr(SD) is the probability of having SD at time T, and SOR(t − T) and SSD(t − T) are the subsequent PFS probabilities in the responders and patients with SD at time T, respectively. More succinctly, for t > T, S(t) is the probability of not progressing during the run-in period (duration T) and surviving progression free a further t − T units. We assume that any patients who were eligible for randomization but dropped out did so for reasons unrelated to outcome.

For t ≤ T, PFS can be estimated simply by the observed proportion of patients progression-free at time t among all patients enrolled, with CIs obtained using the standard normal approximation to the binomial. For t > T, an estimate can be obtained as

S^(t)=p^OR×S^OR(tT)+p^SD×S^SD(tT), (1)

where OR is the estimated probability of response during the run-in period, SD is the estimated probability of SD, and ŜOR(t − T) and ŜSD(t − T) are Kaplan and Meier (1958) estimates of PFS in the responders and patients with SD randomized to continue treatment, respectively. One would expect ŜOR(t − T) to exceed ŜSD(t − T), that is, patients who achieve an early response should have better subsequent outcomes than those who exhibit only SD during the run-in period.

Let X1 denote the number of objective responders during the run-in period, X2 the number of patients with disease progression, and nSD the number with SD. In planning RDT studies, the primary focus is on the comparison of the two randomized groups; therefore, nSD is generally fixed by design, chosen to provide a desired level of power to detect a true difference between the two arms. Consequently, X = (X1, X2) has a negative multinomial distribution with parameters k = nSD, m1, and m2 (see, e.g., Bishop, Fienberg, and Holland 1975, BFH). Using Equation (13.8-3) of BFH for the mixed factorial moments of the negative multinomial distribution, we have

E(Xi)=miV(Xi)=mi(mi+k)kcov(X1,X2)=m1m2k.

Unbiased estimates of pOR, the probability of response during the run-in period, and pSD, the probability of SD, are OR = X1/(n − 1) and SD = (nSD − 1)/(n − 1), where n = X1 + X2 + nSD, with variances

V(p^OR)=X1(nX1)n2(n1) (2)

and

V(p^SD)=nSD(nnSD)n2(n1), (3)

respectively (Haldane 1945). Note that replacing (n − 1) by n in Equations (2) and (3) results in the familiar variance estimates obtained in the multinomial case when n, rather than nSD, is fixed.

Letting f1(X1, X2) = OR, f2(X1, X2) = SD, and σij = cov(Xi, Xj), the δ-method gives

cov(p^OR,p^SD)i,j=12(f1Xi)(f2Xj)σij=(n1X1(n1)2)((nSD1)(n1)2)m1(m1+nSD)nSD+(n1X1(n1)2)((nSD1)(n1)2)m1m2nSD+(X1(n1)2)((nSD1)(n1)2)m1m2nSD+(X1(n1)2)((nSD1)(n1)2)m2(m2+nSD)nSD,

which after reduction and substitution of Xi for mi becomes

cov(p^OR,p^SD)X1·nSDn3, (4)

again a familiar expression. The expectation and variance of Ŝ(t) can now be obtained straightforwardly by first conditioning on (X1, X2). For the mean,

E[S^(t)]=E[E(S^(t)X1,X2)]=E[p^ORSOR(tT)+p^SDSSD(tT)]=pORSOR(tT)+pSDSSD(tT)=S(t).

And for the variance,

V(S^(t))=E{V(S^(t)X1,X2)}+V{E(S^(t)X1,X2)}=E{p^OR2V(S^OR(tT))+p^SD2V(S^SD(tT))}+V{p^ORSOR(tT)+p^SDSSD(tT)},=V(S^OR(tT))E(p^OR2)+V(S^SD(tT))E(p^SD2)+(SOR(tT))2V(p^OR)+(SSD(tT))2V(p^SD)+2SOR(tT)SSD(tT)cov(p^OR,p^SD)=V(S^OR(tT))[V(p^OR)+pOR2]+V(S^SD(tT))×[V(p^SD)+pSD2]+(SOR(tT))2V(p^OR)+(SSD(tT))2V(p^SD)+2SOR(tT)SSD(tT)×cov(p^OR,p^SD), (5)

where we have used the fact that the Kaplan–Meier estimators are unbiased.

The variance can be calculated from (5) by replacing true values with their respective estimates, using Greenwood’s formula for V (ŜOR(t − T)) and V (ŜSD(t − T)), and applying Equations (2)-(4).

As an aside, an alternative way of estimating S(t) is to multiply the probability of remaining progression-free during the run-in period by the weighted average of the PFS curves for the two subgroups treated beyond the run-in period. Specifically, using the above notation,

S^(t)=X1+nSDX1+X2+nSD×[X1X1+nSDS^OR(tT)+nSDX1+nSDS^SD(tT)]. (6)

This was the approach taken by Ratain et al. (2006) in analyzing the data from the renal cell cancer study (although no standard error was provided). Equation (6) is equivalent to Equation (1) except for minor adjustments.

4. EXAMPLE

Ratain et al. (2006) reported the results of an RDT in renal cell carcinoma. A total of 202 patients were enrolled in the trial: 14 patients discontinued treatment before the 12 week assessment (12 due to adverse events, 1 patient withdrew consent, and 1 was lost to follow-up), therefore the analysis is based on n = 188 patients. Seventy-three patients had tumor shrinkage of 25% or more during the 12-week run-in phase and, per protocol, remained on open-label sorafenib, while 6 other patients also were continued on open-label drug at the discretion of the treating physician; 44 patients progressed or died during the run-in phase; 65 patients had SD at 12 weeks (with the exception of one patient with PD who was inadvertently randomized), of whom 32 were randomly assigned to continue sorafenib and 33 to receive placebo. Thus, using the above notation, nSD = 65, X1 = 79, and X2 = 44. Of note, the dataset analyzed here is not identical to the dataset analyzed in the 2006 publication—the latter includes updated follow-up in a handful of cases. PFS from week 12 in the two randomized arms was significantly different in the two groups (p = 0.0087) with a median of 24 weeks in the sorafenib arm compared with 6 weeks in the placebo group.

Our interest is in estimating PFS following treatment with sorafenib, measured from the date of initial entry into the trial. Figure 2 shows the estimated curve along with 95% CIs using the method described in Section 3. Six-month and one-year PFS rates were 49.0% (95% CI: 40.9%–58.7%) and 11.7% (95% CI: 5.5%–25.0%), respectively. The median PFS time was 176 days.

Figure 2.

Figure 2

Estimated progression-free survival (solid line) and pointwise 95% confidence intervals (dotted lines) following treatment with sorafenib, measured from start of treatment. Confidence intervals are based on the log transformation. The online version of this figure is in color.

A good way to see how the procedure works graphically is to consider the alternative form of the estimator shown in Equation (6). The proportion of patients alive and without progression at the end of the run-in period is X1+nSDX1+X2+nSD=79+6569+44+65=0.766. This quantity, multiplied by the weighted average of the curves for the OR and SD subgroups, yields the overall estimate for t > T. The two components and the final estimate are shown in Figure 3.

Figure 3.

Figure 3

Estimated progression-free survival (solid line) and OR and SD components (dashed lines). For T > 84 days, the solid line is the weighted average of the dashed lines (weights = 0.549 and 0.451, respectively).

From the pointwise confidence limits for the survival curve, a CI for the median PFS time can be obtained as the set of all time points t such that 0.50 lies within the 95% CI generated at time t (Brookmeyer and Crowley 1982). This gives an asymmetrical interval ranging from 163 to 244 days. Of interest, the RDT trial was followed by a phase III, up-front randomized clinical trial (Escudier et al. 2007) that also found a statistically significant benefit for sorafenib (p < 0.001) with a median PFS of 5.5 months (168 days) in the sorafenib arm, consistent with that calculated above from the RDT.

5. SIMULATION STUDY

A simulation study was undertaken to confirm the accuracy of (5) and to assess coverage rates of CIs. We simulated a typical phase II trial in which patients are accrued over an interval [0, a] and follow-up is continued beyond the accrual period for an additional time f. The number of patients entering the randomized phase, nSD, was fixed and outcomes (OR, SD, PR) during the run-in period were drawn randomly until nSD SDs occurred. More specifically, the entry time for the ith patient, ei, was drawn from a uniform distribution over [0, a]. The run-in period, T, was added to the entry time (thus, it was assumed that no patients would dropout or be lost to follow-up during the run-in phase) and for patients with outcomes of OR or SD, the subsequent time to disease progression/death, ui, was drawn from one of two Weibull distributions:

SOR(u)=exp((λ1u)p1)SSD(u)=exp((λ2u)p2).

An equal number of patients, nSD/2, were allocated to the treatment and placebo arms in the SD subgroup and only the former entered into the subsequent calculations. If, for the ith patient, ei + T + uia + f, the event was observed, whereas if ei + T + ui > a + f, the patient was censored at a + f. An additional random censoring mechanism following an exponential distribution with rate parameter λc was also incorporated to allow for losses to follow-up postrandomization. All calculations were performed in SAS, version 9.2 (Cary, NC). After generation of the data for a simulated trial, Equation (1) was used to estimate the overall PFS curve and the variance was calculated via (5). Pointwise 95% CIs were then determined based on four different transformations g (Klein and Moeschberger 1997):

g(S^(tj))±1.96(g(S^(tj)))2V(S^(tj)),j=1,2,,k. (7)

Transformations examined were identity (untransformed), log, complementary log–log, and logit. Note that if the Kaplan–Meier estimate of the survival rate at some time point becomes zero, the standard error from Greenwood’s formula is also zero. In these cases, the standard error from the previous time point was used in Equation (5) when calculating V (Ŝ(t)), and for the log, complementary log–log, and logit based CIs, the interval obtained from the identity transformation was employed instead. Finally, when the Kaplan–Meier estimate was undefined due to censoring (last observation censored prior to the time point in question), the replicate was excluded from the calculations for that time point.

Tables 1-3 show results based on 10,000 replications for three different scenarios and sample sizes nSD = 60, 80, and 100. The three scenarios correspond to SD rates of 60%, 70%, and 45% and response rates during the run-in period of 15%, 10%, and 5%, respectively. In all three scenarios, the accrual period, a, was set to 48 weeks, the follow-up period, f, to 36 weeks, and the run-period, T, to 12 weeks. The true overall survival rate, S(t), for t = 18, 24, …, 48 weeks is displayed in column 4 of the tables.

Table 1.

Simulation results, first scenarioa. R = 10,000 replications, except as indicated in the footnotes

nSD Mean n t S(t) Mean Ŝ(t) Empirical Std Dev Mean V(S^(t)) Coverage rates (%)
Un Log CLL Logit
60 99.9 18 0.5852 0.5849 0.0608 0.0599 94.2 94.1 95.0 95.0
24 0.4298 0.4295 0.0644 0.0631 93.9 94.7 94.6 94.9
30 0.3114 0.3109 0.0560 0.0591 93.8 94.9 94.7 95.2
36 0.2259 0.2262b 0.0531 0.0526 93.5 95.0 94.9 95.3
42 0.1652 0.1661c 0.0471 0.0467 94.0 96.1 96.0 96.5
48 0.1221 0.1232d 0.0423 0.0427 95.0 96.0 97.7 96.8
80 133.4 18 0.5852 0.5858 0.0524 0.0519 94.3 94.4 95.0 95.2
24 0.4298 0.4301 0.0555 0.0547 94.4 94.7 94.9 95.0
30 0.3114 0.3120 0.0525 0.0513 94.0 94.7 94.7 95.1
36 0.2259 0.2270 0.0467 0.0456 93.5 94.4 94.7 94.8
42 0.1652 0.1667e 0.0414 0.0403 93.6 94.9 94.8 95.2
48 0.1221 0.1244f 0.0372 0.0364 94.2 95.4 96.6 96.3
100 166.7 18 0.5852 0.5848 0.0465 0.0465 94.6 95.0 94.9 95.1
24 0.4298 0.4296 0.0490 0.0490 94.6 94.8 95.2 95.1
30 0.3114 0.3113 0.0457 0.0459 94.5 95.0 95.0 95.1
36 0.2259 0.2261 0.0409 0.0409 94.1 95.2 94.8 95.3
42 0.1652 0.1655g 0.0368 0.0359 93.6 94.9 94.7 95.1
48 0.1221 0.1229h 0.0326 0.0322 94.2 95.8 95.9 96.3

Means are mean values and Empirical Std Dev is Monte Carlo standard deviation over 10,000 replications. Un: untransformed scale, CLL: complementary log–log.

a

pOR = 0.15, pSD = 0.60, λ1 = 0.003, p1 = 1.7, λ2 = 0.009, p2 = 1.2, λc = 0.0005.

b

R = 9999.

c

R = 9877.

d

R = 9405.

e

R = 9948.

f

R = 9600.

g

R = 9987.

h

R = 9754.

Table 3.

Simulation results, third scenarioa. R = 10,000 replications, except as indicated in the footnotes.

nSD Mean n t S(t) Mean Ŝ(t) Empirical Std Dev Mean V(S^(t)) Coverage rates (%)
Un Log CLL Logit
60 133.4 18 0.4391 0.4385b 0.0482 0.0474 94.5 94.7 95.0 95.1
24 0.3563 0.3559c 0.0510 0.0497 94.4 94.7 94.8 94.9
30 0.2779 0.2776c 0.0496 0.0490 94.1 95.4 95.0 95.5
36 0.2120 0.2118d 0.0464 0.0457 93.6 95.0 94.8 95.4
42 0.1605 0.1607e 0.0427 0.0418 93.3 95.0 94.8 95.2
48 0.1221 0.1230f 0.0388 0.0381 93.1 95.7 95.5 96.0
80 177.9 18 0.4391 0.4392g 0.0409 0.0411 94.9 95.0 95.2 95.4
24 0.3563 0.3566g 0.0429 0.0432 95.0 94.9 95.4 95.4
30 0.2779 0.2784h 0.0423 0.0424 94.7 95.0 95.2 95.3
36 0.2120 0.2125h 0.0399 0.0397 94.0 94.9 94.8 95.1
42 0.1605 0.1611i 0.0368 0.0363 93.7 94.8 94.7 95.0
48 0.1221 0.1227j 0.0338 0.0330 92.7 95.2 94.8 95.2
100 222.3 18 0.4391 0.4384 0.0370 0.0367 94.6 94.8 94.8 94.9
24 0.3563 0.3560 0.0389 0.0386 94.6 94.8 95.0 94.9
30 0.2779 0.2775 0.0383 0.0379 94.5 95.2 95.0 95.3
36 0.2120 0.2116 0.0357 0.0355 94.4 95.1 95.1 95.3
42 0.1605 0.1602k 0.0328 0.0325 93.7 95.2 94.9 95.3
48 0.1221 0.1218l 0.0303 0.0296 92.8 95.1 94.5 95.2

Means are mean values and Empirical Std Dev is Monte Carlo standard deviation over 10,000 replications. Un: untransformed scale, CLL: complementary log–log.

a

pOR = 0.05, pSD = 0.45, λ1 = 0.001, p1 = 1.9, λ2 = 0.006, p2 = 1.4, λc = 0.0005.

b

R = 9983.

c

R = 9980.

d

R = 9975.

e

R = 9931.

f

R = 9771.

g

R = 9997.

h

R = 9996.

i

R = 9988.

j

R = 9934.

k

R = 9998.

l

R = 9979.

Overall the simulation results support the validity of the approach. The PFS estimator (1) is seen to have minimal bias. The estimated standard deviation based on Equation (5) was quite close on average to the empirical standard deviation in all cases. The complementary log–log and logit transformations appear to provide the best coverage rates, although some have observed that the latter may be slightly conservative (Link 1984).

6. DISCUSSION

The RDT is conducted in two stages. The first stage treats all patients with the new agent and potentially identifies a sensitive subpopulation. This subpopulation is then evaluated in the second stage in a randomized manner. The efficiency of the RDT relative to a standard, two-arm-randomized trial depends on its ability to identify a sensitive subpopulation (if one exists) and this, in turn, depends on the total sample size and the duration of the run-in period. Stadler (2007) pointed out, for example, that too short a run-in period may reduce the ability to detect an active agent, due to lack of enrichment.

The primary focus of the RDT is on the randomized comparison. Following this analysis, interest may turn to estimating survival in the population of all treated patients measured from the time of initial entry into the trial. This estimate, together with historical control data, would be useful for planning a follow-on phase III trial. At first glance, it would appear that no such estimate is afforded by the RDT design; however, as described above, an estimator can be recovered from the data by piecing together information from the two phases of the trial. While we have focused on estimation of PFS, the method is also applicable to the estimation of overall survival. This would be accomplished by estimating S(t) for tT by the proportion of survivors at time t (here S(t) denotes overall survival) and extending Equation (1) to

S^(t)=p^OR×S^OR(tT)+p^SD×S^SD(tT)+p^PD×S^PD(tT),t>T,

where PD is the estimated probability of PD during the run period and ŜPD(t − T) is the Kaplan–Meier estimate of subsequent survival in patients with disease progression during the run-in. A caveat in oncology trials, however, is that once a patient’s disease progresses they are usually administered additional therapies that may affect their subsequent survival course.

The RDT design is generally set up by fixing the number of patients randomized to obtain enough power for the comparison of the randomized arms. Thus, the method described here assumes that the number of patients with SD, nSD, is fixed by design. This leads to the negative multinomial distribution for the number of responders and early progressors during the run-in period. The differences, though, are minor relative to the results that would be obtained if the total sample size, n, were treated as fixed. As discussed in Section 3, fixing the total sample size instead of the number of SDs changes the variances of the respective probability estimates by a factor of (n−1)/n. That is, the variances for OR and SD in Equations (2) and (3) have n in the denominator instead of (n−1). One could then carry that through in the calculations of the final variance in Equation (5). In fact, another approach to designing RDTs considers n, T, and T2, where T2 is the duration of the postrandomization follow-up period, as “tuning parameters” (Trippa, Rosner, and Muller 2012). Trippa et al. described a Bayesian decision-theoretic approach for choosing these parameters in an optimal manner. For practical sample sizes, the methods presented here should provide appropriate estimates under either set of assumptions.

The method should work well with reasonable sample sizes. Additional simulations run under the first scenario with nSD equal to 40 and 20 (mean n = 67 and 33) continued to have minimal bias and good coverage rates. We did employ a δ-method approximation to obtain the covariance of OR and SD in Equation (4). And Greenwood’s formula is used in Equation (5) to obtain the variance of the Kaplan–Meier estimators. It is well known that when censoring is heavy, Greenwood’s formula can underestimate the variance in the tails of the survival distribution (Peto et al. 1977).

We have not addressed other approaches to predict phase III results from a positive RDT. It would also be desirable to estimate the HR for a standard randomized controlled trial, on the basis of an RDT. This cannot be precisely estimated, since the HR for nonrandomized patients is unknown. However, given the increasing use of the RDT design in oncology, additional statistical work will be important to maximize the value of the ensuing data.

Table 2.

Simulation results, second scenarioa. R = 10,000 replications, except as indicated in the footnotes

nSD Mean n t S(t) Mean Ŝ(t) Empirical Std Dev Mean V(S^(t)) Coverage rates (%)
Un Log CLL Logit
60 85.7 18 0.6699 0.6693b 0.0618 0.0614 93.8 93.9 95.0 95.2
24 0.5200 0.5190b 0.0707 0.0699 94.0 94.5 95.1 95.4
30 0.3913 0.3901c 0.0708 0.0696 93.7 94.6 94.8 95.2
36 0.2906 0.2896d 0.0652 0.0647 93.7 95.2 95.1 95.7
42 0.2155 0.2145e 0.0589 0.0585 93.2 95.6 95.5 96.3
48 0.1612 0.1616f 0.0542 0.0542 93.3 96.2 97.1 97.1
80 114.3 18 0.6699 0.6701 0.0534 0.0532 94.0 94.2 94.8 95.0
24 0.5200 0.5204 0.0614 0.0606 94.3 94.4 94.9 95.2
30 0.3913 0.3922 0.0614 0.0605 94.2 94.8 95.0 95.2
36 0.2906 0.2911 0.0568 0.0563 93.6 94.8 94.8 95.2
42 0.2155 0.2158g 0.0519 0.0510 93.2 94.7 94.6 95.1
48 0.1612 0.1620h 0.0472 0.0462 93.1 95.5 95.4 96.0
100 142.8 18 0.6699 0.6707 0.0484 0.0477 94.2 93.9 94.8 95.0
24 0.5200 0.5202 0.0546 0.0543 94.5 94.5 95.0 95.3
30 0.3913 0.3913 0.0542 0.0542 94.7 95.1 95.3 95.5
36 0.2906 0.2902 0.0508 0.0504 94.3 95.0 95.0 95.3
42 0.2155 0.2149i 0.0456 0.0456 93.8 95.2 94.9 95.3
48 0.1612 0.1611j 0.0422 0.0412 92.9 95.1 94.8 95.3

Means are mean values and Empirical Std Dev is Monte Carlo standard deviation over 10,000 replications. Un: untransformed scale, CLL: complementary log–log.

a

pOR = 0.10, pSD = 0.70, λ1 = 0.002, p1 = 1.8, λ2 = 0.007, p2 = 1.3, λc = 0.0005.

b

R = 9996.

c

R = 9993.

d

R = 9991.

e

R = 9948.

f

R = 9696.

g

R = 9989.

h

R = 9874.

i

R = 9998.

j

R = 9952.

Contributor Information

Theodore G. Karrison, Email: tkarrison@health.bsd.uchicago.edu, Department of Health Studies, University of Chicago, 5841 S. Maryland Ave., MC2007, Chicago, IL 60637.

Mark J. Ratain, Email: mratain@medicine.bsd.uchicago.edu, Department of Medicine, University of Chicago, 5841 S. Maryland Ave., MC2115, Chicago, IL 60637.

Walter M. Stadler, Email: wmstadler@medicine.bsd.uchicago.edu, Department of Medicine, University of Chicago, 5841 S. Maryland Ave., MC2115, Chicago, IL 60637.

Gary L. Rosner, Email: grosner1@johnshopkins.edu, Department of Medicine, Johns Hopkins University, 550 N. Broadway, Suite 1103, Baltimore, MD 21205.

References

  1. Bishop YMM, Fienberg SE, Holland PW. Discrete Multivariate Analysis. Cambridge: MIT Press; 1975. [Google Scholar]
  2. Brookmeyer R, Crowley J. A Confidence Interval for the Median Survival Time. Biometrics. 1982;38:29–41. [Google Scholar]
  3. Capra W. Comparing the Power of the Discontinuation Design to That of the Classic Randomized Design on Time-to-Event Endpoints. Controlled Clinical Trials. 2004;25:168–177. doi: 10.1016/j.cct.2003.11.005. [DOI] [PubMed] [Google Scholar]
  4. Escudier B, Eisen T, Stadler WM, Szczylik C, Oudard S, Siebels M, Negrier S, Chevreau C, Solska E, Desai AA, Rolland F, Demkow T, Hutson TE, Gore M, Freeman S, Schwartz B, Shan M, Simantov R, Bukowski RM TARGET Study Group. Sorafenib in Advanced Clear-Cell Renal-Cell Carcinoma. New England Journal of Medicine. 2007;356:125–134. doi: 10.1056/NEJMoa060655. [DOI] [PubMed] [Google Scholar]
  5. Freidlin B, Simon R. Evaluation of Randomized Discontinuation Design. Journal of Clinical Oncology. 2005;23:5094–5098. doi: 10.1200/JCO.2005.02.520. [DOI] [PubMed] [Google Scholar]
  6. Fu P, Dowlati A, Schluchter M. Comparison of Power Between Randomized Discontinuation Design and Upfront Randomization Design on Progression-Free Survival. Journal of Clinical Oncology. 2009;27:4135–4141. doi: 10.1200/JCO.2008.19.6709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Haldane JBS. On a Method of Estimating Frequencies. Biometrika. 1945;33:222–225. doi: 10.1093/biomet/33.3.222. [DOI] [PubMed] [Google Scholar]
  8. Kaplan EL, Meier P. Nonparametric Estimation From Incomplete Observations. Journal of the American Statistical Association. 1958;53:457–481. [Google Scholar]
  9. Klein JP, Moeschberger ML. Survival Analysis: Techniques for Censored and Truncated Data. New York: Springer-Verlag; 1997. [Google Scholar]
  10. Kopec JA, Abrahamowicz M, Esdaile JM. Randomized Discontinuation Trials: Utility and Efficiency. Journal of Clinical Epidemiology. 1993;46:959–971. doi: 10.1016/0895-4356(93)90163-u. [DOI] [PubMed] [Google Scholar]
  11. Korn EL, Arbuck SG, Pluda JM, Simon R, Kaplan RS, Christian MC. Clinical Trial Designs for Cytostatic Agents: Are New Approaches Needed? Journal of Clinical Oncology. 2001;19:265–272. doi: 10.1200/JCO.2001.19.1.265. [DOI] [PubMed] [Google Scholar]
  12. Leber PD, Davis CS. Threats to the Validity of Clinical Trials Employing Enrichment Strategies for Sample Selection. Controlled Clinical Trials. 1998;19:178–187. doi: 10.1016/s0197-2456(97)00118-9. [DOI] [PubMed] [Google Scholar]
  13. Link C. Confidence Intervals for the Survival Function using Cox’s Proportional-Hazard Model With Covariates. Biometrics. 1984;40:601–610. [PubMed] [Google Scholar]
  14. Peto R, Pike MC, Armitage P, Breslow NE, Cox DR, Howard SV, Mantel N, McPherson K, Peto J, Smith PG. Design and Analysis of Randomized Clinical Trials Requiring Prolonged Observation of Each Patient, II: Analysis and Examples. British Journal of Cancer. 1977;35:1–39. doi: 10.1038/bjc.1977.1. Statistical Note 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Ratain MJ, Eisen T, Stadler WM, Flaherty KT, Kaye SB, Rosner GL, Gore M, Desai AA, Patnaik A, Xiong HQ, Rowinsky E, Abbruzzese JL, Xia C, Simantov R, Schwartz B, O’Dwyer PJ. Phase II Placebo-Controlled Randomized Discontinuation Trial of Sorafenib in Patients With Metastatic Renal Cell Carcinoma. Journal of Clinical Oncology. 2006;24:2505–2512. doi: 10.1200/JCO.2005.03.6723. [DOI] [PubMed] [Google Scholar]
  16. Rosner GL, Stadler W, Ratain MJ. Randomized Discontinuation Design: Application to Cytostatic Antineoplastic Agents. Journal of Clinical Oncology. 2002;20:4478–4484. doi: 10.1200/JCO.2002.11.126. [DOI] [PubMed] [Google Scholar]
  17. Stadler WS. The Randomized Discontinuation Trial: A Phase II Design to Assess Growth-Inhibitory Agents. Molecular Cancer Therapeutics. 2007;6:1180–1184. doi: 10.1158/1535-7163.MCT-06-0249. [DOI] [PubMed] [Google Scholar]
  18. Therasse P, Arbuck SG, Eisenhauer EA, Wanders J, Kaplan RS, Rubinstein L, Verweij J, Van Glabbeke M, van Oosterom AT, Christian MC, Gwyther SG. New Guidelines to Evaluate the Response to Treatment in Solid Tumors. Journal of the National Cancer Institute. 2000;92:205–216. doi: 10.1093/jnci/92.3.205. [DOI] [PubMed] [Google Scholar]
  19. Trippa L, Rosner GL, Muller P. Bayesian Enrichment Strategies for Randomized Discontinuation Trials. Biometrics. 2012;68:203–212. doi: 10.1111/j.1541-0420.2011.01623.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES