Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Feb 1.
Published in final edited form as: Clin Trials. 2013 Sep 30;11(1):38–48. doi: 10.1177/1740774513500589

A Bayesian Adaptive Phase I-II Clinical Trial for Evaluating Efficacy and Toxicity with Delayed Outcomes

Joseph S Koopmeiners 1,3, Jaime Modiano 2,3
PMCID: PMC3946437  NIHMSID: NIHMS507175  PMID: 24082004

Abstract

Background

In traditional phase I oncology trials, the safety of a new chemotherapeutic agent is tested in a dose escalation study to identify the maximum tolerated dose, which is defined as the highest dose with acceptable toxicity. An alternate approach is to jointly model toxicity and efficacy and allow dose finding to be directed by a pre-specified trade-off between efficacy and toxicity. With this goal in mind, several designs have been proposed to jointly model toxicity and efficacy in a phase I-II dose escalation study. A factor limiting the use of these designs is that toxicity and efficacy must be observed in a timely manner.

Purpose

One approach to overcoming this problem is to model toxicity and efficacy as time-to-event outcomes. This would allow new subjects to be enrolled before full information is available for previous subjects while incorporating partial information when adaptively assigning new subjects to a dose level.

Methods

We propose a phase I-II dose escalation study for evaluating toxicity and efficacy with delayed outcomes by jointly modeling toxicity and efficacy as time-to-event outcomes. We apply our proposed design to a phase I-II clinical trial of a novel targeted toxin for canine hemangiosarcoma.

Results

Our simulation results show that our design identifies the optimal dose at a similar rate to dose finding that treats toxicity and efficacy as binary outcomes but with a substantial savings in study duration.

Limitations

Our proposed design has acceptable operating characteristics and dramatically reduces the trial duration compared to a design that considers toxicity and efficacy as binary outcomes but comes at the cost of enrolling additional subjects when all dose levels are unacceptable.

Conclusions

We developed a novel phase I-II design that accounts for delayed outcomes by modeling toxicity and efficacy as time-to-event outcomes. Our design has similar operating characteristics to efficacy/toxicity trade-off designs that consider efficacy and toxicity as binary outcomes but with a dramatically shorter study duration.

Keywords: Bayesian adaptive design, Dose finding, Phase 1 clinical trial

1 Introduction

In traditional phase I oncology trials, the safety of a new chemotherapeutic agent is tested in a dose escalation study to identify the maximum tolerated dose (MTD), which is defined as the highest dose with acceptable toxicity. Phase I dose escalation studies can be broadly classified into two groups: rule-based designs and model based designs. Rule-based designs, of which the 3+3 design is the most common example [1], rely on a simple algorithm to guide dose escalation and identify the MTD. In contrast, model-based designs utilize a simple, typically parametric, model for the probability of toxicity, which guides dose escalation and identifies the MTD. The most common model-based design is the continual reassessment method (CRM) [2]. The CRM uses a one-parameter model for the probability of toxicity and assigns subjects to dose levels based on the current estimate of the maximum tolerated dose using the outcomes for all previous subjects. The CRM has since been adapted to a variety of scenarios to address its shortcomings and meet the needs of clinicians who run phase I oncology trials [3].

An alternate approach is to jointly model toxicity and efficacy and allow dose finding to be directed by a pre-specified trade-off between efficacy and toxicity. With this goal in mind, several phase I-II designs have been proposed to jointly model toxicity and efficacy in phase I-II dose escalation studies. Braun [4] uses a copula model to jointly model toxicity and efficacy. Thall and Cook [5] take a similar approach to jointly modeling efficacy and toxicity and introduce a contour for considering the trade-off between toxicity and efficacy. Zhang et al. [6] propose a trivariate CRM that models toxicity and efficacy as an ordered categorical variable with three levels: no toxicity or efficacy, efficacy without toxicity and toxicity with or without efficacy. A factor limiting the use of these designs is that toxicity and efficacy must be observed in a timely manner. This is particularly problematic for the efficacy outcome, which is often measured over a longer time-frame than the toxicity outcome. In this case, toxicity and efficacy must be modeled as time-to-event outcomes in order to avoid unnecessary delays in enrolling subjects due to the delay in observing outcomes for the previous cohorts

Consider the following motivating example. Researchers at the University of Minnesota School of Veterinary Medicine would like to complete a phase I-II clinical trial of a novel targeted toxin for canine hemangiosar-coma. Ideally, dose finding would be guided by a toxicity/efficacy trade-off because it is anticipated that there is a dose where further escalation will increase toxicity without increasing efficacy. Toxicity will be recorded and graded according to the Veterinary Cooperative Oncology Group (VCOG) criteria [7]. Adverse events are recorded on a 1-5 scale where severe, unacceptable adverse events are graded as 4 and lethal events are graded as 5. Dose limiting toxicities will be defined in this trial as any grade 4 or 5 toxicity in the first 28 days. Clinical response, measured by diagnostic imaging, is usually used in phase I-II designs that consider a toxicity/efficacy trade-o, but imaging of hemangiosarcoma is complicated due to the vascularity of the tumor and the frequent occurrence of bleeding/clotting episodes that can add “mass” to a tumor, but that is not reflective of tumor growth. Instead, efficacy for this disease is defined as 6-month overall survival, which was chosen for comparison to the median overall survival of six months achievable with the standard of care [8]. This poses an obvious problem. A 6-month delay between cohorts is unacceptably long and would render the trial impractical. Modeling efficacy as a time-to-event outcome would allow additional cohorts to be enrolled during the 6-month follow-up and allow dose finding to be based on all information available at the time of enrollment.

Several approaches have been proposed for adapting the CRM to accommodate time-to-event outcomes for late-onset toxicities. Cheung and Chappell [9] introduce the time-to-event CRM (TITE-CRM), which models the probability of toxicity over time using a weight function and allows partial information to be incorporated into dose finding. Braun [10] takes a cure-rate model approach and shows that the cure-model likelihood is identical to the TITE-CRM likelihood derived by Cheung and Chappell [9]. Thall et al. [11] discuss a “look-ahead” approach and compare this to an approach where dose escalation decisions are based on all available data at subject enrollment. Finally, Bekele et al. [12] use predictive probabilities to control subject accrual in order to avoid excess toxicities in the event of late-onset toxicities.

In this manuscript, we present a phase I-II clinical trial for evaluating efficacy and toxicity with delayed outcomes. We model the time-to-toxicity with a cure-rate model, following the approach of Braun [10], and model the time-to-death with a mixture distribution that results in a similar parameterization to the cure-rate model for toxicity. A joint probability model for the time-to-toxicity and the time-to-death is developed using a copula model. Dose finding based on a trade-off between toxicity and efficacy is completed using the algorithm proposed Thall and Cook [5]. Our simulation results show that our design identifies the optimal dose at a similar rate to dose finding that treats toxicity and efficacy as binary endpoints but with a substantial savings in study duration.

The remainder of our manuscript proceeds as follows. In Section 2, we present a joint model for toxicity and efficacy with delayed outcomes and describe our dose-finding algorithm. In Section 3, we apply our proposed design to a phase I-II clinical trial of a novel targeted toxin for hemangiosarcoma and evaluate the operating characteristics of our design by simulation. Finally, we conclude with a brief discussion in Section 4.

2 Modeling Efficacy and Toxicity with Delayed Outcomes

2.1 Joint Probability Model

In this section, we propose a joint probability model for the time-to-toxicity and the time-to-death. To accomplish this, we specify univariate models for toxicity and death and derive a joint probability model using a copula approach. We begin by specifying a univariate model for toxicity

Let XT be the time-to-toxicity and let HT be the horizon for measuring toxicity. In our motivating example, HT is 28 days and we are interested in estimating the probability of XTHT. Following the approach of Braun [10], we assume a cure rate model for XT,

FT(xT|d)=θT(d)FT,1(xT),

where FT,1(xT) = 0 for xt≤ 0 and Ft,1(xT) = 1 for xtHT. In a standard cure model, 1 − θT(d) is the fraction of the population that is cured. In our case, θT(d) is the probability of experiencing toxicity before HT at dose level, d. This is our pre-determined parameter of interest and dose escalation will be, in part, based on estimates of θT(d). The relationship between dose and the probability of toxicity is often modeled using a simple, one-parameter model in early phase clinical trials but, in this case, we use a two-parameter logistic regression model,

logit(θT(d))=β0,T+β1,T(d1).

We note that subtracting one from the dose results in β0,T equaling the log odds of the probability of toxicity for dose one, which facilitates prior specification for β0,T. At this point, we choose to leave FT,1(xt) unspecified and will discuss potential options for parametric forms of FT,1(xT) below.

Ideally, we would follow the same approach for modeling the time-to-death because the cure model results in a useful parameterization for dose finding. Unfortunately, the cure model is inappropriate for modeling survival in our case. Instead, we take a mixture approach, which results in a similar parameterization that easily facilitates dose finding. Let XS be time-to-death and let HS be the horizon for measuring survival (6-months in our motivating example). Our primary interest lies in estimating the probability of XS>HS. We assume the following mixture model for XS,

Fs(xs|d)=θs(d)Fs,1(xs)+(1θs(d))Fs,2(xs),

where FS,1(xS) = 0 for xS≤ 0, FS,1(xS) = 1 for xSHS and FS,2(xS) = 0 for xSHS and FS,2(xS) 1 as xS∞. In this case, θS(d) is the probability of XSHS at dose level d. This distribution can be alternately expressed as a conditional distribution where 1{XSHS} is a Bernoulli random variable with success probability θS(d), XS follows FS,1(xS) if XSHS and XS follows FS,2(xS) if XS>HS.

The advantage of this mixture approach is that we can directly model the probability of XSHS and base dose finding on current estimates of θS(d). We also model θS(d) using a logistic regression model but this time include a quadratic term,

logit(1θs(d))=β0,s+β1,s(d1)+β2,s(d1)2

We include a quadratic term for the probability of 6-month survival to account for the possibility that a dose exists such that further escalation would increase the probability of toxicity but would not result in a survival benefit. We note that our efficacy outcome, XS>HS, occurs when an event is not observed before HS, which is why we specify our logistic regression model on (1 − θS (d)). This implies that we can not observe our efficacy outcome until a subject has survived beyond HS. This is in contrast to the typical phase I-II setting where efficacy is treatment response, in which case all information for efficacy is fully observed once response occurs. This does not impact our ability to model efficacy statistically but does result in longer trial durations.

We model the joint behavior of XT and XS using the Gumbel Copula [13]. Briefly, a copula model specifies a joint distribution on the unit square, from which a joint distribution of any two random variables can be derived using an inverse transformation. The joint CDF for the Gumbel Copula is,

F(xT,xs)=FT(xT)Fs(xs)+κFT(xT)(1FT(xT))Fs(xs)(1Fs(xs)),

where κ quantifies the correlation between XT and XS. κ takes values between -1 and 1, with κ = 0 implying independence, κ> 0 implying positive correlation and κ< 0 implying negative correlation. The joint PDF for the Gumbel Copula is

f(xT,xs)=fT(xT)fS(xS)+κfT(xT)(12FT(xT))fS(xS)(12FS(xS)),

where fT(xT) and fS(xS) are the PDFs of FT(xT) and FS(xS), respectively.

Let (xT,1, xS,1) , (xT,2, xS,2) ,…,(xT,n, xS,n) be pairs of time-to-toxicity and time-to-death, y1, y2,…,yn be the length of time each subject has been followed on the study and d1, d2,…,dn be the dose-level assigned to each subject. Subjects will be followed until HS and subjects still alive at HS will be considered censored. There are four potential likelihood contributions that depend on xT,i, xS,i and yi:

  • Scenario 1: xT,i≤yi and xS,i≤yi
    L1,i=θT(di)fT,1(xT,i)θS(di)fS,1(xS,i)(1+κ(12θT(di)FT,1(xT,i))(12θS(di)FS,1(xS,i)))
  • Scenario 2: xT,i>yi and xS,iyi
    L2,i=(1θT(di)FT,1(yi))θS(di)fS,1(xS,i)(1κθT(di)FT,1(yi)(12θS(di)FS,1(xS,i)))
  • Scenario 3: xT,iyi and xS,i>yi
    L3,i=θT(di)fT,i(xT,i)(1θs(di)FS,1(yi))(1κ(12θT(di)FT,1(xT,i))θS(di)FS,1(yi))
  • Scenario 4: xT,i>yi and xS,i>yi
    L4,i=(1θT(di)FT,1(yi))(1θs(di)FS,i(yi))(1+κθT(di)FT,1(yi)θS(di)FS,1(yi))

The full likelihood can be expressed as:

L(β|xT,xs,y)=i=1nL1,i1[xT,i,yi,xS,iyi]L2,i1[xT,i>yi,xS,iyi]L3,i1[xT,i,yi,xS,i>yi]L4,i1[xT,i>yi,xS,i>yi].

There are several important characteristics of this likelihood that are worth noting. First, by censoring subjects at time HS, we need not specify the form of FS,2(xS) as the likelihood does not depend on FS,2(xS). As a result, the likelihood contribution for toxicity and the likelihood contribution for survival are the same. By specifying a mixture model for XS, we are able to derive a model that is functionally equivalent to the cure-rate model specified for toxicity but without the unrealistic assumption that a fraction of the population has been cured and are not at-risk. Second, Braun [10] showed that, for the TITE-CRM, θT(di) and fT(xT,i) can be factored completely, which implies that the full conditional for β does not depend on fT(xT,i). That is no longer the case in our joint model unless κ = 0, implying independence between XT and XS. Finally, to this point, we have implicitly assumed that all subjects would survive past HT and that all subjects would eventually be fully observed for the toxicity outcome. It is also possible that a subject could die before HT. This is most likely to occur as a result of treatment, in which case the subject would have already experienced toxicity, but it is also possible that the subject could die from the disease or some other cause not related to treatment. In this case, we can simply censor the toxicity endpoint at the time of death and use the likelihood contribution for Scenario 2.

We must specify priors for all model parameters in order to complete a Bayesian analysis. For our motivating example, we assume normal priors for β0,T and β0,S (N(3, sd = 3) and N (1, 3), respectively), gamma(1/4,1/4) priors for β1,T and β1,S (corresponding to a mean of 1 and a variance of 4) and a N (0, 0.5) for β2,E. Our prior distributions for β0,T, β0,S, β1,T and β1,S represent moderately informative priors. In all cases, the prior mean is set equal to our prior expectation for each parameter and the prior variance set large enough to provide support for all reasonable parameter values. Our prior for β2,E represents strong prior belief against the inclusion of a quadratic term but should allow enough flexibility to accommodate strong departures from linearity in the logistic regression for 6-month survival. Finally, we put a non-informative, uniform(-1,1) prior on K. We will provide simulation results evaluating the operating characteristics of our study using these prior distributions in Section 3. In general, the prior distributions can have a strong impact on the operating characteristics of a Phase I-II study and we recommend that investigators complete a thorough simulation study to evaluate the sensitivity of the operating characteristics to various prior specification before implementing our design.

To this point, we have not specified the form of FT,1(xT) and FS,1(xS). We will consider two parametric forms for FT,1(xT) and FS,1(xS) and compare their performance in a variety of scenarios through simulation. The first approach is to assume uniform distributions for FT,1(xT) and FS,1(xS): XT∼Unif (0, HT) and XS∼ Unif (0, HS). The weight function investigated by Cheung and Chappell [9] results from assuming a uniform distribution for the time-to-toxicity. The second is to follow the approach of Braun [10] and assume Beta (aT, 1) and Beta (aS, 1) distributions for XTHT and XSHS, respectively. This should provide more flexibility than the uniform distribution, which is particularly important for the case of late-onset toxicity. We place an Exponential (1) prior on both aT and aS when fitting this model.

2.2 Dose Finding Algorithm

Several approaches to evaluating the trade-off between efficacy and toxicity have been proposed in the literature [4, 14, 5]. Braun [4] proposed a weighted Euclidean distance between the estimated probabilities of efficacy and toxicity and a set of target probabilities. Yin et al. [14] evaluate the trade-off between efficacy and toxicity using odds ratios. Finally, Thall and Cook [5] propose an efficacy/toxicity contour for evaluating the trade-off between efficacy and toxicity. The efficacy/toxicity contour proposed by Thall and Cook [5] is appealing in that it provides an intuitive framework for considering the trade-off between efficacy and toxicity and allows for great flexibility when designing a study. We will follow their approach in designing our study.

Let θT,max be the maximum acceptable probability of toxicity assuming all subjects survive past HS and S,min be the minimum acceptable probability of survival assuming no toxicity. Thall and Cook [5] use a weighted Lp norm to evaluate the toxicity/efficacy trade-off for a dose, dj,

δj=1((θT(dj)θT,max)p+(1θS(dj)1θS,min)p)1/p,

where p is determined by identifying a probability of toxicity and probability of survival combination, (θT,θS), that is equally desirable to (0, θS,min) and (θT,max, 1). p is determined by setting

((θTθT,max)p+(1θS1θs,min)p)1/p=1

and solving for p. In our example, we set θT,max = 0.50, θS,min = 0.55 and (θT,θS)=(0.40,0.70),= (0.40, 0.70), which results in p = 2.27.

In the context of our trial, we update the posterior whenever a new cohort is ready to be enrolled using all data available at that time. A dose level, dj, is considered acceptable if the posterior probability of δj> 0 exceeds a pre-specified threshold, π1,

p(δj>0|ys,yT,z,d)>π1. (1)

The study terminates for futility if no dose levels are considered acceptable. Otherwise, the acceptable dose level with maximum δj is considered optimal and the next cohort will be treated at that dose level under the restriction that untried dose levels may not be skipped when escalating The trial continues until the maximum sample size has been reached and the dose with maximum δj at study completion is declared the optimal dose.

In conclusion, our proposed dose-finding algorithm is as follows:

  1. Treat the first cohort of m patients at the lowest dose level.

  2. Update the posterior distribution using all available data when the next cohort is to be enrolled.

  3. Identify acceptable dose levels using criteria (1). The trial terminates for futility if no dose levels are acceptable.

  4. Calculate δj for dose levels j = 1,…,J.

  5. Treat the next cohort at the current estimate of the optimal dose under the restriction that untried dose levels may not be skipped when escalating.

  6. The study continues until termination or the maximum sample size is reached. The acceptable dose with maximum δj at study completion is declared the optimal dose.

3 Application

We now return to our motivating example and discuss the application of our proposed design to a phase I-II clinical trial of a novel treatment for canine hemangiosarcoma. Hemangiosarcoma (HSA) is a rapidly fatal disease that can arise in dogs of any age. However, it is more often a disease of middle to old age dogs with alarmingly high predilection for some breeds. Unlike other tumors where there have been recent incremental gains in survival, the optimism voiced at the turn of this century regarding HSA treatments [15] now seems premature. More than 50% of dogs with HSA disease still die within 4 to 6 months of diagnosis, and clinical trials using new combinations of old drugs [16], and new drugs [17] have shown no benefit over the standard of care. Recent data suggest that resistance to conventional and targeted therapies may be due, at least in part, to the fact that the tumors include normal cells in an abnormal niche: the hemangiosarcoma cells seem to direct stromal constituents to undergo proliferation and establish a hypoxic and inflammatory tumor microenvironment that allows them to thrive [18]. We believe that attacking the mechanisms that orchestrate these interactions through targeted therapies will improve the outcomes for dogs with this disease.

Researchers at the University of Minnesota College of Veterinary Medicine would like to complete a Phase I-II clinical trial for canine hemangiosarcoma using a new targeted toxin that considers a trade-off between toxicity and efficacy during dose finding. Dose limiting toxicity is defined as any grade 4 or 5 toxicity within the first 28 days and efficacy will be evaluated by considering 6-month survival. The researchers would like to evaluate four dose levels and are limited to a maximum sample size of thirty dogs. In order to apply our proposed design we must specify the following parameters: θT,max, θS,min, (θT,θs), π1 and the cohort size.θT,max, θS,min and (θT,θs) were set equal to 0.50, 0.55 and (0.40, 0.70), respectively, resulting in p = 2.27 as described in Section 2.2. The cohort size and π1 were chosen by considering several values and evaluating the impact of these parameters on the operating characteristics of our study. We considered cohort sizes of 2 and 3 and π1 equal to 0.05 and 0.10. At first glance, 0.05 and 0.10 may appear small for π1 if π1 represents the minimum probability that a dose level is acceptable for the trial to continue. It may be best to consider π1 from the opposite perspective. A dose level is assumed acceptable and is considered unacceptable if the posterior probability that the dose has an unacceptable efficacy/toxicity trade-off is greater than 1 π1. Therefore, we would only terminate the trial for futility if the probability that all doses are unacceptable exceeds 0.90 or 0.95.

We completed a small simulation study to evaluate the operating characteristics of our study. We considered several hypothetical scenarios in order to identify design parameters that result in good operating characteristics in a variety of settings. For each scenario, data were simulated assuming that either FT,1(xT) and FS,1(xS) were Uniform (0, HT) and Uniform (0, HS), respectively, or that XTHT and XSHS follow a Beta (3, 1) distribution conditional on XTHT or XSHS, respectively. The latter is the more challenging case where the hazards for XT and XS increase over time. The waiting time between subjects was simulated from an exponential distribution with an average waiting time of one, two or four weeks. We anticipate an average waiting time of two weeks between dogs but consider average waiting times of one and four weeks to evaluate the impact of the waiting time on the operating characteristics of our study. It should be noted that the average waiting time between dogs is only relevant relative to the horizon for evaluating the two outcomes. For example, two weeks corresponds to half of the time required to evaluate the toxicity outcome but only 12.6% of the time required to evaluate the survival outcome. Therefore, the three average waiting times being considered di er substantially relative to the time needed to evaluate toxicity but vary only a small amount relative to the time needed to evaluate the survival outcome. In addition, we also considered the setting where toxicity and 6-month survival were treated as binary outcomes using the design proposed by Thall and Cook [5] in combination with a “look-ahead” method. In the “look-ahead” method, we determine if changing the unobserved outcomes would alter the dose assignment for the next cohort and enroll the next cohort immediately if the dose would be the same regardless of the unobserved outcomes. Otherwise, enrollment of the next cohort is delayed until all unobserved outcomes have been observed. The operating characteristics of the di erent designs were summarized by calculating the simulated probability of selecting each dose or stopping for futility, the average number of subjects treated at each dose level and the average study duration in weeks. 1,000 simulated studies were completed for each scenario.

Table 1 presents the five scenarios considered in our simulation study. For each scenario, we present the probability of DLT, the probability of 6-month survival and the corresponding δ for each dose level. In Scenarios 1 - 3, there is only one acceptable dose, which varies from dose 1 to dose 4, in Scenario 4, all doses are excessively toxic and in Scenario 5, all doses have acceptable toxicity but none have acceptable 6-month survival.

Table 1.

Scenarios considered in our simulation studies. Presented for each scenario are the probability of DLT within 28 days, the probability of 6-month survival and the corresponding δ for each dose level.

Scenario 1
Dose 1 Dose 2 Dose 3 Dose 4

P(Toxicity) 0.05 0.12 0.27 0.50
P(Efficacy) 0.38 0.55 0.71 0.83
δ -0.38 -0.22 0.19 -0.05

Scenario 2
Dose 1 Dose 2 Dose 3 Dose 4

P(Toxicity) 0.38 0.52 0.67 0.79
P(Efficacy) 0.77 0.82 0.86 0.89
δ 0.12 -0.09 -0.36 -0.59

Scenario 3
Dose 1 Dose 2 Dose 3 Dose 4

P(Toxicity) 0.02 0.07 0.15 0.31
P(Efficacy) 0.12 0.25 0.45 0.67
δ -0.96 -0.67 -0.24 0.08

Scenario 4
Dose 1 Dose 2 Dose 3 Dose 4

P(Toxicity) 0.62 0.75 0.85 0.91
P(Efficacy) 0.27 0.40 0.55 0.69
δ -0.96 -0.93 -0.91 -0.91

Scenario 5
Dose 1 Dose 2 Dose 3 Dose 4

P(Toxicity) 0.03 0.08 0.18 0.38
P(Efficacy) 0.18 0.25 0.33 0.43
δ -0.82 -0.67 -0.51 -0.43

Table 2 presents the simulated operating characteristics of our study assuming π1 = 0.05, cohorts of size 3, an average waiting time of 2 weeks and XT and XS simulated from uniform distributions conditional on experiencing toxicity or death before HT or HS, respectively. Treating efficacy and toxicity as time-to-event outcomes dramatically decreases the duration of the trial with only a moderate decrease in the probability of correctly identifying the correct dose and the average number of subjects treated at the optimal dose. This is true regardless if FT,1(xT) and FS,1(xS) are modeled as a uniform or beta distribution. In Scenario 1, the average study duration decreased from 175.8 months for the “look-ahead” method to approximately 80 months for the two time-to-event methods with no change in the probability of correctly identifying the optimal dose or the average number of subjects treated at the optimal dose. We observe a similar decrease in the study duration in Scenarios 2 and 3 with little change in the probability of correctly identifying the optimal dose but a modest decrease in the average number of subjects treated at the optimal dose. In Scenarios 4 and 5, there are no acceptable doses and the correct decision is to stop for futility. In Scenario 4, where all doses are over-toxic, the trial stops for futility almost 100% of the time regardless of method and in Scenario 5, where all doses are safe but do not have acceptable efficacy, the time-to-event methods stop for futility less often than the “look-ahead” method with a more dramatic decrease observed when we assume beta distributions for FT,1(xT) and FS,1(xS). The results for Scenarios 4 and 5 illustrate one limitation of the proposed method. Although the proposed design results in shorter trial duration and appropriately terminates for futility at an acceptable rate, the average number of subjects enrolled in the study is actually higher than for the “look-ahead” method because subjects are enrolled immediately upon availability. While we feel that this trade-off is acceptable for our current application, it may not be in all cases and researchers should carefully weigh this trade-off when designing future trials.

Table 2.

Simulated selection probability and average number of subjects treated at each dose with π1 = 0.05, a cohort size of 3, an average waiting time two weeks and XT and XS simulated from uniform distributions. 1000 simulations were completed for each scenario.

Scenario 1 Duration
(weeks)
Method Futility Dose 1 Dose 2 Dose 3 Dose 4

Uniform Selection Probability 0.04 0.04 0.19 0.50 0.24 80.6
Avg # Pts Treated 5.13 8.48 10.36 5.40
Beta Selection Probability 0.01 0.04 0.20 0.51 0.24 81.2
Avg # Pts Treated 4.79 9.29 11.51 4.37
Look Ahead Selection Probability 0.03 0.06 0.22 0.48 0.21 177.7
Avg # Pts Treated 5.11 9.26 10.22 5.01

Scenario 2 Duration
(weeks)
Method Futility Dose 1 Dose 2 Dose 3 Dose 4

Uniform Selection Probability 0.13 0.70 0.16 0.02 0.0 76.8
Avg # Pts Treated 18.8 7.07 1.30 0.20
Beta Selection Probability 0.08 0.76 0.14 0.02 0.0 78.9
Avg # Pts Treated 20.94 6.51 1.09 0.08
Look Ahead Selection Probability 0.12 0.74 0.13 0.01 0.0 157.8
Avg # Pts Treated 22.22 4.78 0.50 0.03

Scenario 3 Duration
(weeks)
Method Futility Dose 1 Dose 2 Dose 3 Dose 4

Uniform Selection Probability 0.16 0.0 0.01 0.06 0.77 76.9
Avg # Pts Treated 3.86 4.26 5.03 14.82
Beta Selection Probability 0.04 0.0 0.02 0.05 0.88 81.0
Avg # Pts Treated 3.64 4.52 6.32 15.31
Look Ahead Selection Probability 0.12 0.0 0.0 0.06 0.82 153.3
Avg # Pts Treated 3.30 3.50 4.66 16.86

Scenario 4 Duration
(weeks)
Method Futility Dose 1 Dose 2 Dose 3 Dose 4

Uniform Selection Probability 1.0 0.0 0.0 0.0 0.0 38.7
Avg # Pts Treated 5.66 2.81 1.09 0.35
Beta Selection Probability 1.0 0.0 0.0 0.0 0.0 43.5
Avg # Pts Treated 7.20 3.52 1.49 0.34
Look Ahead Selection Probability 1.0 0.0 0.0 0.0 0.0 58.8
Avg # Pts Treated 3.99 2.40 1.19 0.38

Scenario 5 Duration
(weeks)
Method Futility Dose 1 Dose 2 Dose 3 Dose 4

Uniform Selection Probability 0.62 0.01 0.02 0.04 0.3 68
Avg # Pts Treated 4.77 4.63 4.72 10.36
Beta Selection Probability 0.50 0.03 0.06 0.07 0.35 77.5
Avg # Pts Treated 4.66 5.45 6.45 12.46
Look Ahead Selection Probability 0.69 0.0 0.02 0.04 0.25 130.2
Avg # Pts Treated 3.93 4.24 4.22 9.86

Table 3 presents similar results to Table 2 except that XTHT and XsHsare now simulated from beta (3,1) distributions conditional on experiencing toxicity or death before HT or HS, respectively. This represents the more challenging scenario where the hazards for XT and XS increase over time. We expect that specifying a more flexible beta distribution for FT,1(xT) and FS,1(xS) would result in better performance in this scenarios. Our results, though, show little di erence between the two time-to-event methods and suggest that assuming a uniform distribution for FT,1(xT) and FS,1(xS) may provide adequate performance even when the model is misspecified.

Table 3.

Simulated selection probability and average number of subjects treated at each dose with π1 = 0.05, a cohort size of 3, an average waiting time two weeks and XT and XS simulated from beta distributions. 1000 simulations were completed for each scenario.

Scenario 1 Duration
(weeks)
Method Futility Dose 1 Dose 2 Dose 3 Dose 4

Uniform Selection Probability 0.01 0.04 0.20 0.52 0.24 81.3
Avg # Pts Treated 5.18 10.56 10.81 3.40
Beta Selection Probability 0.0 0.04 0.19 0.53 0.24 81.4
Avg # Pts Treated 5.05 10.25 11.72 2.97
Look Ahead Selection Probability 0.03 0.04 0.24 0.47 0.22 191.1
Avg # Pts Treated 5.00 9.40 10.23 5.02

Scenario 2 Duration
(weeks)
Method Futility Dose 1 Dose 2 Dose 3 Dose 4

Uniform Selection Probability 0.06 0.79 0.14 0.01 0.0 79.2
Avg # Pts Treated 21.86 6.28 0.67 0.03
Beta Selection Probability 0.06 0.81 0.12 0.01 0.0 79.0
Avg # Pts Treated 22.15 5.92 0.75 0.01
Look Ahead Selection Probability 0.11 0.76 0.12 0.01 0.0 166.4
Avg # Pts Treated 22.57 4.54 0.54 0.07

Scenario 3 Duration
(weeks)
Method Futility Dose 1 Dose 2 Dose 3 Dose 4

Uniform Selection Probability 0.04 0.0 0.01 0.07 0.88 80.7
Avg # Pts Treated 3.69 5.98 7.03 13.17
Beta Selection Probability 0.04 0.0 0.01 0.06 0.89 81.7
Avg # Pts Treated 3.66 5.44 7.70 13.15
Look Ahead Selection Probability 0.12 0.0 0.01 0.05 0.82 174.2
Avg # Pts Treated 3.33 3.62 4.66 16.81

Scenario 4 Duration
(weeks)
Method Futility Dose 1 Dose 2 Dose 3 Dose 4

Uniform Selection Probability 0.99 0.0 0.0 0.01 0.01 49.3
Avg # Pts Treated 9.47 3.75 1.28 0.28
Beta Selection Probability 1.0 0.0 0.0 0.0 0.0 50.6
Avg # Pts Treated 10.11 3.95 1.28 0.28
Look Ahead Selection Probability 1.0 0.0 0.0 0.0 0.0 62.7
Avg # Pts Treated 3.83 2.50 1.24 0.44

Scenario 5 Duration
(weeks)
Method Futility Dose 1 Dose 2 Dose 3 Dose 4

Uniform Selection Probability 0.42 0.02 0.05 0.07 0.43 78.8
Avg # Pts Treated 4.37 6.51 6.88 11.31
Beta Selection Probability 0.42 0.03 0.06 0.06 0.43 80.1
Avg # Pts Treated 4.12 6.15 8.07 11.33
Look Ahead Selection Probability 0.71 0.01 0.02 0.04 0.22 147.3
Avg # Pts Treated 3.99 4.28 4.27 9.25

Additional simulations to evaluate the impact of varying π1, the cohort size and the average waiting time between subjects were also completed. Increasing π1 to 0.10 results in an increased probability of early termination for futility in Scenarios 4 and 5 but reduces the probability of correctly identifying the optimal dose in Scenarios 1 through 3 (results not shown). Using cohorts of two resulted in a decrease in the probability of correctly identifying the optimal dose in Scenarios 1 through 3 and increased the probability of terminating for futility in all cases (results not shown). In general, though, using cohorts of size two instead of cohorts of size three resulted in only modest di erences in the operating characteristics of our study. Finally, Table 4 illustrates that the average waiting time between subject enrollment has little impact on the probability of correctly identifying the optimal dose but the average number of subjects treated at the optimal dose increases as the waiting time between subjects increases. This is encouraging because it suggests that our study would still have acceptable operating characteristics if enrollment were twice as fast as expected, although, it is still possible that a dramatic increase in the rate of enrollment could negatively impact the operating characteristics of our study.

Table 4.

Simulated selection probability and average number of subjects treated at each dose when the average waiting time (WT) between subject enrollment is varied from one week to four week assuming π1 = 0:05, cohorts of size 3 and XT and XS simulated from uniform distributions. 1000 simulations were completed for each scenario.

Scenario 1 Duration
(weeks)
Method Futility Dose 1 Dose 2 Dose 3 Dose 4

1 Week WT Selection Probability 0.05 0.02 0.15 0.54 0.24 51.8
Avg # Pts Treated 5.04 8.23 10.17 5.92
2 Weeks WT Selection Probability 0.04 0.04 0.19 0.50 0.24 80.6
Avg # Pts Treated 5.13 8.48 10.36 5.40
4 Weeks WT Selection Probability 0.02 0.05 0.22 0.48 0.22 137.0
Avg # Pts Treated 5.17 8.93 10.69 4.82

Scenario 2 Duration
(weeks)
Method Futility Dose 1 Dose 2 Dose 3 Dose 4

1 Week WT Selection Probability 0.14 0.71 0.12 0.02 0.0 50.1
Avg # Pts Treated 17.7 7.36 1.90 0.40
2 Weeks WT Selection Probability 0.13 0.70 0.16 0.02 0 76.8
Avg # Pts Treated 18.8 7.07 1.30 0.20
4 Weeks WT Selection Probability 0.10 0.77 0.11 0.02 0 130.0
Avg # Pts Treated 21.52 5.57 0.72 0.05

Scenario 3 Duration
(weeks)
Method Futility Dose 1 Dose 2 Dose 3 Dose 4

1 Week WT Selection Probability 0.19 0.0 0.0 0.06 0.75 50.3
Avg # Pts Treated 4.32 4.31 5.43 14.04
2 Weeks WT Selection Probability 0.16 0.0 0.01 0.06 0.77 76.9
Avg # Pts Treated 3.86 4.26 5.03 14.82
4 Weeks WT Selection Probability 0.15 0.0 0.01 0.05 0.79 130.5
Avg # Pts Treated 3.53 3.74 4.73 15.96

Scenario 4 Duration
(weeks)
Method Futility Dose 1 Dose 2 Dose 3 Dose 4

1 Week WT Selection Probability 0.99 0.0 0.0 0.0 0.0 32.9
Avg # Pts Treated 6.65 3.28 1.42 0.58
2 Weeks WT Selection Probability 1.0 0.0 0.0 0.0 0.0 38.7
Avg # Pts Treated 5.66 2.81 1.09 0.35
4 Weeks WT Selection Probability 1.0 0.0 0.0 0.0 0.0 50.5
Avg # Pts Treated 4.81 2.50 1.18 0.37

Scenario 5 Duration
(weeks)
Method Futility Dose 1 Dose 2 Dose 3 Dose 4

1 Week WT Selection Probability 0.57 0.02 0.04 0.06 0.31 47.3
Avg # Pts Treated 5.40 5.18 5.40 10.06
2 Weeks WT Selection Probability 0.62 0.01 0.02 0.04 0.30 68.0
Avg # Pts Treated 4.77 4.63 4.72 10.36
4 Weeks WT Selection Probability 0.64 0.01 0.03 0.05 0.27 110.1
Avg # Pts Treated 4.61 4.46 4.36 10.06

4 Discussion

In this manuscript, we propose a phase I-II clinical trial for evaluating efficacy and toxicity with delayed outcomes and discuss how our proposed design can be applied to a phase I clinical trial of a novel targeted toxin for canine hemangiosarcoma. We model toxicity using a cure-rate model and take a mixture approach to modeling overall survival. A joint model for the two outcomes was developed using the Gumbel copula. This model allows us to directly model our pre-determined parameters of interest and allows us to implement a dose finding algorithm that is identical to what would be used if efficacy and toxicity where binary outcomes. Our simulation results illustrate that our proposed design dramatically shortens study duration and identifies the optimal dose only slightly less often than a design that considers efficacy and toxicity as binary outcomes.

A limitation of existing phase I-II designs is that efficacy and toxicity must be observed in a timely manner. Our proposed design addresses this limitation by modeling efficacy and toxicity as time-to-event outcomes. This allows new subjects to be enrolled before the outcomes for previous cohorts have been fully observed and uses partial information to assign new subjects to a dose level. In our motivating example, there is a 6-month delay between enrollment and observation of the efficacy outcome but new subjects are expected to ready for enrollment every two weeks. Our proposed design allows this study to be completed in a little over a year and a half. For comparison, the study would take approximately four years to complete if we were forced to wait until full information is available for all subjects before enrolling new cohorts and it is unlikely that the trial would be run at all if that were the case.

Our proposed design includes a stopping rule for futility if at any point in the study there is substantial evidence that none of the dose levels are acceptable. We could also include a stopping rule that stops the study if it is clear that we have identified the optimal dose before reaching the maximum sample size. Stopping for efficacy is quite common in phase II and phase III clinical trials and results in a substantial reduction in sample size. The sample size of phase I clinical trials is often quite small (usually no more than 30 or 40) and it is unlikely that stopping early for efficacy would save more than a few patients. Furthermore, a more precise estimate of the probability of efficacy and toxicity for the new drug could be very helpful when design future phase II clinical trials. Considering the limited benefit of early termination and the benefit of a more thorough understanding of the new drug, we chose not to include a stopping rule for efficacy.

Our proposed design has acceptable operating characteristics assuming an average waiting of one week between subjects but it is possible that a much faster enrollment rate could result in a decreased probability of correctly identifying the optimal dose. Polley [19] recently investigated several approaches to modifying the TITE CRM to accommodate fast accrual. These methods could easily be adapted to our setting to protect against fast accrual. Further work is needed to assure that implementing these methods does not negate the advantage of our proposed methods (i.e. that subjects can be enrolled as they become available) and to determine how these methods would be implemented in practice.

Acknowledgments

This work was supported by the University of Minnesota Cancer Center Support Grant [NIH P30 CA077598].

References

  • 1.Storer Barry E. Design and analysis of phase I clinical trials. Biometrics. 1989;45(3):925–937. [PubMed] [Google Scholar]
  • 2.O’Quigley John, Pepe Margaret, Fisher Lloyd. Continual reassessment method: A practical design for phase 1 clinical trials in cancer. Biometrics. 1990;46(1):33–48. [PubMed] [Google Scholar]
  • 3.Goodman Steven N, Zahurak Marianna L, Piantadosi Steven. Some practical improvements in the continual reassessment method for phase I studies. Statistics in Medicine. 1995;14(11):1149–1161. doi: 10.1002/sim.4780141102. [DOI] [PubMed] [Google Scholar]
  • 4.Braun Thomas M. The bivariate continual reassessment method: extending the CRM to phase I trials of two competing outcomes. Controlled Clinical Trials. 2002;23(3):240–256. doi: 10.1016/s0197-2456(01)00205-7. [DOI] [PubMed] [Google Scholar]
  • 5.Thall Peter F, Cook John D. Dose-finding based on efficacy/toxicity trade-offs. Biometrics. 2004;60(3):684–693. doi: 10.1111/j.0006-341X.2004.00218.x. [DOI] [PubMed] [Google Scholar]
  • 6.Zhang Wei, Sargent Daniel J, Mandrekar Sumithra. An adaptive dose-finding design incorporating both toxicity and efficacy. Statistics in Medicine. 2006;25(14):2365–2383. doi: 10.1002/sim.2325. [DOI] [PubMed] [Google Scholar]
  • 7.Vail David M. Veterinary co-operative oncology group common terminology criteria for adverse events (vcog-ctcae) following chemotherapy or biological antineoplastic therapy in dogs and cats v1.0. Veterinary and Comparative Oncology. 2004;2(4):195–213. doi: 10.1111/j.1476-5810.2004.0053b.x. [DOI] [PubMed] [Google Scholar]
  • 8.Helfand Stuart C. Canine Hemangiosarcoma: A Tumor of Contemporary Interest. Cancer Therapy. 2008;6:457–462. [Google Scholar]
  • 9.Cheung Ying Kuen, Chappell Rick. Sequential designs for phase I clinical trials with late-onset toxicities. Biometrics. 2000;56(4):1177–1182. doi: 10.1111/j.0006-341x.2000.01177.x. [DOI] [PubMed] [Google Scholar]
  • 10.Braun Thomas M. Generalizing the TITE-CRM to adapt for early- and late-onset toxicities. Statistics in Medicine. 2006;25(12):2071–2083. doi: 10.1002/sim.2337. [DOI] [PubMed] [Google Scholar]
  • 11.Thall Peter F, Lee J Jack, Tseng Chi-Hong, Estey Elihu H. Accrual strategies for phase I trials with delayed patient outcome. Statistics in Medicine. 1999;18(10):1155–1169. doi: 10.1002/(sici)1097-0258(19990530)18:10<1155::aid-sim114>3.0.co;2-h. [DOI] [PubMed] [Google Scholar]
  • 12.Bekele B Nebiyou, Ji Yuan, Shen Yu, Thall Peter F. Monitoring late-onset toxicities in phase I trials using predicted risks. Biostatistics. 2008;9(3):442–457. doi: 10.1093/biostatistics/kxm044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Murtaugh Paul A, Fisher Lloyd D. Bivariate binary models of efficacy and toxicity in dose-ranging trials. Communications in Statistics - Theory and Methods. 1990;19(6):2003–2020. [Google Scholar]
  • 14.Yin Guosheng, Li Yisheng, Ji Yuan. Bayesian dose-finding in phase I/II clinical trials using toxicity and efficacy odds ratios. Biometrics. 2006;62(3):777–787. doi: 10.1111/j.1541-0420.2006.00534.x. [DOI] [PubMed] [Google Scholar]
  • 15.Clifford Craig A, Mackin Andrew J, Henry Carolyn J. Treatment of canine hemangiosarcoma: 2000 and beyond. Journal of Veterinary Internal Medicine. 2000;14(5):479–485. doi: 10.1892/0891-6640(2000)014<0479:tochab>2.3.co;2. [DOI] [PubMed] [Google Scholar]
  • 16.Lana Susan, U’ren Lance, Plaza Susan, Elmslie Robyn, Gustafson Daniel, Morley Paul, Dow Steven. Continuous low-dose oral chemotherapy for adjuvant therapy of splenic hemangiosarcoma in dogs. Journal of Veterinary Internal Medicine. 2007;21(4):764–769. doi: 10.1892/0891-6640(2007)21[764:clocfa]2.0.co;2. [DOI] [PubMed] [Google Scholar]
  • 17.Chon E, McCartan L, Kubicek LN, Vail DM. Safety evaluation of combination toceranib phosphate (Palladia) and piroxicam in tumour-bearing dogs (excluding mast cell tumours): a phase I dose-finding study. Veterinary and Comparative Oncology. 2011 doi: 10.1111/j.1476-5829.2011.00265.x. pages no–no. [DOI] [PubMed] [Google Scholar]
  • 18.Tamburini BA, Phang TL, Fosmire SP, Scott MC, Trapp SC, Duckett MM, Robinson SR, Slansky JE, Sharkey LC, Cutter GR, Wojcieszyn JW, Bellgrau D, Gemmill RM, Hunter LE, Modiano JF. Gene expression profiling identifies inflammation and angiogenesis as distinguishing features of canine hemangiosarcoma. BMC Cancer. 2010;10:619. doi: 10.1186/1471-2407-10-619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Polley Mei-Yin C. Practical modifications to the time-to-event continual reassessment method for phase I cancer trials with fast patient accrual and late-onset toxicities. Statistics in Medicine. 30(17):2130–2143. 2011. doi: 10.1002/sim.4255. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES