Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Jun 1.
Published in final edited form as: J Am Stat Assoc. 2014 Jun 27;110(510):711–722. doi: 10.1080/01621459.2014.926815

Bayesian Dose-Finding in Two Treatment Cycles Based on the Joint Utility of Efficacy and Toxicity

Juhee Lee 1,*, Peter F Thall 2, Yuan Ji 3, Peter Müller 4
PMCID: PMC4562700  NIHMSID: NIHMS610172  PMID: 26366026

Abstract

A phase I/II clinical trial design is proposed for adaptively and dynamically optimizing each patient's dose in each of two cycles of therapy based on the joint binary efficacy and toxicity outcomes in each cycle. A dose-outcome model is assumed that includes a Bayesian hierarchical latent variable structure to induce association among the outcomes and also facilitate posterior computation. Doses are chosen in each cycle based on posteriors of a model-based objective function, similar to a reinforcement learning or Q-learning function, defined in terms of numerical utilities of the joint outcomes in each cycle. For each patient, the procedure outputs a sequence of two actions, one for each cycle, with each action being the decision to either treat the patient at a chosen dose or not to treat. The cycle 2 action depends on the individual patient's cycle 1 dose and outcomes. In addition, decisions are based on posterior inference using other patients’ data, and therefore the proposed method is adaptive both within and between patients. A simulation study of the method is presented, including comparison to two-cycle extensions of the conventional 3+3 algorithm, continual reassessment method, and a Bayesian model-based design, and evaluation of robustness.

Keywords: Adaptive Design, Bayesian Design, Dynamic Treatment Regime, Phase I-II Clinical Trial, Q-Learning, Latent Probit Model

1 Introduction

Medical treatment often involves multiple cycles of therapy. Physicians routinely choose a patient's treatment in each cycle adaptively based on the patient's history of treatments and clinical outcomes. In such settings, a patient's therapy is not one treatment, but rather a sequence of treatments, each chosen using an adaptive algorithm of the general form “observe → treat → observe → treat → ...” etc. This paradigm is known as a dynamic treatment regime (DTR) (Murphy, et al., 2001; Lavori and Dawson, 2001; Murphy, 2003; Moodie, et al., 2007), multi-stage treatment strategy (Thall, Millikan and Sung, 2000; Thall, Sung and Estey, 2002) or treatment policy (Lunceford, Davidian and Tsiatis, 2002; Wahed and Tsiatis, 2004). In oncology, treatment in each cycle may be a chemical or biological agent, radiation therapy, or some combination of these. DTRs also are used for chronic diseases, including behavioral disorders (Collins, et al. 2005; Almirall, et al., 2010) and drug or alcohol addiction (Murphy, et al., 2007). Unfortunately, most clinical trial designs ignore the actual DTRs being used, and instead evaluate the treatments given initially as if patient outcome were due to them alone, rather than the entire DTR.

There is an extensive literature on adaptive dose-finding designs for phase I and phase I/II clinical trials (cf. Chevret, 2006; Yin, 2012). In actual conduct of such trials, the attending physician uses a DTR to make multi-cycle decisions for each patient. Depending on the patient's history of doses and outcomes, the dose given in each cycle may be above, below, or the same as the dose given previously, or therapy may be terminated due to excessive toxicity or poor efficacy. Since typical early-phase trial designs ignore such within-patient multi-cycle decision making, the “optimal” dose chosen by such a design actually pertains only to the first cycle of therapy.

While statistical methods for DTRs have seen limited application in actual clinical trials (Rush, et al., 2003; Thall, et al., 2007; Wang, et al., 2012), recently there has been extensive research to develop or optimize DTRs in medicine, including semiparametric methods (Wahed and Tsiatis, 2006), reinforcement learning (Zhao, et al., 2011), and sequential multiple assignment randomized trials (Murphy and Bingham, 2009). The aim is to better reflect the intrinsically multi-stage, adaptive structure of what physicians actually do, in both trial design and analysis of observational data. This methodology had its origins in research to define and estimate causal parameters in complex longitudinal data, pioneered by Robins (1986, 1993, 1997, 1998), and applied to the analysis of AIDS data (Hernan, Brumback, and Robins, 2000; Robins, Hernan and Brumback, 2000).

The problem of optimizing each patient's doses given in multiple cycles based on efficacy and toxicity in phase I/II trials has not been addressed formally. Phase I/II designs typically optimize the initial dose using between-patient adaptive rules. A review is given by Zohar and Chevret (2007). For phase I trials involving multiple cycles of therapy, Braun, Yuan, and Thall (2005) proposed a Bayesian design with between-patient adaptive rules based on time-to-toxicity to optimize the number of cycles (“schedule”) given a fixed dose. Braun et al. (2007) extended this to allow per-administration dose to vary, and jointly optimized dose and schedule, using a criterion similar to that of the time-to-event continual reassessment method (TiTE CRM, Cheung and Chappell, 2000). Li, et al. (2008) proposed an approach to optimizing dose and schedule for two nested schedules and bivariate binary outcomes, using an isotonic transformation to obtain matrix ordered toxicity probabilities with order-restricted inferences. While few of these methods include within-patient adaptive rules applied after the first cycle, the phase I design proposed by Zhang and Braun (2013) to optimize dose and schedule accounts for multiple within-patient administrations.

Here, we address the problem of adaptively optimizing each patient's dose in each of two cycles of therapy in a phase I/II trial based on binary efficacy and toxicity. This is the simplest case of the general multi-cycle phase I/II trial design problem, which may be formulated with ordinal or time-to-event outcomes and an arbitrary number of cycles. We address the simpler two-cycle problem because it still is much more complicated than the one-cycle case. Our goals are to provide a practical trial design and establish a basis for subsequently developing methods for more complex settings. We employ a model-based Bayesian objective function, defined in terms of (efficacy, toxicity) utilities, structurally similar to reinforcement learning (Sutton and Bartow, 1998) or Q-learning functions (Watkins, 1989). Our method chooses a dose in each cycle to maximize the posterior expected mean of the objective function, applying a modified recursive Bellman equation (1957) that assumes, for the decision in cycle 1, that one will behave optimally in cycle 2. At the end of the trial, the method provides an optimal two-stage regime consisting of an optimal cycle 1 dose, and an optimal function of the patient's cycle 1 dose and outcomes that either chooses a cycle 2 dose or says to not treat the patient in cycle 2. This is very different from simply choosing two “optimal” doses, one for each cycle, with the “optimal” cycle 2 dose ignoring each patient's cycle 1 data. Because all decisions are based on posterior quantities computed using all patients’ data, the method is adaptive both within and between patients.

Section 2 describes the proposed decision-theoretic two-cycle method, DTM2, including the Bayesian probability model, an algorithm for prior calibration, and posterior computation. Utility-based decision criteria are presented in Section 3. A simulation study is summarized in Section 4. We close with a discussion in Section 5.

2 Dose-Outcome Model

The model used by DTM2 exploits the idea underlying the multivariate probit model, introduced by Ashford and Sowden (1970). A vector of unobserved, correlated latent multivariate normal variables is defined to induce association among a vector of observed binary variables, by defining each observed variable as the indicator that its corresponding latent variable is greater than 0. The DTM2 model is an elaboration of a multivariate probit model that includes hierarchical structures. It provides a computationally feasible basis for the task at hand. We will exploit the MCMC methods for computing posteriors for latent variable models provided by Albert and Chib (1993) and developed further by Chib and Greenberg (1998) for posterior computation via Gibbs sampling.

Let nt denote the number of patients accrued and given at least one cycle of treatment up to trial (calendar) time t, and index patients by i = 1, ..., nt. Our dose-outcome model does not depend on numerical dose values, and we identify the doses under consideration by the indexes 1, . . . , m. For treatment cycle c = 1, 2, denote the ith patient's dose by di,c, outcome indicators Yi,c ∈ {0, 1} for toxicity and Zi,c ∈{0, 1} for efficacy, and the 2-cycle vectors di = (di,1 di,2), Y i = (Yi,1, Yi,2), and Zi = (Zi,1, Zi,2). Let Xt={(Yi,Zi,di):i=1,,nt} denote the observed data from all patients at t. Although the doses di are actions rather than parameters or random outcomes, throughout the manuscript we will abuse probability notation slightly by including them to the right of the conditioning bar. Since actual clinical decision rules must allow a given patient's therapy to be terminated, e.g. if the patient is cured, has progressive disease, or unacceptable toxicity (cf. Wang, et al., 2012), here possible actions in cycle c may be either a dose, di,c, or the decision to give no treatment, which we index by 0. We denote the possible actions in either cycle by D={0,1,,m}

We construct a joint distribution for [Y i, Zi | di] by defining these binary outcomes in terms of four real-valued latent variables, ξi = (ξi,1, ξi,2) for Y i and ηi = (ηi,1, ηi,2) for Zi, with (ξi, ηi) following a multivariate normal distribution having means that vary with di. Denoting the indicator of event A by I(A), we assume Yi,c = I(ξi,c > 0) and Zi,c = I(ηi,c > 0), so the distribution of [Y iZi | di] is induced by that of [ξi, ηi | di]. The structure of our hierarchical model for two cycles is similar to the non-hierarchical model for multiple toxicities in one cycle of therapy used by Bekele and Thall (2004). To construct the model, we first define a conditional likelihood for the cycle-specific latent variable pairs [ξi,ci,c | di,c], for c = 1, 2 by using patient-specific random effects (ui, vi) that characterize dependence among the outcomes between and within cycles. Denote the univariate normal distribution with mean μ and variance σ2 by N(μ, σ2), with pdf ϕ(· | , μ, σ2).

We begin the construction by assuming the following Level 1 and Level 2 priors:

Level 1 Priors on the Latent Variables. For patient i in cycle c given dose di,c = d,

ξi,cui,ξc,d,σξ2N(ξc,d+ui,σξ2)andηi,cvi,ηc,d,ση2N(ηc,d+vi,ση2), (1)

with ξi and ηi conditionally independent given (ui, vi) and fixed σξ2 and ση2. Level 2 priors of the patient effects, (ui, vi), and mean cycle-specific dose effects, (ξc,d,ηc,d), are as follows :

Level 2 Priors on (ui, vi). For patients i = 1... , n,

ui,viρ,τ2iidMVN2(02,Σu,v), (2)

where MVNk denotes a k-variate normal distribution, 02 = (0, 0) and Σu,v is the 2×2 matrix with all diagonal elements τ2 and all off-diagonal elements ρτ2. The hyperparameters, ρ ∈ (1, 1) and τ2, are fixed. This Level 2 prior induces association, parameterized by (ρ, τ2), among (ξi,1, ηi,1, ξi,2, ηi,2) via the latent variable model (1), and thus among the corresponding toxicity and efficacy outcomes, (Yi,1, Zi,1, Yi,2, Zi,2).

Level 2 Priors on (ξc,d,ηc,d). Let ξc=(ξc,1,,ξc,m) and ηc=(ηc,1,,ηc,m). Denote by ξc,d the vector ξc with ξc,d deleted, and let ηc,d denote ηc with ηc,d deleted. We assume

p(ξc,dξc,d)ϕ(ξc,dξc,0,σξc,02)1(ξc,d1<ξc,d<ξc,d+1)p(ηc,dηc,d)ϕ(ηc,dηc,0,σηc,02)1(ηc,d1<ηc,d<ηc,d+1). (3)

The order constraints ensure that ξi,c and ηi,c increase stochastically in dose, hence the per-cycle probabilities of toxicity and efficacy both increase with dose. If this assumption is not appropriate, such as trials of biologic agents, these constraints may be dropped.

Collecting terms from (1), (2), and (3), the 12 fixed parameters that determine all of the Level 1 and Level 2 priors are θ~=(ξ0,η0,σξ02,ση02,σξ2,ση2,τ2,ρ) where ξ0 = (ξ1,0, ξ2,0), η0 = (η1,0, η2,0, σξ02=(σξ1,02,σξ2,02) and ση02=(ση1,02,ση2,02). Denote ξ=(ξ1,ξ2), η=(η1,η2), μdi=(ξ1,di,1,ξ2,di,2,η1,di,1,η2,di,2), and the covariance matrix

Σξ,η=[σξ2+r2r2ρτ2ρτ2σξ2+τ2ρτ2ρτ2ση2+τ2τ2ση2+τ2].

The joint disttibution of [ξi,ηidi,ξ,η,θ~] is computed by integrating over (ui, vi), yielding

ξi,ηidi,ξ,η,θ~iidMVN4(μdi,Σξ,η). (4)

The mean vector μd is a function of the dose levels, and does not depend on numerical dose values. The hyperparameters, τ2 and ρ, induce associations between cycle 1 and cycle 2 and between efficacy outcomes and toxicity outcomes. For example, if 1 < ρ < 0 (0 < ρ < 1), this model implies that efficacy and toxicity are negatively (positively) associated, that is, higher (lower) toxicity is associated with lower efficacy.

Denote θ=(ξ,η). Integrating over (ui, vi) and suppressing θ~ and patient index i, the joint likelihood for the observables of a patient is given by

p(y,zd,θ)=Pr(Y1=y1,Y2=y2,Z1=z1,Z2=z2d,θ)=Pr(γ1,y1ξ1<γ1,y1+1,γ1,y2ξ2<γ1,y2+1,γ2,z1η1<γ2,z1+1,γ2,y2η2<γ2,z2+1d,θ)=γ1,y1γ1,y1+1γ1,y2γ1,y2+1γ2,z1γ2,z1+1γ2,y2γ2,y2+1ϕ(ξ,ημd,Σξ,η)dη2dη1dξ2dξ1,

where the cuto vectors (γ10, γ11, γ12) for Yc and (γ20, γ21, γ22) for Zc both are (−∞, 0, ∞), for c = 1, 2. The conditional distribution of the cycle 2 outcomes (Y2Z2) given the cycle 1 outcomes (Y1 = y1Z1 = z1) is

p(y2,z2y1,z1,d,θ)=Pr(Y2=y2,Z2=z2Y1=y1,Z1=z1,d)=Pr(γ1y2ξ2<γ1,y2+1,γ2z2η2<γ2,z2+1γ1y1ξ1<γ1,y1+1,γ2,z1ηi1<γ2,z1+1,d)=p(y,zd,θ)p(y1,z1d1,θ), (5)

where the cycle 1 bivariate marginal is computed as the double integral

p(y1,z1d1,θ)=γ1y1γ1,y1+1γ2z1γ2,z1+1ϕ([ξ1,η1]μd11,Σξ,η1)dη1dξ1 (6)

with

μd11=[ξ1,d1η1,d1]andΣξ,η1=[σξ2+τ2ρτ2ρτ2ση2+τ2].

3 Decision Criteria

3.1 Adaptive Dose Selection

To define our decision rules, we distinguish between doses and actions. The action in cycle 1 either chooses a dose from the set {1. . . , m} of doses under consideration or makes the decision to not give the patient any treatment. Recall that we denote this decision by 0 for convenience, and we will denote the possible actions by D={0,1,,m}. If the optimal cycle 1 action is d1 = 0 at any point in the trial then the study is terminated. Otherwise, the patient receives d1 for cycle 1 and d2D for cycle 2, where d2 is a function of the cycle 1 dose and outcomes, (d1Y1Z1), and the current data,X from all patients. For example, if the cycle 1 dose d1 produced toxicity, Y1 =1, then a possible cycle 2 action is d2(d1,1,1,X)=d11 if Z1 = 1, and d2(d1,1,0,X)=0 if Z1 = 0. Similarly, if d1 = 1, the lowest dose level, and Y1 = 1 was observed, then it may be that d2(d1,1,Z1,X)=0 regardless of whether Z1 = 0 or 1. In general, a two-cycle regime is far more general than a dose pair chosen from D×D and a regime for which d2 ignores the patient's cycle 1 dose and outcomes, (d1, Y1, Z1), is unlikely to be optimal. In the DTR literature, (d1, Y1, Z1) would be called “tailoring variables.” Optimizing d = (d1d2) is the focus of our design.

3.2 Objective Function

We construct an objective function by using the basic ideas in Bellman (1957), starting in cycle 2 and working backwards. Our method relies on per-cycle utilities U(y, z) that quantify the desirability of outcome (Yc, Zc) = (y, z) in cycle c = 1 or 2. Depending on the level of marginalization and aggregation over cycles and patients, many variations of the objective function defined below may be obtained. We will generically refer to all of these as “utility” or “objective function” when we want to highlight that a particular expected utility is a function of known quantities and the action only, and thus can be used to select the optimal action. For convenience, one may fix U(0, 1) = 100 and U(1, 0) = 0, which are the respective utilities for the best and worst possible outcomes, and elicit the intermediate values U(0, 0) and U(1, 1) from the physicians planning the trial, although any function with U(1, 0) < U(1, 1)U(0, 0) < U(0, 1) may be used. In our simulations, we will use the numerical utilities U(1, 0) = 0, U(0, 0) = 35, U(1, 1) = 65, U(0, 1) = 100.

In the language of Q-learning (Watkins, 1989; Murphy, 2005; Zhao, et al., 2011), for cycle c, dc is the “action” and U(Yc, Zc) is the “reward,” with (d1, Y1, Z1) the “state” prior to taking action d2 in cycle 2. Ideally, baseline covariates such as age, disease severity, or performance status would comprise the patient's state for c = 1, although in practice even in the single-cycle phase I-II setting choosing covariate-specific doses is quite complicated (cf. Thall, Nguyen, and Estey, 2008).

Given a patient's cycle 1 data (d1, Y1, Z1), the mean utility of action d2 in cycle 2 is

Q2(d2,d1,Y1,Z1,θ)=E{U(Y2,Z2)d2,d1,Y1,Z1,θ}=y2=01z2=01U(y2,z2)p(y2,z2d2,d1,Y1,Z1,θ), (7)

and we define the cycle 2 objective function

q2(d2,d1,Y1,Z1,X)=E{Q2(d2,d1,Y1,Z1,θ)d2,d1,Y1,Z1,X}. (8)

If d2 = 0, i.e., no treatment in cycle 2, then p(Y2 = 0Z2 = 0 | d2 = 0d1, Y1, Z1, θ) = 1 and q2(d2=0,d1,Y1,Z1,X)=U(0,0), the utility of having neither toxicity nor efficacy. If d2 ≠= 0, then q2(d2,d1,Y1,Z1,X) is a posterior expected utility of giving dose d2 in cycle 2 given (d1, Y1, Z1). This underscores the importance of requiring U(0, 0) > U(1, 0), that it is more desirable to have neither toxicity nor efficacy than to have toxicity and no efficacy. Given (d1, Y1, Z1) and current data X, the optimal cycle 2 action, d2opt(d1,Y1,Z1,X)=argmaxd2q2(d2,d1,Y1,Z1,X), subject to dose acceptability rules discussed in Section 3.3.

Next, we move backward to the cycle 1 optimization assuming that q2(d2opt,d1,Y1,Z1,X) is known for all (d1, Y1, Z1). The expected utility of giving dose d1 given is θ

Q1(d1,θ)=E{U(Y1,Z1)d1,θ}=y1=01z1=01U(y1,z1)p(y1,z1d1,θ).

To define the overall objective function, we discount the cycle 2 payo using the fixed parameter 0 < λ < 1, as is done traditionally in Q-learning. The expected entire future utility of giving dose d1 in cycle 1, assuming that d2opt will be taken in cycle 2, is

q1(d1,X)=E[E{U(Y1,Z1)+λq2(d2opt(d1,Y1,Z1,X),d1,Y1,Z1,X)θ,d1}d1,X]=E{Q1(d1,θ)d1,X}+λy1=01z1=01q2(d2opt(d1,y1,z1,X),d1,y1,z1,X)p(y1,z1d1,X), (9)

where p(y1,z1d1,X) is the posterior expected density for (y1, z1). Letting q1(d1,X)=(1+λ)U(0,0)for d1 = 0, the optimal cycle 1 action, d1opt, maximizes this quantity over D.

Maximizing q1 and q2 yields the optimal actions dopt=(d1opt,d2opt), where d1opt is either a dose or 0, d2opt is applicable only when d1opt0, d2opt is a function of (d1opt,Y1,Z1), and both are functions of X. If new data from other patients are obtained between administration of d1opt and optimization of q2(d2(d1),X), so X changes while waiting to evaluate the patient's cycle 1 outcomes (Y1Z1), then the posterior and hence the patient's d2opt might change. This may be made precise by elaborating the notation to account for relationships between timing of the patient's cycles and calendar time. We avoid this complexity since the point is clear.

3.3 Dose Acceptability

We include dose acceptability criteria, motivated by ethical considerations, since maximizing a posterior utility-based objective function, per se, is not enough to allow a dose to be administered. The problem is that, while the optimal policy under a given utility function is mathematically well-defined, it is only an indirect solution of an optimization in expectation. An important case is that where no dose is acceptably safe and e cacious in either cycle 1 or cycle 2, consequently it is not ethical to treat a patient using any dose and the trial must be stopped. Moreover, in some applications, the decision-theoretic solution might turn out to have undesirable features not anticipated when specifying the outcomes, model, and utility function. This problem is one reason why many physicians are reluctant to use formal decision-theoretic methods for clinical decision making. Spiegelhalter et al. (2004, chapter 3.14) discuss this issue. We mitigate these concerns by adding three additional dose acceptability criteria that restrict the set of solutions when maximizing (8) and (9).

The first constraint is that an untried dose level may not be skipped when escalating. This says that one does not fully trust decisions based on any assumed model and decision criteria, especially with the small amounts of data available early in the trial. Let d1M denote the highest dose among those that have been tried in cycle 1 and d2M the highest dose among those that have been tried in either cycle. The search for optimal actions is constrained so that 1d1min(d1M+1,m) and 1d2min(d2M+1,m). The second constraint does not allow escalating a patient's dose in cycle 2 if toxicity was observed in cycle 1, Y1 = 1. The third criterion, defined in terms of expected utility, is to avoid giving undesirable dose pairs. For cycle 2, we say that action d2 is unacceptable if it violates the no-skipping rule, escalates after Y1 = 1, or q2(d2,d1,Y1,Z1,X)<U(0,0) , that is, the posterior expected utility of treating the patient with d2 given (d1,Y1,Z1,X) is smaller than that obtained by not treating the patient at all. We denote the set of acceptable cycle 2 doses for a patient with cycle 1 data (d1, Y1, Z1) by A2(d1,Y1,Z1,X). Thus, a given d2 may be acceptable for some (d1, Y1, Z1) but not acceptable for others.

Table 1 illustrates true expected cycle 2 utilities of d2 conditional on (d1, Y1, Z1) using simulation scenario 4, discussed below in Section 4. Assume that θtrue and θ~trueare known, and suppress θ~true. The Q2(d2 , d2, Y1, Z1, θtrue) in Table 1 is similar to (7). For example, the values of Q2(a2, d1 = 3, Y1 = 0, Z1 = 0, θtrue) given in the first row of the third box from the top are (35.98, 39.49, 39.70, 36.30, 27.17) for d2 =(1, 2, 3, 4, 5), respectively. Since Q2(5, 3, 0, 0, θtrue) < U(0, 0), d2 = 5 is not acceptable. The other dose levels are acceptable, so A2(3,0,0,θtrue)={1,2,3,4}, with d2opt(d1=3,Y1=0,Z1=0,θtrue)=3. When (d1, Y1, Z1) = (3, 1, 0), no d2 ∈ {1... , m} produces expected utility greater than U(0, 0), and d2 = 3, 4, 5 are not allowed due to the no-escalation-after toxicity rule. Thus, A2(3,0,0,θtrue) is the empty set and d2opt(d1=3,Y1=1,Z1=0,θtrue)=0. The last column of the table lists d2opt(d1,Y1,Z1,θtrue) for all combinations of (d1, Y1, Z1).

Table 1.

True expected utilities in Scenario 4 assuming that θtrue is known. q1(d1, θtrue) = the true expected total utility. Entries under d2 in columns 4–8 are the true expected cycle 2 utilities, q2(d2, θtrue). Expected utilities in grey are those for d2 violating the no-escalation-after- Y1 = 1 rule. Expected utilities in italics are those for unacceptable d2 based on the utility-based criterion.

d 1 q1(d1, θtrue) (Y1,Z1) d2opt
1 2 3 4 5
1 66.19 (0,0) 37.32 41.40 41.85 38.54 29.66 3
(0,1) 48.18 55.07 56.80 54.00 45.04 3
(1,0) 30.60 33.58 33.07 29.57 23.27 NT
(1,1) 41.20 47.16 47.96 44.80 38.23 NT
2 73.38 (0,0) 36.53 40.33 40.66 37.30 28.38 3
(0,1) 45.06 51.39 52.92 50.08 41.15 3
(1,0) 30.13 32.88 32.26 28.70 22.28 NT
(1,1) 38.36 43.71 44.31 41.12 34.54 1
3 80.61 (0,0) 35.98 39.49 39.70 36.30 27.17 3
(0,1) 43.31 49.20 50.62 47.73 38.69 3
(1,0) 30.29 32.82 32.12 28.43 21.53 NT
(1,1) 37.34 42.27 42.75 39.46 32.48 2
4 69.71 (0,0) 37.17 40.90 41.40 38.19 28.65 3
(0,1) 44.37 50.46 52.14 49.47 40.09 3
(1,0) 32.13 34.89 34.39 30.68 22.91 NT
(1,1) 39.17 44.32 45.02 41.74 33.91 3
5 68.16 (0,0) 38.08 42.02 42.80 39.83 30.06 3
(0,1) 45.26 51.52 53.46 51.04 41.51 3
(1,0) 33.00 35.88 35.52 31.85 23.69 2
(1,1) 40.05 45.32 46.16 42.95 34.75 3

To identify acceptable cycle 1 dose levels, we assume that d2opt(d1,Y1,Z1,X) is chosen from A2(d1,Y1,Z1,X). For cycle 1, we say that action d1 is unacceptable if it violates the no-skipping rule or satisfies the utility-based criterion

q1(d1,X)<U(0,0)+λU(0,0). (10)

This says that d1 is unacceptable in cycle 1 if it yields a smaller posterior expected utility than not treating the patient. We denote the set of acceptable cycle 1 doses by A1D. Note that, while A1(X) is adaptive between patients since it is a function of other patients’ data, A2(d1,Y1,Z1,X) is adaptive both between and within patients.

The second column of Table 1 illustrates true expected total utilities over two cycles under simulation scenario 4. Assuming that trueare known, the column gives values of

E{U(Y1,Z1)+λQ2(d2opt(d1,Y1,Z1,θtrue),d1,Y1,Z2,θtrue)θtrue,d1}

where d2opt(d1,Y1,Z1,θtrue) can be derived in the last column of the table. The true expected total utility satisfies (10) for all the d1D hence all d1 are acceptable. From the table, the optimal pair of actions is d1opt=3 and d2opt=3,3,0 and 2 for (Y1, Z1) = (0, 0), (0, 1), (1, 0) and (1, 1), respectively, listed in the fourth row of Table 3.

Table 3.

Optimal treatment sequences under the eight simulation scenarios using the simulation truth, θtrue. Assuming that d1opt is given, d2opt is searched for each cycle 1 outcome combination.

Scenario d1opt d2opt
(0,0) (0,1) (1,0) (1,1)
1 0 0 0 0 0
2 5 5 5 4 4
3 3 4 4 2 2
4 3 3 3 0 0
5 0 0 0 0 0
6 5 4 4 4 4
7 4 4 4 3 3
8 5 5 3 4 4

3.4 Adaptive Randomization

While dopt yields the best clinical outcomes, the reliability of the process over the entire trial can be improved by including adaptive randomization (AR) among d giving values of the objective function near the maximum at dopt. While this may seem counter-intuitive, using AR decreases the probability of getting stuck at a suboptimal d and also has the effect of treating more patients at doses having larger utilities, on average. The problem that a “greedy” search algorithm may get stuck at suboptimal actions, and the simple solution of introducing some additional randomness into the search process, have been known for years in the optimization literature (cf. Tokic, 2010). However, this has been dealt with only very recently in dose-finding (Bartroff and Lai, 2010; Azriel, Mandel and Rinott, 2011; Thall and Nguyen, 2012; Braun, Kang and Taylor, 2012).

To implement AR, we first define εi to be a function decreasing in patient index i, and denote ε = (ε1... , εn). We define the set of εi-optimal doses for cycle 1 to be

Di,1={d1:q1(d1,iopt,X)q1(d1,X)<i,d1Ai,1(X)}.

The set Di,1(X) contains d1 in Ai,1(X) having posterior mean utility within εi of the maximum posterior mean utility. Similarly, we define the set of (εi/2)-optimal doses for cycle 2 given (di,1, Yi,1, Zi,1) to be

Di,2={d2:q2(di,2opt(di,1,Yi,1,Zi,1,X),Yi,1,Zi,1,di,1,X)q2(d2,di,1,Yi,1,Zi,1,X)<i2,d2Ai,2(di,1,Yi,1,Zi,1,X)}.

We use εi/2 because q2(d2,d1,Y1,Z1,X) is the posterior expected utility for cycle 2 only. For cycles c = 1, 2, patients are randomized fairly among the doses in Di,c , which we call AR(ε). In practice, the numerical values of εi depend on the numerical range of U(y, z), and must be determined by preliminary trial simulations.

3.5 Trial Design

Our illustrative trial studied in the simulations is constructed to mimic a typical phase III chemotherapy trial with five dose levels, but accounting for two cycles of therapy. The maximum sample size is n = 60 patients with a cohort size of 2. Based on preliminary simulations, we set εi = 20 for the first 10 patients, εi = 15 for the next 10 patients and εi = 10 for the remaining 40 patients. An initial cohort of 2 patients is treated at the lowest dose level in cycle 1, their cycle 1 toxicity and efficacy outcomes are observed, the posterior of θ is computed, and actions are taken for cycle 2 of the initial cohort. Posterior computations are described in the supplementary material. If Di,2={0} then patient i does not receive a second cycle of treatment. If Di,2{0}, then AR(ε) is used to choose an action for cycle 2 from Di,2. When (Y2, Z2) are observed from cycle 2, the posterior of θ is updated. The second cohort is not enrolled until the first cohort has been evaluated for cycle 1. For all subsequent cohorts, the posterior is updated after the outcomes of all previous cohorts are observed, and the posterior expected utility, qi,1(d1,X), is computed using λ = 0.8. If Di,1(X)= for any interim X then di,1(X)=0, and the trial is terminated. If Di,1, then a cycle 1 dose is chosen from Di,1 using AR(ε). Once the outcomes in cycle 1 are observed, the posterior is updated. Using (di,1,Yi,1,Zi,1,X) and εi, Di,2 is searched. If Di,2 contains 0 only, then di2 = 0 and a cycle 2 dose is not given to patient i. Otherwise, di2 is selected from Di,2(di,1,Yi,1,Zi,1,X) using AR(ε). All adaptive decisions are made based on the most recent data X hence a new dselect may be chosen utilizing using partial data from recent patients for whom (Y1, Z1) but not (Y2, Z2) have been evaluated. The above steps are repeated until either the trial has been stopped early or N = 60 has been reached, and in this case a final optimal two-cycle regime dselect is chosen. The aim is that dselect should be the true d1opt{1,,m} and d2opt(d1opt,y1,z1)D. The recommendation for phase III is dselect, rather than a single “optimal” dose as is done conventionally.

We compared DTM2 design with four other designs: two-cycle extensions of the continual reassessment method (CRM, O'Quigley, et al., 1990), a Bayesian phase I-II method using toxicity and efficacy odds ratios (TEOR, Yin et al., 2006), and two (3+3) methods. One (3+3) method implicitly targets a dose with P(Y1 = 1) ≤ 0.17, called (3+3)a, and the other implicitly targets a dose with P(Y1 = 1) ≤ 0.33, called (3+3)b. We extended each one-cycle method to account for a second cycle. For both (3+3) methods, we used thedeterministic rule in cycle 2 that if Y1 = 1 then the dose is lowered by 1 level (d2 = d1 1) and if Y1 = 0 then the first dose is repeated (d2 = d1). The (3+3)a method, coupled with this deterministic rule for cycle 2, is a very commonly used method in actual phase I clinical trials.

For cycle 1 in the extended CRM (ECRM), we assumed the usual model Pr(Y1=1d1)=pd1exp(α) and α ~ N(0, 2) where 0 < p1 < ... < p5 < 1 are fixed values, sometimes called the model's “skeleton.” We calibrated the skeleton using the “getprior” subroutine in the package “dfcrm”, setting the target toxicity probability to be 0.30, the prior guess of maximum tolerated dose 4, and the desired halfwidth of the indi erence intervals 0.05 (Cheung, 2011). The resulting skeleton is (p1. . . , p5) = (0.063, 0.123, 0.204, 0.300, 0.402). Using this model, each patient's cycle 1 dose is that with posterior mean toxicity probability closest to 0.30. We implemented this using the R function,“crm”in dfcrm, but also imposing the no-skipping rule for cycle 1. To determine a cycle 2 dose, we used the same deterministic rule as for the extended (3+3) methods, with one more safety requirement. For ECRM, a cycle 2 dose is not given if Pr{Pr(Y1=1orY2=1)>pTX,d}>ψT, with pT = 0.5 and φT = 0.9, assuming independence of P(Y1 = 1) and P(Y2 = 1) for simplicity. For example, following the deterministic rule, a patient treated in cycle 1 at d1 may be treated at d2 ∈ {d1 1d1} depending on the cycle 1 toxicity outcome. In particular, we repeat d1 in cycle 2 if Y1 = 0. If (d1d1) does not satisfy the safety requirement, then the cycle 2 treatment is not given to a patient with d1 and Y1 = 0. In addition, if the cycle 2 treatment is not allowed for any d1 regardless of Y1, that is, no (d1d2) with d2 ∈ {d1 1d1} satisfies the safety rule, then we lower d1 until the cycle 2 treatment is safe for either of Y1 = 0 or Y1 = 1.

We extended TEOR to 2-cycles similarly to ECRM, and named this ETEOR. For ETEOR, d2 = 0 if Pr{Pr(Y1=1orY2=1)>pTX,d}>ψT or Pr{Pr(Z1=1andZ2=0)>pEX,d}>ψE with pT = 0.6, pE = 0.8, and φT = φE = 0.9, assuming independence of the two cycles for simplicity. In addition, we calibrated the priors of Yin et al.(2006) using the concept of prior effective sample size (see the supplementary material for details), resulting in their σϕ2=20, σψ2=5 and σθ2=10. We set πT=0.35, π¯E=0.5, pescl = 0.5, p* = 0.25 and q* = 0.1, and used ωd(3) to select a dose for the next patient.

4 Simulation Study

4.1 Simulation Design

We simulated trials under each of eight dose-outcome scenarios using each of the five designs: DTM2, and the extended 3+3, ECRM, and ETEOR methods. The first seven scenarios were obtained using the model underlying DTM2, with the eighth obtained from a very different model to study robustness. To specify 2-cycle simulation scenarios, one must specify a joint distribution of (Y1, Z1) for each d1 and a joint distribution of (Y2Z2) as a function of (d1, d2, Y1, Z1). For Scenarios 1 – 7, the marginal probabilities of toxicity and efficacy in each cycle are given in Table 2, and we simulated data using (4), with assumed values σξ2,true=ση2,true=0.52, τ2,true=0.32 and ρtrue=0.2. We determined ξtrue and ηtrue by matching Pr(Yc<0)=Φ(0ξdctrue,σξ2,true+τ2,true) and Pr(Zc<0)=Φ(0ηdctrue,ση2,true+τ2,true). We used (ξtrue,ηtrue) and the assumed nuisance parameters to simulate (Y , Z), generated (Y1, Z1) from (6) using the true values of σξ2, ση2 , τ2, ρ, and used (5) to generate (Y2Z2) conditional on (Y1, Z1).

Table 2.

True marginal probabilities of toxicity and efficacy under the first seven scenarios for the simulation studies, (pT, pE)true for cycles 1 and 2. The true marginal probabilities of Scenario 8 are identical to those of Scenario 5.

Scenario Cycles Doses
1 2 3 4 5
1 1 (0.10, 0.02) (0.15, 0.03) (0.30, 0.05) (0.45, 0.08) (0.55, 0.10)
2 (0.13, 0.01) (0.18, 0.02) (0.33, 0.04) (0.48, 0.07) (0.58, 0.09)
2 1 (0.30, 0.50) (0.32, 0.60) (0.35, 0.70) (0.38, 0.80) (0.40, 0.90)
2 (0.33, 0.45) (0.35, 0.55) (0.38, 0.65) (0.41, 0.75) (0.43, 0.85)
3 1 (0.05, 0.10) (0.18, 0.13) (0.20, 0.25) (0.40, 0.26) (0.50, 0.27)
2 (0.30, 0.20) (0.31, 0.35) (0.32, 0.45) (0.45, 0.65) (0.65, 0.70)
4 1 (0.13, 0.06) (0.15, 0.18) (0.25, 0.35) (0.55, 0.38) (0.75, 0.40)
2 (0.20, 0.14) (0.25, 0.23) (0.35, 0.29) (0.50, 0.32) (0.80, 0.35)
5 1 (0.52, 0.01) (0.61, 0.15) (0.71, 0.20) (0.82, 0.25) (0.90, 0.30)
2 (0.53, 0.04) (0.55, 0.20) (0.62, 0.25) (0.85, 0.27) (0.95, 0.33)
6 1 (0.25, 0.10) (0.28, 0.13) (0.30, 0.25) (0.40, 0.35) (0.50, 0.45)
2 (0.30, 0.20) (0.31, 0.35) (0.32, 0.45) (0.43, 0.65) (0.56, 0.70)
7 1 (0.25, 0.10) (0.28, 0.13) (0.30, 0.25) (0.40, 0.38) (0.65, 0.40)
2 (0.30, 0.20) (0.31, 0.35) (0.32, 0.45) (0.43, 0.65) (0.66, 0.67)

To apply DTM2, we first calibrated the hyperparameters, θ~, using the notion of the expected sample size (ESS) as described in the supplementary material. We simulated 1,000 pseudo samples of θ , setting σξc02=σηc02=62, and computed probabilities of interest, such as P(Yc = 0|dc) and P(Zc = 0|dc), based on the pseudo samples, setting σξ2=ση2=22, τ2 = 1 and ρ = 0.5. We determined θ~ that gave ESS ranging from 0.5 to 2 for the quantities of interest, and used this θ~ to determine the prior for all simulations.

To study robustness, in Scenario 8 we simulated data using the following logistic regression model. The cycle 1 marginal probabilities (pT (d1)pE(d1)) are the same as those of Scenario 5, with outcomes generated using true probabilities

Pr(Y1=1d1)=pT(d1),Pr(Z1=1d1,Y1)=logit1{logit(pE(d1))0.34(Y10.5)},Pr(Y2=1d1,d2,Y1,Z1)=logit1{logit(pT(d1))+0.33d2+0.4(Y10.5)0.3(Z10.5)},Pr(Z2=1d1,d2,Y1,Z1,Y2)=logit1{logit(pE(d1))+0.76d20.22(Y10.5)+2.4(Z10.5)1.8(Y20.5)}.

Table 3 shows the optimal actions, d1opt and d2opt(d1opt,Y1,Z1), under each scenario. For example, in Scenario 3, the optimal cycle 1 action is d1opt=3, and the optimal cycle 2 action is d2opt(d1=3,Y1=0,Z1)=4 and d2opt(d1=3,Y1=1,Z1)=2, regardless of Z1.

4.2 Evaluation Criteria

We used the following summary statistics to evaluate each method's performance. Denote the outcomes of the n patients in a given trial who received at least one cycle of therapy by {(Yi,1, Zi,1), (Yi,2, Zi,2)i = 1..., n}, where n < 60 if the trial was stopped early. The empirical mean total utility for the n patients isU=i=1n{U(Yi,1,Zi,1)+U(Yi,2,Zi,2)}n, where we set U(Yi,2Zi,2) = U(0, 0) for patients who did not receive a second cycle of therapy. Indexing the N simulated replications of the trial by r = 1..., N, the empirical mean total payo for all patents in the trial is U¯=N1r=1NU(r). One may regard U¯ as an index of the ethical desirability of the method for the patients in the trial, given the utility U(y, z).

To evaluate performance in terms of future patient benefit, recall that DTM2 selects an optimal dose d1,select for cycle 1, and an optimal function d2,select for use in cycle 2 assuming that d1,select is given, with d2,select not defined if d1,select = 0. Let θtrue be the true parameter vector assumed for a simulation scenario. Under θtrue, the expected payo in cycle 1 of giving action d1,select to a future patient is Q1,select(d1,select) = E{U(Y1, Z1) | d1 ,selecttrue}, for d1,select ≠= 0. If the rule d2 ,select is used, the expected payo in cycle 2 is

Q2,select(d2,select)=(y1,z1){0,1}2E{U(Y2,Z2)d1,select,d2,select(y1,z1),y1,z1,θtrue}×p(y1,z1d1,select,θtrue),

where E{U(Y2, Z2 | d1,select, d2,select(Y1, Z1)y true 1z1 } becomes U(0, 0) if d2 = 0. The total expected payo to a future patient treated using the selected regime dselect = (d1,selectd2,select) is defined to be Qselect(dselect) = Q1,select(d1,select) + λQ2,select(d2,select).

In addition to the criteria U¯ and Qselect, we evaluated the empirical toxicity and efficacy rates, defined as follows. Let δi,2 = 1 if patient i was treated in cycle 2. For each simulated trial with each method, for patients who received at least one cycle of therapy, we computed

Pr(Tox)=1ni=1n1(Yi,1=1)+δi,21(Yi,2=1)1+δi,2

and

Pr(Eff)=1ni=1n1(Zi,1=1)+δi,21(Zi,2=1)1+δi,2

4.3 Simulation Results

A total of N = 1, 000 trials were simulated under each scenario for each of the five designs studied. The simulation results are summarized in Table 4. For the each of the five trial designs, Table 4 gives U¯, Qselect, the empirical per-cycle toxicity and efficacy probabilities and the percent of trials completed with d1,select ∈ {1. . . , m}. A di erence in U¯ or Q select that may be considered “large” is about 5, since this translates to, on average, a di erence of about .13 in Pr(Tox), while a di erence 1 may be considered small.

Table 4.

Simulation results for the proposed method DTM2, and for 2-cycle extensions (3+3)a, (3+3)b, ECRM of standard phase I methods, and the 2-cycle extension ETEOR of the phase I-II method of Li et al. (2006). U¯ = mean empirical utility, Qselect = mean payoff of dselect. Empirical percentages Pr(Tox) and Pr(Eff) include patients who received at least cycle 1 of treatment.

Scenarios Criterion DTM2 (3+3)a (3+3)b ECRM ETEOR
1 Ū 66.48 59.27 58.81 56.56 61.90
Qselect 57.77 54.36 52.30 51.75 52.43
Pr(Tox) 0.25 0.22 0.23 0.27 0.25
Pr(Eff) 0.07 0.03 0.03 0.05 0.07
% completed trials 2.3 88.6 96.5 99.6 4.4
2 U¯ 136.35 124.36 118.32 115.86 122.13
Qselect 135.76 103.85 104.48 102.43 108.47
Pr(Tox) 0.39 0.30 0.33 0.36 0.35
Pr(Eff) 0.72 0.58 0.55 0.56 0.60
% completed trials 99.4 39.2 64.7 95.6 78.2
3 U¯ 94.23 85.95 85.75 89.93 88.04
Qselect 84.39 77.98 80.14 78.43 78.47
Pr(Tox) 0.38 0.27 0.27 0.30 0.26
Pr(Eff) 0.38 0.27 0.27 0.33 0.28
% completed trials 79.4 96.6 99.2 100.0 78.50
4 Ū 75.84 81.81 80.12 85.40 84.94
Qselect 69.49 74.92 75.76 78.67 78.87
Pr(Tox) 0.51 0.25 0.26 0.29 0.28
Pr(Eff) 0.29 0.22 0.21 0.29 0.27
% completed trials 96.7 83.2 94.7 99.4 81.7
5 Ū 66.65 52.87 52.72 50.41 NA
Qselect 50.64 40.66 40.70 40.61 NA
Pr(Tox) 0.84 0.43 0.44 0.53 NA
Pr(Eff) 0.35 0.08 0.04 0.03 NA
% completed trials 0.4 6.8 20.0 9.0 0.0
6 U¯ 96.43 82.82 79.27 81.50 86.14
Qselect 92.78 70.30 71.28 71.29 76.24
Pr(Tox) 0.45 0.28 0.32 0.32 0.32
Pr(Eff) 0.41 0.24 0.23 0.25 0.29
% completed trials 90.9 51.5 74.7 97.6 58.3
7 Ū 91.88 82.66 79.31 80.99 86.32
Qselect 84.91 70.28 71.27 71.16 76.34
Pr(Tox) 0.47 0.28 0.32 0.32 0.32
Pr(Eff) 0.38 0.24 0.22 0.25 0.29
% completed trials 90.3 51.4 73.6 97.5 58.7
8 Ū 95.92 80.24 76.09 79.83 80.75
Qselect 93.22 68.23 69.26 69.28 70.73
Pr(Tox) 0.54 0.34 0.36 0.37 0.34
Pr(Eff) 0.45 0.25 0.22 0.27 0.27
% completed trials 84.7 49.4 73.2 97.6 57.9

In Scenario 1, Table 2 shows that doses d = 1,2,3, are safe, d = 4,5 are overly toxic, and all doses have very low efficacy. In this case there is little benefit from any dose. The value U¯=66.48 for DTM2 in Table 4 is close to the utility U(0, 0) + 0.8U(0, 0) = 66 of (d1 = 0d2 = 0). The utility-based stopping rule of DTM2 correctly terminates the trial 97.7% of the time. Similarly, ETEOR terminates 95.6% of the trials before reaching the maximum number of patients due to the low efficacy rates. In contrast, the extended versions of the 3+3 and ECRM are very likely to run the trial to completion, essentially because they ignore efficacy. This provides a stark illustration of the fact that there is little benefit in exploring the safety of an agent if it is ine cacious, and methods that ignore efficacy are very likely to make this mistake. This has little to do with the 2-cycle structure, and it also can be seen when comparing one-cycle phase I-II (efficacy and toxicity) to phase I (toxicity only) methods. Thus, DTM2 and ETEOR are the only reasonable designs in scenario 1, and DTM2 is superior in terms of both U¯ and Qselect.

In Scenario 2, Table 2 shows that the toxicity probabilities increase with dose from 0.30 to 0.40 in cycle 1 and from 0.33 to 0.43 in cycle 2, while the efficacy probabilities are quite high in both cycles, increasing with dose from 0.50 to 0.90 in cycle 1 and from 0.45 to 0.85 in cycle 2. Thus, if one considers toxicity probabilities around 0.40 to be acceptable trade-o s for these very high efficacy rates, then there is a substantial payo for escalating to higher doses. The utility function reflects this, with the optimal action d1opt=5 and d2opt=(5,Y1,Z1)=4or5 (Table 3). DTM2 obtains larger values of U¯and Qselect due to much larger Pr(Eff) and slightly larger Pr(Tox), compared to all of the four methods.

In Scenario 3, d1opt=3, with d2opt=4 if Y1 = 0 in cycle 1 and d2opt=2 if Y1 = 1 (Table 3). This illustrates the within-patient adaptation of DTM2. The (3+3)a, (3+3)b, and ECRM methods select d1opt=3 often since the toxicity probability of d1 = 3 is close to 0.30, but they never select d2opt=2 for patients with (d1, Y1) = (3, 0) because their deterministic rules ignore Z1 and do not allow escalation of dose levels for cycle 2 even with Y1 = 0. Again, DTM2 achieves the largest U¯, Qselect, and Pr(Eff), with slightly larger Pr(Tox).

Scenario 4 is a challenging scenario for DTM2, and is favorable for the other four designs. In Scenario 4, d1opt=3 since its toxicity probability 0.25 is closest to 0.30. In addition, d2opt(d1opt,Y1,Z1) is exactly the same as the cycle 2 dose levels chosen by the deterministic rules of (3+3)a, (3+3)b and ECRM, except for (Y1, Z1) = (0, 1), which only occurs about 5% of the time. From Table 1, the true expected utility of d2 = 2 given (d1, Y1, Z1) = (3, 0, 1) is 32.82, which is very close to U(0, 0). Thus, the three methods, (3+3)a, (3+3)b and ECRM, are likely to select d1opt by considering only toxicity outcomes and select d2opt following their deterministic rules. CRM selects d1opt=3 most of time, leading to the largest U¯ and Qselect. Similar performance is observed for ETEOR as well due to the fact that d1opt is considered optimal by ETEOR, and it uses the same deterministic rule for cycle 2. The smaller values of U¯ and Qselect for DTM2 are due to the fact that it does a stochastic search to determine the optimal actions, using much more general criteria than the other methods. Table 1 shows that, for (d1Y1) = (3, 1), the expected cycle 2 utilities are smaller than or very close to U(0, 0) for all the cycle 2 doses, so all cycle 2 doses are barely acceptable or not acceptable. However, d1 = 5 is acceptable and, given d1 = 5, many cycle 2 doses are acceptable, and DTM2 often explores higher doses in cycle 1 than d1opt. This scenario illustrates the price one may pay for using more of the available information to explore the dose domain more extensively based on an efficacy-toxicity utility-based objective function.

In Scenario 5, the lowest dose is too toxic and therefore even d1 = 1 is unacceptable. As expected, all methods terminate the trial early most of time, with DTM2 stopping trials due to the low posterior expected utilities caused by the high toxicity rate at d1. Scenarios 6 and 7 have identical true toxicity and efficacy rates for doses 1, 2 and 3, while for doses 4 and 5, Scenario 7 has higher toxicity rates and lower efficacy rates so that its d1opt is a dose lower than d1opt of Scenario 6. Since dose 3 has a toxicity rate closest to 0.3 in the both scenarios, the other four methods perform very similarly in the two scenarios. However, DTM2 again has much higher U¯ and Qselect values compared to all of the other methods in these scenarios.

Recall that Scenario 8 is included to evaluate robustness, with joint probabilities generated using a model very different from that underlying DTM2. It thus is remarkable that, in terms of both U¯ and Qselect, DTM2 has far superior performance compared to all four other methods. Essentially, this is because DTM2 allows a higher rate of toxicity as a trade-o for much higher efficacy, while the phase I methods (3+3)a, (3+3)b, and ECRM all ignore efficacy, and the other phase I-II method, ETEOR, terminates the trial early much more frequently. The superior performance of DMT2 in Scenario 8 may be attributed to its use of a 2-cycle utility function to account for efficacy-toxicity trade-o s and also as a basis for its early stopping rule. More generally, it appears that DTM2 is quite robust to the actual probability mechanism that generates the outcomes.

To assess sensitivity to association among the outcomes Y1, Z1Y2, Z2 in the two cycles, we evaluated each method's performance with and without association in Scenarios 3, 6, and 7. We let the true (σξ2,ση2,τ2,ρ) be either (0.2, 0.05, 10.5) or (0.52, 0.52, 0, 0). The first set of values induces high association between outcomes both within and across cycles, while the second set of values induces no association. This leads to different true expected utilities in each cycle and thus to different optimal decisions, as shown in Table 5. The results are summarized in Table 6. While performance changes depending on the assumed true values, in all cases DTM2 is again superior to all four other methods.

Table 5.

Optimal sequence of treatments under scenarios 3, 6 and 7, assuming different values of (σξ2,true,ση2,true,τ2,true,ρtrue) to induce either high association or no association between outcomes.

Scenario d1opt d2opt
(0,0) (0, 1) (1,0) (1,1)
3 - High Assoc. 3 4 3 NT 2
3 - No Assoc. 3 4 4 2 2
6 - High Assoc. 5 5 4 NT 3
6 - No Assoc. 5 4 4 4 4
7 - High Assoc. 4 4 4 NT 3
7 - No Assoc. 4 4 4 3 3

Table 6.

Simulation results under scenarios 3, 6 and 7, assuming different values of (σξ2,true,ση2,true,τ2,true,ρtrue) to induce either high association or no association between outcomes.

Scenarios Criterion DTM2 (3+3)a (3+3)b ECRM ETEOR
3 High Assoc. Ū 97.06 85.68 85.24 88.56 89.85
Qselect 86.18 78.53 80.53 76.58 79.27
Pr(Tox) 0.37 0.27 0.28 0.31 0.26
Pr(Eff) 0.38 0.26 0.26 0.33 0.29
% completed trials 97.7 96.6 99.2 99.9 77.5
3 No Assoc. Ū 92.05 85.96 85.44 90.06 87.88
Qselect 82.22 77.83 80.04 79.35 78.75
Pr(Tox) 0.41 0.27 0.27 0.30 0.26
Pr(Eff) 0.36 0.26 0.26 0.33 0.28
% completed trials 98.2 96.6 99.2 99.9 77.5
6 High Assoc. Ū 101.37 85.54 81.74 83.20 89.87
Qselect 95.18 72.43 73.35 71.40 77.67
Pr(Tox) 0.42 0.26 0.29 0.31 0.31
Pr(Eff) 0.43 0.25 0.22 0.26 0.31
% completed trials 91.1 51.5 74.7 97.5 59.9
6 No Assoc. Ū 94.63 82.45 78.73 81.51 85.11
Qselect 90.85 69.76 70.75 71.53 76.11
Pr(Tox) 0.46 0.29 0.32 0.33 0.32
Pr(Eff) 0.40 0.24 0.22 0.26 0.28
% completed trials 91.6 51.5 74.7 98.2 57.0
7 High Assoc. Ū 96.67 85.40 81.63 82.94 90.00
Qselect 87.63 72.44 73.35 71.28 77.84
Pr(Tox) 0.44 0.26 0.29 0.31 0.31
Pr(Eff) 0.41 0.25 0.22 0.26 0.31
% completed trials 90.7 51.4 73.6 97.9 60.2
7 No Assoc. Ū 89.91 82.25 78.64 81.34 85.29
Qselect 82.89 69.74 70.74 71.23 76.20
Pr(Tox) 0.48 0.29 0.32 0.32 0.32
Pr(Eff) 0.37 0.24 0.22 0.25 0.28
% completed trials 90.8 51.4 73.6 97.5 57.3

5 Discussion

Practical application of DTM2 requires substantial input from the physicians, including specification of outcomes, doses, prior values, and numerical utilities. Such key input from the physicians, and preliminary validation by computer simulation, have provided a practical basis for use of model-based outcome-adaptive methods in many actual phase I-II dose-finding trials (cf. de Lima, Champlin, Thall, et al., 2008). In the design process, computer simulation also may be used to conduct sensitivity analyses in the prior or the numerical utilities so that the physicians may adjust their values. For trial conduct, a database and data entry procedure must be established, with the database updated in real time as patients are treated and evaluated in each cycle. The actual data used by DTM2 are simple, however, consisting of (d1, Y1, Z1, d2Y2, Z2). Accounting for two cycles rather than only one is not a substantial complication compared to usual adaptive trials, since all clinical protocols contain rules for adaptive within-patient decision making.

DTM2 provides the 2-cycle regime dselect for phase III evaluation, rather than only a selected d1 or 2-cycle pair (d1d2). This is an important improvement, since it more accurately reflects actual clinical practice and is likely to improve the chance of success in phase III. This is because phase I methods based on toxicity alone are likely to fail to identify higher doses having higher efficacy and acceptable toxicity, and thus are more likely to select an ineffective dose for phase II evaluation. Moreover, our comparisons to the 2-cycle extension ETEOR of the phase I-II design of Yin et al. (2006), which also uses efficacy, show the advantage of optimizing a utility-based Q-function for decision making.

Several important practical extensions should be noted. While DTM2 uses recent patients’ partial data if only their cycle 1 outcomes have been evaluated, this may be refined by using event time data to enhance inferences. A useful extension would to use toxicity or efficacy follow up times from patients treated and but not fully evaluated, employing predictive probabilities similarly to Bekele et al. (2008), or taking the approach of Zhao et al. (2011). Bivariate ordinal (Yc, Zc) outcomes with more than two levels may be accommodated by extending the model to include more cuto s in the latent variables, and eliciting corresponding utilities, as in Thall and Nguyen (2012). Extension to accommodate this case is complex, however, since there would be many more elementary outcomes and thus many more model parameters. Numerous ad hoc adaptive methods for choosing a patient's doses in cycles after the first actually are used in clinical practice. For example, if Y and Z each have four levels, then for two cycles there would be 16 elementary outcomes, rather than 4, (ξ,η) would be 8-dimensional, and Σξ,η would be an 8×8 matrix. Since many actual regimes involve more than two cycles, while in theory the decision criterion can be generalized to accommodate this in a straightforward manner, the dimensions of the outcomes and decisions become much larger. This strongly suggests that, to deal with the general multi-cycle case in a practical way, a more parsimonious model will be needed.

Supplementary Material

Supplementary Materials.

Acknowledgments

Yuan Ji's research is supported by NIH R01/NCI CA132897. Peter Thall's research was supported by NIH/NCI R01 CA 83932 and NIH/NCI 5 P50 CA140388. Peter Müller's research was supported by NIH/NCI R01 CA157458-01A1.

Footnotes

Supplementary Materials

Supplementary materials are available under the Paper Information link at the JASA website.

References

  1. Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 1993;88:6699. [Google Scholar]
  2. Almirall D, Ten Have T, Murphy SA. Structural nested mean models for assessing time-varying effect moderation. Biometrics. 2010;66:131–139. doi: 10.1111/j.1541-0420.2009.01238.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Ashford JR, Sowden RR. Multi-variate probit analysis. Biometrics. 1970;26:535–546. [PubMed] [Google Scholar]
  4. Azriel D, Mandel M, Rinott Y. The treatment versus experiment dilemma in dose-finding studies. J. Statistical Planning and Inference. 2011;141:2759–68. [Google Scholar]
  5. Bartroff J, Lai TL. Approximate dynamic programming and its applications to the design of phase I cancer trials. Statistical Science. 2010;25:255–257. [Google Scholar]
  6. Bekele BN, Thall PF. Dose-finding based on multiple toxicities in a soft tissue sarcoma trial. J American Statistical Assoc. 2004;99:26–35. [Google Scholar]
  7. Bekele BN, Ji Y, Shen Y, Thall PF. Monitoring late onset toxicities in phase I trials using predicted risks. Biostatistics. 2008;9:442–457. doi: 10.1093/biostatistics/kxm044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bellman RE. Dynamic Programming. Princeton University Press; 1957. [Google Scholar]
  9. Braun TM, Kang S, Taylor JMG. A Phase I/II trial design when response is unobserved in subjects with dose-limiting toxicity. Statistical Methods in Medical Research. 2012 doi: 10.1177/0962280212464541. Published online 1 November 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Braun TM, Yuan Z, Thall PF. Determining a maximum tolerated schedule of a cytotoxic agent. Biometrics. 2005;61:335–343. doi: 10.1111/j.1541-0420.2005.00312.x. [DOI] [PubMed] [Google Scholar]
  11. Braun TM, Thall PF, Nguyen H, de Lima M. Simultaneously optimizing dose and schedule of a new cytotoxic agent. Clinical Trials. 2007;4:113–124. doi: 10.1177/1740774507076934. [DOI] [PubMed] [Google Scholar]
  12. Cheung Y-K, Chappell R. Sequential designs for phase I clinical trials with late-onset toxicities. Biometrics. 2000;56:1177–1182. doi: 10.1111/j.0006-341x.2000.01177.x. [DOI] [PubMed] [Google Scholar]
  13. Cheung Y-K. Dose Finding by the Continual Reassessment Method. Chapman&Hall/CRC Biostatistics series; 2011. [Google Scholar]
  14. Chevret SC. Statistical methods for Dose Finding Experiments. John Wiley and Sons; Chichester, UK.: 2006. [Google Scholar]
  15. Chib S, Greenberg E. Analysis of multivariate probit models. Biometrika. 1998;85:3471. [Google Scholar]
  16. Collins LM, Murphy SA, Nair VN, Strecher VJ. A strategy for optimizing and evaluating behavioral interventions. Ann Behav Med. 2005;30(1):65–73. doi: 10.1207/s15324796abm3001_8. [DOI] [PubMed] [Google Scholar]
  17. deLima M, Champlin RE, Thall PF, Wang X, Cook JD, Martin TG, McCormick G, Qazilbash M, Kebriaei P, Couriel D, Shpall EJ, Khouri I, Anderlini P, Hosing C, Chan KE, Patah PA, Caldera Z, Jabbour E, Giralt S. Phase I/II study of gemtuzumab ozogamicin added to fludarabine, melphalan and allogeneic hematopoietic stem cell transplantation for high-risk CD33 positive myeloid leukemias and myelodysplastic syndrome. Leukemia. 2008;22:258–264. doi: 10.1038/sj.leu.2405014. [DOI] [PubMed] [Google Scholar]
  18. Gelfand AE, Smith AFM. Sampling-based approaches to calculating marginal densities. J. Am. Statist. Assoc. 1990;85:398–409. [Google Scholar]
  19. Hernan M, Brumback B, Robins J. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11:561–570. doi: 10.1097/00001648-200009000-00012. [DOI] [PubMed] [Google Scholar]
  20. Lavori PW, Dawson R. Dynamic treatment regimes: Practical design considerations. Statistics in Medicine. 2001;20:1487–98. doi: 10.1191/1740774s04cn002oa. [DOI] [PubMed] [Google Scholar]
  21. Li Y, Bekele BN, Ji Y, Cook JD. Dose-schedule finding in phase I/II clinical trials using a Bayesian isotonic transformation. Statistics In Medicine. 2008;27:4895–4913. doi: 10.1002/sim.3329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Lunceford J, Davidian M, Tsiatis AA. Estimation of the survival distribution of treatment policies in two-stage randomization designs in clinical trials. Biometrics. 2002;58:48–57. doi: 10.1111/j.0006-341x.2002.00048.x. [DOI] [PubMed] [Google Scholar]
  23. Moodie EEM, Richardson TS, Stephens DA. Demystifying optimal dynamic treatment regimes. Biometrics. 2007;63:447–455. doi: 10.1111/j.1541-0420.2006.00686.x. [DOI] [PubMed] [Google Scholar]
  24. Morita S, Thall PF, Mueller P. Evaluating the impact of prior assumptions in Bayesian biostatistics. Statistics in Biosciences. 2010;2:1–17. doi: 10.1007/s12561-010-9018-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Murphy S. Optimal dynamic treatment regimes (with discussion). Journal of the Royal Statistical Society. Series B. 2003;65:331–366. [Google Scholar]
  26. Murphy S, Bingham D. Screening experiments for developing dynamic treatment regimes. J American Statistical Assoc. 2009;104:391–408. doi: 10.1198/jasa.2009.0119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Murphy SA, Collins LM, Rush AJ. Customizing treatment to the patient: Adaptive treatment strategies. Drug and Alcohol Dependence. 2007;88:S1–S3. doi: 10.1016/j.drugalcdep.2007.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Murphy SA, Lynch KG, Oslin D, Mckay JR, TenHave T. Developing adaptive treatment strategies in substance abuse research. Drug and Alcohol Dependence. 2007;88s:s24–s30. doi: 10.1016/j.drugalcdep.2006.09.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Murphy SA, van der Laan MJ, Robins JM. Marginal mean models for dynamic regimes. J American Statistical Assoc. 2001;96:1410–1423. doi: 10.1198/016214501753382327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. O'Quigley J, Pepe M, Fisher L. Continual reassessment method: a practical design for phase I clinical trials in cancer. Biometrics. 1990;46:33–48. [PubMed] [Google Scholar]
  31. Robert CP, Cassella G. Monte Carlo Statistical Methods. Springer; New York: 1999. [Google Scholar]
  32. Robins JM. A new approach to causal inference in mortality studies with sustained exposure periods - application to control of the healthy survivor effect. Mathematical Modeling. 1986;7:1393–1512. [Google Scholar]
  33. Robins JM. Analytic methods for estimating HIV treatment and cofactor effects. In: Ostrow DG, Kessler R, editors. Methodological issues of AIDS Mental Health Research. Plenum Publishing; New York: 1993. pp. 213–290. [Google Scholar]
  34. Robins JM. Causal Inference from Complex Longitudinal Data Latent Variable Modeling and Applications to Causality. In: Berkane M, editor. Lecture Notes in Statistics. 120. Springer Verlag; NY: 1997. pp. 69–117. [Google Scholar]
  35. Robins JM. Marginal Structural Models. Proceedings of the American Statistical Association. section on Bayesian Statistics. 1998:1–10. [Google Scholar]
  36. Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11:550–60. doi: 10.1097/00001648-200009000-00011. [DOI] [PubMed] [Google Scholar]
  37. Rush AJ, Trivedi M, Fava Depression IV: STAR*D treatment trial for depression. American Journal of Psychiatry. 2003;160(2):237. doi: 10.1176/appi.ajp.160.2.237. [DOI] [PubMed] [Google Scholar]
  38. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT Press; Cambridge, MA: 1998. [Google Scholar]
  39. Thall PF, Millikan R, Sung H-G. Evaluating multiple treatment courses in clinical trials. Statistics in Medicine. 2000;19:1011–1028. doi: 10.1002/(sici)1097-0258(20000430)19:8<1011::aid-sim414>3.0.co;2-m. [DOI] [PubMed] [Google Scholar]
  40. Thall PF, Nguyen HQ. Adaptive randomization to improve utility-based dose-finding with bivariate ordinal outcomes. J Biopharmaceutical Statistics. 2012;22:785–801. doi: 10.1080/10543406.2012.676586. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Thall PF, Nguyen HQ, Estey EH. Patient-specific dose-finding based on bivariate outcomes and covariates. Biometrics. 2008;64:1126–1136. doi: 10.1111/j.1541-0420.2008.01009.x. [DOI] [PubMed] [Google Scholar]
  42. Thall PF, Sung H-G, Estey EH. Selecting therapeutic strategies based on efficacy and death in multi-course clinical trials. J American Statistical Assoc. 2002;97:29–39. [Google Scholar]
  43. Thall PF, Wooten LH, Logothetis CJ, Millikan R, Tannir NM. Bayesian and frequentist two-stage treatment strategies based on sequential failure times subject to interval censoring. Statistics in Medicine. 2007;26:4687–4702. doi: 10.1002/sim.2894. [DOI] [PubMed] [Google Scholar]
  44. Tokic M. Advances in Artificial Intelligence. Springer Verlag; Heidelberg, Germany: 2010. Adaptive ε-Greedy exploration in reinforcement learning based on value differences. pp. 203–210. [Google Scholar]
  45. Tierney L. Markov chains for exploring posterior distributions (with Discussion). Annals of Statistics. 1994;22:1701–1762. [Google Scholar]
  46. Wahed AS, Tsiatis AA. Optimal estimator for the survival distribution and related quantities for treatment policies in two-stage randomization designs in clinical trials. Biometrics. 2004;60:124–133. doi: 10.1111/j.0006-341X.2004.00160.x. [DOI] [PubMed] [Google Scholar]
  47. Wang L, Rotnitzky A, Lin X, Millikan R, Thall PF. Evaluation of viable dynamic treatment regimes in a sequentially randomized trial of advanced prostate cancer. J American Statistical Assoc. 2012;107:493–508. doi: 10.1080/01621459.2011.641416. (with discussion, pages 509-517; rejoinder, pages 518-520) [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Watkins CJCH. Learning from delayed rewards. PhD thesis. Cambridge University; 1989. [Google Scholar]
  49. Yin G. Clinical Trial Design: Bayesian and Frequentist Adaptive Methods. John Wiley & Sons; 2012. [Google Scholar]
  50. Yin G, Li Y, Ji Y. Bayesian dose-finding in phase I/II clinical trials using toxicity and efficacy odds ratios. Biometrics. 2006;62:777–789. doi: 10.1111/j.1541-0420.2006.00534.x. [DOI] [PubMed] [Google Scholar]
  51. Zhang J, Braun TM. A phase I Bayesian adaptive design to simultaneously optimize dose and schedule assignments both between and within patients. J. of the American Statistical Association to appear. 2013 doi: 10.1080/01621459.2013.806927. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Zhang W, Sargent DJ, Mandrekar S. An adaptive dose-finding design incorporating both toxicity and efficacy. Statistics in Medicine. 2005;25:2365–2383. doi: 10.1002/sim.2325. [DOI] [PubMed] [Google Scholar]
  53. Zhao Y, Zheng D, Socinski MA, Kosorok MR. Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer. Biometrics. 2011;67:1422–1433. doi: 10.1111/j.1541-0420.2011.01572.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Zhang J, Braun TM. A phase I Bayesian adaptive design to simultaneously optimize dose and schedule assignments both between and within patients. J American Statistical Assoc. 2013;108:892–901. doi: 10.1080/01621459.2013.806927. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Zohar S, Chevret S. Recent developments in adaptive designs for phase I/II dose-finding studies. Journal of Biopharmaceutical Statistics. 2007;17:1071–1083. doi: 10.1080/10543400701645116. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Materials.

RESOURCES