Bayesian Dose-Finding in Two Treatment Cycles Based on the Joint Utility of Efficacy and Toxicity

Juhee Lee; Peter F Thall; Yuan Ji; Peter Müller

doi:10.1080/01621459.2014.926815

. Author manuscript; available in PMC: 2016 Jun 1.

Published in final edited form as: J Am Stat Assoc. 2014 Jun 27;110(510):711–722. doi: 10.1080/01621459.2014.926815

Bayesian Dose-Finding in Two Treatment Cycles Based on the Joint Utility of Efficacy and Toxicity

Juhee Lee ^1,^*, Peter F Thall ², Yuan Ji ³, Peter Müller ⁴

PMCID: PMC4562700 NIHMSID: NIHMS610172 PMID: 26366026

Abstract

A phase I/II clinical trial design is proposed for adaptively and dynamically optimizing each patient's dose in each of two cycles of therapy based on the joint binary efficacy and toxicity outcomes in each cycle. A dose-outcome model is assumed that includes a Bayesian hierarchical latent variable structure to induce association among the outcomes and also facilitate posterior computation. Doses are chosen in each cycle based on posteriors of a model-based objective function, similar to a reinforcement learning or Q-learning function, defined in terms of numerical utilities of the joint outcomes in each cycle. For each patient, the procedure outputs a sequence of two actions, one for each cycle, with each action being the decision to either treat the patient at a chosen dose or not to treat. The cycle 2 action depends on the individual patient's cycle 1 dose and outcomes. In addition, decisions are based on posterior inference using other patients’ data, and therefore the proposed method is adaptive both within and between patients. A simulation study of the method is presented, including comparison to two-cycle extensions of the conventional 3+3 algorithm, continual reassessment method, and a Bayesian model-based design, and evaluation of robustness.

Keywords: Adaptive Design, Bayesian Design, Dynamic Treatment Regime, Phase I-II Clinical Trial, Q-Learning, Latent Probit Model

1 Introduction

Medical treatment often involves multiple cycles of therapy. Physicians routinely choose a patient's treatment in each cycle adaptively based on the patient's history of treatments and clinical outcomes. In such settings, a patient's therapy is not one treatment, but rather a sequence of treatments, each chosen using an adaptive algorithm of the general form “observe → treat → observe → treat → ...” etc. This paradigm is known as a dynamic treatment regime (DTR) (Murphy, et al., 2001; Lavori and Dawson, 2001; Murphy, 2003; Moodie, et al., 2007), multi-stage treatment strategy (Thall, Millikan and Sung, 2000; Thall, Sung and Estey, 2002) or treatment policy (Lunceford, Davidian and Tsiatis, 2002; Wahed and Tsiatis, 2004). In oncology, treatment in each cycle may be a chemical or biological agent, radiation therapy, or some combination of these. DTRs also are used for chronic diseases, including behavioral disorders (Collins, et al. 2005; Almirall, et al., 2010) and drug or alcohol addiction (Murphy, et al., 2007). Unfortunately, most clinical trial designs ignore the actual DTRs being used, and instead evaluate the treatments given initially as if patient outcome were due to them alone, rather than the entire DTR.

There is an extensive literature on adaptive dose-finding designs for phase I and phase I/II clinical trials (cf. Chevret, 2006; Yin, 2012). In actual conduct of such trials, the attending physician uses a DTR to make multi-cycle decisions for each patient. Depending on the patient's history of doses and outcomes, the dose given in each cycle may be above, below, or the same as the dose given previously, or therapy may be terminated due to excessive toxicity or poor efficacy. Since typical early-phase trial designs ignore such within-patient multi-cycle decision making, the “optimal” dose chosen by such a design actually pertains only to the first cycle of therapy.

While statistical methods for DTRs have seen limited application in actual clinical trials (Rush, et al., 2003; Thall, et al., 2007; Wang, et al., 2012), recently there has been extensive research to develop or optimize DTRs in medicine, including semiparametric methods (Wahed and Tsiatis, 2006), reinforcement learning (Zhao, et al., 2011), and sequential multiple assignment randomized trials (Murphy and Bingham, 2009). The aim is to better reflect the intrinsically multi-stage, adaptive structure of what physicians actually do, in both trial design and analysis of observational data. This methodology had its origins in research to define and estimate causal parameters in complex longitudinal data, pioneered by Robins (1986, 1993, 1997, 1998), and applied to the analysis of AIDS data (Hernan, Brumback, and Robins, 2000; Robins, Hernan and Brumback, 2000).

The problem of optimizing each patient's doses given in multiple cycles based on efficacy and toxicity in phase I/II trials has not been addressed formally. Phase I/II designs typically optimize the initial dose using between-patient adaptive rules. A review is given by Zohar and Chevret (2007). For phase I trials involving multiple cycles of therapy, Braun, Yuan, and Thall (2005) proposed a Bayesian design with between-patient adaptive rules based on time-to-toxicity to optimize the number of cycles (“schedule”) given a fixed dose. Braun et al. (2007) extended this to allow per-administration dose to vary, and jointly optimized dose and schedule, using a criterion similar to that of the time-to-event continual reassessment method (TiTE CRM, Cheung and Chappell, 2000). Li, et al. (2008) proposed an approach to optimizing dose and schedule for two nested schedules and bivariate binary outcomes, using an isotonic transformation to obtain matrix ordered toxicity probabilities with order-restricted inferences. While few of these methods include within-patient adaptive rules applied after the first cycle, the phase I design proposed by Zhang and Braun (2013) to optimize dose and schedule accounts for multiple within-patient administrations.

Here, we address the problem of adaptively optimizing each patient's dose in each of two cycles of therapy in a phase I/II trial based on binary efficacy and toxicity. This is the simplest case of the general multi-cycle phase I/II trial design problem, which may be formulated with ordinal or time-to-event outcomes and an arbitrary number of cycles. We address the simpler two-cycle problem because it still is much more complicated than the one-cycle case. Our goals are to provide a practical trial design and establish a basis for subsequently developing methods for more complex settings. We employ a model-based Bayesian objective function, defined in terms of (efficacy, toxicity) utilities, structurally similar to reinforcement learning (Sutton and Bartow, 1998) or Q-learning functions (Watkins, 1989). Our method chooses a dose in each cycle to maximize the posterior expected mean of the objective function, applying a modified recursive Bellman equation (1957) that assumes, for the decision in cycle 1, that one will behave optimally in cycle 2. At the end of the trial, the method provides an optimal two-stage regime consisting of an optimal cycle 1 dose, and an optimal function of the patient's cycle 1 dose and outcomes that either chooses a cycle 2 dose or says to not treat the patient in cycle 2. This is very different from simply choosing two “optimal” doses, one for each cycle, with the “optimal” cycle 2 dose ignoring each patient's cycle 1 data. Because all decisions are based on posterior quantities computed using all patients’ data, the method is adaptive both within and between patients.

Section 2 describes the proposed decision-theoretic two-cycle method, DTM2, including the Bayesian probability model, an algorithm for prior calibration, and posterior computation. Utility-based decision criteria are presented in Section 3. A simulation study is summarized in Section 4. We close with a discussion in Section 5.

2 Dose-Outcome Model

The model used by DTM2 exploits the idea underlying the multivariate probit model, introduced by Ashford and Sowden (1970). A vector of unobserved, correlated latent multivariate normal variables is defined to induce association among a vector of observed binary variables, by defining each observed variable as the indicator that its corresponding latent variable is greater than 0. The DTM2 model is an elaboration of a multivariate probit model that includes hierarchical structures. It provides a computationally feasible basis for the task at hand. We will exploit the MCMC methods for computing posteriors for latent variable models provided by Albert and Chib (1993) and developed further by Chib and Greenberg (1998) for posterior computation via Gibbs sampling.

Let n_t denote the number of patients accrued and given at least one cycle of treatment up to trial (calendar) time t, and index patients by i = 1, ..., n_t. Our dose-outcome model does not depend on numerical dose values, and we identify the doses under consideration by the indexes 1, . . . , m. For treatment cycle c = 1, 2, denote the i^th patient's dose by d_i,c, outcome indicators Y_i,c ∈ {0, 1} for toxicity and Z_i,c ∈{0, 1} for efficacy, and the 2-cycle vectors d_i = (d_i,₁ d_i,₂), Y i = (Y_i,₁, Y_i,₂), and Z_i = (Z_i,₁, Z_i,₂). Let $X_{t} = {(Y_{i}, Z_{i}, d_{i}) : i = 1, \dots, n_{t}}$ denote the observed data from all patients at t. Although the doses d_i are actions rather than parameters or random outcomes, throughout the manuscript we will abuse probability notation slightly by including them to the right of the conditioning bar. Since actual clinical decision rules must allow a given patient's therapy to be terminated, e.g. if the patient is cured, has progressive disease, or unacceptable toxicity (cf. Wang, et al., 2012), here possible actions in cycle c may be either a dose, d_i,c, or the decision to give no treatment, which we index by 0. We denote the possible actions in either cycle by $D = {0, 1, \dots, m}$

We construct a joint distribution for [Y i, Z_i | d_i] by defining these binary outcomes in terms of four real-valued latent variables, ξ_i = (ξ_i,₁, ξ_i,₂) for Y i and η_i = (η_i,₁, η_i,₂) for Z_i, with (ξ_i, η_i) following a multivariate normal distribution having means that vary with d_i. Denoting the indicator of event A by I(A), we assume Y_i,c = I(ξ_i,c > 0) and Z_i,c = I(η_i,c > 0), so the distribution of [Y iZ_i | d_i] is induced by that of [ξ_i, η_i | d_i]. The structure of our hierarchical model for two cycles is similar to the non-hierarchical model for multiple toxicities in one cycle of therapy used by Bekele and Thall (2004). To construct the model, we first define a conditional likelihood for the cycle-specific latent variable pairs [ξ_i,c,η_i,c | d_i,c], for c = 1, 2 by using patient-specific random effects (u_i, v_i) that characterize dependence among the outcomes between and within cycles. Denote the univariate normal distribution with mean μ and variance σ² by N(μ, σ²), with pdf ϕ(· | , μ, σ²).

We begin the construction by assuming the following Level 1 and Level 2 priors:

Level 1 Priors on the Latent Variables. For patient i in cycle c given dose d_i,c = d,

ξ_{i, c} ∣ u_{i}, {\overset{‒}{ξ}}_{c, d}, σ_{ξ}^{2} \sim N ({\overset{‒}{ξ}}_{c, d} + u_{i}, σ_{ξ}^{2}) and η_{i, c} ∣ v_{i}, {\overset{‒}{η}}_{c, d}, σ_{η}^{2} \sim N ({\overset{‒}{η}}_{c, d} + v_{i}, σ_{η}^{2}),

(1)

with ξ_i and η_i conditionally independent given (u_i, v_i) and fixed $σ_{ξ}^{2}$ and $σ_{η}^{2}$ . Level 2 priors of the patient effects, (u_i, v_i), and mean cycle-specific dose effects, $({\overset{‒}{ξ}}_{c, d}, {\overset{‒}{η}}_{c, d})$ , are as follows :

Level 2 Priors on (u_i, v_i). For patients i = 1... , n,

u_{i}, v_{i} ∣ ρ, τ^{2} \overset{i i d}{\sim} {MVN}_{2} (0_{2}, Σ_{u, v}),

(2)

where MVN_k denotes a k-variate normal distribution, 0₂ = (0, 0) and Σ_u,v is the 2×2 matrix with all diagonal elements τ² and all off-diagonal elements ρτ². The hyperparameters, ρ ∈ (−1, 1) and τ², are fixed. This Level 2 prior induces association, parameterized by (ρ, τ²), among (ξ_i,₁, η_i,₁, ξ_i,₂, η_i,₂) via the latent variable model (1), and thus among the corresponding toxicity and efficacy outcomes, (Y_i,₁, Z_i,₁, Y_i,₂, Z_i,₂).

Level 2 Priors on $({\overset{‒}{ξ}}_{c, d}, {\overset{‒}{η}}_{c, d})$ . Let ${\overset{‒}{ξ}}_{c} = ({\overset{‒}{ξ}}_{c, 1}, \dots, {\overset{‒}{ξ}}_{c, m})$ and ${\overset{‒}{η}}_{c} = ({\overset{‒}{η}}_{c, 1}, \dots, {\overset{‒}{η}}_{c, m})$ . Denote by ${\overset{‒}{ξ}}_{c, - d}$ the vector ${\overset{‒}{ξ}}_{c}$ with ${\overset{‒}{ξ}}_{c, d}$ deleted, and let ${\overset{‒}{η}}_{c, - d}$ denote ${\overset{‒}{η}}_{c}$ with ${\overset{‒}{η}}_{c, d}$ deleted. We assume

\begin{matrix} p ({\overset{‒}{ξ}}_{c, d} ∣ {\overset{‒}{ξ}}_{c, - d}) & \propto ϕ ({\overset{‒}{ξ}}_{c, d} ∣ ξ_{c, 0}, σ_{ξ_{c, 0}}^{2}) 1 ({\overset{‒}{ξ}}_{c, d - 1} < {\overset{‒}{ξ}}_{c, d} < {\overset{‒}{ξ}}_{c, d + 1}) \\ p ({\overset{‒}{η}}_{c, d} ∣ {\overset{‒}{η}}_{c, - d}) & \propto ϕ ({\overset{‒}{η}}_{c, d} ∣ η_{c, 0}, σ_{η_{c, 0}}^{2}) 1 ({\overset{‒}{η}}_{c, d - 1} < {\overset{‒}{η}}_{c, d} < {\overset{‒}{η}}_{c, d + 1}) . \end{matrix}

(3)

The order constraints ensure that ξ_i,c and η_i,c increase stochastically in dose, hence the per-cycle probabilities of toxicity and efficacy both increase with dose. If this assumption is not appropriate, such as trials of biologic agents, these constraints may be dropped.

Collecting terms from (1), (2), and (3), the 12 fixed parameters that determine all of the Level 1 and Level 2 priors are $\tilde{θ} = (ξ_{0}, η_{0}, σ_{ξ_{0}}^{2}, σ_{η_{0}}^{2}, σ_{ξ}^{2}, σ_{η}^{2}, τ^{2}, ρ)$ where ξ₀ = (ξ_1,0, ξ_2,0), η₀ = (η_1,0, η_2,0, $σ_{ξ_{0}}^{2} = (σ_{ξ_{1, 0}}^{2}, σ_{ξ_{2, 0}}^{2})$ and $σ_{η_{0}}^{2} = (σ_{η_{1, 0}}^{2}, σ_{η_{2, 0}}^{2})$ . Denote $\overset{‒}{ξ} = ({\overset{‒}{ξ}}_{1}, {\overset{‒}{ξ}}_{2})$ , $\overset{‒}{η} = ({\overset{‒}{η}}_{1}, {\overset{‒}{η}}_{2})$ , $μ_{d_{i}} = ({\overset{‒}{ξ}}_{1, d_{i, 1}}, {\overset{‒}{ξ}}_{2, d_{i, 2}}, {\overset{‒}{η}}_{1, d_{i, 1}}, {\overset{‒}{η}}_{2, d_{i, 2}})$ , and the covariance matrix

Σ_{ξ, η} = [\begin{matrix} σ_{ξ}^{2} + r^{2} & r^{2} & ρ τ^{2} & ρ τ^{2} \\ σ_{ξ}^{2} + τ^{2} & ρ τ^{2} & ρ τ^{2} \\ σ_{η}^{2} + τ^{2} & τ^{2} \\ σ_{η}^{2} + τ^{2} \end{matrix}] .

The joint disttibution of $[ξ_{i}, η_{i} ∣ d_{i}, \overset{‒}{ξ}, \overset{‒}{η}, \tilde{θ}]$ is computed by integrating over (u_i, v_i), yielding

ξ_{i}, η_{i} ∣ d_{i}, \overset{‒}{ξ}, \overset{‒}{η}, \tilde{θ} \overset{i i d}{\sim} {MVN}_{4} (μ_{d_{i}}, Σ_{ξ, η}) .

(4)

The mean vector μ_d is a function of the dose levels, and does not depend on numerical dose values. The hyperparameters, τ² and ρ, induce associations between cycle 1 and cycle 2 and between efficacy outcomes and toxicity outcomes. For example, if −1 < ρ < 0 (0 < ρ < 1), this model implies that efficacy and toxicity are negatively (positively) associated, that is, higher (lower) toxicity is associated with lower efficacy.

Denote $θ = (\overset{‒}{ξ}, \overset{‒}{η})$ . Integrating over (u_i, v_i) and suppressing $\tilde{θ}$ and patient index i, the joint likelihood for the observables of a patient is given by

\begin{matrix} p (y, z ∣ d, θ) & = \Pr (Y_{1} = y_{1}, Y_{2} = y_{2}, Z_{1} = z_{1}, Z_{2} = z_{2} ∣ d, θ) \\ = \Pr (γ_{1, y_{1}} \leq ξ_{1} < γ_{1, y_{1} + 1}, γ_{1, y_{2}} \leq ξ_{2} < γ_{1, y_{2} + 1}, γ_{2, z_{1}} \leq η_{1} < γ_{2, z_{1} + 1}, γ_{2, y_{2}} \leq η_{2} < γ_{2, z_{2} + 1} ∣ d, θ) \\ = \int_{γ_{1, y_{1}}}^{γ_{1, y_{1} + 1}} \int_{γ_{1, y_{2}}}^{γ_{1, y_{2} + 1}} \int_{γ_{2, z_{1}}}^{γ_{2, z_{1} + 1}} \int_{γ_{2, y_{2}}}^{γ_{2, y_{2} + 1}} ϕ (ξ, η ∣ μ_{d}, Σ_{ξ, η}) d η_{2} d η_{1} d ξ_{2} d ξ_{1}, \end{matrix}

where the cuto vectors (γ₁₀, γ₁₁, γ₁₂) for Y_c and (γ₂₀, γ₂₁, γ₂₂) for Z_c both are (−∞, 0, ∞), for c = 1, 2. The conditional distribution of the cycle 2 outcomes (Y₂Z₂) given the cycle 1 outcomes (Y₁ = y₁Z₁ = z₁) is

p (y_{2}, z_{2} ∣ y_{1}, z_{1}, d, θ) = \Pr (Y_{2} = y_{2}, Z_{2} = z_{2} ∣ Y_{1} = y_{1}, Z_{1} = z_{1}, d) = \Pr (γ_{1 y_{2}} \leq ξ_{2} < γ_{1, y_{2} + 1}, γ_{2 z_{2}} \leq η_{2} < γ_{2, z_{2} + 1} ∣ γ_{1 y_{1}} \leq ξ_{1} < γ_{1, y_{1} + 1}, γ_{2, z_{1}} \leq η_{i 1} < γ_{2, z_{1} + 1}, d) = \frac{p (y, z ∣ d, θ)}{p (y_{1}, z_{1} ∣ d_{1}, θ)},

(5)

where the cycle 1 bivariate marginal is computed as the double integral

p (y_{1}, z_{1} ∣ d_{1}, θ) = \int_{γ_{1 y_{1}}}^{γ_{1, y_{1} + 1}} \int_{γ_{2 z_{1}}}^{γ_{2, z_{1} + 1}} ϕ ([ξ_{1}, η_{1}] ∣ μ_{d_{1}}^{1}, Σ_{ξ, η}^{1}) d η_{1} d ξ_{1}

(6)

with

μ_{d_{1}}^{1} = [\begin{matrix} {\overset{‒}{ξ}}_{1, d_{1}} \\ {\overset{‒}{η}}_{1, d_{1}} \end{matrix}] and Σ_{ξ, η}^{1} = [\begin{matrix} σ_{ξ}^{2} + τ^{2} & ρ τ^{2} \\ ρ τ^{2} & σ_{η}^{2} + τ^{2} \end{matrix}] .

3 Decision Criteria

3.1 Adaptive Dose Selection

To define our decision rules, we distinguish between doses and actions. The action in cycle 1 either chooses a dose from the set {1. . . , m} of doses under consideration or makes the decision to not give the patient any treatment. Recall that we denote this decision by 0 for convenience, and we will denote the possible actions by $D = {0, 1, \dots, m}$ . If the optimal cycle 1 action is d₁ = 0 at any point in the trial then the study is terminated. Otherwise, the patient receives d₁ for cycle 1 and $d_{2} \in D$ for cycle 2, where d₂ is a function of the cycle 1 dose and outcomes, (d₁Y₁Z₁), and the current data, $X$ from all patients. For example, if the cycle 1 dose d₁ produced toxicity, Y₁ =1, then a possible cycle 2 action is $d_{2} (d_{1}, 1, 1, X) = d_{1} - 1$ if Z₁ = 1, and $d_{2} (d_{1}, 1, 0, X) = 0$ if Z₁ = 0. Similarly, if d₁ = 1, the lowest dose level, and Y₁ = 1 was observed, then it may be that $d_{2} (d_{1}, 1, Z_{1}, X) = 0$ regardless of whether Z₁ = 0 or 1. In general, a two-cycle regime is far more general than a dose pair chosen from $D \times D$ and a regime for which d₂ ignores the patient's cycle 1 dose and outcomes, (d₁, Y₁, Z₁), is unlikely to be optimal. In the DTR literature, (d₁, Y₁, Z₁) would be called “tailoring variables.” Optimizing d = (d₁d₂) is the focus of our design.

3.2 Objective Function

We construct an objective function by using the basic ideas in Bellman (1957), starting in cycle 2 and working backwards. Our method relies on per-cycle utilities U(y, z) that quantify the desirability of outcome (Y_c, Z_c) = (y, z) in cycle c = 1 or 2. Depending on the level of marginalization and aggregation over cycles and patients, many variations of the objective function defined below may be obtained. We will generically refer to all of these as “utility” or “objective function” when we want to highlight that a particular expected utility is a function of known quantities and the action only, and thus can be used to select the optimal action. For convenience, one may fix U(0, 1) = 100 and U(1, 0) = 0, which are the respective utilities for the best and worst possible outcomes, and elicit the intermediate values U(0, 0) and U(1, 1) from the physicians planning the trial, although any function with U(1, 0) < U(1, 1)U(0, 0) < U(0, 1) may be used. In our simulations, we will use the numerical utilities U(1, 0) = 0, U(0, 0) = 35, U(1, 1) = 65, U(0, 1) = 100.

In the language of Q-learning (Watkins, 1989; Murphy, 2005; Zhao, et al., 2011), for cycle c, d_c is the “action” and U(Y_c, Z_c) is the “reward,” with (d₁, Y₁, Z₁) the “state” prior to taking action d₂ in cycle 2. Ideally, baseline covariates such as age, disease severity, or performance status would comprise the patient's state for c = 1, although in practice even in the single-cycle phase I-II setting choosing covariate-specific doses is quite complicated (cf. Thall, Nguyen, and Estey, 2008).

Given a patient's cycle 1 data (d₁, Y₁, Z₁), the mean utility of action d₂ in cycle 2 is

\begin{matrix} Q_{2} (d_{2}, d_{1}, Y_{1}, Z_{1}, θ) & = E {U (Y_{2}, Z_{2}) ∣ d_{2}, d_{1}, Y_{1}, Z_{1}, θ} \\ = \sum_{y_{2} = 0}^{1} \sum_{z_{2} = 0}^{1} U (y_{2}, z_{2}) p (y_{2}, z_{2} ∣ d_{2}, d_{1}, Y_{1}, Z_{1}, θ), \end{matrix}

(7)

and we define the cycle 2 objective function

q_{2} (d_{2}, d_{1}, Y_{1}, Z_{1}, X) = E {Q_{2} (d_{2}, d_{1}, Y_{1}, Z_{1}, θ) ∣ d_{2}, d_{1}, Y_{1}, Z_{1}, X} .

(8)

If d₂ = 0, i.e., no treatment in cycle 2, then p(Y₂ = 0Z₂ = 0 | d₂ = 0d₁, Y₁, Z₁, θ) = 1 and $q_{2} (d_{2} = 0, d_{1}, Y_{1}, Z_{1}, X) = U (0, 0)$ , the utility of having neither toxicity nor efficacy. If d₂ ≠= 0, then $q_{2} (d_{2}, d_{1}, Y_{1}, Z_{1}, X)$ is a posterior expected utility of giving dose d₂ in cycle 2 given (d₁, Y₁, Z₁). This underscores the importance of requiring U(0, 0) > U(1, 0), that it is more desirable to have neither toxicity nor efficacy than to have toxicity and no efficacy. Given (d₁, Y₁, Z₁) and current data $X$ , the optimal cycle 2 action, $d_{2}^{o p t} (d_{1}, Y_{1}, Z_{1}, X) = {argmax}_{d_{2}} q_{2} (d_{2}, d_{1}, Y_{1}, Z_{1}, X)$ , subject to dose acceptability rules discussed in Section 3.3.

Next, we move backward to the cycle 1 optimization assuming that $q_{2} (d_{2}^{o p t}, d_{1}, Y_{1}, Z_{1}, X)$ is known for all (d₁, Y₁, Z₁). The expected utility of giving dose d₁ given is θ

Q_{1} (d_{1}, θ) = E {U (Y_{1}, Z_{1}) ∣ d_{1}, θ} = \sum_{y_{1} = 0}^{1} \sum_{z_{1} = 0}^{1} U (y_{1}, z_{1}) p (y_{1}, z_{1} ∣ d_{1}, θ) .

To define the overall objective function, we discount the cycle 2 payo using the fixed parameter 0 < λ < 1, as is done traditionally in Q-learning. The expected entire future utility of giving dose d₁ in cycle 1, assuming that $d_{2}^{o p t}$ will be taken in cycle 2, is

\begin{matrix} q_{1} (d_{1}, X) & = E [E {U (Y_{1}, Z_{1}) + λ q_{2} (d_{2}^{o p t} (d_{1}, Y_{1}, Z_{1}, X), d_{1}, Y_{1}, Z_{1}, X) ∣ θ, d_{1}} ∣ d_{1}, X] \\ = E {Q_{1} (d_{1}, θ) ∣ d_{1}, X} + λ \sum_{y_{1} = 0}^{1} \sum_{z_{1} = 0}^{1} q_{2} (d_{2}^{o p t} (d_{1}, y_{1}, z_{1}, X), d_{1}, y_{1}, z_{1}, X) p (y_{1}, z_{1} ∣ d_{1}, X), \end{matrix}

(9)

where $p (y_{1}, z_{1} ∣ d_{1}, X)$ is the posterior expected density for (y₁, z₁). Letting $q_{1} (d_{1}, X) = (1 + λ) U (0, 0)$ for d₁ = 0, the optimal cycle 1 action, $d_{1}^{o p t}$ , maximizes this quantity over $D$ .

Maximizing q₁ and q₂ yields the optimal actions $d^{o p t} = (d_{1}^{o p t}, d_{2}^{o p t})$ , where $d_{1}^{o p t}$ is either a dose or 0, $d_{2}^{o p t}$ is applicable only when $d_{1}^{o p t} \neq 0$ , $d_{2}^{o p t}$ is a function of $(d_{1}^{o p t}, Y_{1}, Z_{1})$ , and both are functions of $X$ . If new data from other patients are obtained between administration of $d_{1}^{o p t}$ and optimization of $q_{2} (d_{2} (d_{1}), X)$ , so $X$ changes while waiting to evaluate the patient's cycle 1 outcomes (Y₁Z₁), then the posterior and hence the patient's $d_{2}^{o p t}$ might change. This may be made precise by elaborating the notation to account for relationships between timing of the patient's cycles and calendar time. We avoid this complexity since the point is clear.

3.3 Dose Acceptability

We include dose acceptability criteria, motivated by ethical considerations, since maximizing a posterior utility-based objective function, per se, is not enough to allow a dose to be administered. The problem is that, while the optimal policy under a given utility function is mathematically well-defined, it is only an indirect solution of an optimization in expectation. An important case is that where no dose is acceptably safe and e cacious in either cycle 1 or cycle 2, consequently it is not ethical to treat a patient using any dose and the trial must be stopped. Moreover, in some applications, the decision-theoretic solution might turn out to have undesirable features not anticipated when specifying the outcomes, model, and utility function. This problem is one reason why many physicians are reluctant to use formal decision-theoretic methods for clinical decision making. Spiegelhalter et al. (2004, chapter 3.14) discuss this issue. We mitigate these concerns by adding three additional dose acceptability criteria that restrict the set of solutions when maximizing (8) and (9).

The first constraint is that an untried dose level may not be skipped when escalating. This says that one does not fully trust decisions based on any assumed model and decision criteria, especially with the small amounts of data available early in the trial. Let $d_{1}^{M}$ denote the highest dose among those that have been tried in cycle 1 and $d_{2}^{M}$ the highest dose among those that have been tried in either cycle. The search for optimal actions is constrained so that $1 \leq d_{1} \leq \min (d_{1}^{M} + 1, m)$ and $1 \leq d_{2} \leq \min (d_{2}^{M} + 1, m)$ . The second constraint does not allow escalating a patient's dose in cycle 2 if toxicity was observed in cycle 1, Y₁ = 1. The third criterion, defined in terms of expected utility, is to avoid giving undesirable dose pairs. For cycle 2, we say that action d₂ is unacceptable if it violates the no-skipping rule, escalates after Y₁ = 1, or $q_{2} (d_{2}, d_{1}, Y_{1}, Z_{1}, X) < U (0, 0)$ , that is, the posterior expected utility of treating the patient with d₂ given $(d_{1}, Y_{1}, Z_{1}, X)$ is smaller than that obtained by not treating the patient at all. We denote the set of acceptable cycle 2 doses for a patient with cycle 1 data (d₁, Y₁, Z₁) by $A_{2} (d_{1}, Y_{1}, Z_{1}, X)$ . Thus, a given d₂ may be acceptable for some (d₁, Y₁, Z₁) but not acceptable for others.

Table 1 illustrates true expected cycle 2 utilities of d₂ conditional on (d₁, Y₁, Z₁) using simulation scenario 4, discussed below in Section 4. Assume that θ^true and ${\tilde{θ}}^{t r u e}$ are known, and suppress ${\tilde{θ}}^{t r u e}$ . The Q₂(d₂ , d₂, Y₁, Z₁, θ^true) in Table 1 is similar to (7). For example, the values of Q₂(a₂, d₁ = 3, Y₁ = 0, Z₁ = 0, θ^true) given in the first row of the third box from the top are (35.98, 39.49, 39.70, 36.30, 27.17) for d₂ =(1, 2, 3, 4, 5), respectively. Since Q₂(5, 3, 0, 0, θ^true) < U(0, 0), d₂ = 5 is not acceptable. The other dose levels are acceptable, so $A_{2} (3, 0, 0, θ^{t r u e}) = {1, 2, 3, 4}$ , with $d_{2}^{o p t} (d_{1} = 3, Y_{1} = 0, Z_{1} = 0, θ^{t r u e}) = 3$ . When (d₁, Y₁, Z₁) = (3, 1, 0), no d₂ ∈ {1... , m} produces expected utility greater than U(0, 0), and d₂ = 3, 4, 5 are not allowed due to the no-escalation-after toxicity rule. Thus, $A_{2} (3, 0, 0, θ^{t r u e})$ is the empty set and $d_{2}^{o p t} (d_{1} = 3, Y_{1} = 1, Z_{1} = 0, θ^{t r u e}) = 0$ . The last column of the table lists $d_{2}^{o p t} (d_{1}, Y_{1}, Z_{1}, θ^{t r u e})$ for all combinations of (d₁, Y₁, Z₁).

Table 1.

True expected utilities in Scenario 4 assuming that θ^true is known. q₁(d₁, θ^true) = the true expected total utility. Entries under d₂ in columns 4–8 are the true expected cycle 2 utilities, q₂(d₂, θ^true). Expected utilities in grey are those for d₂ violating the no-escalation-after- Y₁ = 1 rule. Expected utilities in italics are those for unacceptable d₂ based on the utility-based criterion.

d ₁	q₁(d₁, θ^true)	(Y₁,Z₁)						$d_{2}^{o p t}$
d ₁	q₁(d₁, θ^true)	(Y₁,Z₁)	1	2	3	4	5	$d_{2}^{o p t}$
1	66.19	(0,0)	37.32	41.40	41.85	38.54	29.66	3
		(0,1)	48.18	55.07	56.80	54.00	45.04	3
		(1,0)	30.60	33.58	33.07	29.57	23.27	NT
		(1,1)	41.20	47.16	47.96	44.80	38.23	NT
2	73.38	(0,0)	36.53	40.33	40.66	37.30	28.38	3
		(0,1)	45.06	51.39	52.92	50.08	41.15	3
		(1,0)	30.13	32.88	32.26	28.70	22.28	NT
		(1,1)	38.36	43.71	44.31	41.12	34.54	1
3	80.61	(0,0)	35.98	39.49	39.70	36.30	27.17	3
		(0,1)	43.31	49.20	50.62	47.73	38.69	3
		(1,0)	30.29	32.82	32.12	28.43	21.53	NT
		(1,1)	37.34	42.27	42.75	39.46	32.48	2
4	69.71	(0,0)	37.17	40.90	41.40	38.19	28.65	3
		(0,1)	44.37	50.46	52.14	49.47	40.09	3
		(1,0)	32.13	34.89	34.39	30.68	22.91	NT
		(1,1)	39.17	44.32	45.02	41.74	33.91	3
5	68.16	(0,0)	38.08	42.02	42.80	39.83	30.06	3
		(0,1)	45.26	51.52	53.46	51.04	41.51	3
		(1,0)	33.00	35.88	35.52	31.85	23.69	2
		(1,1)	40.05	45.32	46.16	42.95	34.75	3

Open in a new tab

To identify acceptable cycle 1 dose levels, we assume that $d_{2}^{o p t} (d_{1}, Y_{1}, Z_{1}, X)$ is chosen from $A_{2} (d_{1}, Y_{1}, Z_{1}, X)$ . For cycle 1, we say that action d₁ is unacceptable if it violates the no-skipping rule or satisfies the utility-based criterion

q_{1} (d_{1}, X) < U (0, 0) + λ U (0, 0) .

(10)

This says that d₁ is unacceptable in cycle 1 if it yields a smaller posterior expected utility than not treating the patient. We denote the set of acceptable cycle 1 doses by $A_{1} \subset D$ . Note that, while $A_{1} (X)$ is adaptive between patients since it is a function of other patients’ data, $A_{2} (d_{1}, Y_{1}, Z_{1}, X)$ is adaptive both between and within patients.

The second column of Table 1 illustrates true expected total utilities over two cycles under simulation scenario 4. Assuming that ^trueare known, the column gives values of

E {U (Y_{1}, Z_{1}) + λ Q_{2} (d_{2}^{o p t} (d_{1}, Y_{1}, Z_{1}, θ^{t r u e}), d_{1}, Y_{1}, Z_{2}, θ^{t r u e}) ∣ θ^{t r u e}, d_{1}}

where $d_{2}^{o p t} (d_{1}, Y_{1}, Z_{1}, θ^{t r u e})$ can be derived in the last column of the table. The true expected total utility satisfies (10) for all the $d_{1} \in D$ hence all d₁ are acceptable. From the table, the optimal pair of actions is $d_{1}^{o p t} = 3$ and $d_{2}^{o p t} = 3, 3, 0$ and 2 for (Y₁, Z₁) = (0, 0), (0, 1), (1, 0) and (1, 1), respectively, listed in the fourth row of Table 3.

Table 3.

Optimal treatment sequences under the eight simulation scenarios using the simulation truth, θ^true. Assuming that $d_{1}^{o p t}$ is given, $d_{2}^{o p t}$ is searched for each cycle 1 outcome combination.

Scenario	$d_{1}^{o p t}$	$d_{2}^{o p t}$
Scenario	$d_{1}^{o p t}$	(0,0)	(0,1)	(1,0)	(1,1)
1	0	0	0	0	0
2	5	5	5	4	4
3	3	4	4	2	2
4	3	3	3	0	0
5	0	0	0	0	0
6	5	4	4	4	4
7	4	4	4	3	3
8	5	5	3	4	4

Open in a new tab

3.4 Adaptive Randomization

While d^opt yields the best clinical outcomes, the reliability of the process over the entire trial can be improved by including adaptive randomization (AR) among d giving values of the objective function near the maximum at d^opt. While this may seem counter-intuitive, using AR decreases the probability of getting stuck at a suboptimal d and also has the effect of treating more patients at doses having larger utilities, on average. The problem that a “greedy” search algorithm may get stuck at suboptimal actions, and the simple solution of introducing some additional randomness into the search process, have been known for years in the optimization literature (cf. Tokic, 2010). However, this has been dealt with only very recently in dose-finding (Bartroff and Lai, 2010; Azriel, Mandel and Rinott, 2011; Thall and Nguyen, 2012; Braun, Kang and Taylor, 2012).

To implement AR, we first define ε_i to be a function decreasing in patient index i, and denote ε = (ε₁... , ε_n). We define the set of ε_i-optimal doses for cycle 1 to be

D_{i, 1} = {d_{1} : ∣ q_{1} (d_{1, i}^{o p t}, X) - q_{1} (d_{1}, X) ∣ < ∊_{i}, d_{1} \in A_{i, 1} (X)} .

The set $D_{i, 1} (X)$ contains d₁ in $A_{i, 1} (X)$ having posterior mean utility within ε_i of the maximum posterior mean utility. Similarly, we define the set of (ε_i/2)-optimal doses for cycle 2 given (d_i,₁, Y_i,₁, Z_i,₁) to be

D_{i, 2} = {d_{2} : ∣ q_{2} (d_{i, 2}^{o p t} (d_{i, 1}, Y_{i, 1}, Z_{i, 1}, X), Y_{i, 1}, Z_{i, 1}, d_{i, 1}, X) - q_{2} (d_{2}, d_{i, 1}, Y_{i, 1}, Z_{i, 1}, X) ∣ < ∊_{i} ∕ 2, d_{2} \in A_{i, 2} (d_{i, 1}, Y_{i, 1}, Z_{i, 1}, X)} .

We use ε_i/2 because $q_{2} (d_{2}, d_{1}, Y_{1}, Z_{1}, X)$ is the posterior expected utility for cycle 2 only. For cycles c = 1, 2, patients are randomized fairly among the doses in $D_{i, c}$ , which we call AR(ε). In practice, the numerical values of ε_i depend on the numerical range of U(y, z), and must be determined by preliminary trial simulations.

3.5 Trial Design

Our illustrative trial studied in the simulations is constructed to mimic a typical phase III chemotherapy trial with five dose levels, but accounting for two cycles of therapy. The maximum sample size is n = 60 patients with a cohort size of 2. Based on preliminary simulations, we set ε_i = 20 for the first 10 patients, ε_i = 15 for the next 10 patients and ε_i = 10 for the remaining 40 patients. An initial cohort of 2 patients is treated at the lowest dose level in cycle 1, their cycle 1 toxicity and efficacy outcomes are observed, the posterior of θ is computed, and actions are taken for cycle 2 of the initial cohort. Posterior computations are described in the supplementary material. If $D_{i, 2} = {0}$ then patient i does not receive a second cycle of treatment. If $D_{i, 2} \neq {0}$ , then AR(ε) is used to choose an action for cycle 2 from $D_{i, 2}$ . When (Y₂, Z₂) are observed from cycle 2, the posterior of θ is updated. The second cohort is not enrolled until the first cohort has been evaluated for cycle 1. For all subsequent cohorts, the posterior is updated after the outcomes of all previous cohorts are observed, and the posterior expected utility, $q_{i, 1} (d_{1}, X)$ , is computed using λ = 0.8. If $D_{i, 1} (X) = \emptyset$ for any interim $X$ then $d_{i, 1} (X) = 0$ , and the trial is terminated. If $D_{i, 1} \neq \emptyset$ , then a cycle 1 dose is chosen from $D_{i, 1}$ using AR(ε). Once the outcomes in cycle 1 are observed, the posterior is updated. Using $(d_{i, 1}, Y_{i, 1}, Z_{i, 1}, X)$ and ε_i, $D_{i, 2}$ is searched. If $D_{i, 2}$ contains 0 only, then d_i₂ = 0 and a cycle 2 dose is not given to patient i. Otherwise, d_i₂ is selected from $D_{i, 2} (d_{i, 1}, Y_{i, 1}, Z_{i, 1}, X)$ using AR(ε). All adaptive decisions are made based on the most recent data $X$ hence a new d_select may be chosen utilizing using partial data from recent patients for whom (Y₁, Z₁) but not (Y₂, Z₂) have been evaluated. The above steps are repeated until either the trial has been stopped early or N = 60 has been reached, and in this case a final optimal two-cycle regime d_select is chosen. The aim is that d_select should be the true $d_{1}^{o p t} \in {1, \dots, m}$ and $d_{2}^{o p t} (d_{1}^{o p t}, y_{1}, z_{1}) \in D$ . The recommendation for phase III is d_select, rather than a single “optimal” dose as is done conventionally.

We compared DTM2 design with four other designs: two-cycle extensions of the continual reassessment method (CRM, O'Quigley, et al., 1990), a Bayesian phase I-II method using toxicity and efficacy odds ratios (TEOR, Yin et al., 2006), and two (3+3) methods. One (3+3) method implicitly targets a dose with P(Y₁ = 1) ≤ 0.17, called (3+3)a, and the other implicitly targets a dose with P(Y₁ = 1) ≤ 0.33, called (3+3)b. We extended each one-cycle method to account for a second cycle. For both (3+3) methods, we used thedeterministic rule in cycle 2 that if Y₁ = 1 then the dose is lowered by 1 level (d₂ = d₁ − 1) and if Y₁ = 0 then the first dose is repeated (d₂ = d₁). The (3+3)a method, coupled with this deterministic rule for cycle 2, is a very commonly used method in actual phase I clinical trials.

For cycle 1 in the extended CRM (ECRM), we assumed the usual model $\Pr (Y_{1} = 1 ∣ d_{1}) = p_{d_{1}}^{\exp (α)}$ and α ~ N(0, 2) where 0 < p₁ < ... < p₅ < 1 are fixed values, sometimes called the model's “skeleton.” We calibrated the skeleton using the “getprior” subroutine in the package “dfcrm”, setting the target toxicity probability to be 0.30, the prior guess of maximum tolerated dose 4, and the desired halfwidth of the indi erence intervals 0.05 (Cheung, 2011). The resulting skeleton is (p₁. . . , p₅) = (0.063, 0.123, 0.204, 0.300, 0.402). Using this model, each patient's cycle 1 dose is that with posterior mean toxicity probability closest to 0.30. We implemented this using the R function,“crm”in dfcrm, but also imposing the no-skipping rule for cycle 1. To determine a cycle 2 dose, we used the same deterministic rule as for the extended (3+3) methods, with one more safety requirement. For ECRM, a cycle 2 dose is not given if $\Pr {\Pr (Y_{1} = 1 or Y_{2} = 1) > p_{T} ∣ X, d} > ψ_{T}$ , with p_T = 0.5 and φ_T = 0.9, assuming independence of P(Y₁ = 1) and P(Y₂ = 1) for simplicity. For example, following the deterministic rule, a patient treated in cycle 1 at d₁ may be treated at d₂ ∈ {d₁ − 1d₁} depending on the cycle 1 toxicity outcome. In particular, we repeat d₁ in cycle 2 if Y₁ = 0. If (d₁d₁) does not satisfy the safety requirement, then the cycle 2 treatment is not given to a patient with d₁ and Y₁ = 0. In addition, if the cycle 2 treatment is not allowed for any d₁ regardless of Y₁, that is, no (d₁d₂) with d₂ ∈ {d₁ − 1d₁} satisfies the safety rule, then we lower d₁ until the cycle 2 treatment is safe for either of Y₁ = 0 or Y₁ = 1.

We extended TEOR to 2-cycles similarly to ECRM, and named this ETEOR. For ETEOR, d₂ = 0 if $\Pr {\Pr (Y_{1} = 1 or Y_{2} = 1) > p_{T} ∣ X, d} > ψ_{T}$ or $\Pr {\Pr (Z_{1} = 1 and Z_{2} = 0) > p_{E} ∣ X, d} > ψ_{E}$ with p_T = 0.6, p_E = 0.8, and φ_T = φ_E = 0.9, assuming independence of the two cycles for simplicity. In addition, we calibrated the priors of Yin et al.(2006) using the concept of prior effective sample size (see the supplementary material for details), resulting in their $σ_{ϕ}^{2} = 20$ , $σ_{ψ}^{2} = 5$ and $σ_{θ}^{2} = 10$ . We set ${\overset{‒}{π}}_{T} = 0.35$ , ${\underline{π}}_{E} = 0.5$ , p^escl = 0.5, p* = 0.25 and q* = 0.1, and used $ω_{d}^{(3)}$ to select a dose for the next patient.

4 Simulation Study

4.1 Simulation Design

We simulated trials under each of eight dose-outcome scenarios using each of the five designs: DTM2, and the extended 3+3, ECRM, and ETEOR methods. The first seven scenarios were obtained using the model underlying DTM2, with the eighth obtained from a very different model to study robustness. To specify 2-cycle simulation scenarios, one must specify a joint distribution of (Y₁, Z₁) for each d₁ and a joint distribution of (Y₂Z₂) as a function of (d₁, d₂, Y₁, Z₁). For Scenarios 1 – 7, the marginal probabilities of toxicity and efficacy in each cycle are given in Table 2, and we simulated data using (4), with assumed values $σ_{ξ}^{2, t r u e} = σ_{η}^{2, t r u e} = {0.5}^{2}$ , $τ^{2, t r u e} = {0.3}^{2}$ and $ρ^{t r u e} = - 0.2$ . We determined ${\overset{‒}{ξ}}^{t r u e}$ and ${\overset{‒}{η}}^{t r u e}$ by matching $\Pr (Y_{c} < 0) = Φ (0 ∣ {\overset{‒}{ξ}}_{d_{c}}^{t r u e}, σ_{ξ}^{2, t r u e} + τ^{2, t r u e})$ and $\Pr (Z_{c} < 0) = Φ (0 ∣ {\overset{‒}{η}}_{d_{c}}^{t r u e}, σ_{η}^{2, t r u e} + τ^{2, t r u e})$ . We used $({\overset{‒}{ξ}}^{t r u e}, {\overset{‒}{η}}^{t r u e})$ and the assumed nuisance parameters to simulate (Y , Z), generated (Y₁, Z₁) from (6) using the true values of $σ_{ξ}^{2}$ , $σ_{η}^{2}$ , τ², ρ, and used (5) to generate (Y₂Z₂) conditional on (Y₁, Z₁).

Table 2.

True marginal probabilities of toxicity and efficacy under the first seven scenarios for the simulation studies, (p_T, p_E)^true for cycles 1 and 2. The true marginal probabilities of Scenario 8 are identical to those of Scenario 5.

Scenario	Cycles	Doses
Scenario	Cycles	1	2	3	4	5
1	1	(0.10, 0.02)	(0.15, 0.03)	(0.30, 0.05)	(0.45, 0.08)	(0.55, 0.10)
1	2	(0.13, 0.01)	(0.18, 0.02)	(0.33, 0.04)	(0.48, 0.07)	(0.58, 0.09)
2	1	(0.30, 0.50)	(0.32, 0.60)	(0.35, 0.70)	(0.38, 0.80)	(0.40, 0.90)
2	2	(0.33, 0.45)	(0.35, 0.55)	(0.38, 0.65)	(0.41, 0.75)	(0.43, 0.85)
3	1	(0.05, 0.10)	(0.18, 0.13)	(0.20, 0.25)	(0.40, 0.26)	(0.50, 0.27)
3	2	(0.30, 0.20)	(0.31, 0.35)	(0.32, 0.45)	(0.45, 0.65)	(0.65, 0.70)
4	1	(0.13, 0.06)	(0.15, 0.18)	(0.25, 0.35)	(0.55, 0.38)	(0.75, 0.40)
4	2	(0.20, 0.14)	(0.25, 0.23)	(0.35, 0.29)	(0.50, 0.32)	(0.80, 0.35)
5	1	(0.52, 0.01)	(0.61, 0.15)	(0.71, 0.20)	(0.82, 0.25)	(0.90, 0.30)
5	2	(0.53, 0.04)	(0.55, 0.20)	(0.62, 0.25)	(0.85, 0.27)	(0.95, 0.33)
6	1	(0.25, 0.10)	(0.28, 0.13)	(0.30, 0.25)	(0.40, 0.35)	(0.50, 0.45)
6	2	(0.30, 0.20)	(0.31, 0.35)	(0.32, 0.45)	(0.43, 0.65)	(0.56, 0.70)
7	1	(0.25, 0.10)	(0.28, 0.13)	(0.30, 0.25)	(0.40, 0.38)	(0.65, 0.40)
7	2	(0.30, 0.20)	(0.31, 0.35)	(0.32, 0.45)	(0.43, 0.65)	(0.66, 0.67)

Open in a new tab

To apply DTM2, we first calibrated the hyperparameters, $\tilde{θ}$ , using the notion of the expected sample size (ESS) as described in the supplementary material. We simulated 1,000 pseudo samples of θ , setting $σ_{ξ_{c 0}}^{2} = σ_{η_{c 0}}^{2} = 6^{2}$ , and computed probabilities of interest, such as P(Y_c = 0|d_c) and P(Z_c = 0|d_c), based on the pseudo samples, setting $σ_{ξ}^{2} = σ_{η}^{2} = 2^{2}$ , τ² = 1 and ρ = −0.5. We determined $\tilde{θ}$ that gave ESS ranging from 0.5 to 2 for the quantities of interest, and used this $\tilde{θ}$ to determine the prior for all simulations.

To study robustness, in Scenario 8 we simulated data using the following logistic regression model. The cycle 1 marginal probabilities (p_T (d₁)p_E(d₁)) are the same as those of Scenario 5, with outcomes generated using true probabilities

\begin{matrix} \Pr (Y_{1} = 1 ∣ d_{1}) & = p_{T} (d_{1}), \\ \Pr (Z_{1} = 1 ∣ d_{1}, Y_{1}) & = {logit}^{- 1} {logit (p_{E} (d_{1})) - 0.34 (Y_{1} - 0.5)}, \\ \Pr (Y_{2} = 1 ∣ d_{1}, d_{2}, Y_{1}, Z_{1}) & = {logit}^{- 1} {logit (p_{T} (d_{1})) + 0.33 d_{2} + 0.4 (Y_{1} - 0.5) - 0.3 (Z_{1} - 0.5)}, \\ \Pr (Z_{2} = 1 ∣ d_{1}, d_{2}, Y_{1}, Z_{1}, Y_{2}) & = {logit}^{- 1} {logit (p_{E} (d_{1})) + 0.76 d_{2} - 0.22 (Y_{1} - 0.5) + 2.4 (Z_{1} - 0.5) - 1.8 (Y_{2} - 0.5)} . \end{matrix}

Table 3 shows the optimal actions, $d_{1}^{o p t}$ and $d_{2}^{o p t} (d_{1}^{o p t}, Y_{1}, Z_{1})$ , under each scenario. For example, in Scenario 3, the optimal cycle 1 action is $d_{1}^{o p t} = 3$ , and the optimal cycle 2 action is $d_{2}^{o p t} (d_{1} = 3, Y_{1} = 0, Z_{1}) = 4$ and $d_{2}^{o p t} (d_{1} = 3, Y_{1} = 1, Z_{1}) = 2$ , regardless of Z₁.

4.2 Evaluation Criteria

We used the following summary statistics to evaluate each method's performance. Denote the outcomes of the n patients in a given trial who received at least one cycle of therapy by {(Y_i,₁, Z_i,₁), (Y_i,₂, Z_i,₂)i = 1..., n}, where n < 60 if the trial was stopped early. The empirical mean total utility for the n patients is $\overset{‒}{U} = \sum_{i = 1}^{n} {U (Y_{i, 1}, Z_{i, 1}) + U (Y_{i, 2}, Z_{i, 2})} ∕ n$ , where we set U(Y_i,₂Z_i,₂) = U(0, 0) for patients who did not receive a second cycle of therapy. Indexing the N simulated replications of the trial by r = 1..., N, the empirical mean total payo for all patents in the trial is $\bar{\overset{‒}{U}} = N^{- 1} \sum_{r = 1}^{N} {\overset{‒}{U}}^{(r)}$ . One may regard $\bar{\overset{‒}{U}}$ as an index of the ethical desirability of the method for the patients in the trial, given the utility U(y, z).

To evaluate performance in terms of future patient benefit, recall that DTM2 selects an optimal dose d_1,select for cycle 1, and an optimal function d_2,select for use in cycle 2 assuming that d_1,select is given, with d_2,select not defined if d_1,select = 0. Let θ^true be the true parameter vector assumed for a simulation scenario. Under θ^true, the expected payo in cycle 1 of giving action d_1,select to a future patient is Q_1,select(d_1,select) = E{U(Y₁, Z₁) | d₁ ,select^true}, for d_1,select ≠= 0. If the rule d₂ ,select is used, the expected payo in cycle 2 is

Q_{2, s e l e c t} (d_{2, s e l e c t}) = \sum_{(y_{1}, z_{1}) \in {0, 1}^{2}} E {U (Y_{2}, Z_{2}) ∣ d_{1, s e l e c t,} d_{2, s e l e c t} (y_{1}, z_{1}), y_{1}, z_{1}, θ^{t r u e}} \times p (y_{1}, z_{1} ∣ d_{1, s e l e c t}, θ^{t r u e}),

where E{U(Y₂, Z₂ | d_1,select, d_2,select(Y₁, Z₁)y true 1z₁ } becomes U(0, 0) if d₂ = 0. The total expected payo to a future patient treated using the selected regime d_select = (d_1,selectd_2,select) is defined to be Qselect(d_select) = Q_1,select(d_1,select) + λQ_2,select(d_2,select).

In addition to the criteria $\bar{\overset{‒}{U}}$ and Qselect, we evaluated the empirical toxicity and efficacy rates, defined as follows. Let δ_i,₂ = 1 if patient i was treated in cycle 2. For each simulated trial with each method, for patients who received at least one cycle of therapy, we computed

\Pr (Tox) = \frac{1}{n} \sum_{i = 1}^{n} \frac{1 (Y_{i, 1} = 1) + δ_{i, 2} 1 (Y_{i, 2} = 1)}{1 + δ_{i, 2}}

and

\Pr (Eff) = \frac{1}{n} \sum_{i = 1}^{n} \frac{1 (Z_{i, 1} = 1) + δ_{i, 2} 1 (Z_{i, 2} = 1)}{1 + δ_{i, 2}}

4.3 Simulation Results

A total of N = 1, 000 trials were simulated under each scenario for each of the five designs studied. The simulation results are summarized in Table 4. For the each of the five trial designs, Table 4 gives $\bar{\overset{‒}{U}}$ , Qselect, the empirical per-cycle toxicity and efficacy probabilities and the percent of trials completed with d_1,select ∈ {1. . . , m}. A di erence in $\bar{\overset{‒}{U}}$ or Q select that may be considered “large” is about 5, since this translates to, on average, a di erence of about .13 in Pr(Tox), while a di erence 1 may be considered small.

Table 4.

Simulation results for the proposed method DTM2, and for 2-cycle extensions (3+3)a, (3+3)b, ECRM of standard phase I methods, and the 2-cycle extension ETEOR of the phase I-II method of Li et al. (2006). $\bar{\overset{‒}{U}}$ = mean empirical utility, Q_select = mean payoff of d_select. Empirical percentages Pr(Tox) and Pr(Eff) include patients who received at least cycle 1 of treatment.

Scenarios	Criterion	DTM2	(3+3)a	(3+3)b	ECRM	ETEOR
1	Ū	66.48	59.27	58.81	56.56	61.90
	Q_select	57.77	54.36	52.30	51.75	52.43
	Pr(Tox)	0.25	0.22	0.23	0.27	0.25
	Pr(Eff)	0.07	0.03	0.03	0.05	0.07
	% completed trials	2.3	88.6	96.5	99.6	4.4
2	$\bar{\overset{‒}{U}}$	136.35	124.36	118.32	115.86	122.13
	Q_select	135.76	103.85	104.48	102.43	108.47
	Pr(Tox)	0.39	0.30	0.33	0.36	0.35
	Pr(Eff)	0.72	0.58	0.55	0.56	0.60
	% completed trials	99.4	39.2	64.7	95.6	78.2
3	$\bar{\overset{‒}{U}}$	94.23	85.95	85.75	89.93	88.04
	Q_select	84.39	77.98	80.14	78.43	78.47
	Pr(Tox)	0.38	0.27	0.27	0.30	0.26
	Pr(Eff)	0.38	0.27	0.27	0.33	0.28
	% completed trials	79.4	96.6	99.2	100.0	78.50
4	Ū	75.84	81.81	80.12	85.40	84.94
	Q_select	69.49	74.92	75.76	78.67	78.87
	Pr(Tox)	0.51	0.25	0.26	0.29	0.28
	Pr(Eff)	0.29	0.22	0.21	0.29	0.27
	% completed trials	96.7	83.2	94.7	99.4	81.7
5	Ū	66.65	52.87	52.72	50.41	NA
	Q_select	50.64	40.66	40.70	40.61	NA
	Pr(Tox)	0.84	0.43	0.44	0.53	NA
	Pr(Eff)	0.35	0.08	0.04	0.03	NA
	% completed trials	0.4	6.8	20.0	9.0	0.0
6	$\bar{\overset{‒}{U}}$	96.43	82.82	79.27	81.50	86.14
	Q_select	92.78	70.30	71.28	71.29	76.24
	Pr(Tox)	0.45	0.28	0.32	0.32	0.32
	Pr(Eff)	0.41	0.24	0.23	0.25	0.29
	% completed trials	90.9	51.5	74.7	97.6	58.3
7	Ū	91.88	82.66	79.31	80.99	86.32
	Q_select	84.91	70.28	71.27	71.16	76.34
	Pr(Tox)	0.47	0.28	0.32	0.32	0.32
	Pr(Eff)	0.38	0.24	0.22	0.25	0.29
	% completed trials	90.3	51.4	73.6	97.5	58.7
8	Ū	95.92	80.24	76.09	79.83	80.75
	Q_select	93.22	68.23	69.26	69.28	70.73
	Pr(Tox)	0.54	0.34	0.36	0.37	0.34
	Pr(Eff)	0.45	0.25	0.22	0.27	0.27
	% completed trials	84.7	49.4	73.2	97.6	57.9

Open in a new tab

In Scenario 1, Table 2 shows that doses d = 1,2,3, are safe, d = 4,5 are overly toxic, and all doses have very low efficacy. In this case there is little benefit from any dose. The value $\bar{\overset{‒}{U}} = 66.48$ for DTM2 in Table 4 is close to the utility U(0, 0) + 0.8U(0, 0) = 66 of (d₁ = 0d₂ = 0). The utility-based stopping rule of DTM2 correctly terminates the trial 97.7% of the time. Similarly, ETEOR terminates 95.6% of the trials before reaching the maximum number of patients due to the low efficacy rates. In contrast, the extended versions of the 3+3 and ECRM are very likely to run the trial to completion, essentially because they ignore efficacy. This provides a stark illustration of the fact that there is little benefit in exploring the safety of an agent if it is ine cacious, and methods that ignore efficacy are very likely to make this mistake. This has little to do with the 2-cycle structure, and it also can be seen when comparing one-cycle phase I-II (efficacy and toxicity) to phase I (toxicity only) methods. Thus, DTM2 and ETEOR are the only reasonable designs in scenario 1, and DTM2 is superior in terms of both $\bar{\overset{‒}{U}}$ and Qselect.

In Scenario 2, Table 2 shows that the toxicity probabilities increase with dose from 0.30 to 0.40 in cycle 1 and from 0.33 to 0.43 in cycle 2, while the efficacy probabilities are quite high in both cycles, increasing with dose from 0.50 to 0.90 in cycle 1 and from 0.45 to 0.85 in cycle 2. Thus, if one considers toxicity probabilities around 0.40 to be acceptable trade-o s for these very high efficacy rates, then there is a substantial payo for escalating to higher doses. The utility function reflects this, with the optimal action $d_{1}^{o p t} = 5$ and $d_{2}^{o p t} = (5, Y_{1}, Z_{1}) = 4 or 5$ (Table 3). DTM2 obtains larger values of $\bar{\overset{‒}{U}}$ and Qselect due to much larger Pr(Eff) and slightly larger Pr(Tox), compared to all of the four methods.

In Scenario 3, $d_{1}^{o p t} = 3$ , with $d_{2}^{o p t} = 4$ if Y₁ = 0 in cycle 1 and $d_{2}^{o p t} = 2$ if Y₁ = 1 (Table 3). This illustrates the within-patient adaptation of DTM2. The (3+3)a, (3+3)b, and ECRM methods select $d_{1}^{o p t} = 3$ often since the toxicity probability of d₁ = 3 is close to 0.30, but they never select $d_{2}^{o p t} = 2$ for patients with (d₁, Y₁) = (3, 0) because their deterministic rules ignore Z₁ and do not allow escalation of dose levels for cycle 2 even with Y₁ = 0. Again, DTM2 achieves the largest $\bar{\overset{‒}{U}}$ , Qselect, and Pr(Eff), with slightly larger Pr(Tox).

Scenario 4 is a challenging scenario for DTM2, and is favorable for the other four designs. In Scenario 4, $d_{1}^{o p t} = 3$ since its toxicity probability 0.25 is closest to 0.30. In addition, $d_{2}^{o p t} (d_{1}^{o p t}, Y_{1}, Z_{1})$ is exactly the same as the cycle 2 dose levels chosen by the deterministic rules of (3+3)a, (3+3)b and ECRM, except for (Y₁, Z₁) = (0, 1), which only occurs about 5% of the time. From Table 1, the true expected utility of d₂ = 2 given (d₁, Y₁, Z₁) = (3, 0, 1) is 32.82, which is very close to U(0, 0). Thus, the three methods, (3+3)a, (3+3)b and ECRM, are likely to select $d_{1}^{o p t}$ by considering only toxicity outcomes and select $d_{2}^{o p t}$ following their deterministic rules. CRM selects $d_{1}^{o p t} = 3$ most of time, leading to the largest $\bar{\overset{‒}{U}}$ and Qselect. Similar performance is observed for ETEOR as well due to the fact that $d_{1}^{o p t}$ is considered optimal by ETEOR, and it uses the same deterministic rule for cycle 2. The smaller values of $\bar{\overset{‒}{U}}$ and Qselect for DTM2 are due to the fact that it does a stochastic search to determine the optimal actions, using much more general criteria than the other methods. Table 1 shows that, for (d₁Y₁) = (3, 1), the expected cycle 2 utilities are smaller than or very close to U(0, 0) for all the cycle 2 doses, so all cycle 2 doses are barely acceptable or not acceptable. However, d₁ = 5 is acceptable and, given d₁ = 5, many cycle 2 doses are acceptable, and DTM2 often explores higher doses in cycle 1 than $d_{1}^{o p t}$ . This scenario illustrates the price one may pay for using more of the available information to explore the dose domain more extensively based on an efficacy-toxicity utility-based objective function.

In Scenario 5, the lowest dose is too toxic and therefore even d₁ = 1 is unacceptable. As expected, all methods terminate the trial early most of time, with DTM2 stopping trials due to the low posterior expected utilities caused by the high toxicity rate at d₁. Scenarios 6 and 7 have identical true toxicity and efficacy rates for doses 1, 2 and 3, while for doses 4 and 5, Scenario 7 has higher toxicity rates and lower efficacy rates so that its $d_{1}^{o p t}$ is a dose lower than $d_{1}^{o p t}$ of Scenario 6. Since dose 3 has a toxicity rate closest to 0.3 in the both scenarios, the other four methods perform very similarly in the two scenarios. However, DTM2 again has much higher $\bar{\overset{‒}{U}}$ and Qselect values compared to all of the other methods in these scenarios.

Recall that Scenario 8 is included to evaluate robustness, with joint probabilities generated using a model very different from that underlying DTM2. It thus is remarkable that, in terms of both $\bar{\overset{‒}{U}}$ and Qselect, DTM2 has far superior performance compared to all four other methods. Essentially, this is because DTM2 allows a higher rate of toxicity as a trade-o for much higher efficacy, while the phase I methods (3+3)a, (3+3)b, and ECRM all ignore efficacy, and the other phase I-II method, ETEOR, terminates the trial early much more frequently. The superior performance of DMT2 in Scenario 8 may be attributed to its use of a 2-cycle utility function to account for efficacy-toxicity trade-o s and also as a basis for its early stopping rule. More generally, it appears that DTM2 is quite robust to the actual probability mechanism that generates the outcomes.

To assess sensitivity to association among the outcomes Y₁, Z₁Y₂, Z₂ in the two cycles, we evaluated each method's performance with and without association in Scenarios 3, 6, and 7. We let the true $(σ_{ξ}^{2}, σ_{η}^{2}, τ^{2}, ρ)$ be either (0.2, 0.05, 1−0.5) or (0.5², 0.5², 0, 0). The first set of values induces high association between outcomes both within and across cycles, while the second set of values induces no association. This leads to different true expected utilities in each cycle and thus to different optimal decisions, as shown in Table 5. The results are summarized in Table 6. While performance changes depending on the assumed true values, in all cases DTM2 is again superior to all four other methods.

Table 5.

Optimal sequence of treatments under scenarios 3, 6 and 7, assuming different values of $(σ_{ξ}^{2, t r u e}, σ_{η}^{2, t r u e}, τ^{2, t r u e}, ρ^{t r u e})$ to induce either high association or no association between outcomes.

Scenario	$d_{1}^{o p t}$	$d_{2}^{o p t}$
Scenario	$d_{1}^{o p t}$	(0,0)	(0, 1)	(1,0)	(1,1)
3 - High Assoc.	3	4	3	NT	2
3 - No Assoc.	3	4	4	2	2
6 - High Assoc.	5	5	4	NT	3
6 - No Assoc.	5	4	4	4	4
7 - High Assoc.	4	4	4	NT	3
7 - No Assoc.	4	4	4	3	3

Open in a new tab

Table 6.

Simulation results under scenarios 3, 6 and 7, assuming different values of $(σ_{ξ}^{2, t r u e}, σ_{η}^{2, t r u e}, τ^{2, t r u e}, ρ^{t r u e})$ to induce either high association or no association between outcomes.

Scenarios	Criterion	DTM2	(3+3)a	(3+3)b	ECRM	ETEOR
3 High Assoc.	Ū	97.06	85.68	85.24	88.56	89.85
	Q_select	86.18	78.53	80.53	76.58	79.27
	Pr(Tox)	0.37	0.27	0.28	0.31	0.26
	Pr(Eff)	0.38	0.26	0.26	0.33	0.29
	% completed trials	97.7	96.6	99.2	99.9	77.5
3 No Assoc.	Ū	92.05	85.96	85.44	90.06	87.88
	Q_select	82.22	77.83	80.04	79.35	78.75
	Pr(Tox)	0.41	0.27	0.27	0.30	0.26
	Pr(Eff)	0.36	0.26	0.26	0.33	0.28
	% completed trials	98.2	96.6	99.2	99.9	77.5
6 High Assoc.	Ū	101.37	85.54	81.74	83.20	89.87
	Q_select	95.18	72.43	73.35	71.40	77.67
	Pr(Tox)	0.42	0.26	0.29	0.31	0.31
	Pr(Eff)	0.43	0.25	0.22	0.26	0.31
	% completed trials	91.1	51.5	74.7	97.5	59.9
6 No Assoc.	Ū	94.63	82.45	78.73	81.51	85.11
	Q_select	90.85	69.76	70.75	71.53	76.11
	Pr(Tox)	0.46	0.29	0.32	0.33	0.32
	Pr(Eff)	0.40	0.24	0.22	0.26	0.28
	% completed trials	91.6	51.5	74.7	98.2	57.0
7 High Assoc.	Ū	96.67	85.40	81.63	82.94	90.00
	Q_select	87.63	72.44	73.35	71.28	77.84
	Pr(Tox)	0.44	0.26	0.29	0.31	0.31
	Pr(Eff)	0.41	0.25	0.22	0.26	0.31
	% completed trials	90.7	51.4	73.6	97.9	60.2
7 No Assoc.	Ū	89.91	82.25	78.64	81.34	85.29
	Q_select	82.89	69.74	70.74	71.23	76.20
	Pr(Tox)	0.48	0.29	0.32	0.32	0.32
	Pr(Eff)	0.37	0.24	0.22	0.25	0.28
	% completed trials	90.8	51.4	73.6	97.5	57.3

Open in a new tab

5 Discussion

Practical application of DTM2 requires substantial input from the physicians, including specification of outcomes, doses, prior values, and numerical utilities. Such key input from the physicians, and preliminary validation by computer simulation, have provided a practical basis for use of model-based outcome-adaptive methods in many actual phase I-II dose-finding trials (cf. de Lima, Champlin, Thall, et al., 2008). In the design process, computer simulation also may be used to conduct sensitivity analyses in the prior or the numerical utilities so that the physicians may adjust their values. For trial conduct, a database and data entry procedure must be established, with the database updated in real time as patients are treated and evaluated in each cycle. The actual data used by DTM2 are simple, however, consisting of (d₁, Y₁, Z₁, d₂Y₂, Z₂). Accounting for two cycles rather than only one is not a substantial complication compared to usual adaptive trials, since all clinical protocols contain rules for adaptive within-patient decision making.

DTM2 provides the 2-cycle regime d_select for phase III evaluation, rather than only a selected d₁ or 2-cycle pair (d₁d₂). This is an important improvement, since it more accurately reflects actual clinical practice and is likely to improve the chance of success in phase III. This is because phase I methods based on toxicity alone are likely to fail to identify higher doses having higher efficacy and acceptable toxicity, and thus are more likely to select an ineffective dose for phase II evaluation. Moreover, our comparisons to the 2-cycle extension ETEOR of the phase I-II design of Yin et al. (2006), which also uses efficacy, show the advantage of optimizing a utility-based Q-function for decision making.

Several important practical extensions should be noted. While DTM2 uses recent patients’ partial data if only their cycle 1 outcomes have been evaluated, this may be refined by using event time data to enhance inferences. A useful extension would to use toxicity or efficacy follow up times from patients treated and but not fully evaluated, employing predictive probabilities similarly to Bekele et al. (2008), or taking the approach of Zhao et al. (2011). Bivariate ordinal (Y_c, Z_c) outcomes with more than two levels may be accommodated by extending the model to include more cuto s in the latent variables, and eliciting corresponding utilities, as in Thall and Nguyen (2012). Extension to accommodate this case is complex, however, since there would be many more elementary outcomes and thus many more model parameters. Numerous ad hoc adaptive methods for choosing a patient's doses in cycles after the first actually are used in clinical practice. For example, if Y and Z each have four levels, then for two cycles there would be 16 elementary outcomes, rather than 4, (ξ,η) would be 8-dimensional, and Σ_ξ,η would be an 8×8 matrix. Since many actual regimes involve more than two cycles, while in theory the decision criterion can be generalized to accommodate this in a straightforward manner, the dimensions of the outcomes and decisions become much larger. This strongly suggests that, to deal with the general multi-cycle case in a practical way, a more parsimonious model will be needed.

Supplementary Material

Supplementary Materials.

NIHMS610172-supplement-Supplementary_Materials_.pdf^{(158KB, pdf)}

Acknowledgments

Yuan Ji's research is supported by NIH R01/NCI CA132897. Peter Thall's research was supported by NIH/NCI R01 CA 83932 and NIH/NCI 5 P50 CA140388. Peter Müller's research was supported by NIH/NCI R01 CA157458-01A1.

Footnotes

Supplementary Materials

Supplementary materials are available under the Paper Information link at the JASA website.

References

Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 1993;88:6699. [Google Scholar]
Almirall D, Ten Have T, Murphy SA. Structural nested mean models for assessing time-varying effect moderation. Biometrics. 2010;66:131–139. doi: 10.1111/j.1541-0420.2009.01238.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ashford JR, Sowden RR. Multi-variate probit analysis. Biometrics. 1970;26:535–546. [PubMed] [Google Scholar]
Azriel D, Mandel M, Rinott Y. The treatment versus experiment dilemma in dose-finding studies. J. Statistical Planning and Inference. 2011;141:2759–68. [Google Scholar]
Bartroff J, Lai TL. Approximate dynamic programming and its applications to the design of phase I cancer trials. Statistical Science. 2010;25:255–257. [Google Scholar]
Bekele BN, Thall PF. Dose-finding based on multiple toxicities in a soft tissue sarcoma trial. J American Statistical Assoc. 2004;99:26–35. [Google Scholar]
Bekele BN, Ji Y, Shen Y, Thall PF. Monitoring late onset toxicities in phase I trials using predicted risks. Biostatistics. 2008;9:442–457. doi: 10.1093/biostatistics/kxm044. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bellman RE. Dynamic Programming. Princeton University Press; 1957. [Google Scholar]
Braun TM, Kang S, Taylor JMG. A Phase I/II trial design when response is unobserved in subjects with dose-limiting toxicity. Statistical Methods in Medical Research. 2012 doi: 10.1177/0962280212464541. Published online 1 November 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Braun TM, Yuan Z, Thall PF. Determining a maximum tolerated schedule of a cytotoxic agent. Biometrics. 2005;61:335–343. doi: 10.1111/j.1541-0420.2005.00312.x. [DOI] [PubMed] [Google Scholar]
Braun TM, Thall PF, Nguyen H, de Lima M. Simultaneously optimizing dose and schedule of a new cytotoxic agent. Clinical Trials. 2007;4:113–124. doi: 10.1177/1740774507076934. [DOI] [PubMed] [Google Scholar]
Cheung Y-K, Chappell R. Sequential designs for phase I clinical trials with late-onset toxicities. Biometrics. 2000;56:1177–1182. doi: 10.1111/j.0006-341x.2000.01177.x. [DOI] [PubMed] [Google Scholar]
Cheung Y-K. Dose Finding by the Continual Reassessment Method. Chapman&Hall/CRC Biostatistics series; 2011. [Google Scholar]
Chevret SC. Statistical methods for Dose Finding Experiments. John Wiley and Sons; Chichester, UK.: 2006. [Google Scholar]
Chib S, Greenberg E. Analysis of multivariate probit models. Biometrika. 1998;85:3471. [Google Scholar]
Collins LM, Murphy SA, Nair VN, Strecher VJ. A strategy for optimizing and evaluating behavioral interventions. Ann Behav Med. 2005;30(1):65–73. doi: 10.1207/s15324796abm3001_8. [DOI] [PubMed] [Google Scholar]
deLima M, Champlin RE, Thall PF, Wang X, Cook JD, Martin TG, McCormick G, Qazilbash M, Kebriaei P, Couriel D, Shpall EJ, Khouri I, Anderlini P, Hosing C, Chan KE, Patah PA, Caldera Z, Jabbour E, Giralt S. Phase I/II study of gemtuzumab ozogamicin added to fludarabine, melphalan and allogeneic hematopoietic stem cell transplantation for high-risk CD33 positive myeloid leukemias and myelodysplastic syndrome. Leukemia. 2008;22:258–264. doi: 10.1038/sj.leu.2405014. [DOI] [PubMed] [Google Scholar]
Gelfand AE, Smith AFM. Sampling-based approaches to calculating marginal densities. J. Am. Statist. Assoc. 1990;85:398–409. [Google Scholar]
Hernan M, Brumback B, Robins J. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11:561–570. doi: 10.1097/00001648-200009000-00012. [DOI] [PubMed] [Google Scholar]
Lavori PW, Dawson R. Dynamic treatment regimes: Practical design considerations. Statistics in Medicine. 2001;20:1487–98. doi: 10.1191/1740774s04cn002oa. [DOI] [PubMed] [Google Scholar]
Li Y, Bekele BN, Ji Y, Cook JD. Dose-schedule finding in phase I/II clinical trials using a Bayesian isotonic transformation. Statistics In Medicine. 2008;27:4895–4913. doi: 10.1002/sim.3329. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lunceford J, Davidian M, Tsiatis AA. Estimation of the survival distribution of treatment policies in two-stage randomization designs in clinical trials. Biometrics. 2002;58:48–57. doi: 10.1111/j.0006-341x.2002.00048.x. [DOI] [PubMed] [Google Scholar]
Moodie EEM, Richardson TS, Stephens DA. Demystifying optimal dynamic treatment regimes. Biometrics. 2007;63:447–455. doi: 10.1111/j.1541-0420.2006.00686.x. [DOI] [PubMed] [Google Scholar]
Morita S, Thall PF, Mueller P. Evaluating the impact of prior assumptions in Bayesian biostatistics. Statistics in Biosciences. 2010;2:1–17. doi: 10.1007/s12561-010-9018-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Murphy S. Optimal dynamic treatment regimes (with discussion). Journal of the Royal Statistical Society. Series B. 2003;65:331–366. [Google Scholar]
Murphy S, Bingham D. Screening experiments for developing dynamic treatment regimes. J American Statistical Assoc. 2009;104:391–408. doi: 10.1198/jasa.2009.0119. [DOI] [PMC free article] [PubMed] [Google Scholar]
Murphy SA, Collins LM, Rush AJ. Customizing treatment to the patient: Adaptive treatment strategies. Drug and Alcohol Dependence. 2007;88:S1–S3. doi: 10.1016/j.drugalcdep.2007.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
Murphy SA, Lynch KG, Oslin D, Mckay JR, TenHave T. Developing adaptive treatment strategies in substance abuse research. Drug and Alcohol Dependence. 2007;88s:s24–s30. doi: 10.1016/j.drugalcdep.2006.09.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Murphy SA, van der Laan MJ, Robins JM. Marginal mean models for dynamic regimes. J American Statistical Assoc. 2001;96:1410–1423. doi: 10.1198/016214501753382327. [DOI] [PMC free article] [PubMed] [Google Scholar]
O'Quigley J, Pepe M, Fisher L. Continual reassessment method: a practical design for phase I clinical trials in cancer. Biometrics. 1990;46:33–48. [PubMed] [Google Scholar]
Robert CP, Cassella G. Monte Carlo Statistical Methods. Springer; New York: 1999. [Google Scholar]
Robins JM. A new approach to causal inference in mortality studies with sustained exposure periods - application to control of the healthy survivor effect. Mathematical Modeling. 1986;7:1393–1512. [Google Scholar]
Robins JM. Analytic methods for estimating HIV treatment and cofactor effects. In: Ostrow DG, Kessler R, editors. Methodological issues of AIDS Mental Health Research. Plenum Publishing; New York: 1993. pp. 213–290. [Google Scholar]
Robins JM. Causal Inference from Complex Longitudinal Data Latent Variable Modeling and Applications to Causality. In: Berkane M, editor. Lecture Notes in Statistics. 120. Springer Verlag; NY: 1997. pp. 69–117. [Google Scholar]
Robins JM. Marginal Structural Models. Proceedings of the American Statistical Association. section on Bayesian Statistics. 1998:1–10. [Google Scholar]
Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11:550–60. doi: 10.1097/00001648-200009000-00011. [DOI] [PubMed] [Google Scholar]
Rush AJ, Trivedi M, Fava Depression IV: STAR*D treatment trial for depression. American Journal of Psychiatry. 2003;160(2):237. doi: 10.1176/appi.ajp.160.2.237. [DOI] [PubMed] [Google Scholar]
Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT Press; Cambridge, MA: 1998. [Google Scholar]
Thall PF, Millikan R, Sung H-G. Evaluating multiple treatment courses in clinical trials. Statistics in Medicine. 2000;19:1011–1028. doi: 10.1002/(sici)1097-0258(20000430)19:8<1011::aid-sim414>3.0.co;2-m. [DOI] [PubMed] [Google Scholar]
Thall PF, Nguyen HQ. Adaptive randomization to improve utility-based dose-finding with bivariate ordinal outcomes. J Biopharmaceutical Statistics. 2012;22:785–801. doi: 10.1080/10543406.2012.676586. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thall PF, Nguyen HQ, Estey EH. Patient-specific dose-finding based on bivariate outcomes and covariates. Biometrics. 2008;64:1126–1136. doi: 10.1111/j.1541-0420.2008.01009.x. [DOI] [PubMed] [Google Scholar]
Thall PF, Sung H-G, Estey EH. Selecting therapeutic strategies based on efficacy and death in multi-course clinical trials. J American Statistical Assoc. 2002;97:29–39. [Google Scholar]
Thall PF, Wooten LH, Logothetis CJ, Millikan R, Tannir NM. Bayesian and frequentist two-stage treatment strategies based on sequential failure times subject to interval censoring. Statistics in Medicine. 2007;26:4687–4702. doi: 10.1002/sim.2894. [DOI] [PubMed] [Google Scholar]
Tokic M. Advances in Artificial Intelligence. Springer Verlag; Heidelberg, Germany: 2010. Adaptive ε-Greedy exploration in reinforcement learning based on value differences. pp. 203–210. [Google Scholar]
Tierney L. Markov chains for exploring posterior distributions (with Discussion). Annals of Statistics. 1994;22:1701–1762. [Google Scholar]
Wahed AS, Tsiatis AA. Optimal estimator for the survival distribution and related quantities for treatment policies in two-stage randomization designs in clinical trials. Biometrics. 2004;60:124–133. doi: 10.1111/j.0006-341X.2004.00160.x. [DOI] [PubMed] [Google Scholar]
Wang L, Rotnitzky A, Lin X, Millikan R, Thall PF. Evaluation of viable dynamic treatment regimes in a sequentially randomized trial of advanced prostate cancer. J American Statistical Assoc. 2012;107:493–508. doi: 10.1080/01621459.2011.641416. (with discussion, pages 509-517; rejoinder, pages 518-520) [DOI] [PMC free article] [PubMed] [Google Scholar]
Watkins CJCH. Learning from delayed rewards. PhD thesis. Cambridge University; 1989. [Google Scholar]
Yin G. Clinical Trial Design: Bayesian and Frequentist Adaptive Methods. John Wiley & Sons; 2012. [Google Scholar]
Yin G, Li Y, Ji Y. Bayesian dose-finding in phase I/II clinical trials using toxicity and efficacy odds ratios. Biometrics. 2006;62:777–789. doi: 10.1111/j.1541-0420.2006.00534.x. [DOI] [PubMed] [Google Scholar]
Zhang J, Braun TM. A phase I Bayesian adaptive design to simultaneously optimize dose and schedule assignments both between and within patients. J. of the American Statistical Association to appear. 2013 doi: 10.1080/01621459.2013.806927. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang W, Sargent DJ, Mandrekar S. An adaptive dose-finding design incorporating both toxicity and efficacy. Statistics in Medicine. 2005;25:2365–2383. doi: 10.1002/sim.2325. [DOI] [PubMed] [Google Scholar]
Zhao Y, Zheng D, Socinski MA, Kosorok MR. Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer. Biometrics. 2011;67:1422–1433. doi: 10.1111/j.1541-0420.2011.01572.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang J, Braun TM. A phase I Bayesian adaptive design to simultaneously optimize dose and schedule assignments both between and within patients. J American Statistical Assoc. 2013;108:892–901. doi: 10.1080/01621459.2013.806927. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zohar S, Chevret S. Recent developments in adaptive designs for phase I/II dose-finding studies. Journal of Biopharmaceutical Statistics. 2007;17:1071–1083. doi: 10.1080/10543400701645116. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Materials.

NIHMS610172-supplement-Supplementary_Materials_.pdf^{(158KB, pdf)}

[R1] Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 1993;88:6699. [Google Scholar]

[R2] Almirall D, Ten Have T, Murphy SA. Structural nested mean models for assessing time-varying effect moderation. Biometrics. 2010;66:131–139. doi: 10.1111/j.1541-0420.2009.01238.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Ashford JR, Sowden RR. Multi-variate probit analysis. Biometrics. 1970;26:535–546. [PubMed] [Google Scholar]

[R4] Azriel D, Mandel M, Rinott Y. The treatment versus experiment dilemma in dose-finding studies. J. Statistical Planning and Inference. 2011;141:2759–68. [Google Scholar]

[R5] Bartroff J, Lai TL. Approximate dynamic programming and its applications to the design of phase I cancer trials. Statistical Science. 2010;25:255–257. [Google Scholar]

[R6] Bekele BN, Thall PF. Dose-finding based on multiple toxicities in a soft tissue sarcoma trial. J American Statistical Assoc. 2004;99:26–35. [Google Scholar]

[R7] Bekele BN, Ji Y, Shen Y, Thall PF. Monitoring late onset toxicities in phase I trials using predicted risks. Biostatistics. 2008;9:442–457. doi: 10.1093/biostatistics/kxm044. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Bellman RE. Dynamic Programming. Princeton University Press; 1957. [Google Scholar]

[R9] Braun TM, Kang S, Taylor JMG. A Phase I/II trial design when response is unobserved in subjects with dose-limiting toxicity. Statistical Methods in Medical Research. 2012 doi: 10.1177/0962280212464541. Published online 1 November 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Braun TM, Yuan Z, Thall PF. Determining a maximum tolerated schedule of a cytotoxic agent. Biometrics. 2005;61:335–343. doi: 10.1111/j.1541-0420.2005.00312.x. [DOI] [PubMed] [Google Scholar]

[R11] Braun TM, Thall PF, Nguyen H, de Lima M. Simultaneously optimizing dose and schedule of a new cytotoxic agent. Clinical Trials. 2007;4:113–124. doi: 10.1177/1740774507076934. [DOI] [PubMed] [Google Scholar]

[R12] Cheung Y-K, Chappell R. Sequential designs for phase I clinical trials with late-onset toxicities. Biometrics. 2000;56:1177–1182. doi: 10.1111/j.0006-341x.2000.01177.x. [DOI] [PubMed] [Google Scholar]

[R13] Cheung Y-K. Dose Finding by the Continual Reassessment Method. Chapman&Hall/CRC Biostatistics series; 2011. [Google Scholar]

[R14] Chevret SC. Statistical methods for Dose Finding Experiments. John Wiley and Sons; Chichester, UK.: 2006. [Google Scholar]

[R15] Chib S, Greenberg E. Analysis of multivariate probit models. Biometrika. 1998;85:3471. [Google Scholar]

[R16] Collins LM, Murphy SA, Nair VN, Strecher VJ. A strategy for optimizing and evaluating behavioral interventions. Ann Behav Med. 2005;30(1):65–73. doi: 10.1207/s15324796abm3001_8. [DOI] [PubMed] [Google Scholar]

[R17] deLima M, Champlin RE, Thall PF, Wang X, Cook JD, Martin TG, McCormick G, Qazilbash M, Kebriaei P, Couriel D, Shpall EJ, Khouri I, Anderlini P, Hosing C, Chan KE, Patah PA, Caldera Z, Jabbour E, Giralt S. Phase I/II study of gemtuzumab ozogamicin added to fludarabine, melphalan and allogeneic hematopoietic stem cell transplantation for high-risk CD33 positive myeloid leukemias and myelodysplastic syndrome. Leukemia. 2008;22:258–264. doi: 10.1038/sj.leu.2405014. [DOI] [PubMed] [Google Scholar]

[R18] Gelfand AE, Smith AFM. Sampling-based approaches to calculating marginal densities. J. Am. Statist. Assoc. 1990;85:398–409. [Google Scholar]

[R19] Hernan M, Brumback B, Robins J. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11:561–570. doi: 10.1097/00001648-200009000-00012. [DOI] [PubMed] [Google Scholar]

[R20] Lavori PW, Dawson R. Dynamic treatment regimes: Practical design considerations. Statistics in Medicine. 2001;20:1487–98. doi: 10.1191/1740774s04cn002oa. [DOI] [PubMed] [Google Scholar]

[R21] Li Y, Bekele BN, Ji Y, Cook JD. Dose-schedule finding in phase I/II clinical trials using a Bayesian isotonic transformation. Statistics In Medicine. 2008;27:4895–4913. doi: 10.1002/sim.3329. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Lunceford J, Davidian M, Tsiatis AA. Estimation of the survival distribution of treatment policies in two-stage randomization designs in clinical trials. Biometrics. 2002;58:48–57. doi: 10.1111/j.0006-341x.2002.00048.x. [DOI] [PubMed] [Google Scholar]

[R23] Moodie EEM, Richardson TS, Stephens DA. Demystifying optimal dynamic treatment regimes. Biometrics. 2007;63:447–455. doi: 10.1111/j.1541-0420.2006.00686.x. [DOI] [PubMed] [Google Scholar]

[R24] Morita S, Thall PF, Mueller P. Evaluating the impact of prior assumptions in Bayesian biostatistics. Statistics in Biosciences. 2010;2:1–17. doi: 10.1007/s12561-010-9018-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Murphy S. Optimal dynamic treatment regimes (with discussion). Journal of the Royal Statistical Society. Series B. 2003;65:331–366. [Google Scholar]

[R26] Murphy S, Bingham D. Screening experiments for developing dynamic treatment regimes. J American Statistical Assoc. 2009;104:391–408. doi: 10.1198/jasa.2009.0119. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Murphy SA, Collins LM, Rush AJ. Customizing treatment to the patient: Adaptive treatment strategies. Drug and Alcohol Dependence. 2007;88:S1–S3. doi: 10.1016/j.drugalcdep.2007.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Murphy SA, Lynch KG, Oslin D, Mckay JR, TenHave T. Developing adaptive treatment strategies in substance abuse research. Drug and Alcohol Dependence. 2007;88s:s24–s30. doi: 10.1016/j.drugalcdep.2006.09.008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Murphy SA, van der Laan MJ, Robins JM. Marginal mean models for dynamic regimes. J American Statistical Assoc. 2001;96:1410–1423. doi: 10.1198/016214501753382327. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] O'Quigley J, Pepe M, Fisher L. Continual reassessment method: a practical design for phase I clinical trials in cancer. Biometrics. 1990;46:33–48. [PubMed] [Google Scholar]

[R31] Robert CP, Cassella G. Monte Carlo Statistical Methods. Springer; New York: 1999. [Google Scholar]

[R32] Robins JM. A new approach to causal inference in mortality studies with sustained exposure periods - application to control of the healthy survivor effect. Mathematical Modeling. 1986;7:1393–1512. [Google Scholar]

[R33] Robins JM. Analytic methods for estimating HIV treatment and cofactor effects. In: Ostrow DG, Kessler R, editors. Methodological issues of AIDS Mental Health Research. Plenum Publishing; New York: 1993. pp. 213–290. [Google Scholar]

[R34] Robins JM. Causal Inference from Complex Longitudinal Data Latent Variable Modeling and Applications to Causality. In: Berkane M, editor. Lecture Notes in Statistics. 120. Springer Verlag; NY: 1997. pp. 69–117. [Google Scholar]

[R35] Robins JM. Marginal Structural Models. Proceedings of the American Statistical Association. section on Bayesian Statistics. 1998:1–10. [Google Scholar]

[R36] Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11:550–60. doi: 10.1097/00001648-200009000-00011. [DOI] [PubMed] [Google Scholar]

[R37] Rush AJ, Trivedi M, Fava Depression IV: STAR*D treatment trial for depression. American Journal of Psychiatry. 2003;160(2):237. doi: 10.1176/appi.ajp.160.2.237. [DOI] [PubMed] [Google Scholar]

[R38] Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT Press; Cambridge, MA: 1998. [Google Scholar]

[R39] Thall PF, Millikan R, Sung H-G. Evaluating multiple treatment courses in clinical trials. Statistics in Medicine. 2000;19:1011–1028. doi: 10.1002/(sici)1097-0258(20000430)19:8<1011::aid-sim414>3.0.co;2-m. [DOI] [PubMed] [Google Scholar]

[R40] Thall PF, Nguyen HQ. Adaptive randomization to improve utility-based dose-finding with bivariate ordinal outcomes. J Biopharmaceutical Statistics. 2012;22:785–801. doi: 10.1080/10543406.2012.676586. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] Thall PF, Nguyen HQ, Estey EH. Patient-specific dose-finding based on bivariate outcomes and covariates. Biometrics. 2008;64:1126–1136. doi: 10.1111/j.1541-0420.2008.01009.x. [DOI] [PubMed] [Google Scholar]

[R42] Thall PF, Sung H-G, Estey EH. Selecting therapeutic strategies based on efficacy and death in multi-course clinical trials. J American Statistical Assoc. 2002;97:29–39. [Google Scholar]

[R43] Thall PF, Wooten LH, Logothetis CJ, Millikan R, Tannir NM. Bayesian and frequentist two-stage treatment strategies based on sequential failure times subject to interval censoring. Statistics in Medicine. 2007;26:4687–4702. doi: 10.1002/sim.2894. [DOI] [PubMed] [Google Scholar]

[R44] Tokic M. Advances in Artificial Intelligence. Springer Verlag; Heidelberg, Germany: 2010. Adaptive ε-Greedy exploration in reinforcement learning based on value differences. pp. 203–210. [Google Scholar]

[R45] Tierney L. Markov chains for exploring posterior distributions (with Discussion). Annals of Statistics. 1994;22:1701–1762. [Google Scholar]

[R46] Wahed AS, Tsiatis AA. Optimal estimator for the survival distribution and related quantities for treatment policies in two-stage randomization designs in clinical trials. Biometrics. 2004;60:124–133. doi: 10.1111/j.0006-341X.2004.00160.x. [DOI] [PubMed] [Google Scholar]

[R47] Wang L, Rotnitzky A, Lin X, Millikan R, Thall PF. Evaluation of viable dynamic treatment regimes in a sequentially randomized trial of advanced prostate cancer. J American Statistical Assoc. 2012;107:493–508. doi: 10.1080/01621459.2011.641416. (with discussion, pages 509-517; rejoinder, pages 518-520) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] Watkins CJCH. Learning from delayed rewards. PhD thesis. Cambridge University; 1989. [Google Scholar]

[R49] Yin G. Clinical Trial Design: Bayesian and Frequentist Adaptive Methods. John Wiley & Sons; 2012. [Google Scholar]

[R50] Yin G, Li Y, Ji Y. Bayesian dose-finding in phase I/II clinical trials using toxicity and efficacy odds ratios. Biometrics. 2006;62:777–789. doi: 10.1111/j.1541-0420.2006.00534.x. [DOI] [PubMed] [Google Scholar]

[R51] Zhang J, Braun TM. A phase I Bayesian adaptive design to simultaneously optimize dose and schedule assignments both between and within patients. J. of the American Statistical Association to appear. 2013 doi: 10.1080/01621459.2013.806927. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] Zhang W, Sargent DJ, Mandrekar S. An adaptive dose-finding design incorporating both toxicity and efficacy. Statistics in Medicine. 2005;25:2365–2383. doi: 10.1002/sim.2325. [DOI] [PubMed] [Google Scholar]

[R53] Zhao Y, Zheng D, Socinski MA, Kosorok MR. Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer. Biometrics. 2011;67:1422–1433. doi: 10.1111/j.1541-0420.2011.01572.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] Zhang J, Braun TM. A phase I Bayesian adaptive design to simultaneously optimize dose and schedule assignments both between and within patients. J American Statistical Assoc. 2013;108:892–901. doi: 10.1080/01621459.2013.806927. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] Zohar S, Chevret S. Recent developments in adaptive designs for phase I/II dose-finding studies. Journal of Biopharmaceutical Statistics. 2007;17:1071–1083. doi: 10.1080/10543400701645116. [DOI] [PubMed] [Google Scholar]

PERMALINK

Bayesian Dose-Finding in Two Treatment Cycles Based on the Joint Utility of Efficacy and Toxicity

Juhee Lee

Peter F Thall

Yuan Ji

Peter Müller

Abstract

1 Introduction

2 Dose-Outcome Model

3 Decision Criteria

3.1 Adaptive Dose Selection

3.2 Objective Function

3.3 Dose Acceptability

Table 1.

Table 3.

3.4 Adaptive Randomization

3.5 Trial Design

4 Simulation Study

4.1 Simulation Design

Table 2.

4.2 Evaluation Criteria

4.3 Simulation Results

Table 4.

Table 5.

Table 6.

5 Discussion

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Bayesian Dose-Finding in Two Treatment Cycles Based on the Joint Utility of Efficacy and Toxicity

Juhee Lee

Peter F Thall

Yuan Ji

Peter Müller

Abstract

1 Introduction

2 Dose-Outcome Model

3 Decision Criteria

3.1 Adaptive Dose Selection

3.2 Objective Function

3.3 Dose Acceptability

Table 1.

Table 3.

3.4 Adaptive Randomization

3.5 Trial Design

4 Simulation Study

4.1 Simulation Design

Table 2.

4.2 Evaluation Criteria

4.3 Simulation Results

Table 4.

Table 5.

Table 6.

5 Discussion

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases