Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Jul 1.
Published in final edited form as: J Biopharm Stat. 2012 Jul;22(4):785–801. doi: 10.1080/10543406.2012.676586

Adaptive Randomization to Improve Utility-Based Dose-Finding with Bivariate Ordinal Outcomes

Peter F Thall 1,, Hoang Q Nguyen 1
PMCID: PMC3385658  NIHMSID: NIHMS384755  PMID: 22651115

Summary

A sequentially outcome-adaptive Bayesian design is proposed for choosing the dose of an experimental therapy based on elicited utilities of a bivariate ordinal (toxicity, efficacy) outcome. Subject to posterior acceptability criteria to control the risk of severe toxicity and exclude unpromising doses, patients are randomized adaptively among the doses having posterior mean utilities near the maximum. The utility increment used to define near-optimality is non-increasing with sample size. The adaptive randomization uses each dose’s posterior probability of a set of good outcomes, defined by a lower utility cut-off. Saturated parametric models are assumed for the marginal dose-toxicity and dose-efficacy distributions, allowing the possible requirement of monotonicity in dose, and a copula is used to obtain a joint distribution. Prior means are computed by simulation using elicited outcome probabilities, and prior variances are calibrated to control prior effective sample size and obtain a design with good operating characteristics. The method is illustrated by a phase I/II trial of radiation therapy for children with brain stem gliomas.

Keywords: Adaptive design, Bayesian design, Clinical trial, Dose-finding, Epsilon-greedy algorithm, Phase I/II clinical trial, Utility

1. Introduction

We propose a Bayesian phase I/II procedure for sequentially adaptive dose selection based on a bivariate ordinal (toxicity, efficacy) outcome. The method is based on elicited utilities of the possible outcome pairs (cf. Berger, 1985). Rather than choosing the dose that maximizes the posterior mean utility, we deal with the well-known “exploration versus exploitation” dilemma (cf. Sutton and Barto, 1998) by adaptively randomizing patients among the doses having posterior mean utilities that differ from the maximum by less than a specified increment. We require this increment to be non-increasing with sample size, similarly to an “epsilon decreasing” version of an “epsilon greedy” algorithm. The adaptive randomization (AR) uses each dose’s posterior probability of a set of good outcomes, defined by an elicited lower utility cut-off. We obtain a bivariate model by first constructing marginals and using a copula (Nelsen, 1999) to induce association. Aside from link functions, we do not assume functional forms for dose-toxicity or dose-efficacy curves, but rather use marginals with saturated parameterizations. We establish a prior by using elicited outcome probabilities to simulate many very large pseudo samples, with the mean parameter vector of the pseudo posteriors used as the prior mean parameter vector, and the prior variances calibrated to control prior effective sample size.

Because we consider bivariate ordinal outcomes, our methodology differs substantively from phase I/II designs based on trinary or bivariate binary outcomes (cf. Thall and Russell, 1998; O’Quigley, Hughes, and Fenton, 2001; Braun, 2002; Thall and Cook, 2004; Bekele and Shen, 2005; Zhang, Sargent and Mandrekar, 2005; Dragalin and Fedorov, 2006; Thall, Nguyen and Estey, 2008). Our design may be considered a phase I/II, utility-based generalization of several Bayesian phase I methods. Bekele and Thall (2004) use a continual reassessment method (CRM, O’Quigley et al., 1990) type criterion based on posterior means of summed severity scores of multivariate ordinal toxicities; however, their methodology does not incorporate efficacy, maximize a utility, or use AR. For the case of one ordinal toxicity, a method similar to that of Bekele and Thall, but using quasi-likelihood, is proposed by Yuan, Chappell and Bailey (2007), while Van Meter et al. (2011) assume a proportional odds model (McCullagh, 1980) and extend the CRM. Our methodology is similar to the phase I/II design of Houede et al. (2010) for choosing dose pairs of two agents, with the differences that we incorporate AR, require more complex dose admissibility criteria, and use a model with much weaker assumptions.

In Section 2, we describe the motivating trial. The probability model is presented in Section 3. Section 4 gives definitions of the utility function, dose admissibility criteria, AR probabilities, and the design. Application to the RT trial is described in Section 5, including simulation studies of the method’s sensitivity to dose admissibility requirements, the use of AR, maximum sample size, prior variability, and number of doses studied. We close with a discussion in Section 6.

2. A Radiation Therapy Trial

Diffuse intrinsic pontine gliomas (DIPGs) are aggressive brain tumors for which no treatment with substantive anti-disease effect currently exists. DIPGs account for about 75% of brain stem gliomas in children, and the median age of DIPG patients is 5 years. Radiation therapy (RT) is the standard treatment, but nearly all patients experience disease progression within eight months after RT and median survival is less than one year. However, the dose-toxicity and dose-efficacy profiles of RT for this disease are not well understood.

This paper was motivated by the desire to design a phase I/II trial of RT for DIPG patients. The trial, which uses the dose-finding design described here and is ongoing at this writing, includes children with DIPGs who previously received RT with or without chemotherapy and currently have progressive disease. RT is administered by separating a total dose of absorbed radiation (in gray units, Gy) into fractions that are given serially. The biologically equivalent dose is BED = (total dose)*(1 + d/κ), where d = dose/fraction and κ is a constant corresponding to the type of tissue being irradiated. This model is based on the empirical observation in animal models that the empirical proportion of cells surviving radiation can be fit closely by a linear-quadratic function of per fraction dose (Fowler, 1989), with the assumption that cell killing of successive fractions is independent. For brain tissue, κ = 3, and the three combinations of (total dose, d) to be studied in the DIPG trial are (24,2.0), (26.4, 2.2), and (30.8,2.2), so the corresponding BEDs are 40.00, 45.76, and 53.39. Toxicity is defined on a 4-level ordinal scale as Low, Moderate, High, or Severe, with each level defined in terms of fatigue, nausea/vomiting, headache, skin inflammation or desquamation, blindness, and brain edema or necrosis, each evaluated during 42 days from the start of therapy. Efficacy is scored at day 42, and is defined as the sum of three indicators of any improvement, compared to baseline, in (i) clinical symptoms, (ii) radiographic appearance, or (iii) quality of life. Thus, an efficacy score of 0 corresponds to no improvement, 1 to improvement in one of the three categories, and so on.

For our general regime, index the outcomes by k = 1 for toxicity and k = 2 for efficacy, with Yk = 0, 1, ⋯, mk identifying the observed ordinal levels. For toxicity, Y1 = 0 denotes the least severe and m1 the most severe level, while Y2 = 0 denotes the worst and m2 the best level of efficacy. In the RT trial, m1 = m2 = 3. Our proposed dose-finding method requires elicited utilities of all possible values y = (y1, y2) of the outcome pairs Y = (Y1, Y2). The elicited utilities U (y1, y2) for all 16 possible outcomes in the RT trial are given in Table 1. The RT trial utility has the required property that U(y1, y2) must increase as either toxicity severity level decreases or efficacy increases. While it may seem counterintuitive that outcomes with efficacy score Y2 = 0 should be given positive utilities, because the prognosis of the patients in this trial is very poor and treatment is in large part palliative, the oncologists who organized the trial consider achieving a lower level of toxicity to be more desirable for any level of efficacy. When we questioned the oncologists about this critical point during the utility elicitation process, they explained that, even when the efficacy outcome score equals 0, there still is some palliative effect, that is, the treatment still is useful. Consequently, when the efficacy score is 0, lower toxicity severity levels are far more desirable, as the first row of Table 1 shows.

Table 1.

Elicited consensus utilities U(y1, y2) for all 16 possible patient outcomes (y1, y2) = (Toxicity severity, Efficacy score) in the radiation therapy trial. The gray-shaded utilities identify the set of “good” outcomes, defined as having utility ≥ 25.

Toxicity Severity
Low Moderate High Severe

Efficacy
Score
0 50 25 10 0
1 85 50 15 5
2 92 60 20 7
3 100 75 25 10

3. Probability Models

Index doses by χ = {1, ⋯, J}. We will construct a model for Y = (Y1, Y2) as a function of xχ by formulating marginals for [Y1x] and [Y2x] and using a copula (Nelsen, 1999) to obtain a joint distribution. Let θ denote the model parameter vector, with θk the subvector characterizing the marginal of [Ykx]. Denote

λk,y,x=Pr(YkyYky1,x,θk)andπk,y,x=Pr(Yk=yx,θk)

for y = 1, ⋯, mk and k=1,2. Given a monotone increasing link function g, our marginal model assumption is simply

g(λk,y,x)=θk,y,x,y=1,,mk,k=1,2, (1)

with all θk,y,x real-valued. Thus, θk = (θk,1, ⋯, θk,mk), denoting θk,y (θk,y,1, ⋯, θk,y,J). The marginal model (1) is saturated since it has mk parameters for each x, with dim(θk) = Jmk, which is the number of πk,y,x’s needed to specify the J marginals of Yk for all x.

Equation (1) ensures that the distribution {πk,y,x, y = 0, 1, ⋯, mk} is well defined for each k and x. This follows from the fact that, denoting π̄k,y,x = Pr(Ykyx, θk), the unconditional and conditional probabilities are related by the recursive formula

π¯k,y,x=r=1yλk,r,x=r=1yg1(θk,r,x),y=1,,mk. (2)

Consequently, π̄k,y,x is decreasing in y for each x, Jmk-dimensional real-valued θk, and monotone link g. The marginal probabilities are given by

πk,y,x=(1λk,y+1,x)r=1yλk,r,x,y=1,,mk1,πk,mk,x=r=1mkλk,r,x. (3)

Given the marginals, we obtain a joint distribution of [Y1, Y2x] by using a bivariate Gaussian copula, Cρ (υ1, υ2) = Φρ−1 (υ1), Φ−1 (υ2)} for 0 ≤ υ1, υ2 ≤ 1, where Φρ is the bivariate standard normal cdf with correlation ρ, and Φ is the univariate standard normal cdf. This is used to define the joint distribution as

π(yx,θ)=Pr(Y=yx,θ)=a=12b=12(1)a+bCρ(u1,a,u2,b)

where uk,1 = Pr(Ykykx, θk) and uk,2 = Pr(Ykyk − 1 ∣ x, θk). We chose the Gaussian copula for its tractability. Other copulas, such as the Gumbel or Clayton, may be used (Nelsen, 1999). Denoting the data from the first n patients in the trial, by Dn = {(Y1, x[1]), ⋯, (Yn, x[n])} n = 1, ⋯, Nmax, the likelihood is the product

L(Dnθ)=i=1ny1=0m1y2=0m2{π(y1,y2x[i],θ)}I(Yi,1=y1,Yi,2=y2)

and the posterior is p(θDn) ∝ L(Dnθ)prior(θ).

The model parameter vector θ = (θ1, θ2, ρ) has dimension p = J(m1 + m2) + 1, and characterizes a total of J(m1+m2+m1m2) bivariate probabilities. This parameterization is feasible for many cases arising in practice, with (J, m1, m2) = (3,2,2), (3,3,3), (4,3,2), (4,3,3), (5,2,2), (5,3,2), (5,3,3) giving corresponding p = 13, 19, 21, 25, 21, 26, 31. For m1 = m2 = 1, which is the bivariate binary outcome case, p = 2J + 1 and θk = (θk,1, ⋯, θk,J), for k = 1, 2.

How π̄1,y,x and π̄2,y,x may vary with x depends on both the therapeutic modality and the definitions of toxicity and efficacy. An important case is that where it is necessary to assume that π̄k,y,x is increasing in x for one or both outcomes. This assumption is appropriate both for cytotoxic agents and for RT, but it may not be realistic for cytostatic or biologic agents. Imposing the constraints θk,y,1θk,y,2 ≤ ⋯ θk,y,J implies that π̄k,y,x increases in x, for each y = 1, ⋯, mk due to the monotonicity of g, by equation (2). Rather than fitting the model (1) with real-valued θk,y,x’s subject to these constraints, to reduce computation we obtain monotonicity of π̄k,y,x in x by re-parameterizing the model as

θk,y,x=μk,y+z=2xγk,y,zfor allx=2,,J (4)

with real-valued μk,yθk,y,1 and γk,y,x ≥ 0 for all k, y, and x = 2, ⋯, J. Thus, θk,y = (μk,y, γk,y,2, ⋯, γk,y,J), and collecting terms we denote θ = (μ, γ). The parameterization (4) borrows strength between doses quite strongly because, for distinct doses x and z, θk,y,x and θk,y,z share many parameters.

The model π(yx, θ) must do a good job of reflecting the way that the posterior mean utility, defined below in Section 4.1, changes as a function of dose. Intuitively, it may seem that the number of parameters p in the range 13 to 31 is impractically large for dose-finding trials with small sample sizes. This is not the case, essentially because the information in a bivariate ordinal outcome Y is much greater than that provided by a single ordinal Y or a binary outcome, which is used conventionally in phase I trials. Since our goal is to find a dose with high mean utility, if the model is tractable then the value of p is not critical. For example, in the case (J, m1, m2) = (5,3,3) where p = 31, the algorithm (Section 4.3, below) for computing a prior from elicited values works quite well, implementing the MCMC algorithm for computing posteriors is not problematic, and the design performs well across a large set of different dose-outcome scenarios.

When using a utility U(Y) to quantify the desirability of a bivariate ordinal patient outcome Y, conventional generalized linear models (GLMs, McCullagh and Nelder, 1972) for the marginals may not be sufficiently refined to distinguish reliably between doses. This is especially problematic when a middle dose has the highest utility, which can easily be the case when both π̄1,y,x and π̄2,y,x increase with x. The family of GLMs for ordinal Yk given by g(π̄k,y,x) = αk,y + βkx with αk,y decreasing in y, which is the proportional odds model if g is the logit link, may not be sufficiently flexible for dose-finding because a single dose effect βk is assumed for all levels of Yk. The more general form g(λk,y,x) = αk,y + βk,yx, subject to appropriate monotonicity constraints, may provide more flexibility. An alternative model might replace λk,y,x in (1) by the unconditional probability π̄k,y,x, so that g(π¯k,y,x)=θk,y,x, which requires that θk,y,x must increase in y for k = 1, 2 and all x. In the case where π̄k,y,x must increase in x, however, one must impose two sets of monotonicity constraints, one in x and the other in y, which limits tractability for adaptive dose-finding. Sensitivity analyses using different marginal distributions showed that, in terms of performance of the dose-finding method, no one model among those described above is uniformly better than the others. However, our simulations (Table 6) showed that, in terms of both dose selection and choosing desirable doses for patients during the trial, the worst performance of the model with saturated marginals was better than the worst performance of the model with proportional odds marginals.

Table 6.

Comparison of the performance under the simplified 9-parameter bivariate proportional odds model (p=9) versus the model with saturated marginals (p=31) in the 5-dose trial. The numbers in parentheses after Rselect are the percentages of the trial being stopped with no dose selected.

Scenario Prop. odds marginals
Saturated marginals
Rselect Rtreat Rselect Rtreat



1 0.55 (3) 0.79 0.87 (1) 0.84
2 0.92 (8) 0.55 0.71 (2) 0.45
3 0.67 (8) 0.82 0.82 (3) 0.72
4 0.32 (9) 0.47 0.74 (1) 0.68
5 0.84 (4) 0.66 0.64 (1) 0.47
6 0.53 (99) 0.92 0.94 (94) 0.94
7 0.50 (11) 0.30 0.70 (2) 0.52
8 0.66 (5) 0.78 0.75 (1) 0.62

4. Decision Criteria and Trial Design

4.1 Utilities

Denote the elicited utility of outcome y by U(y). The mean utility of dose x given θ is

u(x,θ)=EY{U(Y)x,θ}=y1=0m1y2=0m2U(y)π(yx,θ).

The posterior mean utility is

ϕ(x,Dn)=Eθ{u(x,θ)Dn}=y1=0m1y2=0m2U(y)θπ(y|x,θ)p(θ|Dn)dθ. (5)

Note that (5) reflects the physicians’ utilities and the observed data by averaging over the posterior. We denote the dose that maximizes ϕ(x, Dn) by xnopt. Because U(y) is a patient utility, maximizing ϕ(x, Dn) is very different from the more common Bayesian approach of choosing x to optimize some posterior function of the Fisher information matrix (cf. Haines, Perevozshaya and Rosenberger, 2003; Dragalin and Federov, 2006). In the context of phase I trials, Bartorff and Lai (2010) address the problem of “individual versus collective ethics” by considering an objective function including components for both current and future patients. In our setting, always choosing based on xnopt based on ϕ(x, Dn) alone is an example of a “greedy algorithm,” which in general is a sequentially adaptive decision rule that always takes the locally optimal action at each stage. Motivated by both ethical and practical considerations, we next introduce additional dose acceptability criteria, and an AR procedure, to improve this greedy algorithm.

4.2 Dose Acceptability and Adaptive Randomization

Simply maximizing ϕ(x, Dn) ignores the undesirable but important possibility that all doses are too toxic. To control the risk of toxicity, we elicit the level y* of toxicity considered to be unacceptable by the physicians, and an accompanying fixed limit π1 on the probability π̄1,y*,x of toxicity at or above y*. We say that a dose x is unacceptably toxic if

Pr(π¯1,y,x>π1Dn)>pU, (6)

where pU is an upper probability cut-off, usually in the range .80 to .95. In the simplest case where Y1 is a binary indicator of toxicity at or above some specified severity level, so that y = 0 or 1 denote the absence or presence of toxicity defined in this way, y* = 1 by default. The inequality (6) says that x has a high posterior probability of producing an unacceptably high level of toxicity. We will limit all dose assignments to the set of acceptably safe doses, Ansafe, defined to be all xχ for which (6) is not the case. This is similar in spirit to the safety requirements used by Thall and Cook (2004) and Braun et al. (2007), and “escalation with overdose control” proposed by Babb, Rogatko and Zacks (1998) for phase I trials.

A very important practical problem when using ϕ(x, Dn) to select doses adaptively is that, in some cases, little or no information may be obtained for the dose that actually has the highest true mean utility, u(x, θtrue). This occurs when the algorithm that chooses x by maximizing ϕ(x, Dn) repeatedly assigns a dose that actually is suboptimal, and does not escalate to higher levels that include the true optimal dose. The general phenomenon of a greedy sequential decision procedure becoming stuck at a suboptimal treatment is well-known. A common solution for this problem is to randomly assign some patients to suboptimal treatments. This distributes patients more evenly among treatments and consequently more is learned about the design space, often with a resulting improvement in the method’s reliability. To do this in an ethical way for adaptive dose-finding, we use the following constrained AR procedure.

Let {δn, n = 1, ⋯, Nmax} be a non-increasing sequence of differences in the utility domain. We define the set of δn -optimal doses to be

Anδ={xχ:|ϕ(x,Dn)ϕ(xnopt,Dn)|δn}. (7)

This is the set of doses having posterior mean utility within δn of the maximum value. The sequence {δn} quantifies what is meant by posterior mean utilities being “close” in expression (7); and we require it to be non-increasing with n to accommodate the decreasing variability in the posteriors of the u(x, θ)’s as n increases. Restriction of the AR to doses in Anδ is motivated by both ethical considerations and the practical fact that the posteriors of the utilities {u(x,θ),xAnsafe} may be quite disperse, especially for the small values of n in a dose-finding trial. Moreover, ϕ(x, Dn) may be nearly flat around its maximum, with the numerical superiority of ϕ(xnopt,Dn) over ϕ(x, Dn) for one or more xxnopt quite small. Our simulations, summarized below in Section 5, show that it sometimes is more ethical to treat some patients at suboptimal doses having ϕ(x, Dn) near ϕ(xnopt,Dn) because on average, in many scenarios, this leads to more patients in the trial being treated at doses having higher utilities.

A third acceptability criterion that may be used is to require that a dose should not be unlikely to have the highest utility. We say that a dose x is unlikely to be best if

Pr[u(x,θ)=maxzχ{u(z,θ)}Dn]<pL (8)

for a small lower probability cut-off pL. We denote the set of doses that do not have this property by Anmin. A dose in Anmin is admissible in the sense that it satisfies the minimality requirement that it has at least a non-trivial probability of having the highest utility. Combining the three criteria (6), (7), and (8), we define the set of acceptable doses to be

An=AnsafeAnδAnmin. (9)

Thus, a dose is acceptable if it (i) has acceptable toxicity, (ii) has posterior mean utility that is δn-close to the maximum, and (iii) is not unlikely to have the highest posterior utility.

Our design randomizes patients adaptively among the doses in An. This may be done using many different criteria. An ethically attractive approach is to define AR probabilities in terms of a set of “good” outcomes, defined as G = {y : U(y) ≥ }, where the lower limit is elicited from the physicians who provided the utilities. Given G, the probability of a good outcome for a patient treated with dose x is Pr(YG ∣ x, θ). Denoting the posterior means μG(x, Dn) = E{Pr(YG ∣ x, θ) ∣ Dn}, we randomize a patient to dose xAn with probability

r(x,Dn)=μG(x,Dn)zAnμG(z,Dn). (10)

Thus, the dose-finding method includes the requirements that it only assigns doses that are safe and δn-optimal, as defined by (6) and (7), and that satisfy the minimality requirement (8), and it uses the good outcome set G to determine the AR probabilities (10).

4.3 Establishing a Prior

Denote the normal distribution with mean μ and variance σ2 by N(μ, σ2), and denote the normal distribution truncated below at 0 by N0 (μ, σ2). For the prior p(θ ∣ θ̃), we assume μk,y~N(μk,y,σμ,k,y2) and γk,y,x~N0(γk,y,x,σγ,k,y,x2). The numerical values of the hyperparameters θ=(μ,γ,σμ2,σγ2) may be established from elicited probabilities using the following algorithm, similar to that given in Thall, et al. (2011).

  • Step 1. Assume a non-informative pseudo-prior on θ with all entries N(0,σo2) for large σo2.

  • Step 2. Use the elicited prior probabilities to simulate a large pseudo-sample of size N balanced equally among the doses.

  • Step 3. Compute the pseudo-posterior from the pseudo-prior and pseudo-sample, and record the pseudo-posterior mean.

  • Step 4. Repeat steps 2 and 3 M times, and set the prior mean (μ̃, γ̃) of θ equal to the mean of the M pseudo-posterior means.

  • Step 5. Using the effective sample size (ESS) as a criterion, calibrate the values of ( σμ2, σγ2) to obtain ESS values of the θk,y,x’s in the range 0.20 to 1.0.

As a practical guideline, this algorithm may be applied effectively with N = 100J, i.e. 100 observations per dose, M = 1000 pseudo-samples, and pseudo-prior variances σo2 in the range 102 to 1002, with the particular value of σo2 chosen to ensure that the pseudo-posteriors are insensitive to the pseudo-prior. In practice, one or two numerical values may be used for the entries of ( σμ2, σγ2). If desired, an elaboration of Step 5 is to simulate the trial for each of several values of the hyper variances to ensure that the trial will have good operating characteristics, as well as the prior being non-informative in terms of ESS. One overall ESS of p(θ ∣ θ̃) may be computed using the formal method of Morita, Thall, and Mueller (2008) or, alternatively, one may approximate the prior of each πk,y,x as a beta(a, b), set its ESS equal to a + b, and average these values to obtain a single summary ESS. For the association parameter ρ in the Gaussian copula, a uniform prior on (-1, +1) may be assumed.

4.4 Trial Conduct

The trial is conducted as follows. An initial cohort is treated at a starting dose chosen by the physicians. For all subsequent cohorts, once the posterior is updated based on the observed outcomes of previous patients, if An is empty then the trial is stopped and no dose is chosen; otherwise, patients are randomized among the doses in An using the updated AR probabilities given by (10). A rule superseding the above is that no untried dose may be skipped when escalating. At the end of the trial, if An is not empty, xNmaxopt is selected.

To summarize, the design requires the following quantities to be elicited from the physicians: the prior means of all πk,y,x’s, the utilities, the cut-off that determines G, the safety parameters y* and π1, a starting dose, and the number of patients treated at the starting dose. Given this information, prior parameters must be determined from the elicited values, e.g. by using the simulation-based approach described earlier, and the additional design parameters pU, pL, {δn}, Nmax, and cohort size must be specified. These parameters, along with prior variances, should be calibrated by simulating the trial to obtain good operating characteristics.

5. Application to the Radiation Therapy Trial

5.1 Trial Design

For the RT trial, since m1 = m2 = 3, the distribution {πk(yx, θ), y1, y2 = 0, 1, 2, 3} is determined by 15 probabilities for each x. Since J = 3, there are J(m1m2 + m1 + m2) = 45 bivariate probabilities. For each k and x, the marginals (3) are given by

πk,0,x=1λk,1,xπk,1,x=λk,1,xλk,1,xλk,2,xπk,2,x=λk,1,xλk,2,xλk,1,xλk,2,xλk,3,xπk,3,x=λk,1,xλk,2,xλk,3,x.

We assume a logit link, so λk,y,x = eθk,y,x/(1+eθk,y,x). For each k = 1, 2 and y = 1, 2, 3, θk,y = (μk,y, γk,y,2, γk,y,3), so dim(θk) = 9 and p = dim(θ) = 19.

The prior was obtained from the elicited prior mean probabilities in Table 1 using the algorithm described in Section 4.3. We simulated 1000 pseudo samples, each of size 300 with 100 patients in each pseudo sample assigned to each dose. For each simulated data set, a pseudo posterior was computed starting with a pseudo-prior on θ with each θk,y,x ~ N(0, 602), and the mean of the 1000 pseudo posterior means was used as the prior means μ̃k,y and γ̃k,y,x. The standard deviations of the θk,y,x’s were calibrated by approximating the prior of each πk,x,y with a beta(a, b) and using ESS = a + b. Setting all σ̃μ,k,y = σ̃γ,k,y,x = σ̃ = 6 gave ESS values ranging from 0.31 to 0.70, with mean ESS = 0.42.

For the safety criterion, Ansafe, the physicians specified y* = 3 (severe toxicity) and π1=0.10. Using the conservative upper cut-off pU = 0.80, a dose x is unacceptably toxic if Pr(π1,x,3 ≥ .10 ∣ Dn) > .80. The lower cut-off pL = 0.10 was used to define Anmin, after studying effects of the values pL = 0.05, 0.10, 0.15 in preliminary simulations. An initial cohort of 3 patients will be treated at x = 1 (BED = 40.00), with subsequent cohorts of size 1 and the AR started at the 4th patient. A maximum of Nmax = 30 patients will be treated, chosen in part because an accrual rate of 6 to 10 patients per year is anticipated, so it will require 3 to 5 years to complete the trial. The physicians specified the lower utility = 25 to determine the good outcome set, which is used to define the AR probabilities. The nine outcomes considered to be good by this criterion are shown by the gray shaded values in Table 3. After a preliminary sensitivity analysis examining the design’s behavior by simulation for various sequences {δn}, including 10 ≤ δn ≤ 30 and several non-decreasing functions, it was decided to use the step function δn = 20 for 4 ≤ n ≤ 15 and δn = 15 for 16 ≤ n ≤ 30 to define Anδ. All posterior quantities were computed using MCMC with Gibbs sampling (Robert and Cassella, 1999). All simulations are based on 5000 iterations of each case studied.

Table 3.

Simulation results. The assumed true marginal probabilities {πk,y,xtrue} defining each scenario are given in Figure 1. Under each scenario, each tabled dose selection % and sample size is the mean from 5000 simulated trials. “None” means that no dose was selected.

Dose utrue(x) % Sel # Pats utrue(x) % Sel # Pats
Scenario 1
Scenario 2
x = 1 64.6 50 15.7 51.1 13 11.3
x = 2 64.6 32 8.5 58.6 22 8.8
x = 3 57.0 17 5.6 63.7 61 9.5
None 1 4

Scenario 3
Scenario 4
x = 1 65.2 63 17.2 61.7 31 13.5
x = 2 60.8 20 7.2 66.2 51 10.4
x = 3 54.6 14 5.0 53.1 15 5.7
None 4 3

Scenario 5
Scenario 6
x = 1 60.7 35 14.7 37.6 6 10.3
x = 2 54.3 9 6.6 37.6 2 2.9
x = 3 65.7 54 8.4 33.4 1 1.4
None 3 91

Scenario 7
Scenario 8
x = 1 52.4 14 11.5 65.4 64 17.4
x = 2 64.4 51 10.7 55.1 9 6.2
x = 3 59.7 32 7.4 59.9 25 6.2
None 4 2

5.2. Simulation Results

Using this design, we simulated the trial under eight scenarios, each determined by a set of assumed true probabilities, {πk,y,xtrue}, given in Figure 1. Scenario 1 is based on the elicited prior means, which give equal utility 64.6 to x = 1 and 2 and utility 57.0 to x = 3. Scenario 2 has steeply increasing utility with x = 3 best. Scenario 3 has steeply decreasing utility with x = 1 best, the middle dose x = 2 is best in Scenario 4, and the utility is V-shaped in Scenario 5 with x = 3 best. No dose is acceptably safe in Scenario 6. Scenario 7 is similar to Scenario 4 in terms of utilities, but has slightly higher toxicity and efficacy probabilities. The utility is V-shaped in Scenario 8 with x = 1 best. In each simulation scenario, the true marginal probabilities {πk,y,xtrue} do not depend on the assumed model, although we use the copula to determine correlations. The Gaussian copula correlation parameter ρtrue = 0.10 was used throughout. The true utilities utrue(x) are determined by the assumed {πk,y,xtrue}, ρtrue, and the elicited utilities U(y) in Table 1.

Figure 1.

Figure 1

Fixed probabilities of all outcomes for each of the eight scenarios in the simulation study.

The results of simulating the RT trial design under Scenarios 1 – 8 are summarized in Table 3. In Scenario 6, where no dose is acceptably safe and the true severe toxicity probabilities are 0.25, 0.28, 0.30 at the three doses, the method correctly stops early and chooses no dose 91% of the time, and treats on average 14.6 patients. Modifying Scenario 6 so that the true severe toxicity probabilities at the three doses are the slightly higher values .30, .35, .40, the stopping probability is 98% and the mean sample size drops to 11.6 patients. In all of the other 7 scenarios, the selection probabilities reflect the utilities quite closely. The algorithm’s selection reliability is quite striking in Scenarios 5 and 8, which have V-shaped utilities with the middle dose x = 2 least desirable. The sample size distributions are biased toward lower doses in all scenarios, reflecting the initial cohort of 3 patients at x = 1, the do-not-skip rule, and the fact that the prior was biased toward the lower doses, which favors AR to these doses early in the trial.

While Table 3 shows that our method has very desirable properties for the RT trial design, since it relies on three particular admissibility criteria and AR, it is important to assess each design component’s effect on the method’s performance. We thus simulated four alternative versions of the procedure. Method 1 is our proposed procedure with An given by (9). Method 2 drops the minimality requirement Anmin from An. Method 3 drops both the minimality requirement Anmin and the requirement Anδ in the AR that a dose must have ϕ(x,Dn) that is δn–close to the optimum. Thus, Methods 1, 2, and 3 all use AR based on the posterior probabilities r(x, Dn) of a good outcome, but with different definitions of an acceptable dose. Method 4 drops the AR entirely, and uses the greedy algorithm that simply chooses x to maximize ϕ(x, Dn), subject only to the safety constraint on toxicity. Thus, An=Ansafe for both Methods 3 and 4, but Method 4 does not use AR. The results of this additional comparative simulation are given in in Table 4. To present the results in a more compact way than giving four versions of Table 2, one for each version of the method, instead we use the following two summary statistics to quantify each method’s performance. For each scenario, let utrue(xselect) denote the true utility of the final selected dose, and umax and umin the largest and smallest possible true utilities among all xχ. A statistic that quantifies how well the method selects a final dose for future patients is

Rselect=utrue(xselect)uminumaxumin,

the proportion of the difference between the utilities of the best and worst possible choices that is achieved by xselect. A similar statistic that quantifies how well the method assigns doses to patients throughout the trial is

Rtreat=1Ni=1Nutrue(x[i])uminumaxumin,

where utrue (x[i]) is the true utility of the dose given to the ith patient, and N is the achieved final sample size. Both statistics have range [0, 1], with a larger value corresponding to better overall performance. Thus, Rselect quantifies future patient benefit while Rtreat quantifies the benefit to the patients in the trial.

Table 4.

Simulations comparing four alternative dose-finding methods. Method 1 is the proposed procedure. Methods 2, 3, and 4 successively drop the minimality requirement xAnmin, the δn-close requirement xAnδ, and the use of adaptive randomization (AR). The numbers in parentheses after Rselect are the percentages of the trial being stopped with no dose selected.

Method
1
2
3
4
AR Yes Yes Yes No
An
AnsafeAnδAnmin
AnsafeAnδ
Ansafe
Ansafe
Scenario Rselect Rtreat Rselect Rtreat Rselect Rtreat Rselect Rtreat





1 0.83 (1) 0.81 0.82 (1) 0.82 0.81 (1) 0.77 1.00 (2) 1.00
2 0.77 (4) 0.50 0.77 (2) 0.44 0.79 (1) 0.42 0.12 (5) 0.10
3 0.77 (4) 0.73 0.76 (4) 0.71 0.76 (4) 0.65 0.99 (5) 0.99
4 0.74 (3) 0.65 0.73 (1) 0.65 0.72 (1) 0.62 0.68 (2) 0.67
5 0.75 (3) 0.56 0.74 (2) 0.50 0.76 (1) 0.50 0.54 (3) 0.53
6 0.92 (91) 0.90 0.91 (90) 0.90 0.89 (90) 0.90 1.00 (93) 1.00
7 0.73 (4) 0.51 0.74 (2) 0.49 0.73 (1) 0.45 0.25 (5) 0.20
8 0.77 (2) 0.68 0.75 (2) 0.64 0.75 (1) 0.58 0.98 (2) 0.97

Table 2.

Elicited prior mean marginal toxicity severity and efficacy score probabilities for the model used in the radiation therapy trial.

x BED Y1 = Toxicity Severity
Y2 = Efficacy Score
Low Moderate High Severe 0 1 2 3
1 40.00 0.65 0.20 0.12 0.03 0.20 0.40 0.35 0.05
2 45.76 0.55 0.25 0.15 0.05 0.10 0.30 0.45 0.15
3 53.39 0.40 0.30 0.23 0.07 0.10 0.20 0.50 0.20

To interpret the results in Table 4, we first note that all four methods are safe in that they stop and select no dose between 90% and 93% of the time in the toxic Scenario 6. This is as expected, since all of the methods include the safety requirement (6). For the other seven scenarios, Table 4 shows that Method 4, the greedy algorithm with the safety requirement but no AR, may perform either extremely well (in Scenarios 1, 3, and 8), moderately well (in Scenarios 4 and 5), or extremely poorly (in Scenarios 2 and 7). That is, Method 4 is inconsistent and sometimes yields disastrous results. In particular, Method 4 gives the extremely small values Rtreat = 0.10 in Scenario 2 and Rtreat = 0.20 in Scenario 7, compared to much larger values for Methods 1 – 3. Thus, depending on the actual state of nature, the greedy algorithm without AR may perform either very well or very poorly in terms of benefit to the patients in the trial. In contrast, across all scenarios considered Methods 1 – 3 are all much more reliable, with no scenarios where any of these methods have extremely poor performance. Thus, the addition of AR acts like an insurance policy against disaster, but with the price being reduced performance in terms of Rselect and Rtreat in some cases. Comparing Methods 1, 2, and 3, the greatest effect of adding the admissibility constraints Anδ and Anmin is that they both increase Rtreat substantially in almost all scenarios. Thus, these two dose admissibility constraints have the effect of making the dose-finding algorithm more ethically attractive for the patients in the trial. In contrast, in terms of future patient benefit, Table 4 shows that most of the advantage over Method 4 in Scenarios 2, 4, 5, and 7 is due to including AR, since the Rselect values for Methods 1,2, and 3 about the same in these scenarios.

Our motivating application is somewhat atypical in that most phase I/II trials have more than 3 dose levels. To more fully illustrate the methodology, we also include a simulation study with J = 5 doses. The model, which now has p = 31 parameters, and the design used for this simulation are very similar to those of the RT trial, but we assume Nmax = 40. With regard to choice of Nmax in practice, it should be kept in mind that a phase I/II trial replaces the more conventional approach of conducting phase I based on toxicity only and phase II based on efficacy only. Thus, Nmax = 40 is quite reasonable for a phase I/II trial. All numerical values in the prior and model corresponding to the new doses x′ ∈ {1, 2, 3, 4, 5} were obtained by matching values for x′ = 1, 3, 5 to x = 1, 2, 3, respectively, and interpolating to obtain values for x′ = 2 and x′ = 4. This simulation is summarized in Table 5, which shows that the qualitative behavior of the design for J = 5 dose levels is very similar to what is shown for J = 3 in Table 3.

Table 5.

Simulation results for the 5-dose version of the radiation therapy trial.

Dose utrue(x) % Sel # Pats utrue(x) % Sel # Pats
Scenario 1
Scenario 2
x = 1 64.6 44 15.4 51.1 10 12.0
x = 2 64.6 21 8.8 54.9 10 7.8
x = 3 64.6 16 6.7 58.6 15 7.3
x = 4 60.8 10 5.1 61.2 24 6.8
x = 5 57.0 7 3.9 63.7 39 5.8
None 1 2

Scenario 3
Scenario 4
x = 1 65.2 58 16.9 61.7 27 13.9
x = 2 63.0 18 8.5 64.0 22 8.9
x = 3 60.8 10 6.1 66.2 30 7.7
x = 4 57.7 7 4.5 59.8 15 5.4
x = 5 54.6 5 3.4 53.1 6 3.8
None 3 1

Scenario 5
Scenario 6
x = 1 60.7 36 15.3 37.6 4 10.5
x = 2 57.5 11 7.8 37.7 1 3.2
x = 3 54.3 5 5.7 37.6 1 1.6
x = 4 60.0 13 5.8 35.5 0 0.8
x = 5 65.7 34 5.2 33.4 0 0.4
None 1 94

Scenario 7
Scenario 8
x = 1 52.4 11 11.9 65.4 60 17.8
x = 2 58.5 16 8.4 60.3 13 8.1
x = 3 64.4 33 8.2 55.1 5 5.4
x = 4 62.1 21 6.2 57.5 8 4.7
x = 5 59.7 17 4.9 59.9 14 3.8
None 2 1

A natural question is how well the method works using a more conventional model with fewer parameters. To answer this, we also simulated the trial assuming a bivariate model with PO marginals defined by logit(π̄k,y,x) = αk,y + βkx with αk,1 > αk,2 > αk,3 so θ = (α1,1, α1,2, α1,3, α2,1, α2,2, α2,3, β1, β2, ρ) and p = 9. The prior for this model was obtained similarly to that of the 19-parameter model, using the method described in Section 4.3. The results are summarized in Table 6, which shows that the simpler model gives a design with greatly inferior performance in Scenarios 1, 4, 6, and 7, superior performance in Scenarios 2 and 5, and the comparative results are mixed in Scenarios 3 and 8. This illustrates the point that, in the present setting, no model is uniformly best. For the model with saturated marginals, excluding Scenario 6 where no doses are acceptable, Rselect has range [0.64, 0.87] and Rtreat has range [0.45, 0.84], whereas the proportional odds model has corresponding ranges [0.32, 0.92] and [0.30, 0.82]. Thus, the saturated model performs much more consistently across a diverse set of scenarios, with far better worst-case results than the simple model.

6. Discussion

In our proposed design, we have modified the approach of maximizing the posterior mean utility by imposing several dose acceptability criteria and using AR among near-optimal doses. As shown in Table 4, each modification improved the method’s performance, in terms of both reliability and ethical desirability. While it may seem counterintuitive, in some scenarios randomizing patients to interimly sub-optimal doses actually increases the average utility of the doses assigned to patients throughout the trial. We also found that using a model with saturated marginals gives a design with more consistent overall performance compared to a model with conventional proportional odds marginals.

Several questions remain, and are areas for future investigation. In principle, the utility function might be generalized to include information, monetary costs, or a future patient horizon. Since these utilities have qualitatively different ranges, combining them to actually conduct a trial is a difficult practical and ethical problem (cf. Dragalin and Fedorov, 2006; Bartroff and Lai, 2010). Other issues include the use of bivariate event times as outcomes (cf. Yuan and Yin, 2009), and addressing the more general problems of optimizing a two-agent combination or both dose and schedule (Braun, et al. 2007).

Acknowledgments

The authors thank H. Fontanilla and A. Mahajan for presenting the problem of designing the radiation therapy trial, and for providing the elicited values, and Pat Fox for preparing Figure 1. This research was supported by NCI grant RO1-CA-83932.

References

  1. Babb J, Rogatko A, Zacks S. Cancer phase I clinical trials: Efficient dose escalation with overdose control. Statistics in Medicine. 1998;17:1103–1120. doi: 10.1002/(sici)1097-0258(19980530)17:10<1103::aid-sim793>3.0.co;2-9. [DOI] [PubMed] [Google Scholar]
  2. Bartroff J, Lai TL. Approximate dynamic programming and its applications to the design of phase I cancer trials. Statistical Science. 2010;25:255–257. [Google Scholar]
  3. Bekele BN, Shen Y. A Bayesian approach to jointly modeling toxicity and biomarker expression in a phase I/II dose-finding trial. Biometrics. 2004;60:343–354. doi: 10.1111/j.1541-0420.2005.00314.x. [DOI] [PubMed] [Google Scholar]
  4. Bekele BN, Thall PF. Dose-finding based on multiple toxicities in a soft tissue sarcoma trial. J American Statistical Assoc. 2004;99:26–35. [Google Scholar]
  5. Berger James O. Statistical Decision Theory and Bayesian Analysis. 2. New York: Springer-Verlag; 1985. [Google Scholar]
  6. Braun TM. The bivariate continual reassessment method: extending the CRM to phase I trials of two competing outcomes. Controlled Clinical Trials. 2002;23:240–256. doi: 10.1016/s0197-2456(01)00205-7. [DOI] [PubMed] [Google Scholar]
  7. Braun TM, Thall PF, Nguyen H, de Lima M. Simultaneously optimizing dose and schedule of a new cytotoxic agent. Clinical Trials. 2007;4:113–124. doi: 10.1177/1740774507076934. [DOI] [PubMed] [Google Scholar]
  8. Dragalin V, Fedorov V. Adaptive designs for dose-finding based on efficacy–toxicity response. Journal of Statistical Planning and Inference. 2006;136:1800–1823. [Google Scholar]
  9. Fowler J. The Linear-quadratic formula and progress in fractionated radiotherapy. Br J Radiology. 1989;62:679–694. doi: 10.1259/0007-1285-62-740-679. [DOI] [PubMed] [Google Scholar]
  10. Haines LM, Perevozskaya I, Rosenberger WF. Bayesian optimal design for phase I clinical trials. Biometrics. 2003;59:591–600. doi: 10.1111/1541-0420.00069. [DOI] [PubMed] [Google Scholar]
  11. Houede N, Thall PF, Nguyen H, Paoletti X, Kramar A. Utility-based optimization of combination therapy using ordinal toxicity and efficacy in phase I/II trials. Biometrics. 2010;66:532–540. doi: 10.1111/j.1541-0420.2009.01302.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. McCullagh P. Regression models for ordinal data (with discussion) J Royal Statistical Society, Series B. 1980;42:109–142. [Google Scholar]
  13. McCullagh P, Nelder JA. Generalized Linear Models. 2. New York: Chapman and Hall; 1989. [Google Scholar]
  14. Nelsen RB. An Introduction to Copulas Lecture Notes in Statistics. Vol. 139. New York: Springer-Verlag; 1999. [Google Scholar]
  15. O’Quigley J, Hughes MD, Fenton T. Dose-finding designs for HIV studies. Biometrics. 2001;57:1018–1029. doi: 10.1111/j.0006-341x.2001.01018.x. [DOI] [PubMed] [Google Scholar]
  16. O’Quigley J, Pepe M, Fisher L. Continual reassessment method: a practical design for phase I clinical trials in cancer. Biometrics. 1990;46:33–48. [PubMed] [Google Scholar]
  17. Robert CP, Cassella G. Monte Carlo Statistical Methods. New York: Springer; 1999. [Google Scholar]
  18. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press; 1998. [Google Scholar]
  19. Thall PF, Cook JD. Dose-finding based on efficacy-toxicity trade-offs. Biometrics. 2004;60:684–693. doi: 10.1111/j.0006-341X.2004.00218.x. [DOI] [PubMed] [Google Scholar]
  20. Thall PF, Nguyen HQ, Estey EH. Patient-specific dose-finding based on bivariate outcomes and covariates. Biometrics. 2008;64:1126–1136. doi: 10.1111/j.1541-0420.2008.01009.x. [DOI] [PubMed] [Google Scholar]
  21. Thall PF, Russell KT. A strategy for dose finding and safety monitoring based on efficacy and adverse outcomes in phase I/II clinical trials. Biometrics. 1998;54:251–264. [PubMed] [Google Scholar]
  22. Van Meter EM, Garrett-Mayer E, Bandyopadhyay D. Proportional odds model for dose-finding clinical trial designs with ordinal toxicity grading. Statistics in Medicine. 2011;30:2070–2080. doi: 10.1002/sim.4069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Yuan Z, Chappell R, Bailey H. The continual reassessment method for multiple toxicity grades: a Bayesian quasi-likelihood approach. Biometrics. 2007;63:173–179. doi: 10.1111/j.1541-0420.2006.00666.x. [DOI] [PubMed] [Google Scholar]
  24. Yuan Y, Yin G. Bayesian dose-finding by jointly modeling toxicity and efficacy as time-to-event outcomes. Journal of the Royal Statistical Society, Series C. 2009;58:954–968. [Google Scholar]
  25. Zhang W, Sargent DJ, Mandrekar S. An adaptive dose-finding design incorporating both toxicity and efficacy. Statistics in Medicine. 2005;25:2365–2383. doi: 10.1002/sim.2325. [DOI] [PubMed] [Google Scholar]

RESOURCES