Summary
We consider treatment regimes in which an agent is administered continuously at a specified concentration until either a response is achieved or a predetermined maximum infusion time is reached. Response is an event defined to characterize therapeutic efficacy. A portion of the maximum planned total amount administered is given as an initial bolus. For such regimes, the amount of the agent received by the patient depends on the time to response. An additional complication when response is evaluated periodically rather than continuously is that the response time is interval censored. We address the problem of designing a clinical trial in which such response time data and a binary indicator of toxicity are used together to jointly optimize the concentration and the size of the bolus. We propose a sequentially adaptive Bayesian design that chooses the optimal treatment for successive patients by maximizing the posterior mean utility of the joint efficacy-toxicity outcome. The methodology is illustrated by a trial in which tissue plasminogen activator is infused intra-arterially as rapid treatment for acute ischemic stroke.
Keywords: Adaptive design, Bayesian design, Clinical trial, Continuous infusion, Phase I/II clinical trial, Stroke
1. Introduction
Many phase I/II designs that choose an optimal dose based on efficacy and toxicity have been proposed. Most of these methods characterize clinical outcomes as discrete variables (cf. Gooley, et al. 1994; Thall and Russell, 1998; O’Quigley, Hughes and Fenton, 2001; Braun, 2002; Ivanova, 2003; Thall and Cook, 2004; Bekele and Shen, 2005; Zhang, Sargent, and Mandrekar, 2006; Thall, Nguyen and Estey, 2008). Phase I/II methods also have been proposed based on two time-to-event outcomes (Yuan and Yin, 2009) and two ordinal outcomes (Houede, et al., 2010). While the problem that motivated the present paper is to optimize a 2-dimensional treatment based on efficacy and toxicity, the specific structure of our setting does not fit any of the dose-finding paradigms noted above. To explain why this is the case, we first give some background on the medical setting and treatment regime.
Acute ischemic ’stroke (AIS) is a major cause of mortality and disability in adults (Johnson, Mendis and Mathers, 2009). A new therapeutic modality for AIS is intra-arterial (IA) fibrinolytic infusion, wherein a thrombolytic agent to dissolve the clot that caused the stroke is delivered via two telescoping catheters, one supportive in the carotid artery and a smaller microcatheter within it positioned directly into the clot. The catheters are introduced to the arterial system via a sheath placed into the femoral artery. Using live X-ray fluoroscopic guidance, the catheters are moved through the carotid artery leading to the site in the brain artery where the clot leading to AIS occurred, and the agent is infused via the microcatheter. A thrombolytic agent approved by the U.S. Food and Drug Administration for intravenous (IV) treatment of AIS in adults is tissue plasminogen activator (tPA). While effects of IV tPA in adult stroke patients are well understood, optimally safe and efficacious concentrations of IA tPA have not been established. The methodology described in this paper was motivated by the desire to design a clinical trial to optimize administration of IA tPA.
The treatment regime is as follows. For a given concentration c in mg/kg body weight and fixed maximum volume V, the maximum total dose is cV. Since V is fixed, hereafter we set V = 1 without loss of generality. A proportion q of the maximum volume is given as an initial bolus at t = 0, followed by continuous infusion (ci) of the remaining proportion 1 – q at a constant rate for a maximum time period t*. In the IA tPA trial, t* = 2 hours. Efficacy is the time to dissolve the clot, YE. This includes the possibility that the clot is dissolved immediately by the bolus (YE = 0). The ci is stopped at the time of response if it is observed by time t*, otherwise YE is right-censored at t*. Toxicity is the binary indicator YT of symptomatic intra-cerebral hemorrhage (SICH), characterized by neurological worsening compared to baseline in terms of a standardized stroke severity scale, and is associated with high rates of morbidity and mortality. SICH is evaluated by brain imaging, using head computerized tomography (CT) scan or magnetic resonance imaging (MRI), typically at a fixed time much later than t*, such as 24 or 48 hours. Patients for whom the infusion fails to dissolve the clot are believed to be at higher risk of toxicity.
If response is observed continuously, then each patient’s outcome data consist of YT and either the response time YE if the clot is dissolved before t* or the fixed right-censoring time t*. When response is evaluated periodically, which may be the only practical possibility in some settings, only the time interval in which YE occurred is known. In the IA tPA trial, response is evaluated at 15 minute intervals. For example, if a patient’s clot was not dissolved by the 30 minute evaluation but was found to have dissolved by the 45 minute evaluation, then it is only known that the response occurred during the interval (30, 45]. Thus, YE is interval censored from 0 to t* and administratively right-censored at t*. With either continuous or periodic observation of YE, because the ci is stopped before t* if the clot is dissolved, the amount of the agent that a patient actually receives depends on the patient’s response time as well as c and q. Consequently, since toxicity is scored after the ci is completed the distribution of YT depends on YE.
The goal is to jointly optimize (c, q) over a design space consisting of a rectangular grid of the eight pairs obtained from the bolus proportions q = 0.10 and 0.20, and the concentrations c = 0.20, 0.30, 0.40, and 0.50 mg/kg. These (c, q) combinations were chosen by two co-authors of this paper (CMA and OOZ), expert stroke, vascular, and interventional neurologists. The main problems that we address here are to (1) specify tractable probability models for YE and [YT|YE] as functions of (c, q) reflecting the modes of administration and observation, and (2) construct an adaptive decision procedure utilizing YT and the response time data described above to choose (c, q) for successive patients. We model the marginal distribution of YE as a mixture of a discrete mass at t = 0 for the bolus and a continuous component on (0, ∞) for the ci. The conditional toxicity probability of [YT|YE] is a nonlinear function of c, q, and YE. Elicited utilities of joint (YE, YT) outcomes are used as a basis for adaptively assigning treatments to successive patients as they are enrolled during the trial. Each patient is given the treatment combination (c, q) maximizing the current posterior mean utility.
The probability model is given in Section 2, followed by descriptions of the utility function and design in Section 3. Application to the IA tPA trial is described in Section 4, and simulation studies are summarized in Section 5. We close with a discussion in Section 6.
2. Probability Models
2.1 Models for Response and Toxicity
While there are many parametric regression models for event time data (cf. Ibrahim, Chen and Sinha, 2001), with this treatment regime we model efficacy as a time-to-event outcome, as follows. Since efficacy is monitored on the interval 0 ≤ t ≤ t*, to simplify notation we define YE in terms of standardized time, s = t/t*. Denote the parameter vector of the marginal distribution of YE by α. We define the distribution of [YE | c, q, α] in terms of a hazard function with two components, a discrete component p0(c, q, α), the probability that the clot is dissolved instantaneously by the bolus at s = 0, and a continuous component, λ(s, c, q, α), for s ≥ 0. For integrated continuous hazard , the cumulative hazard function is −log{1 – p0(c, q, α)} + Λ(s, c, q, α) for s ≥ 0. Consequently, denoting the indicator of an event A by 1(A), the probability distribution function (pdf) of YE is the discrete-continuous mixture
and the cumulative distribution function (cdf) is
(1) |
for y ≥ 0. In particular, FE(0, c, q, α) = p0(c, q, α) since Λ(0, c, q, α) = 0.
To apply this model, functional forms for p0 and λ must be specified. Any model used in sequential outcome-adaptive decision making based on small to moderate sized samples must balance flexibility to accurately reflect the observed data with tractability to facilitate computation. To allow p0 and λ to vary nonlinearly in both c and q, we will use cα1 and qα2 rather than c and q as arguments, where α1, α2 > 0 are model parameters. The additional flexibility provided by α1 and α2 provides a basis for distinguishing more reliably among diffierent values of (c, q) in terms of their effects on p0 and λ when applying the adaptive decision scheme. Based on clinical experience, we assume that
p0(c, q, α) increases in both c and q, and
the clot cannot dissolve instantaneously at s = 0 without a bolus infusion of some tPA, hence p0(c, 0, α) = 0 for all c > 0 and p0(0, q, α) = 0 for all 0 ≤ q ≤ 1.
A simple, flexible function with properties (a) and (b) is
(2) |
We require the hazard function λ(s, c, q, α) for the clot dissolving during the ci to have the following properties:
λ(s, c, q, α) must be continuous in s;
λ(s, c, q, α) must be su ciently flexible so that it may be monotone increasing, monotone decreasing, or non-monotone in s;
λ(s, 0, q, α) > 0, to allow a non-zero baseline hazard if no tPA is given, c = 0;
λ(s, c, 0, α) > 0 to allow the possibility that the clot is dissolved if no bolus is given;
the integrated continuous hazard Λ(s, c, q, α) must be numerically tractable;
λ(s, c, q, α) must increase in both c and q, and may be nonlinear in either c or q.
An intuitive approach to constructing a function λ with these properties is to define it in terms of the cumulative delivered dose by standardized time s, which is given by
This function increases linearly in s with slope c(1 − q) from minimum value d(0, c, q) = cq at s = 0 to d(1, c, q) = c at the last possible observation time s = 1 for YE. While a hazard function with properties (i) – (v) may be obtained by using d(s, c, q) as an argument, to also obtain property (vi), we use the more general function
which may be considered the effective cumulative delivered dose by standardized time s. We define the hazard function to take the form
(3) |
where αj > 0 for all j = 1, …, 5. The baseline hazard of the clot dissolving if no tPA is α3. The ratio added to α3 is a log logistic hazard function with argument d(s, cα1, qα2) and shape parameter α5. The parameter α1 allows λ to vary nonlinearly in c while α2 determines the relative magnitude of the contribution cα1(1 − qα2)s of the ci versus the contribution cα1qα2 of the bolus. Thus, α = (α0, …, α5) characterizes p0 and λ.
Integrating (3) gives the cumulative hazard function
(4) |
We define the distribution of YT conditional on YE because toxicity is scored by imaging at 48 hours, after YE has been observed. Our model must account for the possibilities that either larger YE, hence a larger amount of the continuously infused agent, or failure to dissolve the clot may increase the risk of toxicity. To ensure that the probability of toxicity is a function of c, q, and YE with these properties, denoting the minimum of a and b by a ∧ b and β = (β0, …, β4), we assume the model
(5) |
Under this model, β2cβ1q is the effect of the bolus, β2cβ1(1 − q)(YE ∧ 1) is the effect of the continuously infused portion, β4 is the effect of failing to dissolve the clot, and 1 − e−β0 is the baseline probability of toxicity if no tPA is given. The bolus size q is not exponentiated as in the definition of λ since this would result in an over-parameterized model for πT. Thus, the model parameter vector θ = (θ, β) has dimension 11.
Although treatment begins with a bolus in the IA tPA trial, if no bolus were given then q = 0 and the effective delivered dose at standardized time s would be d(s, cα1, 0) = cα1s. In this case, α2 would be dropped from the model, the hazard function (3) would simplify to
and the cumulative hazard function (4) would become
with dim(α) reduced from 6 to 5. The model for πT would be simplified by dropping β2 and the linear component in (5) would be reduced to β0 + β3cβ1(YE ∧ 1) + β41(YE > 1), with dim(β) reduced from 5 to 4, so β would have dimension 9.
2.2 Joint Distribution of Response and Toxicity
Given the conditional probability πT (yE, c, q, α) for YE = yE and the marginal distribution fE(yE|c, q, α) of YE, the joint distribution of Y = (YE, YT) is given in general by
Since Pr(YT = 1|yE, c, q, β) = 1 − Pr(YT = 0|yE, c, q, β) = πT(yE, c, q, α), if YE is observed continuously then each patient’s likelihood contribution takes the form
(6) |
The first line of expression (6) is the probability that the clot is dissolved instantaneously by the bolus, the second line is the probability that the clot is dissolved during the ci, and third line is the probability that the clot is not dissolved by standardized time s = 1, each computed either with or without toxicity, i.e. YT = 1 or 0.
When YE is evaluated at the ends of successive intervals, as in the IA tPA trial, the likelihood must account for interval censoring. Given interval , denote
(7) |
This is the relevant probability if the specific value of YE is not observed but, instead, it is only known that the efficacy event did not occur by time and did occur by time . In this case, infusion is stopped at the end of the interval, , and consequently the probability of toxicity is . It follows that
(8) |
with FE computed by applying formulas (1), (2), and (4) and πT specified by (5).
Let {IE,1, …, IE,M} be a partition of (0, 1] into all possible subintervals where YE may beknown to fall. In this case, the second term in the product that defines the likelihood (6) for continuous observation of YE is replaced by
3. Decision Criteria and Trial Conduct
3.1 Utilities
Given one of the above likelihood formulations, tailored to the trial at hand, the problem is to construct an outcome-adaptive design for determining an optimal pair (c, q). To do this, we take an approach similar to that of Houede, et al. (2010), who sequentially choose dose pairs in a phase I/II trial with bivariate ordinal outcomes by maximizing the posterior mean of an elicited utility. A fundamental di erence that we must address here when defining utilities is that the bivariate outcome consists of a binary toxicity and continuous time to response which may be interval censored. Denote the numerical utility of outcome y by U(y). In practice, U(y) is elicited from the physicians planning the trial. We will provide details of how this was done for the IA tPA trial below. Given θ, the mean over Y of the utility for a patient who receives the treatment combination (c, q) is
(9) |
For each new cohort of patients during the trial, we exploit the Bayesian model by adaptively selecting the (c, q) combination that is optimal in the sense that it maximizes the posterior mean of u(c, q, θ) based on the most recent data. Let denote the data from the first n patients, so that the accumulating data may be represented by the nested sequence as patients are treated and their outcomes are observed during trial conduct. The optimal (c, q) maximizing the posterior mean utility given (Berger, 1985)is given by the equation
(10) |
With interval censoring due to sequential evaluation of YE and Pr(YE = 0) > 0, given the resulting partition {IE0, IE,1, …, IE,M} of [0, 1] with IE,0 = {0}, a practical approach is to elicit numerical utilities for each of the 2(M + 2) sets of Y values obtained from the cross product {IE,0, IE,1, …, IE,M, IE,M+1} × {0, 1}, where IE,M+1 = (1, ∞) is the outcome that the clot was not dissolved by the end of the infusion at standardized time 1. Denoting the utility of the observed outcome {YE ∈ IE,m, YT = yT } by U(IE,m, yT), given this structure, the objective function given by (10) takes the form
A possible alternative to the utility-based approach might be to use a linear combination such as FE(1|c, q, β) – ξπT(1|c, q, α) as an objective function, where 0 < ξ < 1 is a design parameter quantifying the importance of achieving a response relative to su ering a toxicity. Another alternative might be a trade-off function based on FE(1|c, q, β) and πT(1|c, q, α), similar to that used by Thall and Cook (2004). These objective functions do not distinguish between responses achieved quickly or later during the ci period, and they fail to account for effects of YE < 1 on πT. In contrast, our utility function accounts for the desirability of all observable (YE, YT) pairs. Given the data structure and goals of the IA tPA trial, choosing (c, q) to optimize the posterior mean utility is a logical and practical approach.
3.2 Safety and Futility Constraints
It does no good to treat patients with the (c, q) that optimizes the posterior expected utility if all pairs being considered are either excessively toxic or inefficacious. To protect patients during the trial, we impose the following safety/futility rules. Given , a pair (c, q) isunacceptable if either it is likely to be too toxic,
(11) |
or it is likely to be inefficacious,
(12) |
where is the maximum allowed πT (1, c, q, θ) and is the minimum allowed FE(1, c, q, α). The fixed limits and are elicited from the physicians, while pT and pE are cut-offs, usually between .80 and .99, calibrated to obtain a design with good properties. Aside from the IA tPA trial’s data structure and the models underlying πT(1, c, q, θ) and FE(1, c, q, α), the criteria (11) and (12) are similar to the phase I/II acceptability rules used by Thall and Russell (1998), Thall and Cook (2004) and Thall, Nguyen and Estey (2008). The criterion (11) pertains to safety, and the efficacy criterion (12) is similar to Bayesian phase II futility stopping rules (Thall, Simon and Estey, 1995). Denote the set of acceptable (c, q) pairs determined by (11) and (12) based on the most recent data by . We impose the additional safety constraint that no untried concentration may be skipped when escalating, since πT(YE, c, q, βT) increases in c but may not be monotone in q. For example, if the largest value of c in {0.20, 0.30, 0.40, 0.50} for which patients have been treated is 0.30, then regardless of q the next cohort may be treated only at (c, q) pairs for which c ≤ 0.40.
A possible alternative to using (11) and (12) might be a single criterion based on u(c, q, θ). One might specify a fixed lower bound U for the utility and declare a pair (c, q) to have unacceptably low utility if for fixed cut-o pU. If all (c, q) were unacceptable in this sense the trial would be stopped. While choosing a value of U might not be as intuitive as and , since U(Y) jointly quantifies safety and efficacy the use of such a single stopping rule would provide a somewhat more unified design.
3.3 Trial Conduct
Once the design parameters and model are established, given a set of (c, q) pairs, maximum sample size, N, and cohort size, the trial is conducted as follows. The first cohort is treated at a starting (c, q) combination chosen by the physicians, and the choice may be guided by the numerical utilities and prior means. While the usual fear in phase I where only toxicity is considered is overdosing the first few patients, in the present setting when choosing the starting (c, q) pair this fear may be counterbalanced by the concern that patients may be given too little tPA to dissolve their clots. A given c may be too low to dissolve the clot that caused the stroke but high enough to cause a variety of adverse effects not associated with observable hemmorrhage (SICH) and not easily be detected, and thus such events cannot feasibly be included in an outcome-adaptive decision making procedure. Such adverse effects include cytotoxicity, degradation of extracellular matrix, and increased permeability of the neurovascular unit with the development of cerebral edema (Kaur, et al., 2004). For each cohort after the first, if is empty then the trial is stopped early with the conclusion that all (c, q) pairs are unacceptable. If is not empty then the next cohort is treated at the best acceptable pair (c, q)opt(Dn), subject to the do-not-skip rule. At the end of the trial, the (c, q)opt(data) pair based on the final data is selected.
4. Application to the IA tPA Trial
4.1 Utilities and Design Parameters
To evaluate YE, the clot is imaged at the start of infusion when the bolus is given, and thereafter every 15 minutes up to the maximum infusion duration of t* = 120 minutes. The planned observation intervals are thus [0, 15], …, (105, 120] and (120, ∞]. These are given in Table 1 along with the corresponding intervals in standardized time. Since YE is observed only at the end of each observation interval up the the last, as a covariate in the linear term of πT it can take on only the values of the interval endpoints, also given in Table 1, unless it appears in the indicator 1(YE > 1). If the evaluation times deviate from this planned schedule for some patients, however, then the likelihood can easily be modified to accommodate this, so that the actual interval observation data on YE will be recorded during the trial.
Table 1.
Time Required to Dissolve the Blood Clot |
||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Minutes | 0 | 1-15 | 16-30 | 31-45 | 46-60 | 61-75 | 76-90 | 91-105 | 106-120 | > 120 |
Elicited Utilties
|
||||||||||
No SICH | 100 | 95 | 90 | 85 | 80 | 75 | 70 | 60 | 50 | 30 |
SICH | 7 | 6.5 | 6 | 5 | 4.5 | 4 | 2 | 1 | 0 | 0 |
Value of YE Used to Compute πT(YE, c, q, β) |
||||||||||
0 | .125 | .250 | .375 | .500 | .625 | .750 | .875 | 1.00 | 1.00 |
Up to N = 36 patients will be treated in cohorts of size 1, with the aim to choose the (c, q) pair that maximizes the posterior mean utility among the set of eight possible pairs obtained from q = 0.10, 0.20 and c = 0.20, 0.30, 0.40, 0.50. The maximum sample size was chosen based on an anticipated accrual rate of 1 patient/month/site with 15 sites participating and 5% of accrued patients both eligible and consenting. This would give .75 patients per month, so a 36-patient trial would last 48 months. The admissibility limits were specified to be and , and the probability cut-offs used in (11) and (12) were pE = pT = .95. The starting pair is (c, q) = (0.20, 0.10), and N was chosen based on anticipated accrual limitations and the desire to complete the trial within a reasonable timeframe.
Numerical utilities, on a scale of 0 to 100, elicited for each combination of YT = 0 or 1 and observation interval for YE, are given in Table 1. CMA and OOZ provided the numerical utilities, acceptability limits, and , maximum sample size, N, and means of the efficacy and toxicity events used to establish the prior, described below in Section 4.2. The elicited utilities are based on clinical experience and published data on IA therapy for AIS. The elicitation was carried out in two stages, with refinement in the second stage to ensure that the utilities decreased strictly with the time interval required to dissolve the clot. In a trial for interventional management of AIS, Khatri, et al. (2009) showed that the chance of good clinical outcome decreased greatly with longer time from symptom onset to initiation of treatment. In Table 1, the large drop in utilities when SICH occurs quantifies the fact that SICH is associated with much worse clinical outcome, as close to 50% of all SICHs are fatal.
4.2 Establishing Priors
To establish priors, we used a three-step strategy. First, we elicited a large number of prior means of the probabilities p0(c, q, θ), FE(s, c, q, α), and πT (yE|c, q, β). Second, based on the elicited values we repeatedly simulated a large pseudo data set and, starting with a non-informative pseudo prior, used the average of the means of the resulting pseudo posteriors as the prior means. Third, we calibrated the prior variances of the entries θ of to obtain a prior with desirable properties.
In step 1, for each of the eight (c, q) combinations, we elicited the prior means of the probability of dissolving the clot immediately with the bolus, p0(c, q, θ), within 60 minutes, , or within 120 minutes, FE(1 | c, q, θ). Similarly, we elicited the probability of toxicity, πT(yE, c, q, θ), if the clot was dissolved instantaneously (yE = 0), if it was dissolved within the 120 minute infusion (yE = 1), or if it was not dissolved during the infusion (yE > 1). The elicited values are given in Table 2a. The last line of Table 2a expresses the prior expectation that, for c = .50 and either q = .10 or .20, the mean of πT will increase from .12 (the value for yE = 1) to .15 if the clot is not dissolved by the end of the infusion.
Table 2.
a. Elicited Prior Mean Probabilities | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
c = 0.20 | c = 0.30 | c = 0.40 | c = 0.50 | |||||||
q = .10 | E{p0(c, q, θ)} | 0.10 | 0.15 | 0.15 | 0.25 | |||||
E{FE(½ | c, q, θ)} | 0.25 | 0.30 | 0.45 | 0.50 | ||||||
E{FE(1 | c, q, θ)} | 0.35 | 0.45 | 0.60 | 0.70 | ||||||
E{πT(0, c, q, θ)} | 0.02 | 0.03 | 0.03 | 0.03 | ||||||
E{πT(1, c, q, θ)} | 0.04 | 0.06 | 0.08 | 0.12 | ||||||
q = .20 | E{p0(c, q, θ)} | 0.15 | 0.20 | 0.25 | 0.30 | |||||
E{FE(½ | c, q, θ)} | 0.40 | 0.45 | 0.50 | 0.60 | ||||||
E{FE(1 | c, q, θ)} | 0.50 | 0.60 | 0.70 | 0.80 | ||||||
E{πT(0, c, q, θ)} | 0.02 | 0.03 | 0.03 | 0.03 | ||||||
E{πT(1, c, q, θ)} | 0.04 | 0.06 | 0.08 | 0.12 | ||||||
q = .10 or .20 | E{πT(.50, q, 1(YE > 1), θ)} = .15 | |||||||||
| ||||||||||
b. Prior Mean of the Log of Each Model Parameter | ||||||||||
α 0 | α 1 | α 2 | α 3 | α 4 | α 5 | β 0 | β 1 | β 2 | β 3 | β 4 |
−1.04 | −1.60 | −7.25 | −4.74 | −2.85 | 2.37 | −6.10 | −3.79 | −7.05 | −5.42 | −7.88 |
For the second step, the elicited values were treated like the true state of nature and used to simulate 1000 large pseudo samples, each of size 400 with exactly 50 patients for each (c, q) combination. Starting with a very non-informative pseudo-prior on θ in which the logarithm of each entry followed a normal distribution with mean 0 and standard deviation 20, we used the pseudo data set to compute a pseudo posterior. The average of the 1000 pseudo posterior means were used as the prior means. These are given in Table 2b. The pseudo sample size of 400 was chosen to be large enough so that prior means obtained in this way would not change substantively with a larger pseudo sample size.
For the third step, we calibrated the variances of the entries of θ to ensure a prior that was suitably non-informative in terms of the prior effective sample sizes (ESSs) of πT(s, c, q, θ) and FE(s, c, q, θ), and also to obtain good simulated performance of the design in the actual trial across a diverse set of scenarios. Denoting πT (s, c, q, θ) or FE(s, c, q, θ) for s = 0 or 1 by p(θ), the ESS of the prior on p(θ) was approximated by matching its mean and variance with those of a beta(a, b) distribution with mean a/(a + b) and variance ab/{(a + b)2(a + b + 1)} and approximating the ESS as a + b π ≈ E{p(θ)}[1 − E{p(θ)}]/var{p(θ)} − 1. On this basis, we set the variance σ2 of log(θj) for each element θj of θ to equal σ2 = 81, which gave ESS values of each p(θ) ranging from 0.17 to 0.22 with mean 0.19. It is inappropriate to specify arbitrarily large σ2, since this may produce priors on FE(y, c, q, α) and πT(yE|c, q, β) with very large probability masses below .01 and above .99. For such a prior, the design behaves pathologically and makes erroneous decisions for patients enrolled early in the trial.
4.3 Posterior Computation
The computations for each interim decision include obtaining the posterior probabilities in the admissibility criteria (11) and (12) and posterior mean utility (9) for all (c, q) combinations. We used Markov chain Monte Carlo (MCMC) with Gibbs sampling (Robert and Cassella, 1999) to compute all posterior quantities, based on the full conditionals. Each series of sample parameters θ(1), …, θ(N) distributed proportionally to the posterior integrand was initialized at the mode using the two-level algorithm described in Braun et al. (2007). Because each sample chain was initialized at the mode, which reliably identifies the region of highest posterior probability density, we did not require any burn-in, and a single chain was used for each posterior computation. We used MCMC sample size N = 2, 000 for the dose assignments during the trial, and N = 16, 000 for the dose selection at the end of the trial. For each sample θ(i) = (α(i), β(i)), we computed p0(c, q, α(i)), then Λ(YE, c, q, α(i)), FE(YE, c, q, α(i)) and πT(YE, c, q, β(i)) at every interval endpoint YE and for every (c, q), and computed πE,T(IE,m, yT |c, q, θ(i)) from (8) for each IE,m and yT ∈ {0, 1}. The elicited utilities were averaged over each of these distributions to obtain a utility for every (c, q) given θ(i). We computed the (c, q) posterior mean utilities by averaging over the samples θ(1), …, θ(N). The Monte Carlo standard error (MCSE) was computed using the batch-means method for FE(1, c, q), πT(1, c, q) and u(c, q) at the highest and lowest (c, q) combinations. The ratios of the MCSE to the posterior standard deviation of these quantities were small (< 3%), indicating MCMC convergence.
5. Simulation Studies
Each trial was simulated 10,000 times under each of a wide variety of scenarios. Since each scenario was specified in terms of fixed values of the marginal probabilities πT(s, c, q)true and FE(s, c, q)true for s = 0 and 1, to obtain fixed true probabilities for all s in [0, 1] we used several interpolation methods, allowing πT(s, c, q)true and FE(s, c, q)true to take various shapes as functions of s. In terms of πT(s, c, q)true, the intermediate probabilities were interpolated as πT(s, c, q)true = πT(0, c, q)+{πT(1, c, q)−πT(0, c, q)}sφ for various φ, with φ = 1 giving linear interpolation, φ = 2 giving values “below linear”, and φ = 0.5 giving values “above linear”. We also used a method that sets πT(0.5, c, q)true = 0.5{(π(0, c, q)+π(1, c, q))} and interpolates using πT (s, c, q)true = πT(0, c, q) + {πT(0.5, c, q)true − πT (0, c, q)}(2s)2 for 0 ≤ s ≤ .5 and πT(.5, c, q) + {πT(1, c, q)true − πT(.5, c, q)}(2s − 1).5 for .5 ≤ s ≤ 1, to give an “S-shaped” curve. Interpolated values of FE(s, c, q)true were obtained similarly. The joint probabilities πE,T(s, c, q)true used to generate (YE, YT) given the assigned (c, q) were computed using (7) and (8), and the resulting true utility u(c, q)true was obtained from expression (9).
The simulation scenarios are given in Supplementary Tables 1 – 6. Scenario 1 uses the elicited prior means, with the utilities increasing with both c and q, so that (c, q) = (0.5, 0.2) is optimal. Scenarios 2 has a similar pattern, but with a larger increase as (c, q) goes from (0.2, 0.1) to (0.5, 0.2). In Scenario 3, the middle values c = 0.3 and 0.4 have the highest utilities, also with u(c, 0.1)true > u(c, 0.2)true so a smaller bolus is more desirable. In Scenario 4, smaller values of both c and q have higher u(c, q)true. Scenario 5 is unsafe, with unacceptably high values of all πT(s, c, q)true compared to the upper limit . In Scenario 6, all efficacy probabilities FE(s, c, q)true of dissolving the clot are unacceptably small compared to the lower limit πE = 0.50 Thus, in the last two scenarios, it is most desirable to stop the trial early and select no (c, q) pair.
To summarize the method’s overall behavior, we used the following statistic. For each scenario, let utrue,sel denote the true utility of the final selected (c, q) and umax and umin the largest and smallest true utilities among all (c, q) pairs. The summary statistic R = (utrue,sel − umin)/(umax − umin) is the proportion of the di erence between the best and worst possible choices achieved by the selected treatment. Thus, 0 ≤ R ≤ 1, with larger values corresponding to better performance of the method. In cases such as Scenarios 5 and 6 where no treatment is acceptable, the value of R is not relevant since it is not useful to choose a treatment maximizing the utility if all treatments are unacceptable. The simulations are summarized in Table 3. Under Scenarios 1 – 4, searching among the eight possible (c, q) pairs, the method does a reliable job of selecting pairs with higher utilities, and subsample sizes are favorably balanced toward more desirable pairs. The results for Scenarios 5 and 6 show that if no pair is acceptably safe and efficacious the method is likely to stop early and select no pair. In Scenarios 1 – 4 where an acceptable (c, q) pair exists, R increases with N. The values of “% none” selected increase with N. These are fixed at 2% for all N studied under Scenario 2, and reach a maximum of 100% by N = 240 in the Scenarios 5 and 6 where no pairs are acceptable.
Table 3.
Scenario | q |
c |
% none | ||||
---|---|---|---|---|---|---|---|
0.2 | 0.3 | 0.4 | 0.5 | ||||
1 Prior means |
0.1 |
utrue(c, q) % Sel (No. Pats.) |
46.9 2% (3.2) |
51.5 2% (1.6) |
59.2 7% (3.4) |
64.4 29% (8.7) |
6 |
0.2 |
utrue(c, q) % Sel (No. Pats.) |
56.1 2% (0.8) |
60.5 2% (0.8) |
65.1 11% (4.5) |
70.6 38% (11.5) |
||
2 Safe, high c and q=.2 best |
0.1 |
utrue(c, q) % Sel (No. Pats.) |
49.0 1% (2.6) |
54.9 1% (1.2) |
62.4 4% (2.3) |
71.5 17% (5.9) |
2 |
0.2 |
utrue(c, q) % Sel (No. Pats.) |
52.6 1% (0.5) |
58.4 1% (0.6) |
65.9 13% (5.3) |
73.8 60% (16.9) |
||
3 Safe, middle c and q=.1 best |
0.1 |
utrue(c, q) % Sel (No. Pats.) |
57.3 3% (2.4) |
68.9 8% (3.6) |
70.1 33% (9.0) |
64.0 33% (9.3) |
9 |
0.2 |
utrue(c, q) % Sel (No. Pats.) |
57.1 2% (0.5) |
68.6 2% (0.9) |
69.7 7% (5.4) |
63.5 3% (2.8) |
||
4 Safe, low c and q=.1 best |
0.1 |
utrue(c, q) % Sel (No. Pats.) |
61.1 43% (14.4) |
58.7 7% (3.3) |
51.6 6% (3.4) |
48.0 5% (2.8) |
12 |
0.2 |
utrue(c, q) % Sel (No. Pats.) |
58.2 22% (5.0) |
53.9 3% (1.5) |
49.5 2% (2.2) |
45.0 1% (0.9) |
||
5 Unsafe |
0.1 |
utrue(c, q) % Sel (No. Pats.) |
44.8 4% (6.5) |
45.2 1% (1.5) |
45.2 1% (2.0) |
45.0 1% (2.2) |
91 |
0.2 |
utrue(c, q) % Sel (No. Pats.) |
45.2 1% (0.9) |
45.2 0% (0.4) |
45.0 0% (0.8) |
44.3 0% (0.6) |
||
6 Safe, but no (c, q) acceptable |
0.1 |
utrue(c, q) % Sel (No. Pats.) |
38.2 0% (2.8) |
40.0 0% (1.1) |
41.9 1% (1.6) |
43.3 7% (5.2) |
83 |
0.2 |
utrue(c, q) % Sel (No. Pats.) |
39.3 0% (0.4) |
41.2 0% (0.4) |
43.1 1% (1.4) |
44.4 7% (4.3) |
We assessed sensitivity to the prior, N, cohort size, and σ, summarized in Supplementary Tables 7, 8, and 9. Supplementary Table 7 shows that, for N = 24 to 240, using a prior with mean 0 and σ2 = 20 for all log(θj) substantially degrades R under Scenarios 1 and 2, increases R under Scenarios 3 and 4, and increases the futility stopping probabilities under Scenario 6. For N = 36, there is no general pattern of R or stopping probability with cohort size 1, 2, 3, or with σ = 7 to 20. Sensitivity to the four interpolation methods for obtaining each scenario’s probabilities between successive evaluation times s = 0, 0.125, …, 1.0 is summarized in Supplementary Table 10. The stopping probabilities are insensitive to the interpolation method, but R may change very little or substantively, depending on the scenario and method. This is because each interpolation method gives diffierent shapes of πT(s, c, q)true and FE(s, c, q)true and thus a diffierent version of each scenario, which in turn changes u(c, q, θ)true and thus the method’s behavior and R. Supplementary Table 11 summarizes sensitivity to , the effect on πT of failure to dissolve the clot, for , with k = 1 to 9. In all scenarios, “% none” selected increases with k, a very desirable property. To assess the design’s performance under a more parsimonious model, we simplified the mixture distribution given by (1) and (2) by fixing α1 = α2 ≡ β1 ≡ 1 and β0 ≡ β4 = 0, so that πT(yE) = 1 − e−β2cq−β3c(1−q)(yE ; 1) and p0 = 1 − eα0cq, and assumed the Weibull hazard λ(y, c, q, α) = {α3cq + α4c(1 − q)}ψyψ−1, in place of (5). This model has 6 parameters, (α0, α4, α3, ψ, β2, β3), compared to 11 for the original model. The simulations, in Supplementary Table 12, show that using this simpler model greatly degrades the method’s performance in Scenarios 3 and 4 where the highest concentration is not the best choice, and reduces the reliability of the safety and futility rules, with early stopping probability reduced from .91 to .69 under scenario 5 and from .83 to .70 under Scenario 6.
Table 4 gives a hypothetical case-by-case example to illustrate how a trial might play out in practice, under a scenario with the best c in the middle. For each patient, the treatment values and outcomes are given with the posterior mean utilities of the eight (c, q) combinations. To conserve space, results are given for the first 12 patients and thereafter each sixth patient. The values for all 36 patients are given in Supplementary Table 13. For this example, posterior means and standard deviations of the elements of θ, and of FE(1, c, q, θ) and πT(1, c, q, θ), with prior values for comparison, are given in Supplementary Table 14.
Table 4.
Patient | Treatment |
Outcomes |
Posterior Mean Utility for Each (c, q) |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
c | q | YE | YT | (.2,.1) | (.3,.1) | (.4,.1) | (.5,.1) | (.2,.2) | (.3,.2) | (.4,.2) | (.5,.2) | |
(Prior) | – | – | – | – | 42.4 | 42.4 | 42.4 | 42.6 | 42.0 | 42.0 | 41.9 | 42.0 |
1 | 0.2 | 0.1 | 45 | No | 67.7 | 66.6 | 66.3 | 64.1 | 67.3 | 66.2 | 65.9 | 63.6 |
2 | 0.2 | 0.1 | 45 | No | 75.6 | 73.2 | 71.1 | 66.6 | 74.6 | 72.6 | 70.6 | 66.3 |
3 | 0.2 | 0.1 | No | No | 59.6 | 59.0 | 59.3 | 58.17 | 59.4 | 58.8 | 59.0 | 57.8 |
4 | 0.2 | 0.1 | No | No | 52.3 | 52.2 | 52.7 | 52.8 | 52.1 | 52.0 | 52.5 | 52.5 |
5 | 0.2 | 0.1 | No | No | 47.8 | 47.9 | 48.7 | 49.2 | 47.7 | 47.8 | 48.6 | 49.0 |
6 | 0.3 | 0.1 | 75 | No | 52.3 | 52.8 | 53.3 | 53.1 | 52.2 | 52.7 | 53.2 | 52.9 |
7 | 0.4 | 0.1 | 90 | No | 54.8 | 55.2 | 55.6 | 55.3 | 54.8 | 55.2 | 55.5 | 55.1 |
8 | 0.4 | 0.1 | 105 | No | 56.2 | 56.6 | 57.1 | 56.6 | 56.2 | 56.7 | 57.2 | 56.5 |
9 | 0.4 | 0.2 | 0 | No | 59.4 | 60.3 | 62.5 | 63.5 | 60.9 | 61.9 | 64.3 | 65.1 |
10 | 0.5 | 0.2 | 90 | Yes | 60.1 | 60.8 | 61.1 | 26.9 | 61.0 | 61.7 | 61.9 | 26.7 |
11 | 0.4 | 0.2 | 120 | No | 59.9 | 60.6 | 61.3 | 26.0 | 61.0 | 61.8 | 62.4 | 26.1 |
12 | 0.4 | 0.2 | No | Yes | 54.8 | 54.8 | 53.0 | 33.2 | 54.8 | 54.8 | 52.5 | 32.2 |
18 | 0.3 | 0.1 | 120 | No | 61.3 | 61.8 | 59.6 | 35.0 | 61.6 | 61.9 | 59.1 | 33.7 |
24 | 0.3 | 0.1 | No | No | 57.6 | 57.9 | 57.0 | 39.4 | 57.3 | 57.5 | 56.1 | 37.6 |
30 | 0.3 | 0.1 | 105 | No | 56.4 | 56.8 | 56.3 | 37.5 | 56.2 | 56.5 | 55.8 | 36.0 |
36 | 0.3 | 0.2 | 45 | No | 58.8 | 59.5 | 59.2 | 32.4 | 59.4 | 60.1 | 59.6 | 31.4 |
6. Discussion
The methodology proposed here may be extended to oncology settings. For example, suitable modifications of the methodology may be used for a chemotherapeutic anti-cancer agent administered by ci, with a possible initial bolus, with the tumor imaged repeatedly and therapy stopped when tumor response is achieved. In such settings, the time frame likely would be much longer than the 120 minute schedule considered here, and the infusion typically would include successive cycles with interim rest periods. Additionally, toxicity might be a time-to-event variable, possibly occurring during infusion and causing treatment to be suspended or permanently stopped. Such diffierences are non-trivial, however, and would require substantive modifications of λ, πT, and the decision rules.
A simpler version of the method currently is being applied to plan a similar trial of IA tPA in pediatric stroke patients. Although pediatric AIS is rare, over 75% of children with acute AIS will die or su er long-term neurological deficits (deVeber, et al., 2000). In this trial, it was decided to fix q ≡ 0.10 since a bolus of size q = 0.20 or larger was considered too risky for children. For the model, the response hazard is simplified by fixing α2 ≡ 1. The design space consists of the four concentrations {0.20, 0.30, 0.40, 0.50} and c is chosen based on , defined as the maximum over c of .
A computer program, named “CiBolus,” to implement this methodology is available from the website https://biostatistics.mdanderson.org/SoftwareDownload.
Supplementary Material
Acknowledgements
The authors thank the editor, associate editor, and a referee for their detailed and constructive comments that substantively improved the manuscript. The research of PT and HN was supported by NCI grant RO1-CA-83932.
Footnotes
7. Supplementary Materials
Supplementary Tables 1 – 14, referenced in Section 5, are available under the Paper Information link at the Biometrics website http://www.biometrics.tibs.org.
References
- Bekele BN, Shen Y. A Bayesian approach to jointly modeling toxicity and biomarker expression in a phase I/II dose-finding trial. Biometrics. 2005;60:343–354. doi: 10.1111/j.1541-0420.2005.00314.x. [DOI] [PubMed] [Google Scholar]
- Berger James O. Statistical Decision Theory and Bayesian Analysis. 2nd Edition Springer-Verlag; New York: 1985. [Google Scholar]
- Braun TM. The bivariate continual reassessment method: extending the CRM to phase I trials of two competing outcomes. Contemporary Clinical Trials. 2002;23:240–256. doi: 10.1016/s0197-2456(01)00205-7. [DOI] [PubMed] [Google Scholar]
- Braun TM, Thall PF, Nguyen H, de Lima M. Simultaneously optimizing dose and schedule of a new cytotoxic agent. Clinical Trials. 2007;4:113–124. doi: 10.1177/1740774507076934. [DOI] [PubMed] [Google Scholar]
- deVeber G, MacGregor D, Curtis R, Mayank S. Neurologic outcome in survivors of childhood arterial ischemic stroke and sinovenous thrombosis. Journal of Childhood Neurology. 2000;15(5):316–324. doi: 10.1177/088307380001500508. [DOI] [PubMed] [Google Scholar]
- Gooley TA, Martin PJ, Fisher LD, Pettinger M. Simulation as a design tool for phase I/II clinical trials: An example from bone marrow transplantation. Controlled Clinical Trials. 1994;15:450–462. doi: 10.1016/0197-2456(94)90003-5. [DOI] [PubMed] [Google Scholar]
- Houede N, Thall PF, Nguyen H, Paoletti X, Kramar A. Utility-based optimization of combination therapy using ordinal toxicity and efficacy in phase I/II trials. Biometrics. 2010;66:532–540. doi: 10.1111/j.1541-0420.2009.01302.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ibrahim JG, Chen M-H, Sinha D. Bayesian Survival Analysis. Springer; New York: 2001. [Google Scholar]
- Ivanova A. A new dose-finding design for bivariate outcomes. Biometrics. 2003;59:1001–1007. doi: 10.1111/j.0006-341x.2003.00115.x. [DOI] [PubMed] [Google Scholar]
- Johnson SC, Mendis S, Mathers CD. Global variation in stroke burden and mortality: estimates from monitoring, surveillance, and modelling. The Lancet Neurology. 2009;8:345–354. doi: 10.1016/S1474-4422(09)70023-7. [DOI] [PubMed] [Google Scholar]
- Kaur J, Zhao Z, Klein GM, Lo EH, Buchan AM. The neurotoxicity of tissue plasminogen activator? J. Cerebral Blood Flow Metabolism. 2004;24:945–963. doi: 10.1097/01.WCB.0000137868.50767.E8. [DOI] [PubMed] [Google Scholar]
- Khatri P, Abruzzo T, Yeatts SD, Nichols C, Broderick JP, Tomsick TA, IMS I and II Investigators Good clinical outcome after ischemic stroke with successful revascularization is time-dependent. Neurology. 2009;73:1066–1072. doi: 10.1212/WNL.0b013e3181b9c847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Quigley J, Hughes MD, Fenton T. Dose-finding designs for HIV studies. Biometrics. 2001;57:1018–1029. doi: 10.1111/j.0006-341x.2001.01018.x. [DOI] [PubMed] [Google Scholar]
- Robert CP, Cassella G. Monte Carlo Statistical Methods. Springer; New York: 1999. [Google Scholar]
- Thall PF, Cook JD. Dose-finding based on efficacy-toxicity trade-offs. Biometrics. 2004;60:684–693. doi: 10.1111/j.0006-341X.2004.00218.x. [DOI] [PubMed] [Google Scholar]
- Thall PF, Nguyen H, Estey EH. Patient-specific dose-finding based on bivariate outcomes and covariates. Biometrics. 2008;64:1126–1136. doi: 10.1111/j.1541-0420.2008.01009.x. [DOI] [PubMed] [Google Scholar]
- Thall PF, Russell KT. A strategy for dose finding and safety monitoring based on efficacy and adverse outcomes in phase I/II clinical trials. Biometrics. 1998;54:251–264. [PubMed] [Google Scholar]
- Thall PF, Simon R, Estey EH. Bayesian sequential monitoring designs for single-arm clinical trials with multiple outcomes. Stat in Medicine. 1995;14:357–379. doi: 10.1002/sim.4780140404. [DOI] [PubMed] [Google Scholar]
- Yuan Y, Yin G. Bayesian dose-finding by jointly modeling toxicity and efficacy as time-to-event outcomes. Journal of the Royal Statistical Society, Series C. 2009;58:954–968. [Google Scholar]
- Zhang W, Sargent DJ, Mandrekar S. An adaptive dose-finding design incorporating both efficacy and toxicity. Statistics In Medicine. 2006;25:2365–2383. doi: 10.1002/sim.2325. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.