Abstract
The Bayesian model averaging continual reassessment method (BMA-CRM) is an extension of the continual reassessment method (CRM) for dose finding. The BMA-CRM improves the robustness and overall performance of the CRM by specifying multiple skeletons (or models) and then using Bayesian model averaging to automatically favor the best fitting model for robust decision making. Specifying multiple skeletons, however, can be challenging for practitioners. In this paper, we propose a default way to specify skeletons for the BMA-CRM. We show that skeletons that appear rather different may actually lead to equivalent models. Motivated by this, we define a nonequivalence measure to index the difference among skeletons. Using this measure, we extend the model calibration method of Lee and Cheung (2009) to choose the optimal skeletons that maximize the average percentage of correct selection of the maximum tolerated dose and ensure sufficient nonequivalence among the skeletons. Our simulation study shows that the proposed method has desirable operating characteristics. We provide software to implement the proposed method.
Keywords: Bayesian adaptive design, BMA-CRM, Continual reassessment method, Maximum tolerated dose, Skeleton specification
1 Introduction
The primary goal of a phase I clinical trial is to identify the maximum tolerated dose (MTD) of a new drug, which is defined as the dose with the toxicity probability closest to the target toxicity rate. Numerous statistical methods have been developed for phase I dose-finding studies, for example, the conventional 3+3 design [1], the continual reassessment method (CRM) [2], the decision-theoretic approach [3], the dose escalation method with overdose control [4], the improved up-and-down design [5], biased coin design [6], sequential testing approach [7], the modified toxicity probability interval design [8], and the newly developed Bayesian optimal interval design [9], among others.
The CRM is an important model-based phase I trial design. The CRM prespecifies an initial shape of the dose-toxicity curve, known as the skeleton, using a parameter model, and then continuously updates the estimate of the dose-toxicity curve based on accumulating data to make the decision of dose assignment and selection. The CRM with a reasonable skeleton is generally robust [10], however a single skeleton cannot perform optimally or close to optimally under all scenarios. It is not uncommon that one skeleton (say skeleton 1) performs better than another skeleton (say skeleton 2) in a set of scenarios, whereas in another set of scenarios, skeleton 2 performs better than skeleton 1 [11, 12, 13]. To further improve the performance of the CRM, Yin and Yuan [11] proposed Bayesian model averaging CRM (BMA-CRM), which specifies multiple skeletons, say three. By treating each skeleton as an independent model, the BMA-CRM uses a Bayesian model averaging or selection approach to automatically favor the best fitting skeleton and thus the performance of the BMA-CRM is always close to the optimal one. Standalone software with a graphic user interface for BMA-CRM is freely available at the MD Anderson Department of Biostatistics. The software has been downloaded more than 700 times since its completion. BMA-CRM has been used for a number of ongoing phase I trials at MD Anderson Cancer Center and other institutions.
The most common question we have received from the users of the BMA-CRM is how to specify the three skeletons. Substantial research has been done on model calibration for the standard CRM (with single skeleton) [7, 14, 15, 16]. In particular, Lee and Cheung [14] proposed an indifference-interval based method that can be conveniently used to specify the skeleton for the CRM. However, these methods cannot be directly used to specify multiple skeletons. The rationale of the BMA-CRM is to use different skeletons to cover different possible shapes of the dose-response curve, such that as long as one of them is close to the true dose-toxicity curve, the BMA-CRM will perform well. Therefore, the natural guidance is that we should choose the skeletons in such a way that each of them represents different shapes of the dose-toxicity curve [11]. For example, we can set a skeleton to represent a slowly increasing dose-toxicity curve with a high dose as the MTD; while another skeleton represents a quickly increasing dose-toxicity curve with a low dose as the MTD. However, as we show, the specification of multiple skeletons is actually more complicated than that because seemingly different skeletons can lead to an equivalent model.
In this paper, we propose an automatic method to help practitioners specify multiple skeletons for BMA-CRM. We first define the equivalence of multiple skeletons and then convert the problem of measuring the equivalence of skeletons into a collinearity problem. Combining the proposed nonequivalence measure of multiple skeletons with the calibration method proposed by Lee and Cheung [14], we devise an automatic way to specify the optimal multiple skeletons that maximize the average percentage of correct selection of the MTD and meanwhile ensure sufficient nonequivalence among the skeletons. Simulation studies show that the proposed method has desirable operating characteristics. Software to implement the proposed method is available for free downloading at http://odin.mdacc.tmc.edu/~yyuan/.
The remainder of the article is organized as follows. In Section 2, after briefly reviewing the BMA-CRM and the calibration method of Lee and Cheung, we introduce the concept of equivalency of skeletons and the procedure to choose the optimal skeletons for the BMA-CRM. In Section 3, we investigate the operating characteristics of the proposed approach and conclude with a brief discussion in Section 4.
2 Method
2.1 Bayesian model averaging CRM (BMA-CRM)
Let d1 < ⋯ < dJ denote a set of J prespecified doses of a new drug under investigation, and ϕ be the target toxicity rate. The standard CRM assumes a working dose-toxicity model, and then based on the accumulating data, continuously updates the estimate of the dose-toxicity model and makes the decision of dose escalation/de-escalation. A commonly used working model is the following power (or empirical) model,
| (1) |
where α is an unknown parameter, and (p1, ⋯, pJ) are a set of prespecified constants, known as skeletons. The skeleton can be interpreted as the prior guess of toxicity probabilities at J doses. A normal prior distribution N(0, σ2) is often assumed for α, e.g., α ~ N(0, 2).
To improve the performance of the CRM, the BMA-CRM specifies multiple skeletons, each of them leading to a dose-toxicity model, and then uses Bayesian model averaging to automatically favor the best fitting model for decision making. As a result, the performance of the BMA-CRM is always close to the optimal one. Specifically, let {(p11, ⋯, p1J), ⋯, (pK1, ⋯, pKJ)} denote K prespecified skeletons, and (M1, ⋯, MK) be the corresponding models generated by each of these skeletons, with Mk given by .
The BMA estimate for the toxicity probability at each dose level is given by
| (2) |
where π̂kj is the posterior mean of the toxicity probability of dose level j under model Mk. The details of calculating of π̄j can be found in Yin and Yuan [11]. By assigning π̂kj a weight of Pr(Mk|D), the BMA method automatically identifies and favors the best fitting model, thus π̄j is always the best estimate. Based on π̄j, we can make the decision of dose escalation and de-escalation. The dose-finding algorithm for the BMA-CRM can be described as follows:
Patients in the first cohort are treated at the lowest dosed1, or the physician-specified dose.
-
At the current dose level jcurr, we obtain the BMA estimates for the toxicity probabilities, π̄j (j = 1, …, J), based on the cumulated data. We then find dose level j* that has a toxicity probability closest to ϕ, i.e.,
If jcurr > j*, we de-escalate the dose level to jcurr − 1; if jcurr < j*, we escalate the dose level to jcurr + 1; otherwise, the dose stays at the same level as jcurr for the next cohort of patients.
Once the maximum sample size is reached, the dose that has the toxicity probability closest to ϕ is selected as the MTD.
In addition, we add a stopping rule in our algorithm:
the trial is terminated for safety.
2.2 Lee and Cheung’s method for choosing a single skeleton
Lee & Cheung [14] proposed a practical method for choosing a single skeleton for the standard CRM based on the indifference interval, which is defined as an interval of toxicity probabilities associated with the neighboring doses of the true MTD such that these neighboring doses may be selected instead of the true MTD under large samples. In that approach, one specifies the target toxicity probability of toxicity ϕ, the prior location of the MTD ν ∈ {1, ⋯, J}, and an acceptable half-width indifference interval δ. Then a skeleton will be uniquely determined as follows
| (3) |
This skeleton guarantees that the target probability of the DLT will fall in the specified indifference interval, and can be conveniently obtained using function getprior() in R package “dfcrm”. As the indifference interval is a large sample property, in order to ensure good performance in finite samples, Lee and Cheung [14] suggested numerically searching a range of acceptable indifference intervals, rather than prespecifying a fixed value of the indifference interval, to identify the optimal skeleton that yields the highest percentage of correct dose selection (PCS) based on a set of prespecified toxicity scenarios. The authors showed that this calibration method yields a skeleton with good operating characteristics. The method of Lee and Cheung is useful for selecting a single skeleton for the standard CRM, but cannot be used for selecting multiple skeletons. We will extend this method to selecting multiple skeletons, for example, which can be use for the BMA-CRM method.
2.3 Equivalency of skeletons
As introduced previously, the rationale behind the BMA-CRM is to use multiple skeletons (or models) to represent different dose-toxicity relationships, such that as long as one of them is close to the truth, we will obtain good design performance, thanks to the property that the BMA automatically identifies and favors the best fitting model. Therefore, ideally, we would like these skeletons to be as different as possible to maximize the coverage of the model space (i.e., all possible shapes of the dose-response relationship). Achieving this goal, however, is trickier than it appears. For example, Figure 1 shows three skeletons, which represent rather different prior opinions on the dose-toxicity profile. Skeleton 1 represents an aggressive prior opinion that the first dose is the MTD (with target toxicity probability of 0.3) and the dose-toxicity curve takes a concave shape; skeleton 2 represents a neutral prior opinion that the middle (i.e., 3rd) dose level is the MTD and toxicity increases linearly with the dose; skeleton 3 represents a conservative opinion that the dose starts with a low toxicity and the highest dose is the MTD. Although the three skeletons appear to be rather different, they are actually equivalent (see below). We use the following definition to determine equivalency of multiple skeletons.
Figure 1.
Example of equivalent skeletons for the CRM
Definition 1
Two skeletons p = {p1,…, pJ} and are equivalent if , for j = 1, ⋯, J, where c is a constant.
This definition matters because of the following result:
Theorem 1
Equivalent skeletons lead to equivalent likelihood (or dose-toxicity models).
To see this, the model using skeleton p′ is
Plugging in , we obtain
| (4) |
Applying the reparameterization γ = α + log(c), model (4) becomes
which is the same as the model that uses skeleton p.
Letting p ~ p′ denote that p is equivalent to p′, it is easy to see that equivalence has property of transitivity.
Theorem 2
If p ~ p′ and p′ ~ p″, then p ~ p″.
The implication of the above results is that when the skeletons of the BMA-CRM are equivalent, it is the same as using a single skeleton and thus Bayesian model averaging cannot function for that purpose, i.e., to improve the performance of the design by automatically favoring the best-fitting skeleton. In the next section, we discuss a way to measure the degree of equivalence among multiple skeletons and propose a method to optimize the choice of multiple skeletons.
An interesting application of Definition 1 and Theorem 1 is the following result:
Theorem 3
The skeleton generated by the method of Lee and Cheung (2009) is invariant to the specification of the prior location of the MTD, i.e., ν in equations (3), in the sense that skeletons obtained by using different values of ν are equivalent.
In other words, when we use the method of Lee and Cheung (2009) or the function getprior() in the R package dfcrm to obtain the skeleton, specifying different values of ν yields equivalent skeletons, given that the other parameters are the same. The proof is provided in the Appendix.
We here focus on the power function model (1). Recently, Jia, Lee and Cheung (2014) developed a general framework to evaluate the equivalence between different types of models and calibrate them for the likelihood CRM. Specifically, they define the ψ-equivalent dose-toxicity functions based on whether these functions can represented by the same function ψ, and further showed that the choice of ν is irrelevant to the performance of the likelihood CRM for ψ-equivalent functions.
2.4 Optimize the choice of skeletons
As shown in Figure 1, though we now have the Definition 1, it is still difficult to evaluate the equivalency of multiple skeletons on the basis of a visual inspection. A quantitative measure that gauges the degree of equivalence among multiple skeleton needs to be developed. To do that, the key observation is that the equivalence condition can be rewritten as
Hence, if we view { } and {log(pj)} as observations from two independent random variables, the equivalency of two skeletons is the same as the logarithm of these two skeletons being in perfect collinearity. This result is simple but powerful because it converts the problem of determining the equivalence of skeletons into a well-studied, classical collinearity problem in linear regression analysis. Specifically, the problem of measuring the equivalence among K skeletons, p1 = (p11, ⋯, p1J), ⋯, pK = (pK1, ⋯, pKJ), can be converted into the problem of measuring the collinearity among K vectors q1 = (log(p11), ⋯, log(p1J)), ⋯, qK = (log(pK1), ⋯, log(pKJ)). Following Weinberg [17] (page 214–216), a common way to measure the collinearity among K skeletons, q1, ⋯, qK, is given by
| (5) |
where is the R2 obtained by regressing qk on the other K−1 skeletons, i.e., {qk′, k′ ≠ k}. The value of R̄2 is between 0 and 1. A small value indicates less collinearity, i.e., the K skeletons are less equivalent, and R̄2 = 1 indicates perfect collinearity, i.e., the K skeletons are equivalent. Given a set of K skeletons, we define a measure for quantifying nonequivalency of the skeletons, denoted as Q, as follows,
Our method of choosing K skeletons for BMA-CRM is based on the nonequivalence measure Q and the skeleton calibration method of Lee and Cheung [14], which can be described as follows.
For each of K skeletons to be specified, generate a pool of candidate skeletons using Lee and Cheung’s method based on a sequence of indifference intervals ranging from [a, b] with a step of c. This results in K skeleton pools, each of which contains S = (a−b)/c candidate skeletons.
Randomly select one skeleton from each of the K skeleton pools to form a K-skeleton set. This results in a total of SK possible K-skeleton sets.
Calculate the value of the nonequivalence measure Q for each of the K-skeleton sets and sort them by the value of Q from large to small.
Pick the top 20K-skeleton sets with the largest values of Q, and simulate 1,000 trials with each of them using the BMA-CRM under a set of prespecified toxicity scenarios. We choose the K-skeleton set that maximizes the average PCS (across the scenarios) as the recommended skeletons to be used in the BMA-CRM for conducting the actual trial.
Several remarks are warranted. In the above algorithm, we do not directly choose the K-skeleton set with the largest value of Q as the final recommended skeletons because, as pointed out by Lee and Cheung [14], the skeleton generated by the indifference interval only guarantees a good performance in a large sample. To ensure good finite-sample performance, following Lee and Cheung’s approach, we choose the recommended skeleton as the skeleton set that yields the highest average PCS from the top 20 skeleton sets (i.e., step 4 of the algorithm). Note that, rather than maximizing the average PCS, other criteria, e.g., maximizing the lowest PCS among the simulation scenarios (i.e., the minimax criterion), can also be used to select the final recommended skeletons. The numerical studies we present show that the use of the highest average PCS generally performs better than the minimax criterion. Last, in step 1, to generate a candidate skeleton, Lee and Cheung’s method (i.e., function getprior() in R package “dfcrm”) requires specifying three parameters: the target toxicity probability ϕ, the location of the MTD ν and a half-width indifference interval δ. We know that ϕ have set the indifference interval as a sequence from [a, b] with a step of c. To specify the location of the MTD (i.e., ν), we have proven in Therorem 3 that the skeleton generated by getprior() is actually equivalent under different values of ν. In other words, the skeleton generated by the method of Lee and Cheung is invariant to the location of the MTD. Therefore, without loss of generality, we simply set ν = [(k/K)J] (i.e., the MTD is the k/K percentile of the investigational doses) when generating the candidate skeleton pool for the kth skeleton, k = 1, ⋯, K.
3 Simulation studies
3.1 Operating characteristics
We used the simulation setting previously used by Yin and Yuan [11]. We assumed J = 8 dose levels, the maximum sample size of 30 patients, the target toxicity rate ϕ = 0.3 and 8 toxicity scenarios (see Table 1). Following the recommendation of Yin and Yuan [11], we used 3 skeletons to run the BMA-CRM. To apply the proposed procedure to determine 3 skeletons, we set the half-width indifference interval range as [0.02, 0.15], with a step of 0.01, i.e., a = 0.02, b = 0.15, and c = 0.01 in step 1 of the algorithm. This generated 14 different indifference intervals. Cheung [15] recommended the half-width indifference interval range [0.04, 0.10] for the target toxicity rate of 0.33. We slightly expanded that range to [0.02, 0.15] to obtain a broader coverage of the skeleton space. We considered two versions of the proposed procedure, one maximizing the average PCS and one maximizing the lowest PCS. We refer to the two sets of resulting skeletons as optimal skeletons and minimax skeletons, respectively. We compared the performance of these two sets of skeletons with a set of “empirical skeletons” specified by varying the prior location of the MTD and shape of the dose curve (see Figure 2):
Table 1.
Selection percentage and the number of patients treated under 8 dose-toxicity scenarios.
| Skeletons | Dose level
|
Avg # of toxicity | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |||
| Scenario 1 | ||||||||||
| Tox rate | 0.06 | 0.15 | 0.30 | 0.55 | 0.60 | 0.65 | 0.68 | 0.70 | ||
| Empirical | Sel % | 0.8 | 15.9 | 66.5 | 15.7 | 0.9 | 0.1 | 0 | 0 | 8.7 |
| # Pts | 4.0 | 6.8 | 12.8 | 5.3 | 0.9 | 0.1 | 0 | 0 | ||
| Minimax | Sel % | 0 | 15.4 | 70.3 | 13.5 | 0.6 | 0 | 0 | 0 | 8.5 |
| # Pts | 3.9 | 6.4 | 13.5 | 5.4 | 0.7 | 0.1 | 0 | 0 | ||
| Optimal | Sel % | 0.2 | 16.5 | 72.1 | 10.5 | 0.5 | 0 | 0 | 0.1 | 8.5 |
| # Pts | 4.0 | 6.9 | 13.2 | 5.2 | 0.7 | 0 | 0 | 0 | ||
| CRM | Sel % | 0 | 14 | 67.6 | 17.3 | 1.3 | 0 | 0 | 0 | 8.6 |
| # Pts | 3.9 | 6.9 | 12.8 | 5.8 | 0.6 | 0 | 0 | 0 | ||
| Scenario 2 | ||||||||||
| Tox rate | 0.02 | 0.03 | 0.05 | 0.07 | 0.30 | 0.50 | 0.70 | 0.80 | ||
| Empirical | Sel % | 0 | 0 | 0 | 10.1 | 61.5 | 26.3 | 1.9 | 0.2 | 7.2 |
| # Pts | 3.2 | 3.0 | 3.1 | 4.4 | 9.1 | 6.1 | 1.0 | 0 | ||
| Minimax | Sel % | 0 | 0 | 0.2 | 12.1 | 64.6 | 21.0 | 1.4 | 0.7 | 6.8 |
| # Pts | 3.2 | 3.0 | 3.1 | 4.7 | 9.7 | 5.2 | 0.9 | 0.1 | ||
| Optimal | Sel % | 0 | 0 | 0 | 9.5 | 71.7 | 17.1 | 1.1 | 0.6 | 6.7 |
| # Pts | 3.2 | 3.0 | 3.2 | 4.8 | 10.1 | 4.9 | 0.8 | 0 | ||
| CRM | Sel % | 0 | 0 | 0 | 6.9 | 59 | 32 | 0 | 0 | 6.6 |
| # Pts | 3.2 | 3.3 | 3.5 | 4.4 | 9.8 | 5.3 | 0.4 | 0 | ||
| Scenario 3 | ||||||||||
| Tox rate | 0.02 | 0.03 | 0.05 | 0.06 | 0.07 | 0.09 | 0.10 | 0.30 | ||
| Empirical | Sel % | 0 | 0 | 0 | 0.5 | 1.0 | 2.7 | 17.4 | 78.4 | 3.4 |
| # Pts | 3.2 | 3.1 | 3.2 | 3.3 | 3.4 | 3.6 | 4 | 6.4 | ||
| Minimax | Sel % | 0 | 0 | 0 | 0 | 1.0 | 3.7 | 14.3 | 81.0 | 3.3 |
| # Pts | 3.2 | 3.0 | 3.1 | 3.3 | 3.4 | 3.6 | 3.9 | 6.4 | ||
| Optimal | Sel % | 0 | 0 | 0 | 0 | 0.4 | 1.6 | 9.3 | 88.7 | 3.6 |
| # Pts | 3.2 | 3.0 | 3.1 | 3.2 | 3.2 | 3.3 | 3.5 | 7.5 | ||
| CRM | Sel % | 0 | 0 | 0 | 0 | 3.8 | 9.7 | 24.3 | 61.8 | 2.8 |
| # Pts | 3.2 | 3.3 | 3.5 | 3.7 | 4.2 | 4.3 | 3.8 | 4.1 | ||
| Scenario 4 | ||||||||||
| Tox rate | 0.10 | 0.20 | 0.30 | 0.40 | 0.50 | 0.60 | 0.70 | 0.80 | ||
| Empirical | Sel % | 2.7 | 21.8 | 41.4 | 27.3 | 5.8 | 0.6 | 0.1 | 0 | 8.3 |
| # Pts | 5.6 | 7.5 | 8.9 | 5.6 | 1.9 | 0.4 | 0 | 0 | ||
| Minimax | Sel % | 1.8 | 20.1 | 49.7 | 23.1 | 4.0 | 0.7 | 0 | 0.2 | 8.1 |
| # Pts | 5.4 | 7.4 | 9.7 | 5.6 | 1.6 | 0.3 | 0 | 0 | ||
| Optimal | Sel % | 1.8 | 20.1 | 49.7 | 23.1 | 4.0 | 0.7 | 0 | 0.2 | 8.1 |
| # Pts | 5.4 | 7.4 | 9.7 | 5.6 | 1.6 | 0.3 | 0 | 0 | ||
| CRM | Sel % | 1.4 | 18.6 | 46.6 | 28.6 | 0.4 | 0 | 0 | 0 | 8.0 |
| # Pts | 5.0 | 1.6 | 9.9 | 5.5 | 1.2 | 0 | 0 | 0 | ||
| Scenario 5 | ||||||||||
| Tox rate | 0.20 | 0.30 | 0.40 | 0.50 | 0.60 | 0.65 | 0.70 | 0.75 | ||
| Empirical | Sel % | 22.7 | 40.6 | 26.7 | 4.9 | 0.5 | 0 | 0 | 0 | 8.9 |
| # Pts | 10.4 | 9.4 | 6.8 | 2 | 0.3 | 0 | 0 | 0 | ||
| Minimax | Sel % | 23.1 | 43.0 | 23.3 | 4.9 | 0.4 | 0.1 | 0 | 0 | 8.7 |
| # Pts | 11.6 | 9.1 | 6.1 | 1.9 | 0.3 | 0 | 0 | 0 | ||
| Optimal | Sel % | 21.3 | 46.4 | 20.7 | 5.1 | 0.4 | 0.5 | 0.2 | 0 | 8.7 |
| # Pts | 11.0 | 9.6 | 5.9 | 1.9 | 0.4 | 0.1 | 0 | 0 | ||
| CRM | Sel % | 20.7 | 45.5 | 29.4 | 4.3 | 0 | 0 | 0 | 0 | 8.9 |
| # Pts | 10.5 | 11.0 | 6.7 | 1.6 | 0 | 0 | 0 | 0 | ||
| Scenario 6 | ||||||||||
| Tox rate | 0.02 | 0.06 | 0.08 | 0.12 | 0.20 | 0.30 | 0.40 | 0.50 | ||
| Empirical | Sel % | 0 | 0 | 0.2 | 4.1 | 25.9 | 38.8 | 23.4 | 7.6 | 5.8 |
| # Pts | 3.2 | 3.1 | 3.3 | 4.2 | 6.2 | 6.1 | 3.1 | 0.8 | ||
| Minimax | Sel % | 0 | 0 | 0.4 | 5.3 | 29.5 | 37.0 | 15.6 | 12.2 | 5.6 |
| # Pts | 3.2 | 3.1 | 3.4 | 4.6 | 6.6 | 5.5 | 2.7 | 0.9 | ||
| Optimal | Sel % | 0 | 0 | 0.2 | 5.4 | 27.5 | 40.3 | 17.1 | 9.5 | 5.5 |
| # Pts | 3.3 | 3.1 | 3.5 | 4.6 | 6.4 | 5.8 | 2.4 | 0.9 | ||
| CRM | Sel % | 0 | 0 | 0.3 | 5.5 | 26.7 | 42.5 | 21.0 | 0.4 | 5.1 |
| # Pts | 3.2 | 3.6 | 3.9 | 4.9 | 6.6 | 5.4 | 1.9 | 0.3 | ||
| Scenario 7 | ||||||||||
| Tox rate | 0.02 | 0.03 | 0.04 | 0.06 | 0.08 | 0.10 | 0.30 | 0.50 | ||
| Empirical | Sel % | 0 | 0 | 0 | 0 | 1.1 | 17 | 50.9 | 31 | 4.9 |
| # Pts | 3.2 | 3.0 | 3.1 | 3.2 | 3.5 | 4.4 | 6.1 | 3.5 | ||
| Minimax | Sel % | 0 | 0 | 0 | 0.1 | 1.9 | 18.6 | 48.3 | 31.1 | 4.7 |
| # Pts | 3.2 | 3.0 | 3.1 | 3.3 | 3.5 | 4.6 | 5.8 | 3.5 | ||
| Optimal | Sel % | 0 | 0 | 0 | 0 | 2.3 | 16.5 | 53.4 | 27.8 | 4.6 |
| # Pts | 3.2 | 3 | 3.1 | 3.3 | 3.6 | 4.9 | 6.5 | 2.4 | ||
| CRM | Sel % | 0 | 0 | 0 | 0 | 3.9 | 17.7 | 50.4 | 27.6 | 4.0 |
| # Pts | 3.2 | 3.3 | 3.4 | 3.7 | 4.3 | 4.6 | 5.5 | 2.1 | ||
| Scenario 8 | ||||||||||
| Tox rate | 0.03 | 0.07 | 0.10 | 0.15 | 0.20 | 0.30 | 0.50 | 0.70 | ||
| Empirical | Sel % | 0 | 0 | 0.8 | 6.4 | 28 | 45.3 | 17.9 | 1.6 | 6.1 |
| # Pts | 3.3 | 3.2 | 3.7 | 4.6 | 5.9 | 5.9 | 2.9 | 0.4 | ||
| Minimax | Sel % | 0 | 0.1 | 1.7 | 8.8 | 33.4 | 37.0 | 11.9 | 7.1 | 5.8 |
| # Pts | 3.3 | 3.2 | 3.7 | 4.9 | 6.4 | 5.9 | 2.2 | 0.4 | ||
| Optimal | Sel % | 0 | 0 | 1.1 | 9.2 | 31.1 | 43.3 | 10.1 | 5.2 | 5.8 |
| # Pts | 3.3 | 3.2 | 3.7 | 4.9 | 6.4 | 5.9 | 2.2 | 0.4 | ||
| CRM | Sel % | 0 | 0 | 1.1 | 10.2 | 31.2 | 40.4 | 15.9 | 1.1 | 5.3 |
| # Pts | 3.2 | 3.8 | 4.5 | 5.6 | 6.5 | 4.7 | 1.6 | 0.1 | ||
Figure 2.
Three empirically chosen skeletons
Skeleton 1: 0.30, 0.39, 0.48, 0.57, 0.64, 0.71, 0.76, 0.81
Skeleton 2: 0.15, 0.19, 0.22, 0.26, 0.30, 0.34, 0.38, 0.42
Skeleton 3: 0.0001, 0.002, 0.01, 0.038, 0.095, 0.19, 0.30, 0.42
Although they appear to be very different, these three arbitrarily specified skeletons are actually close to being equivalent, with Q = 0.03. We also compared the proposed method to the standard CRM that uses one skeleton. For the one skeleton CRM, the skeleton was generated from the function getprior with the half-width indifference interval δ = 0.05 and prior MTD location ν = 5.
Table 1 shows the results based on 1,000 simulated trials, including the selection percentage of each dose as the MTD, the average number of patients treated at each dose, and average number of toxicity events. We first describe the comparison results among three approaches that use multiple skeletons. In scenario 1, the third dose is the MTD. The empirical skeletons yielded the lowest PCS of 66.5% and the optimal skeletons yielded the highest PCS of 72.1%. The PCS achieved by the minimax skeletons is lower than that achieved by the optimal skeletons (i.e., 70.3%). Though the number of patients treated at the 3rd dose level using the empirical skeletons (12.8) is less than the number obtained when using the optimal (13.2) and minimax (13.5) skeletons, the results are comparable. Scenario 2 has the MTD at the fifth dose level, and the optimal skeletons performed best with the highest PCS (71.7%) and the largest number of patients treated at the MTD (10.1). The minimax skeletons performed the second best while the empirical skeletons performed the worst. Similar results are observed in scenarios 3, 4 and 5. In scenario 6, the sixth dose is the MTD, and the PCS obtained when using the optimal skeletons is the highest (40.3%), but that obtained when using the minimax skeletons is lowest (37%). Similar results are observed in scenario 7. In scenario 8, the MTD is located at the sixth dose. In this case, the empirical skeletons yielded the PCS of 45.3%. The PCS using the optimal skeletons is 43.3%, and the PCS using the minimax skeletons is 37.0%. The average numbers of patients treated at the MTD are similar among the three sets of skeletons.
We now turn to the comparison between the standard CRM (using a single skeleton) and the BMA-CRM with three optimal skeletons. As shown in Table 1 and Figure 3, the BMA-CRM generally outperformed the CRM. The PCS of the BMA-CRM was higher than that of the CRM in 7 out of 8 scenarios. For example, in scenario 1, the PCS of the BMA-CRM using the optimal skeletons was 72.1%, while that of the CRM was 67.6%. This is consistent with the finding of Yin and Yuan (2009). That is, using multiple skeletons improves the performance of the CRM because the design automatically favors the best performed skeleton.
Figure 3.
The percentage of correct dose selection (PCS) of the MTD in eight scenarios
In summary, the simulations show that the BMA-CRM equipping with the proposed skeletons has better overall performance than the CRM with one single skeleton. Between two proposed skeletons, the optimal skeletons perform better than the minimax skeletons, and thus we recommend the former for practical use with the BMA-CRM.
3. 2 Sensitivity analysis
Our algorithm picks the skeleton set that maximizes the average PCS from the top 20 skeleton sets (with the largest values of Q) as the recommended optimal skeletons. To investigate the effect of the nonequivalence measure Q on the performance of the design, we considered two other ways to choose the optimal skeletons. Specifically, after calculating the value of the nonequivalence measure Q for all possible K-skeleton sets and sorting them by the value of Q from large to small (i.e., step 3 of the algorithm), rather than picking the top 20 skeleton sets, we picked the middle or bottom 20 skeleton sets, among which we chose the skeleton set that maximized the average PCS as the optimal skeletons. We compared the performance of these three ways of choosing the optimal skeletons under 8 scenarios. Figures 4 displays the PCS and the number of patients treated at the MTD. We can see that in general, the optimal skeleton set selected from the sets with the top 20 Q values performed better than that based on the sets with the middle 20 Q values, which performed better than that based on the sets with the bottom 20 Q values. This result shows that using skeletons with larger values of Q, i.e., using more diverse skeletons, generally improves the performance of the BMA-CRM. Across the 8 scenarios, although the optimal skeleton set based on the sets with the top 20 Q values has the best or near best performance in terms of both the PCS and the number of patients treated at the MTD, it is not always the best. This is reasonable because, in the BMA-CRM, the objective of using multiple skeletons is to improve the robustness of the design and ensure that the design generally has good performance across various scenarios, not to guarantee that the design always perform the best in every single scenario. The details of the simulation results are provided in Table 2.
Figure 4.
Performances of in eight scenarios based on the top, middle and bottom 20 skeleton sets
Table 2.
Simulation results when the recommended skeleton set selected from skeleton sets with top, middle and bottom 20 Q values
| Q value | Dose level
|
Avg # of toxicity | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |||
| Scenario 1 | ||||||||||
| Tox rate | 0.06 | 0.15 | 0.30 | 0.55 | 0.60 | 0.65 | 0.68 | 0.70 | ||
| Top | Sel % | 0.2 | 16.5 | 72.1 | 10.5 | 0.5 | 0 | 0 | 0.1 | 8.5 |
| # Pts | 4.0 | 6.9 | 13.2 | 5.2 | 0.7 | 0 | 0 | 0 | ||
| Middle | Sel % | 0.1 | 15.3 | 68.3 | 15.1 | 0.9 | 0.2 | 0 | 0 | 8.6 |
| # Pts | 4.0 | 6.5 | 13.0 | 5.6 | 0.8 | 0.1 | 0 | 0 | ||
| Bottom | Sel % | 0.3 | 17.2 | 69.7 | 11.7 | 0.9 | 0 | 0 | 0 | 8.6 |
| # Pts | 4.1 | 6.9 | 12.9 | 5.3 | 0.6 | 0.1 | 0 | 0 | ||
| Scenario 2 | ||||||||||
| 0.02 | 0.03 | 0.05 | 0.07 | 0.30 | 0.50 | 0.70 | 0.80 | |||
| Top | Sel % | 0 | 0 | 0 | 9.5 | 71.7 | 17.1 | 1.1 | 0.6 | 6.7 |
| # Pts | 3.2 | 3.0 | 3.2 | 4.8 | 10.1 | 4.9 | 0.8 | 0 | ||
| Middle | Sel % | 0 | 0 | 0 | 8.2 | 67.4 | 23.5 | 0.9 | 0 | 6.7 |
| # Pts | 3.3 | 3.1 | 3.2 | 4.7 | 9.8 | 5.1 | 0.8 | 0 | ||
| Bottom | Sel % | 0 | 0 | 0.3 | 9.0 | 66.1 | 23.3 | 1.3 | 0 | 6.7 |
| # Pts | 3.2 | 3.0 | 3.2 | 4.8 | 9.7 | 5.2 | 0.8 | 0 | ||
| Scenario 3 | ||||||||||
| 0.02 | 0.03 | 0.05 | 0.06 | 0.07 | 0.09 | 0.10 | 0.30 | |||
| Top | Sel % | 0 | 0 | 0 | 0 | 0.4 | 1.6 | 9.3 | 88.7 | 3.6 |
| # Pts | 3.2 | 3.0 | 3.1 | 3.2 | 3.2 | 3.3 | 3.5 | 7.5 | ||
| Middle | Sel % | 0 | 0 | 0 | 0.6 | 1.2 | 5.9 | 25.5 | 66.8 | 3.2 |
| # Pts | 3.2 | 3.0 | 3.2 | 3.4 | 3.6 | 3.9 | 4.4 | 5.3 | ||
| Bottom | Sel % | 0 | 0 | 0.1 | 1.2 | 3.3 | 11.8 | 25.2 | 58.4 | 3.2 |
| # Pts | 3.2 | 3.0 | 3.2 | 3.7 | 3.9 | 4.1 | 3.9 | 4.8 | ||
| Scenario 4 | ||||||||||
| 0.10 | 0.20 | 0.30 | 0.40 | 0.50 | 0.60 | 0.70 | 0.80 | |||
| Top | Sel % | 1.8 | 20.1 | 49.7 | 23.1 | 4.0 | 0.7 | 0 | 0.2 | 8.1 |
| # Pts | 5.4 | 7.4 | 9.7 | 5.6 | 1.6 | 0.3 | 0 | 0 | ||
| Middle | Sel % | 1.1 | 19.8 | 51.8 | 22.4 | 4.3 | 0.3 | 0 | 0 | 8.0 |
| # Pts | 5.1 | 7.5 | 10.6 | 5.3 | 1.3 | 0.2 | 0 | 0 | ||
| Bottom | Sel % | 1.7 | 21.8 | 48.3 | 22.9 | 4.9 | 0.4 | 0 | 0 | 8.1 |
| # Pts | 5.1 | 7.9 | 9.7 | 5.5 | 1.5 | 0.2 | 0 | 0 | ||
| Scenario 5 | ||||||||||
| 0.20 | 0.30 | 0.40 | 0.50 | 0.60 | 0.65 | 0.70 | 0.75 | |||
| Top | Sel % | 21.3 | 46.4 | 20.7 | 5.1 | 0.4 | 0.5 | 0.2 | 0 | 8.7 |
| # Pts | 11.0 | 9.6 | 5.9 | 1.9 | 0.4 | 0.1 | 0 | 0 | ||
| Middle | Sel % | 20.5 | 49.3 | 23.4 | 3.6 | 0.1 | 0 | 0 | 0 | 8.7 |
| # Pts | 11.3 | 10.3 | 5.9 | 1.5 | 0.3 | 0 | 0 | 0 | ||
| Bottom | Sel % | 21.7 | 47.2 | 22.7 | 3.7 | 0.3 | 0 | 0 | 0 | 8.7 |
| # Pts | 10.6 | 10.1 | 6.4 | 1.7 | 0.3 | 0 | 0 | 0 | ||
| Scenario 6 | ||||||||||
| 0.02 | 0.06 | 0.08 | 0.12 | 0.20 | 0.30 | 0.40 | 0.50 | |||
| Top | Sel % | 0 | 0 | 0.2 | 5.4 | 27.5 | 40.3 | 17.1 | 9.5 | 5.5 |
| # Pts | 3.3 | 3.1 | 3.5 | 4.6 | 6.4 | 5.8 | 2.4 | 0.9 | ||
| Middle | Sel % | 0 | 0.1 | 0.2 | 5.3 | 27.0 | 43.7 | 19.2 | 4.5 | 5.4 |
| # Pts | 3.3 | 3.1 | 3.4 | 4.6 | 6.5 | 5.8 | 2.7 | 0.7 | ||
| Bottom | Sel % | 0 | 0 | 0.3 | 6.2 | 32.9 | 41.9 | 15.3 | 3.4 | 5.3 |
| # Pts | 3.2 | 3.1 | 3.5 | 4.9 | 7.0 | 5.5 | 2.2 | 0.5 | ||
| Scenario 7 | ||||||||||
| 0.02 | 0.03 | 0.04 | 0.06 | 0.08 | 0.10 | 0.30 | 0.50 | |||
| Top | Sel % | 0 | 0 | 0 | 0 | 2.3 | 16.5 | 53.4 | 27.8 | 4.6 |
| # Pts | 3.2 | 3 | 3.1 | 3.3 | 3.6 | 4.9 | 6.5 | 2.4 | ||
| Middle | Sel % | 0 | 0 | 0 | 0.3 | 3.0 | 18.0 | 54.0 | 24.7 | 4.4 |
| # Pts | 3.2 | 3.1 | 3.1 | 3.4 | 3.7 | 4.9 | 5.9 | 2.8 | ||
| Bottom | Sel % | 0 | 0 | 0 | 0.6 | 2.9 | 22.2 | 53.3 | 21.0 | 4.2 |
| # Pts | 3.2 | 3.0 | 3.1 | 3.5 | 4.0 | 4.9 | 5.7 | 2.6 | ||
| Scenario 8 | ||||||||||
| 0.03 | 0.07 | 0.10 | 0.15 | 0.20 | 0.30 | 0.50 | 0.70 | |||
| Top | Sel % | 0 | 0 | 1.1 | 9.2 | 31.1 | 43.3 | 10.1 | 5.2 | 5.8 |
| # Pts | 3.3 | 3.2 | 3.7 | 4.9 | 6.4 | 5.9 | 2.2 | 0.4 | ||
| Middle | Sel % | 0 | 0 | 1.4 | 10.4 | 28.8 | 46.8 | 12.1 | 0.5 | 5.6 |
| # Pts | 3.4 | 3.2 | 3.7 | 5.4 | 6.5 | 5.7 | 1.9 | 0.2 | ||
| Bottom | Sel % | 0 | 0 | 1.8 | 11.9 | 32.6 | 42.0 | 11.2 | 0.5 | 5.5 |
| # Pts | 3.4 | 3.2 | 4.0 | 5.6 | 6.7 | 5.2 | 1.7 | 0.2 | ||
4 Conclusion
The BMA-CRM is an extension of the CRM that improves the robustness of the design by specifying multiple skeletons and then using Bayesian model averaging to automatically favor the best fitting model for robust dose finding. The major difficulty for practitioners when using the BMA-CRM is the requirement of specifying multiple skeletons. To overcome this issue, we propose a default, automatic method to help practitioner specify multiple skeletons when using the BMA-CRM. We define a measure to gauge the difference among multiple skeletons and then, based on that measure, we develop a model calibration method to select the optimal skeletons. The simulation studies show that the proposed method produces robust operating characteristics.
To facilitate the use of the method in practice, we provide the R function that automatically generates the three skeletons to be used with the BMA. To obtain the optimal skeletons, users only need to provide a set of representative dose-toxicity scenarios. Ideally, the specification of these representative dose-toxicity scenarios should be consulted with physicians, and cover various cases in terms of the true location of the MTD and the shape of the dose-toxicity curve. We are in the process of incorporating the proposed method into the existing BMA-CRM software. As the existing BMA-CRM software already has a module for generating the operating characteristics, which requires the user to enter the representative dose-toxicity scenarios. The proposed method will not bring extra burden to the users and can be seamlessly incorporated into the existing software.
Acknowledgments
The authors thank the editor and two reviewers for very insightful and constructive comments that substantially improved the article. Pan’s research was partially supported by Research Grant 81302513 from the National Science Foundation of China, China Postdoctoral Science Foundation funded project of 2014M562601, Natural Science Basic Research Plan of Shaanxi Province of China (2015JM8405) and Scientific Research Project of Education Department of Shaanxi Province (Grant 15JK1275). Yuan’s research was partially supported by grants CA154591, CA016672, and 5P50CA098258 from the National Cancer Institute.
Appendix Proof of Theorem 3
Proof
Because of the property of transitivity of equivalency between skeletons (i.e., Theorem 2), we only need to show that the skeletons are equivalent for v = v1 and v = v1 + 1.
According to the algorithm by Lee & Cheung (2009) in their paper,
For v = v1,
For v = v1 + 1:
Given any s, if s ∈ [v1, ⋯, k−1], we have
Similarly, if s ∈ [2, v1], we have
Thus, for any two skeletons generated by the method of Lee & Cheung, we prove that the ratio of logarithm of the two skeleton is constant. That is, the two skeletons are equivalent based on the Definition 1 proposed in subsection 2.3, irrespective of prior guess of the MTD location.
Footnotes
This research was performed in partial fulfillment of the requirements for the Ph.D degrees from The University of Texas Graduate School of Biomedical Sciences at Houston; The University of Texas MD Anderson Cancer Center, Houston, Texas 77030
Conflict of Interest
None declared.
References
- 1.Storer BE. Design and Analysis of Phase I Clinical Trials. Biometrics. 1989;45:925–937. [PubMed] [Google Scholar]
- 2.O’Quigley J, Pepe M, Fisher L. Continual reassessment method: a practical design for phase 1 clinical trials in cancer. Biometrics. 1990;46:33–48. [PubMed] [Google Scholar]
- 3.Whitehead J, Brunier H. Bayesian Decision Procedures for Dose Determining Experiments. Statistics in Medicine. 1995;14:885–893. doi: 10.1002/sim.4780140904. [DOI] [PubMed] [Google Scholar]
- 4.Babb J, Rogatko A, Zacks S. Cancer Phase I Clinical Trials: Efficient Dose Escalation With Overdose Control. Statistics in Medicine. 1998;17:1103–1120. doi: 10.1002/(sici)1097-0258(19980530)17:10<1103::aid-sim793>3.0.co;2-9. [DOI] [PubMed] [Google Scholar]
- 5.Leung D, Wang YG. An improved up-and-down design for Phase I trials. Controlled Clinical Trials. 2001;22:126–138. doi: 10.1016/s0197-2456(00)00132-x. [DOI] [PubMed] [Google Scholar]
- 6.Stylianou M, Flournoy N. Dose Finding Using the Biased Coin Up-and-down Design and Isotonic Regression. Biometrics. 2002;58:171–177. doi: 10.1111/j.0006-341x.2002.00171.x. [DOI] [PubMed] [Google Scholar]
- 7.Cheung Y. Sequential implementation of stepwise procedures for identifying the maximum tolerated dose. Journal of the American Statistical Association. 2007;102:1448–1461. [Google Scholar]
- 8.Ji Y, Li Y, Bekele BN. Dose-finding in phase I clinical trials based on toxicity probability interval. Clinical Trials. 2007;4:235–244. doi: 10.1177/1740774507079442. [DOI] [PubMed] [Google Scholar]
- 9.Liu S, Yuan Y. Bayesian Optimal Interval Designs for Phase I Clinical Trials. Journal of the Royal Statistical Society: Series C. 2015;64(3):507–523. [Google Scholar]
- 10.Shen L, O’Quigley J. Consistency of Continual Reassessment Method Under Model Misspecification. Biometrika. 1996;83:395–405. [Google Scholar]
- 11.Yin G, Yuan Y. Bayesian model averaging continual reassessment method in phase I clinical trials. Journal of the American Statistical Association. 2009;104:954–968. [Google Scholar]
- 12.Daimon T, Zohar S, O’Quigley J. Posterior maximization and averaging for Bayesian working model choice in the continual reassessment method. Statistics in Medicine. 2011;30:1563–1573. doi: 10.1002/sim.4054. [DOI] [PubMed] [Google Scholar]
- 13.Asakawa T, Hirakawa A, Hamada C. Bayesian model averaging continual reassessment method for bivariate binary efficacy and toxicity outcomes in phase I oncology trials. Journal of Biopharmaceutical Statistics. 2014;24:310–325. doi: 10.1080/10543406.2013.863779. [DOI] [PubMed] [Google Scholar]
- 14.Lee SM, Cheung YK. Model calibration in the continual reassessment method. Clinical Trials. 2009;6(3):227–238. doi: 10.1177/1740774509105076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Cheung YK. Dose Finding by the Continual Reassessment method. Chapman & Hall; Boca Raton: 2011. [Google Scholar]
- 16.Jia X, Lee SM, Cheung YK. Characterization of the likelihood continual reassessment method. Biometrika. 2014;101:599–612. [Google Scholar]
- 17.Weisberg S. Applied Linear Regression. 3. John Wiley & Sons, Inc; Hoboken, New Jersey: 2005. [Google Scholar]




