Abstract
The statistical problem of bridging is closely associated with the problem of heterogeneity in dose-finding studies. There are some distinctive features in the case of bridging which need to be considered if efficient estimation of the maximum tolerated dose (MTD) is to be accomplished. The case of two distinct populations is considered. Extensions to several populations are, at least in principle, straightforward although, in practice, likely to be awkward and infrequently encountered. The goal is to make efficient use of information gained in one study in the context of a second study. Since working models are typically misspecified it is not possible to just add a further parameter to deal with an added source of variability.
RÉSUMÉ
Le problème statistique de l’association croisée est intimement lié au problème de l’hétérogénéité dans les études de dosages. Quelques aspects spécifiques dans le cas de l’association croisée sont à considérer si une estimation efficace de la dose maximum tolérée (DMT) doit être obtenue. Le cas de deux populations distinctes est envisagé. Les extensions au cas de plusieurs populations sont, au moins en principe, directes, mais en pratique vraisemblablement gauches et peu fréquentes. Le but est d’utiliser de façon efficace l’information tirée d’une premiére étude dans le contexte d’une seconde. Comme les modèles sont en général mal spécifiés, il n’est pas possible de se contenter d’ajouter un paramètre pour traiter une nouvelle source d’aléa.
1. Dose finding for heterogeneous groups
Assume that we have available k ordered doses; d1 < d2 < ⋯ < dk. The probability of toxicity, R(di), is such that R(di) < R(dj) whenever i < j. The maximum tolerated dose (MTD) is that dose d0 ∈ {d1, …, dk } such that, for some given θ ∈ (0, 1),
| (1) |
Y j takes the value 1 in the case of a toxic response for the jth entered subject ( j = 1, …, n) and 0 otherwise. The dose for the jth entered subject, X j is viewed as random taking values xj ∈ {d1, …, dk}; j = 1, …, n. Thus Pr(Y j = 1| X j = xj) = R(xj). We suppose that
| (2) |
for some one parameter model ψ(xj, a) and a defined on the set . For every a, ψ(x, a) should be monotone increasing in x and, for any x, ψ(x, a) should be monotone in a. For every di there exists some such that R(di) = ψ(di, ai). Here, we take , where i runs from 1 to k, and where 0 < α1 < ⋅ ⋅ ⋅ < αk < 1 and −∞ < a < ∞.
The true mechanism generating the observations can be quite removed from our working model overall, but, close to our target, the true situation and our working model coincide. For the six levels studied in the simulations by O’Quigley, Pepe and Fisher [3] the working model had α1 = 0.05, α2 = 0.10, α3 = 0.20, α4 = 0.30, α5 = 0.50 and α6 = 0.70. Once a model has been chosen and we have data in the form of the set Ωj = { y1, x1, …, y j, xj}, the outcomes of the first j experiments we obtain estimates of the true unknown probabilities R(di) (i = 1, …, k) at the k dose levels. Denote the set of outcomes of the first j subjects as Ωj. Given this information, a posterior distribution for a is denoted by f (a, Ωj) from which we induce a posterior distribution for ψ(di, a), i = 1, …, k, leading to summary estimates of the toxicity probabilities at each level. Specifically;
| (3) |
We can take , an integral equation providing the starting dose. Given the set Ωj and the log-likelihood the posterior density for a is:
| (4) |
The dose xj+1 ∈ {d1, …, dk} assigned to the (j + 1)th included patient is the dose minimizing the Euclidean distance between θ and ∫ψ{xj+1, u} f (u|Ωj)du. We can extend the single model to a class of models ψm(xj, a) for m = 1, …, M where there are a total of M possible models. In particular, we might consider [2]
| (5) |
where 0 < αm1 < · · · < αmk < 1 and −∞ < a < ∞, as an immediate generalization of the single model. Any prior information concerning the plausibility of each model is denoted π(m), m = 1, …, M, where π(m) ⩾ 0 and where Σm π(m) = 1. In the simplest case where each model is weighted equally, we would take π(m) = 1/m. The logarithm of the likelihood is, up to a constant term:
| (6) |
Under model m, we obtain a summary value of the parameter a, in particular the maximum of the posterior mode and we refer to this as . Given the value of under model m we have an estimate of the probability of toxicity at each dose level di via: (i = 1, …, k). On the basis of this formula, and having taken some value for m, the dose to be given to the (j + 1)th patient, xj+1 is determined. Thus, we need some value for m and we make use of the posterior probabilities of the models given Ωj. Denoting these posterior probabilities by π(m|Ωj), then:
| (7) |
One useful class of models, studied in [1] and [6] in the case that M = 2, takes the model for one reference group as ψ(di, a) and, for the second group, considers a shift in the index i so that i′ for the second group can be expressed via i′ = i + t. When there are no group effects then t = 0 and a value t = 1 for instance means that the second group shares the same probabilities of toxicity as the first but “shifted” by one level. For this reason the model is sometimes called the shift model. More generally, we write:
| (8) |
where
| (9) |
In the case that M = 2 a straightforward extension of condition 7 of [1], applied to the groups separately, allows us to claim consistency of the model in terms of correct identification of the MTD for large samples.
2. Efficiency gains
Alternatively, in place of Δ, we include a regression coefficient b. We have n1 observations in the first group, n2 in the second, so that n = n1 + n2. The pair then maximizes:
Next, the pair maximizes separately:
We need ψ1(⋅) in the calculation of both and . Introduce:
and
Write their first-order partial derivatives as:
The pair satisfies the following estimating equations:
| (10) |
whereas satisfies:
| (11) |
The variances of and indicate relative design efficiency. Under the model restrictions given in [5], there exists a pair (a0, b0) such that:
| (12) |
Assuming the shift model to be correct then the dose-toxicity curve for the second group is given by ψ1(x, a0) whereas the actual dose-toxicity curve for the group is ψ2(x, a0, b0). However, our conclusion will not be altered, since we are comparing the performance of and only and the outcome for group 2 has no effect on the outcome for group 1 in this case. Under weak conditions we have that converges almost surely to (a0, b0) and almost surely to . The technical details follow those in [4]. For brevity we drop the arguments (xi, yi, a0) or (xi, yi, a0, b0) wherever it is possible.
Theorem 1
Assume condition 7 of [1] is satisfied. When both n1 and n2 become large, the asymptotic variances of and can then be approximated by:
| (13) |
and
| (14) |
The above formulas are evaluated at (a0, b0)
Corollary 1
Under the conditions of Theorem 1, the asymptotic variance of can be approximated by , where s11 is calculated at a0.
To compare the performance of and we apply Cauchy–Schwarz’s inequality:
Hence, it follows from (13) that . Moreover, a sufficient condition for their equality is that there is a scalar κ such that:
| (15) |
Therefore the variance of is smaller than that of unless (15) is satisfied. We could constrain the difference to be strictly greater than zero, corresponding to the case where we know that, should any difference exist, it can only be in a given direction; a well-known example being heavily pre-treated and lightly pre-treated patients. The MTD for the heavily pre-treated patients will be no higher than that for the lightly pre-treated patients. This amounts to incorporating into the design known features of the experiment and the idea is easily extended. For example, we may not only know the only possible direction any difference may take but we may also be able to say something about the size of the difference. In the case of heavily versus lightly pre-treated patients, a frequently encountered situation, the definition of the two groups is not usually very sharp so that, if a difference should exist, it may not be that great. The possible differences could be, say, either zero, one level or no more than two levels. Expressed in this way it immediately opens the possibility of bringing prior information, or prior opinion, on board and that is readily done without impact on the efficiency findings discussed here.
The estimating equation providing the estimate , is derived from the likelihood for n patients, and can be expressed equivalently as a sum over the dose levels as
| (16) |
where
and H(s) = I(s ≠ 0), i.e., a function taking the value 1 when s is not equal to 0, and is zero otherwise and 0/0 is defined as being equal to 1. We obtain as the zero of the above estimating equation and, at that value, the inverse of −∂Ui,n(a)/∂a provides an estimate of the variance denoted . Finally, it is worth noting that bridging is not only a question of efficiency. Sometimes there can be so few patients in the second study, when compared to the first, that the only way to make progress is to, in some sense, piggyback the second study on to the first. We then use as much information as needed to create enough structure to make inferences feasible in the smaller study.
Acknowledgements
I would like to acknowledge the reviewers for picking up errors and the editor for suggestions in helping make a more sharply focused presentation.
References
- [1].O’Quigley J, A theoretical study of the continual reassessment method, J. Stat. Plan. Infer 136 (2006) 1765–1780. [Google Scholar]
- [2].O’Quigley J, Conaway M, Continual reassessment and related dose-finding designs, Stat. Sci 25 (2010) 202–216. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].O’Quigley J, Pepe M, Fisher L, Continual Reassessment Method; a practical design for Phase 1 clinical studies in cancer, Biometrics 46 (1) (1990) 33–48. [PubMed] [Google Scholar]
- [4].O’Quigley J, Shen LZ, Continual reassessment method: A likelihood approach, Biometrics 52 (1996) 163–174. [PubMed] [Google Scholar]
- [5].O’Quigley J, Shen L, Gamst A, Two sample continual reassessment method, J. Biopharm. Stat 9 (1999) 17–44. [DOI] [PubMed] [Google Scholar]
- [6].Shu J, Continual reassessment designs in the presence of population heterogeneity, PhD thesis, Dept. of Statistics, University of Virginia, USA, 2012. [Google Scholar]
