Abstract
PURPOSE
Despite the fact that almost any sample of patients with a particular disease is heterogeneous, most clinical trial designs ignore the possibility that treatment or dose effects may differ between prognostic or biologically defined subgroups. This article reviews two clinical trial designs that make subgroup-specific decisions and compares each to a simpler design that ignores patient heterogeneity. The purpose is to illustrate the benefits of accounting prospectively for treatment-subgroup interactions and how utilities may be used to quantify risk-benefit trade-offs.
METHODS
Two Bayesian clinical trial designs that perform subgroup-specific decision making and inference based on elicited utilities of patient outcomes are reviewed. The first is a randomized comparative trial of nutritional prehabilitation for patients undergoing esophageal resection that has two prognostic subgroups and is based on postoperative morbidity score. The second is a sequentially adaptive trial of natural killer cells for treating hematologic malignancies that is based on five time-to-event outcomes and that performs safety monitoring and optimizes cell dose within six disease subgroups. Computer simulations under a range of different scenarios are presented for each design to establish its operating characteristics and compare it to a more conventional design that ignores patient heterogeneity.
RESULTS
Each design has attractive operating characteristics, is greatly superior to a simplified design that ignores patient subgroups, is robust to deviations from its assumed statistical model, and is feasible to use for conducting trials.
CONCLUSION
Bayesian designs that make subgroup-specific decisions in randomized comparative trials or sequentially adaptive early-phase dose-finding trials are superior to designs that ignore patient heterogeneity. Using elicited utilities of complex patient outcomes to quantify risk-benefit trade-offs provides a practical and ethical basis for decision making and treatment evaluation in clinical trials.
INTRODUCTION
By definition, good- and poor-prognosis patients have different response rates and mean survival times. Most clinical trial designs ignore this fact. Similarly, qualitatively different disease subtypes or biologically defined subgroups typically are ignored by the decision rules and hypotheses of conventional trial designs, aside from possibly stratifying on subgroups when randomly assigning patients. Ignoring the possibility of treatment-subgroup interactions by making one-size-fits-all decisions and inferences runs the risk of making incorrect conclusions within important patient subgroups. This problem persists in all phases of conventional trial designs, from early-phase dose-finding to randomized comparative trials.
To illustrate the problem, consider a prospective randomized trial in patients with relapsed or refractory leukemia, with the aim of evaluating the effect on overall survival (OS) of adding a new targeted agent, X, to standard chemotherapy. Suppose that the historical mean OS with chemotherapy is μold = 15 months in older patients (age 60 years or older) and μyoung = 24 months in younger patients (age younger than 60 years), with two thirds of patients being older than age 60 years and one third being younger. A group sequential design comparing chemotherapy + X with chemotherapy constructed on the basis of the average mean OS, μavg = (2/3) μold + (1/3) μyoung, would have null mean OS μavg0 = (2/3)15 + (1/3)24 = 18 months. Suppose a design is constructed to target an improvement of 33% from μavg0 = 18 to μavg* = 24 months, equivalently, a targeted hazard ratio of .75 in favor of chemotherapy + X. To see what could go wrong, notice that the average target μavg* = 24 months for chemotherapy + X equals the null historical mean with chemotherapy alone in younger patients. Suppose that, in fact, adding X to chemotherapy decreases mean OS in older patients by 10% to 13.5 months but increases mean OS in younger patients by 30% to 31.2 months, that is, there is a treatment-age subgroup interaction. The overall mean OS with chemotherapy + X then would be (2/3)13.5 + (1/3)31.2 = 19.4 months, well below the target μavg* = 24 months and close to the null value μavg0 = 18 months. Thus, a group sequential test based on average mean OS would be likely to conclude that adding X to chemotherapy fails to improve OS, thus making a false-negative conclusion in younger patients. Although post-trial data analyses accounting for age may show that chemotherapy + X is superior in younger patients, such an inference may be dismissed as post hoc data dredging, that is, searching the data set for subgroups in which the X effect is nominally significant. Arguing for the future use of chemotherapy + X in younger patients thus would be problematic, and it probably would be necessary to repeat the trial, if possible.
CONTEXT
Key Objective
An important question in designing and conducting clinical trials is whether and how a trial design should account explicitly and prospectively for patient heterogeneity when making treatment assignment decisions and reaching final conclusions. The two designs reviewed in this article illustrate how a Bayesian utility-based framework can be used to construct practical designs that account for patient heterogeneity.
Knowledge Generated
Each of the two designs—one is a randomized comparative trial and the other is a complex dose-finding trial—is a novel methodology that has been published in the statistical literature. An aim of this review is to make members of the medical community aware that such designs exist so that they may consider taking this sort of approach in their own trials.
Relevance
The Bayesian utility-based approach may be used to construct designs for a wide variety of clinical trials that account for patient heterogeneity by making subgroup-specific, personalized treatment decisions. In the presence of patient heterogeneity, this approach refines treatment development and evaluation and thus is more likely to protect safety and provide benefit for both the patients enrolled in the trial and future patients.
Similar problems arise in early-phase trials in which the goal is to identify an optimal dose, dose pair, or (dose, schedule) combination for a new agent or treatment combination at the start of clinical evaluation. It has been well established that choosing a dose in a phase I trial on the basis of toxicity alone is a highly flawed convention that can easily lead to choosing an unsafe or ineffective dose.1,2 As a first example of this problem, consider a trial to optimize the dose d among five levels (1, 2, 3, 4, 5). Suppose the respective toxicity probabilities are (.10, .15, .30, .40, .55) and the response probabilities are (.00, .01, .01, .02, .02). If the continual reassessment method (CRM)3 is used to choose a dose with Pr(toxicity) closest to .30, then d = 3 is most likely to be chosen, whereas a 3+3 algorithm is most likely to choose either d = 2 or d = 3. But because no dose has any substantive anti-disease effect, choosing a maximum tolerated dose (MTD) with either method is a severe error, simply because both phase I methods are based only on toxicity. As a second example, if the toxicity probabilities are (.10, .18, .30, .38, .45) and the response probabilities are (.10, .15, .30, .60, .65), then the CRM with target Pr(toxicity) = .30 will be most likely to choose d = 3, but d = 4 has toxicity probability only .08 larger than that of d = 3, whereas it doubles the response rate from .30 to .60. These examples show that the convention of choosing the dose of a new agent on the basis of toxicity while ignoring response is dysfunctional.
In contrast, phase I/II designs1,2,4 determine an optimal dose based on both R = response and T = toxicity. For example, if both outcomes are binary, then one may assign the numerical utilities U(R, No T) = 100, U(R, T) = 70, U(No R, No T) = 40, U(No R, T) = 0 to the four possible elementary outcomes. In practice, the best and worst outcomes are assigned respective utilities of 100 and 0, and numerical utilities between these two values for intermediate outcomes are elicited from the investigators planning the trial. Suppose that the probabilities of these four events at a dose d in G = good-prognosis patients are Pr(R, No T|d, G) = .30; Pr(R, T|d, G) = .20; Pr(No R, No T|d, G) = .40; Pr(No R, T|d, G) = .10. Then the marginal probabilities of R and T at dose d in subgroup G are Pr(R|d, G) = .50 and Pr(T|d, G) = .30, and the mean utility of d in G is the average Uavg(d, G) = .30 × 100 + .20 × 70 + .40 × 40 + .10 × 0 = 60. During trial conduct, the posterior mean of Uavg(d, G) is computed for each d, and the dose giving the largest value is selected for the next good-prognosis patient enrolled. Similarly, if the corresponding elementary outcome probabilities for subgroup P are .10, .30, .30, .30, then Pr(R|d, P) = .40 and Pr(T|d, P) = .60, and the mean utility of dose d in subgroup P is Uavg(d, P) = .10 × 100 + .30 × 70 + .30 × 40 + .30 × 0 = 43. But if the prognostic subgroups are ignored, assuming that Pr(Good) = Pr(Poor) = .50, then the average mean utility of dose d would be 51.5, which understates its benefit in good-prognosis patients and overstates its benefit in poor-prognosis patients.
EVALUATING PRESURGERY NUTRITIONAL PREHABILITATION
This section reviews a Bayesian design for a randomized comparative group sequential trial to compare nutritional prehabilitation (N) with standard of care (C) for patients with esophageal cancer undergoing chemoradiation and surgery.5,6 The trial enrolls both primary (P) patients who have not been treated previously and salvage (S) patients whose disease has recurred after previous therapy. The outcome is Y = Clavien-Dindo postoperative morbidity (POM),7-10 scored within 30 days of surgery, a six-level ordinal variable with possible integer values from 0 (normal recovery) to 5 (death). The design is based on elicited numerical utilities of these outcomes (Table 1), which show that lower POM values are much more desirable. Notice that dichotomizing POM score as high = [Y ≤ 3] versus low = [Y ≤ 2] for convenience would discard valuable information. Moreover, because differences in utilities between successive POM levels are far from being equal, using the raw scores 0, 1, 2, 3, 4, 5 would also be a misleading way to quantify POM.
TABLE 1.
The design accounts for patient heterogeneity by allowing possibly different conclusions within the subgroups P and S when comparing treatment N to C. A group sequential testing procedure is based on the posterior utilities for the POM score computed under an assumed Bayesian probability model for Y. A key property of the model is that all treatment and subgroup effect parameters vary with the level of POM score, so it is a robust generalization of the conventional proportional odds model.11 The model allows the magnitudes of the N-versus-C effects on Y to differ between subgroups and is formulated to borrow strength between subgroups, with prior location parameters determined from elicited information on POM score. Details are provided in Murray et al.6
The trial will enroll a maximum of 200 patients, with an interim comparative test of the N-versus-S effect on POM score after n = 100 patients have been treated and evaluated and possibly a final test at n = 200 within each subgroup P and S. The design allows different conclusions to be made within the prognostic subgroups. Denoting the utility of POM score y in Table 1 by U(y), the mean utility of treatment Z = N or S in subgroup g = P or S is the probability-weighted average
The subgroup-specific comparative tests, computed under the Bayesian model, conclude that N is superior to C in subgroup g = P or S if
and conclude that C is superior to N in subgroup g if
for n = 100 or n = 200. The numerical cutoffs pcut(100) = .997 and pcut(200) = .976 were chosen by conducting preliminary computer simulations to control the overall within-subgroup type I error probability of the group sequential test procedure to be ≤ .025. The design targets mean utility differences of 89 – 75.5 = 13.5 in primary patients and 81.3 – 60 = 21.3 in salvage patients. These differences correspond to a reduction in Pr(Y ≥ 3) from 0.20 to 0.08 for primary patients and from 0.35 to 0.14 for salvage patients. As a comparator, a conventional model-based design that ignores subgroups and makes one-size-fits-all treatment comparisons was also considered. To ensure balance and comparability with either design, patients were randomly assigned in blocks of four, so for each block of four patients within each prognostic subgroup, two receive nuprehab and two receive the control.
The operating characteristics of the two designs are summarized in Table 2. For example, under treatment-subgroup interaction scenario 3, nuprehab achieves an improvement of 13.5 in POM score mean utility in the primary patients, but there is no difference between the two treatments in the salvage patients.
TABLE 2.
Both designs control the type I error in scenario 1 Null/Null. In scenario 2, where N is truly superior to C in salvage patients but the two treatments are equivalent in primary patients, the subgroup-specific design has power .808 in salvage patients and type I error probability .029 in primary patients. In sharp contrast, the conventional design has power .440; equivalently, type II error probability = 1 – .440 = .560 in salvage patients, and type I error probability = .440 in primary patients. The same sort of thing is seen in scenario 3. Given these huge false-positive and false-negative rates in the mixed scenarios with treatment-subgroup interactions, one could do as well by not running a trial and instead deciding which treatment is better by flipping a coin. The conventional design has much larger power than the subgroup-specific design in scenario 4, however, in which N provides an improvement over C in both subgroups. This loss of power in scenario 4 seems to be the price that must be paid for using a comparative design that allows different decisions to be made within subgroups to accommodate treatment-subgroup interactions.
Given these computer simulation results, the choice of the conventional methodology, which may have extremely large within-subgroup type I and type II error probabilities in the range of .44 to .56, should reflect a strong belief that any N-versus-C effect will be the same in the two prognostic subgroups. A computer program to implement the design, biom12842-sup-0002-SuppSimulation.R, is available from the Wiley Biometrics Web site at https://onlinelibrary.wiley.com/doi/full/10.1111/biom.12842.
OPTIMIZING NATURAL KILLER CELL DOSE
This section reviews a Bayesian design for an early-phase trial of natural killer (NK) cells for treatment of patients with advanced hematologic malignancies,12,13 including chronic lymphocytic leukemia (CLL), acute lymphocytic leukemia (ALL), and non-Hodgkin lymphoma (NHL).14 Each patient is also classified as having low-bulk disease (LBD) or high-bulk disease (HBD), for a total of 3 × 2 = 6 (disease type, disease bulk) prognostic subgroups. The trial is designed to enroll a maximum of 60 patients. The goals are to perform safety monitoring and to identify an optimal NK cell dose from the set {105, 106, 107} cells per kg of body weight (hereafter dose levels 1, 2, 3) within each of the subgroups, while borrowing strength between subgroups. This is an example of a sequentially adaptive “precision” or “personalized” clinical trial.
The trial has five coprimary outcomes: the times from NK cell infusion to D = death, and the four nonfatal events P = disease progression, R = response, T = severe toxicity, and C = severe cytokine release syndrome. Each patient’s events are monitored for a 365-day follow-up period. Denoting the time to event j by Yj for j = D, P, R, T, C for each patient, each Yj is a potential outcome because it may or may not be observed; YP and YR are competing risks because, at most, one of these two events can occur; the five potential event times are highly associated with each other; D censors any events that have not occurred, but such censoring is not independent; and the distribution of Y = (YD, YP, YR, YT, YC) varies substantially with disease subgroup.
Three challenges in designing an NK cell dose-finding trial are (1) if one ignores the subgroups, then there are high probabilities of making incorrect decisions within subgroups; (2) on the basis of current knowledge about NK cell biology, the rates of the five outcomes are neither monotone increasing nor decreasing in NK cell dose; and (3) small to moderate sample sizes in early-phase trials limit the reliability of subgroup-specific decisions. Given the 60-patient maximum overall sample size and assuming that Pr(CLL) = Pr(ALL) = Pr(NHL) = 1/3 and that Pr(LBD) = 1/3 and Pr(HBD) = 2/3, the expected disease subsample sizes are 7 patients for each of the subgroups (LBD, CLL), (LBD, ALL), (LBD, NHL) and 13 patients for each of the subgroups (HBD, CLL), (HBD, ALL), (HBD, NHL). Consequently, to make reasonably reliable decisions, it is essential to assume a dose-subgroup-outcome probability model that borrows strength between subgroups. Thus the design is based on a Bayesian parametric multivariable Weibull regression model with the marginal distribution of each Yj a function of NK cell dose and disease subgroup. The model includes a hierarchical structure with latent (unobserved) patient frailties to induce association among the five outcomes and account for variability not explained by dose and subgroup. To establish a prior probability distribution, hyper parameters characterizing location were computed from elicited probabilities of particular combinations of events occurring during the first 100 days of follow-up, and prior hyper parameters characterizing dispersion were calibrated to reflect vague prior knowledge about dose effects. Details are given in Lee et al.14
First assigning utility 0 to any outcome for which the patient dies before day 100, utilities of the 12 possible combinations of nonfatal outcomes during the first 100 days following NK cell infusion were elicited from the principal investigator for the trial (Table 3).
TABLE 3.
Denote the vector of binary indicators of the 12 possible nonfatal outcomes within 100 days by Δ = (ΔP, ΔR, ΔT, ΔC). For example, the vector (0,1,1,0) indicates that a patient alive at day 100 experienced R and T, but not P or C. The mean utility of assigning a dose d to a patient with disease subgroup g, denoted by Uave(d, g), is the average of the 12 values of the utility U(Δ) from Table 3, weighted by the probability distribution of [Δ|d, g]. This may be denoted by
The final decision criterion for choosing an optimal dose within each of the six disease (type, bulk) subgroups is the posterior mean of Uavg(d, g) given the observed data, denoted by u(d,g|data), computed under the assumed Bayesian model.
To ensure that the trial is ethical, the design includes subgroup-specific safety monitoring rules that terminate assignment to a disease subgroup g of a dose found to have an unacceptably high probability pD100(g) of death within 100 days for that subgroup. Elicited subgroup-specific limits on pD100(g) were used to formulate these rules. During the trial, for each disease type, patients are randomly assigned among the three doses in order of entry to the trial by randomly permuting the integers (1, 2, 3), subject to being in accordance with the safety rule. At the end of the trial, for each disease subgroup g, the NK cell dose d that is safe and that maximizes u(d, g|data) is chosen, with no dose chosen if all three are unsafe within subgroup g.
To establish its operating characteristics, the design was simulated under each of six different possible dose-subgroup-outcome scenarios. Table 4 summarizes the operating characteristics of the design in one particular scenario that includes dose-subgroup interactions, with some doses unsafe for some subgroups but not for others. The results in Table 4 show that the design is likely to stop accrual to doses in subgroups in which they are unsafe and otherwise is likely to select the truly optimal dose in each subgroup.
TABLE 4.
Figure 1 compares the dose-finding design that makes subgroup-specific decisions with a similar design that ignores the six disease subgroups when performing safety monitoring and dose selection. Histograms are given for the differences between the empirical proportions of Pr(stop|subgroup, dose) and Pr(stop|dose), for truly safe doses (Fig 1A) and for truly unsafe doses (Fig 1B) on the basis of computer simulations under the six dose-subgroup-outcome scenarios. Because the figure shows that it is likely that Pr(stop|subgroup, dose) < Pr(stop|dose) for truly safe doses, the design that ignores subgroups has a much larger probability of incorrectly stopping treatment in subgroups in which doses are safe. For truly unsafe doses, the two designs have similar stopping probabilities, with the important exception that the design with subgroup-specific safety monitoring is about .40 more likely to stop accrual to truly unsafe doses within subgroups. Figure 1C shows that the subgroup-specific design is much more likely to select truly optimal doses within disease subgroups.
Additional computer simulations (not shown) evaluated the behavior of the NK cell dose-finding design in cases in which the true underlying time-to-event distributions were log-logistic and thus differed substantially from the assumed Weibull dose-subgroup-outcome probability model. The results show that the model and design are extremely robust to the true underlying event time distribution model and that the superior properties of the design persist. Details are provided in Lee et al.14 The NKcelldosefinding computer program for implementing this design is available at https://users.soe.ucsc.edu/∼juheelee/.
Utility-based clinical trial designs for a wide variety of other settings, assuming that patients are homogeneous, are given in Houede et al,15 Thall et al,4,16-18 Lee et al,19 Hobbs et al,20 Murray et al,21 and Xu et al.22 An explanation of how utility-based clinical trial design can be formulated as a decision-making problem is provided by Mueller et al.23
The simulation results show that the designs that make subgroup-specific decisions are greatly superior to competing conventional methodologies that ignore patient heterogeneity and treatment-subgroup or dose-subgroup interactions when such subgroup effects are present. In the randomized comparative trial, this is seen in terms of extremely large within-subgroup type I and type II error probabilities of the conventional design in cases of treatment-subgroup interactions. However, when significant treatment effects are homogeneous across subgroups, the conventional design has larger power. In the dose-finding trial, there are reduced safety and also smaller probabilities of selecting truly optimal doses within disease subgroups for the design that ignores subgroups. These results strongly suggest that if patient heterogeneity is known, conventional clinical trial designs should be replaced by designs that account explicitly and prospectively for patient subgroups and treatment-subgroup or dose-subgroup effects in trial design, conduct, and decision making. An important caveat is that, if enrollment to a trial is limited to a homogeneous patient population, then a design with subgroup-specific rules would be an inappropriate elaboration.
Footnotes
Supported by Core Grant No. CA016672 from the National Cancer Institute Cancer Center.
Preprint version available on bioRxiv.
AUTHOR’S DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST
The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/po/author-center
No potential conflicts of interest were reported.
REFERENCES
- 1.Yan F, Thall PF, Lu KH, et al. Phase I-II clinical trial design: A state-of-the-art paradigm for dose finding. Ann Oncol. 2018;29:694–699. doi: 10.1093/annonc/mdx795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Yuan Y, Nguyen HQ, Thall PF. Bayesian Designs for Phase I–II Clinical Trials. New York, NY: Chapman and Hall/CRC; 2016. [Google Scholar]
- 3.Cheung YK. Dose Finding by the Continual Reassessment Method. New York, NY: Chapman and Hall/CRC; 2011. [Google Scholar]
- 4.Thall PF, Nguyen HQ. Adaptive randomization to improve utility-based dose-finding with bivariate ordinal outcomes. J Biopharm Stat. 2012;22:785–801. doi: 10.1080/10543406.2012.676586. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Braga M, Gianotti L, Nespoli L, et al. Nutritional approach in malnourished surgical patients: A prospective randomized study. Arch Surg. 2002;137:174–180. doi: 10.1001/archsurg.137.2.174. [DOI] [PubMed] [Google Scholar]
- 6.Murray TA, Yuan Y, Thall PF, et al. A utility-based design for randomized comparative trials with ordinal outcomes and prognostic subgroups. Biometrics. 2018;74:1095–1103. doi: 10.1111/biom.12842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Parekh K, Iannettoni MD. Complications of esophageal resection and reconstruction. Semin Thorac Cardiovasc Surg. 2007;19:79–88. doi: 10.1053/j.semtcvs.2006.11.002. [DOI] [PubMed] [Google Scholar]
- 8.Clavien PA, Sanabria JR, Strasberg SM. Proposed classification of complications of surgery with examples of utility in cholecystectomy. Surgery. 1992;111:518–526. [PubMed] [Google Scholar]
- 9.Dindo D, Demartines N, Clavien PA. Classification of surgical complications: A new proposal with evaluation in a cohort of 6336 patients and results of a survey. Ann Surg. 2004;240:205–213. doi: 10.1097/01.sla.0000133083.54934.ae. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Clavien PA, Barkun J, de Oliveira ML, et al. The Clavien-Dindo classification of surgical complications: Five-year experience. Ann Surg. 2009;250:187–196. doi: 10.1097/SLA.0b013e3181b13ca2. [DOI] [PubMed] [Google Scholar]
- 11.McCullagh P. Regression models for ordinal data. J R Stat Soc B. 1980;42:109–142. [Google Scholar]
- 12.Rezvani K, Rouce RH. The application of natural killer cell immunotherapy for the treatment of cancer. Front Immun. 2015;6:578. doi: 10.3389/fimmu.2015.00578. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Hoyos V, Savoldo B, Quintarelli C, et al. Engineering CD19-specific T lymphocytes with interleukin-15 and a suicide gene to enhance their anti-lymphoma/leukemia effects and safety. Leukemia. 2010;24:1160–1170. doi: 10.1038/leu.2010.75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lee J, Thall PF, Rezvani K. Optimizing natural killer cell doses for heterogeneous cancer patients on the basis of multiple event times. J R Stat Soc Ser C Appl Stat. 2019;6:461–474. doi: 10.1111/rssc.12271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Houede N, Thall PF, Nguyen H, et al. Utility-based optimization of combination therapy using ordinal toxicity and efficacy in phase I/II trials. Biometrics. 2010;66:532–540. doi: 10.1111/j.1541-0420.2009.01302.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Thall PF, Szabo A, Nguyen HQ, et al. Optimizing the concentration and bolus of a drug delivered by continuous infusion. Biometrics. 2011;67:1638–1646. doi: 10.1111/j.1541-0420.2011.01580.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Thall PF, Nguyen HQ, Braun TM, et al. Using joint utilities of the times to response and toxicity to adaptively optimize schedule-dose regimes. Biometrics. 2013;69:673–682. doi: 10.1111/biom.12065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Thall PF, Nguyen HQ, Zohar S, et al. Optimizing sedative dose in preterm infants undergoing treatment for respiratory distress syndrome. J Am Stat Assoc. 2014;109:931–943. doi: 10.1080/01621459.2014.904789. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lee J, Thall PF, Ji Y, et al. Bayesian dose-finding in two treatment cycles based on the joint utility of efficacy and toxicity. J Am Stat Assoc. 2015;110:711–722. doi: 10.1080/01621459.2014.926815. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Hobbs BP, Thall PF, Lin SH. Bayesian group sequential clinical trial design using total toxicity burden and progression-free survival. J R Stat Soc Ser C Appl Stat. 2016;65:273–297. doi: 10.1111/rssc.12117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Murray TA, Thall PF, Yuan Y. Utility-based designs for randomized comparative trials with categorical outcomes. Stat Med. 2016;35:4285–4305. doi: 10.1002/sim.6989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Xu Y, Thall PF, Müller P, et al. A decision-theoretic comparison of treatments to resolve air leaks after lung surgery based on nonparametric modeling. Bayesian Anal. 2017;12:639–652. doi: 10.1214/16-BA1016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Müller P, Xu Y, Thall PF. Clinical trial design as a decision problem. Appl Stochastic Models Data Anal. 2017;33:296–301. doi: 10.1002/asmb.2222. [DOI] [PMC free article] [PubMed] [Google Scholar]