Utility-based designs for randomized comparative trials with categorical outcomes

Thomas A Murray; Peter F Thall; Ying Yuan

doi:10.1002/sim.6989

. Author manuscript; available in PMC: 2017 Oct 30.

Published in final edited form as: Stat Med. 2016 May 18;35(24):4285–4305. doi: 10.1002/sim.6989

Utility-based designs for randomized comparative trials with categorical outcomes

Thomas A Murray ^a,^*, Peter F Thall ^a, Ying Yuan ^a

PMCID: PMC5048520 NIHMSID: NIHMS791215 PMID: 27189672

Abstract

A general utility-based testing methodology for design and conduct of randomized comparative clinical trials with categorical outcomes is presented. Numerical utilities of all elementary events are elicited to quantify their desirabilities. These numerical values are used to map the categorical outcome probability vector of each treatment to a mean utility, which is used as a one-dimensional criterion for constructing comparative tests. Bayesian tests are presented, including fixed sample and group sequential procedures, assuming Dirichlet-multinomial models for the priors and likelihoods. Guidelines are provided for establishing priors, eliciting utilities, and specifying hypotheses. Efficient posterior computation is discussed, and algorithms are provided for jointly calibrating test cutoffs and sample size to control overall type I error and achieve specified power. Asymptotic approximations for the power curve are used to initialize the algorithms. The methodology is applied to re-design a completed trial that compared two chemotherapy regimens for chronic lymphocytic leukemia, in which an ordinal efficacy outcome was dichotomized and toxicity was ignored to construct the trial’s design. The Bayesian tests also are illustrated by several types of categorical outcomes arising in common clinical settings. Freely available computer software for implementation is provided.

Keywords: Bayesian Methods, Dirichlet-multinomial, Multiple Outcomes, Oncology, Randomized Comparative Trials, Utility Elicitation

1. Introduction

Medical outcomes often are complex and multivariate. Physicians routinely select each patient’s treatment based on consideration of risk-benefit tradeoffs between desirable and undesirable clinical outcomes. Conventional designs for randomized comparative trials (RCTs) seldom reflect this aspect of medical practice. Rather, most designs in clinical trial protocols are based on one outcome, identified as “primary,” with all other outcomes given the nominal status of “secondary.” This dichotomy often is codified in institutionally required protocol formats. For example, in cancer studies of chemotherapies for solid tumors, the primary outcome may be objective response, defined as 30% or greater tumor shrinkage compared to baseline evaluation, while regimen-related adverse events, called “toxicities,” are listed as secondary outcomes [1]. This approach is convenient because it facilitates sample size and power computations in terms of the probabilities of a one-dimensional outcome in the treatment arms. It does not reflect the way that practicing physicians actually think and behave, however. Alternative design approaches include defining a composite outcome that treats efficacy and safety events equally [2, 3], using a test statistic that is a weighted average [4], or basing a test on a quadratic form, such as Hotelling’s T-squared statistic, with weights estimated to reflect variability [5]. These approaches ignore the relative clinical importance of beneficial and adverse outcomes, however.

Safety is never a secondary concern in a clinical trial. In actual trial conduct, if interim data from a randomized clinical trial (RCT) show that one treatment has a much higher adverse event rate than the other, or that both arms are unacceptably toxic in a trial comparing two experimental agents, the physicians conducting the trial will terminate accrual whether the protocol’s design includes a formal safety stopping rule or not. Such a decision shows that, due to their unwillingness to continue the trial, the physicians have decided that one treatment is inferior to the other in terms of safety. While stopping a trial due to an unacceptably high adverse event rate is an ethical decision, it also is part of the general consideration of how much risk of an adverse outcome is acceptable as a tradeoff for a given level of therapeutic benefit.

This paper is motivated by the consideration that, because clinical trial conduct must accommodate medical practice, a trial design should account formally for risk-benefit tradeoffs between all clinically relevant outcomes. That is, in actual trial design and conduct, scientific and ethical considerations should not be separated. We provide a practical framework for including such tradeoffs explicitly in the treatment comparison underlying the design of two-arm RCTs. We focus on settings where the clinically relevant events are categorical, and thus the outcome Y is a realization from a finite set of elementary patient outcomes. The clinically relevant events, and the resulting set of elementary outcomes, are determined in collaboration with the physician(s) planning the trial. The proposed framework accommodates most discrete outcome structures that occur in practice, including univariate ordinal, bivariate binary indicators of efficacy and safety, bivariate ordinal variables, and such bivariate variables with death as a separate event.

1.1. A Trial in Chronic Lymphocytic Leukemia

We illustrate the proposed methodology by applying it to re-design a RCT reported by [6] that compared two chemotherapy regimens for untreated chronic lymphocytic leukemia (CLL), FC = fludarabine plus cyclophosphamide versus F = fludarabine alone. Patients in this study were treated for up to six 28-day cycles. Following the recommended guidelines at the time of the trial [7], patients were monitored for clinical response, with categories CR = Complete response, PR = Partial response, SD = Stable disease, and PD = Progressive disease. Patients also were monitored for several adverse events (AEs), including infections, with severity grades {None, Minor, Major, Fatal}, hematological toxicities with severity grades 0–5, and non-hematological toxicities graded 0–5, according to the National Cancer Institute (NCI) Common Terminology Criteria for Adverse Events (CTCAE). Detailed definitions of the levels of clinical response and the AEs are given in [7].

In the CLL trial design, CR was designated as the primary outcome, with all other outcomes designated as secondary. Thus, the comparison of FC to F was based on the probabilities of CR in the two arms. For this comparison, since clinical response was not evaluable for patients that died during the observation period, these patients were counted as non-responders. This approach is sensible since it counts death during response evaluation as a treatment failure. In contrast, the non-fatal AEs were not included in the study design, despite the fact that the safety of FC was an important concern. Because the above approach to constructing the design for this trial is quite typical, it serves as a useful illustration of our proposed methodology.

To apply our methodology to design this trial would have required working with the physicians planning the trial to determine the clinically relevant outcomes and elicit their utilities. Thus, for the sake of illustration, we first assume that the physicians decided that the relevant outcomes were clinical response, specifically the ordinal variable with possible values {CR, PR, SD, PD}, and also the worst AE with levels {Minimal, Moderate, Severe, Fatal}. Here, “minimal” is defined as no AE requiring medical intervention, “moderate” as a non-life-threatening AE requiring medical intervention without hospitalization, “severe” as an imminently life-threatening AE requiring hospitalization, and “fatal” as an AE resulting in death. Using these definitions, a moderate AE includes grade 3 hematologic and non-hematologic toxicities and minor infections, and a severe AE includes grade 4 hematologic and non-hematologic toxicities and major infections. To define the values of Y, we denote the 12 = 4×3 non-fatal elementary patient outcomes by the pairs (r, s), for r = {CR, PR, SD, PD} and s = {Min, Mod, Sev}, with the 13^th elementary event D = a fatal AE. Thus, for example, (PR, Mod) is the elementary outcome that the patient had a partial response and a moderate worst AE level. Our design requires numerical utilities for the 13 elementary outcomes, which in practice would be elicited from the physicians. Since we cannot do this retrospectively, we specify numerical utilities (Table 1) for the CLL trial’s 13 outcomes that may be considered a reasonable representation of what would be obtained in practice. In Section 6, once our methodology has been established, we will compare our proposed design to a design that compares the two regimens based on the probabilities of CR. Because the numerical utilities are a key component our methodology, we also include an analysis of the sensitivity of the final inferences to alternative utilities (Table 6).

Table 1.

Numerical utilities for the CLL trial’s 13 elementary outcomes.

Level of Worst Adverse Event	Clinical Response
Level of Worst Adverse Event	CR	PR	SD	PD
Minimal	100	84	35	19	Death
Moderate	93	77	29	14	0
Severe	28	24	14	10

Level of Worst Adverse Event	Clinical Response				Death
Original Utilities from Table 1
	CR	PR	SD	PD

Minimal	100	84	35	19
Moderate	93	77	29	14	0
Severe	28	24	14	10

Utilities Giving Better Efficacy Higher Value

Minimal	100	84	35	19
Moderate	98	81	31	14	0
Severe	82	68	24	10

Utilities Giving Lower Toxicity Higher Value

Minimal	100	93	71	64
Moderate	93	81	44	32	0
Severe	28	24	14	10

Scenario			CAT-BUB Design		Beta-Bin Design
	θ_B	δ_U,B–A(θ)	B>A	A>B	B>A	A>B
1.0:	(0.50, 0.30, 0.20)	0	0.025	0.025	0.025	0.025

2.1:	(0.60, 0.00, 0.40)	−5	0.001	0.206	0.552	0.000
2.2:	(0.60, 0.10, 0.30)	0	0.024	0.025
2.3:	(0.60, 0.20, 0.20)	5	0.246	0.001
2.4:	(0.60, 0.30, 0.10)	10	0.798	0.000
2.5:	(0.60, 0.40, 0.00)	15	0.997	0.000

3.1:	(0.65, 0.05, 0.30)	2.5	0.088	0.006	0.877	0.000
3.2:	(0.65, 0.15, 0.20)	7.5	0.485	0.000
3.3:	(0.65, 0.25, 0.10)	0.936	0.000
3.4:	(0.65, 0.35, 0.00)	1.000	0.000

4.1:	(0.70, 0.00, 0.30)	5	0.217	0.001	0.989	0.000
4.2:	(0.70, 0.10, 0.20)	10	0.720	0.000
4.3:	(0.70, 0.20, 0.10)	15	0.987	0.000
4.4:	(0.70, 0.30, 0.00)	20	1.000	0.000

Scenario Specification			CAT-BUB Design			Beta-Binomial Design
	θ_B	δ_U,B–A(θ)	Ave SS	B>A	A>B	Ave SS	B>A	A>B
1.0:	(0.50, 0.30, 0.20)	0	211.9	0.025	0.025	211.8	0.026	0.024

2.1:	(0.60, 0.00, 0.40)	−5	207.7	0.001	0.214	192.8	0.541	0.000
2.2:	(0.60, 0.10, 0.30)	0	211.8	0.026	0.025
2.3:	(0.60, 0.20, 0.20)	5	206.6	0.250	0.001
2.4:	(0.60, 0.30, 0.10)	10	177.8	0.800	0.000
2.5:	(0.60, 0.40, 0.00)	15	123.8	0.998	0.000

Scenarios	Abbreviation	Response Probabilities
All	NA	θ_F,T= (0.67, 0.25, 0.05, 0.03)
1.0, 2.0, 3.0, 4.0	=	θ_F,C,T = (0.67, 0.25, 0.05, 0.03)
1.1, 2.1, 3.1, 4.1	>	θ_F,C,T = (0.44, 0.40, 0.10, 0.06)
1.2, 2.2, 3.2, 4.2	≫	θ_F,C,T = (0.26, 0.45, 0.20, 0.09)

All	NA	θ_F,E = (0.25, 0.35, 0.20, 0.20)
1.0, 1.1, 1.2	=	θ_F,C,E = (0.25, 0.35, 0.20, 0.20)
2.0, 2.1, 2.2	>	θ_F,C,E = (0.35, 0.35, 0.15, 0.15)
3.0, 3.1, 3.2	>	θ_F,C,E = (0.45, 0.35, 0.10, 0.10)
4.0, 4.1, 4.2	≫>	θ_F,C,E = (0.60, 0.30, 0.05, 0.05)

Scenario				Probability of Final Conclusion
Scenario						Beta-Binomial Designs
	Efficacy	Toxicity		CAT-BUB Design		Efficacy Only		Efficacy then Toxicity
	FC vs F	FC vs F	θ_U,FC–F	FC > F	F > FC	FC > F	F > FC	FC > F	F > FC
1.0:	=	=	0.0	0.025	0.025	0.026	0.024	0.024	0.025
1.1:	=	>	−5.2	0.001	0.222	0.019	0.035	0.007	0.388
1.2:	=	≫	−11.9	0.000	0.782	0.012	0.047	0.005	0.982

2.0:	>	=	6.8	0.352	0.000	0.402	0.000	0.289	0.010
2.1:	>	>	1.1	0.041	0.012	0.331	0.000	0.226	0.308
2.2:	>	≫	−6.5	0.000	0.314	0.278	0.000	0.173	0.818

3.0:	≫	=	13.5	0.903	0.000	0.910	0.000	0.846	0.002
3.1:	≫	>	7.3	0.397	0.000	0.873	0.000	0.778	0.097
3.2:	≫	≫	−1.1	0.041	0.015	0.816	0.000	0.716	0.281

4.0:	≫>	=	21.0	1.000	0.000	1.000	0.000	1.000	0.000
4.1:	≫>	>	14.2	0.917	0.000	1.000	0.000	0.999	0.001
4.2:	≫>	≫	5.0	0.201	0.001	0.999	0.000	0.997	0.003

	C	C̄
T̄	100
T		U(C̄, T)

	CR	PR	SD	PD
Min	100	C	C	B	Death
Mod	C	D	D	C	0
Sev	B	C	C	A

	CR	PD
Min	100	100 × ζ₁
Sev	100 × ζ₂	0

PERMALINK

Utility-based designs for randomized comparative trials with categorical outcomes

Thomas A Murray

Peter F Thall

Ying Yuan

Abstract

1. Introduction

1.1. A Trial in Chronic Lymphocytic Leukemia

Table 1.

Table 6.

1.2. Mean Utilities

1.3. Utility-Based Design Framework

1.4. Outline

2. Utility Elicitation

3. Dirichlet-Multinomial Model

4. Comparative Tests

4.1. Efficient Posterior Computation

4.2. Type I Error, Power, and Sample Size

5. Designing a CAT-BUB Trial

5.1. Eliciting Targeted Alternatives

5.2. Computational Algorithm for Fixed Sample CAT-BUB Design

5.3. Computational Algorithm for Group Sequential CAT-BUB Design

6. Illustrations

6.1. Trinary Outcomes

6.1.1. Fixed Sample Tests

Table 2.

6.1.2. Sensitivity to Elicited Utilities

Figure 1.

Figure 2.

6.1.3. Group Sequential Tests

Table 3.

6.2. Redesigning the CLL Trial

Table 4.

Table 5.

Table 7.

6.3. Additional Illustrations

7. Discussion

Supplementary Material

Acknowledgments

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases