Optimizing Trial Designs for Targeted Therapies

Thomas Ondra; Sebastian Jobjörnsson; Robert A Beckman; Carl-Fredrik Burman; Franz König; Nigel Stallard; Martin Posch

doi:10.1371/journal.pone.0163726

. 2016 Sep 29;11(9):e0163726. doi: 10.1371/journal.pone.0163726

Optimizing Trial Designs for Targeted Therapies

Thomas Ondra ^1,^#, Sebastian Jobjörnsson ^2,^#, Robert A Beckman ^3,⁴, Carl-Fredrik Burman ^2,⁵, Franz König ¹, Nigel Stallard ⁶, Martin Posch ^1,^*

Editor: Robert K Hills⁷

PMCID: PMC5042421 PMID: 27684573

Abstract

An important objective in the development of targeted therapies is to identify the populations where the treatment under consideration has positive benefit risk balance. We consider pivotal clinical trials, where the efficacy of a treatment is tested in an overall population and/or in a pre-specified subpopulation. Based on a decision theoretic framework we derive optimized trial designs by maximizing utility functions. Features to be optimized include the sample size and the population in which the trial is performed (the full population or the targeted subgroup only) as well as the underlying multiple test procedure. The approach accounts for prior knowledge of the efficacy of the drug in the considered populations using a two dimensional prior distribution. The considered utility functions account for the costs of the clinical trial as well as the expected benefit when demonstrating efficacy in the different subpopulations. We model utility functions from a sponsor’s as well as from a public health perspective, reflecting actual civil interests. Examples of optimized trial designs obtained by numerical optimization are presented for both perspectives.

1 Introduction

In the development of targeted therapies the investigation of potentially predictive biomarkers is critical. If efficacy is limited to an identifiable subgroup of patients, developing a therapy for an unselected patient population is ethically problematic and will also require unnecessarily large sample sizes because of a diluted treatment effect. On the other hand, erroneously restricting a drug development program to a subpopulation is also unethical, as it excludes patients from an effective treatment. Furthermore, it will entail a financial loss for the sponsor because of unnecessary costs of biomarker development and screening and the lower prevalence of the future patient population.

Several one and two stage clinical trial designs have been proposed in which the treatment effect is tested in an overall population as well as in a subgroup of biomarker positive patients [1–6] (see [7] for a recent review). To account for the resulting multiple comparisons, tailored multiplicity adjustments have been developed [8–16]. Alpha allocation has also been optimized using interim trial data and/or data external to the trial, with respect to a utility function, providing an early example of the use of decision analysis [2].

In this paper we use a comprehensive decision theoretic approach to derive optimal trial designs for the development of targeted therapies. Especially, the framework allows us to assess when it is favourable to investigate the biomarker in a clinical trial and when it is actually more efficient to disregard the biomarker and to proceed with a classical trial design. This extends earlier decision theoretic methods that focused on the selection of the population for clinical trials incorporating a biomarker [17–22].

Consider a setting where a single potentially predictive binary biomarker has been identified in advance, separating the full population F into biomarker positive (S) and biomarker negative (S′) patients and there is prior evidence suggesting that the treatment effect may be more pronounced (or only present) in the biomarker positive group. Let λ_S and λ_S′, satisfying λ_S + λ_S′ = 1, be the prevalences of biomarker positive and biomarker negative patients in the full population. For this situation we consider three design options for a pivotal clinical trial: (i) The classical design that does not account for the biomarker status and tests for a treatment effect in the full population only, (ii) the stratified design that also recruits patients from the full population but where the biomarker status of each patient is determined and the treatment effect is tested in the full population and the subpopulation, and, (iii) the enrichment design, where patients are screened for the biomarker status and only biomarker positive patients are included in the trial.

The choice of trial design will in general not only be based on power arguments, but on the overall expected utility of different designs, accounting for the potential rewards and costs. Rewards can be quantified by the sales revenue, from a sponsor’s view, or by a measure of the overall health benefit, from a public health view. The costs of the trial are determined by fixed and per patient costs as well as investments in biomarker development and the determination of the biomarker status for the patients in the trial. Based on a decision theoretic framework, we first optimize each of the three trial designs by choosing optimal sample sizes (and an optimized multiple testing procedure for the stratified design). Then, the optimal design can be selected among the three optimised designs based on their expected utilities. The optimal design choice depends on the type of utility function used (sponsor’s view or public health view), the reward and cost parameters, the prior distribution on the effect sizes and the prevalence of the biomarker positive subgroup.

2 Testing Problem and Considered Trial Designs

Let Δ = (δ_S, δ_S′) denote the treatment effects for the primary efficacy endpoint in the subgroup and its complement, respectively. Furthermore, let π(Δ) denote a prior distribution on Δ. We focus on priors that satisfy π(Δ) = 0 for δ_S < δ_S′. This accounts for settings where there is some evidence that the effect size in the biomarker positive treatment group may be larger than in the biomarker negative group but not the other way around.

For simplicity, we assume that the basis of marketing authorization is a single pivotal trial. We further assume that a necessary condition for regulators to approve a drug for the populations S or F is the demonstration of a significant treatment effect in the respective population by a suitable multiple testing procedure controlling the familywise error rate (FWER) at level α in the strong sense. Consider the two null hypotheses

H_{S} : δ_{S} \leq 0 and H_{F} : δ_{F} \leq 0,

where δ_F = λ_S δ_S + λ_S′ δ_S′, and let, for some trial design d, ψ_d = (ψ_S,d, ψ_F,d) denote a multiple testing procedure such that ψ_i,d = 1 (0) if there is a statistically significant (no significant) treatment effect in population i = S, F.

We consider three types of trial designs, the classical, the stratified and the enrichment design. Let $D = {C_{n}, S_{n, α_{S}}, E_{n} | n \geq n_{min}, α_{S} \in [0, α]}$ denote the set of considered trial designs, where $C_{n}, S_{n, α_{S}}, E_{n}$ are defined below:

Classical design C_n refers to a classical parallel group design with per group sample size n recruiting patients from the full population and testing H_F only. H_F is tested by a non-stratified test $ψ_{F, C_{n}}$ and we set $ψ_{S, C_{n}} = 0$ .

Stratified Design $S_{n, α_{S}}$ refers to a stratified design, which differs from the classical design in that analysis is stratified by the biomarker status and both hypotheses H_F and H_S are tested with a weighted multiple testing procedure with parameter α_S. As multiple testing procedure we apply the closed Spiessens-Debois’ test [8, 13]. This test combines the Spiessens-Debois’ test for the rejection of the intersection hypothesis H_S ∩ H_F with the closed testing principle so as to obtain a test for the rejection of either H_S or H_F (or both). Let p_S and p_F denote unadjusted p-values for testing H_S and H_F, respectively. Here we assume that H_F is tested with a test stratified for the biomarker (in contrast to the classical design, where a non-stratified test is used as no biomarker information is available). For α_S, α_F ≥ 0, the closed Spiessens-Debois’ test then rejects H_S if p_S ≤ α and either p_S < α_S or p_F < α_F. Similarly, it rejects H_F if p_F ≤ α and either p_S < α_S or p_F < α_F. To control the familywise error rate at level α in the strong sense, the significance levels α_S and α_F need to satisfy

P_{H_{S} \cap H_{F}} (p_{S} \leq α_{S} \lor p_{F} \leq α_{F}) \leq α .

(1)

Thus, the significance level α_F is determined by Eq (1) if α_S ≤ α is given. Note that the corresponding function α_F(α_S) depends on the subgroup prevalence λ_S.

We assume that in the stratified design, marketing authorization in the population F is not only determined by the treatment effect in F, but that regulators additionally require some evidence that there is a treatment effect in both S and S′, so that the rejection of H_F is not completely dominated by a treatment effect in a single subgroup only. Thus, we assume that the regulators’ decision rule corresponds to a hypothesis test where H_F is only rejected, if, in addition, the p-values p_S and p_S′ of tests for efficacy in the two subgroups fall below corresponding thresholds τ_S and τ_S′. The resulting modified Spiessens-Debois’ test ( $ψ_{S, S_{n, α_{S}}}, ψ_{F, S_{n, α_{S}}})$ rejects H_S if {p_S ≤ α} ∧ {p_S ≤ α_S ∨ p_F ≤ α_F} and rejects H_F if {p_F ≤ α} ∧ {p_S ≤ α_S ∨ p_F ≤ α_F} ∧ {p_S ≤ τ_S ∧ p_S′ ≤ τ_S′}. Note that this test is strictly conservative, because the consistency thresholds τ_S and τ_S′ are not considered in the level α condition.

Enrichment Design E_n refers to an enrichment design, which differs from the classical design in that only patients from the subpopulation are recruited and only H_S is tested. In the enrichment design, H_S is tested by a test denoted by $ψ_{S, E_{n}}$ and we set $ψ_{F, E_{n}} = 0$ .

3 Utility Functions

We define utility functions that quantify the potential rewards for each of the possible trial outcomes as well as the cost of the trial. To model the rewards, we distinguish between the sponsor and the public health view, leading to different utility functions for the two perspectives:

U^{(v)} (d) = \sum_{i = S, S^{'}} φ_{i, d}^{(v)} - C_{d},

(2)

where v = Sponsor for the sponsor and v = Public for the public health view, d ∈ D denotes the trial design, $φ_{i, d}^{(v)}$ the reward due to the trial outcome in subgroup i = S, S′ and C_d the cost of the trial. The cost functions C_d of the different trial designs d ∈ D are sums of fixed costs and costs per recruited patient in the trial. Note that the per-group sample size n may vary among the three designs and below we will determine optimal sample sizes for each type of design.

For the classical design the cost function is given by

C_{C_{n}} = c_{setup} + 2 n c_{per-patient},

where the setup costs of the trial c_setup are fixed costs and c_per-patient are the marginal costs per patient. In the stratified design there are additional fixed costs c_biomarker to develop the biomarker and additional per patient costs to determine the biomarker status c_screening. Thus, the cost function of the stratified design is given by

C_{S_{n, α_{S}}} = c_{setup} + c_{biomarker} + 2 n (c_{per-patient} + c_{screening}) .

For the enrichment design the fixed costs are the same as for the stratified design. However, to recruit only biomarker positive patients one has to screen (on average) 2n/λ_S patients from the full population until 2n biomarker positive patients are identified. Given that the screening and determination of the biomarker status induces costs c_screening the cost function is given by

C_{E_{n}} = c_{setup} + c_{biomarker} + 2 n (c_{per-patient} + c_{screening} / λ_{S}) .

3.1 The Sponsor’s Utility Function

For the sponsor, the utility is the Net Present Value (NPV), which is defined as the reward (sales revenue) minus the trial costs. We model the sponsor’s reward as a function of (i) the outcome of the regulatory approval process, (ii) the price the sponsor can achieve, and (iii) the size of the population the drug is licensed for.

To model (i) and (ii) we define reward functions $φ_{i}^{(S p o n s o r)}$ for i = S, S′ that specify the reward obtained in the respective population. The reward function may depend on the observed data, the design of the pivotal trial d and the prevalence of the subgroup. We model the reward as the product of the price of the drug for the treatment of a single patient times the market size. Given an overall market size N, the market sizes of the two subgroups are λ_S N and λ_S′ N, respectively. Furthermore, we assume that the payers are willing to pay more if a larger treatment effect was observed. If the drug is authorized for neither subgroup, both reward functions are set to zero. If the drug is authorized for the subgroup S only, the reward for the complement S′ is set to zero. If the drug is authorized for the full population, we assume that the same price is charged in both subgroups.

We assume that (given that the respective hypothesis test rejects and the observed effect size exceeds a clinically relevant threshold) the price increases linearly with the observed effect size. Then the reward functions for the subgroups S and S′ are

\begin{array}{l} φ_{S, d}^{(S p o n s o r)} & = {\begin{matrix} λ_{S} N r_{S} ψ_{S, d} {({\hat{δ}}_{S, d} - μ_{S})}^{+} & if ψ_{F, d} = 0 \\ λ_{S} N r_{F} ψ_{F, d} {({\hat{δ}}_{F, d} - μ_{F})}^{+} & otherwise \end{matrix} \\ φ_{S^{'}, d}^{(S p o n s o r)} & = λ_{S^{'}} N r_{F} ψ_{F, d} {({\hat{δ}}_{F, d} - μ_{F})}^{+}, \end{array}

(3)

where μ_i denotes a minimal clinically relevant effect size for population i = S, F and (⋅)⁺ denotes the positive part. ${\hat{δ}}_{S, d}$ and ${\hat{δ}}_{F, d}$ are the estimates of δ_S and δ_F obtained from the trial data. The constants r_i for i = S, F are the marginal prices (the change in price if the observed effect size increases by one unit) and N denotes the total market size, which for the sponsor is defined as the number of future patients within the patent life of the therapy in the unselected, full population. Note that, given that efficacy is shown in the full population, a common treatment effect estimate ${\hat{δ}}_{F}$ is used in the price function. Then the overall reward within the patent life of the therapy is given by $φ_{S, d}^{(S p o n s o r)} + φ_{S^{'}, d}^{(S p o n s o r)}$ .

3.2 Public Health Utility Function

With the public health utility function we model the utility of trial designs under the assumption that the drug is developed by public health authorities. Therefore, the utility of a trial is given by the total health benefit to the society (adjusted by the production cost of the drug) minus the cost of running the trial. We assume that the benefit of the drug is measured on a monetary scale representing the expected, accumulated (over the whole treated population) treatment effect. Costs are assumed to be the same as under the sponsor view. The reward functions for the subgroups S and S′ are given by

\begin{array}{l} φ_{S, d}^{(P u b l i c)} & = {\begin{matrix} λ_{S} N r_{S} ψ_{S, d} (δ_{S} - μ_{S}) & if ψ_{F, d} = 0 \\ λ_{S} N r_{F} ψ_{F, d} (δ_{F} - μ_{F}) & otherwise \end{matrix} \\ φ_{S^{'}, d}^{(P u b l i c)} & = λ_{S^{'}} N r_{F} ψ_{F, d} (δ_{F} - μ_{F}), \end{array}

(4)

The first term in the utility function Eq (2) denotes the total benefits summed over the whole population, which are assumed to be proportional to the effect size (adjusted for a minimal relevant threshold), if the drug is authorized. The constants r_i for i = S, F are the marginal benefits (the change in benefit if the effect size increases by one unit), and N denotes the size of the future (unselected) patient population. Note that the benefit depends on the actual effect sizes δ_i and not on the corresponding trial estimates ${\hat{δ}}_{i, d}$ , implying that the benefit may be negative if the effect size is low. A consequence of this model choice is that a public health authority will take into account the risk of false positive approvals when optimizing its trial design. Such considerations are absent when a sponsor is optimizing, since we have assumed that only the estimated effects enter its utility function.

3.3 Optimizing the Expected Utility

Recall that π denotes a prior distribution on the effect sizes Δ. For a given utility function U^(v) and set of trial designs D the design optimizing the expected utility is given by

d^{*} \in {argmax}_{d \in D} E_{π} [U^{(v)} (d)],

(5)

where

E_{π} [U^{(v)} (d)] = \int E_{Δ} [U^{(v)} (d)] d π (Δ),

(6)

Note that the expectation is first taken over the data distribution given the effect sizes Δ and then over the prior distribution π.

4 Numerical Examples

We consider parallel group designs for the comparison of means of a continuous outcome. We assume that the responses in the control and experimental treatment arms k = C, T in subgroups j = S, S′ are normally distributed with mean θ_k,j and variance σ². However, utilizing the central limit theorem, the model can be modified to account for many other situations. The mean treatment effects in the two subgroups are given by δ_j = θ_T,j − θ_C,j. In the classical and the enrichment design non-stratified z-tests are performed to test H_F and H_S, respectively. In the stratified test, an elementary p-value p_S is computed from a z-test for H_S based on the observations in S and a p-value p_F from a stratified z-test for H_F stratifying by biomarker status. Then the closed Spiessens and Debois test is performed to adjust for multiplicity. We set σ = 1.

To be able to compute the expected utilities by numerical integration and not to have to rely on simulations, we approximated the sampling distributions for both the classical and the stratified designs by normal distributions (for the enrichment design the z-test statistic is exactly normally distributed). For the classical design, each subject recruited to the trial belongs to S with probability λ_S. Therefore, each observation in group i = T, C is with probability λ_S distributed as N(θ_i,S, σ²) and with probability λ_S′ distributed as N(θ_i,S′, σ²). If the biomarker is either prognostic (such that θ_C,S ≠ θ_C,S′) or predictive (such that δ_S ≠ δ_S′) the overall treatment effect estimate ${\hat{δ}}_{F}$ for the classical design is not exactly normal, but, for sufficiently large sample sizes, approximately normal by the central limit theorem. Because the observations are drawn from a mixture distribution, the standard deviation of ${\hat{δ}}_{F}$ increases with the absolute differences |θ_i,S − θ_i,S′|, i = T,C. For simplicity, in the numerical examples we assume that the biomarker is predictive only but not prognostic (i.e., θ_C,S = θ_C,S′, see Appendix A for further details). For the stratified design, we assume that the subgroup estimates, ${\hat{δ}}_{S}$ and ${\hat{δ}}_{S^{'}}$ , are constructed as the sample means of exactly λ_S n (resp. λ_S′ n) observations per group from the subgroups S and S′. However, if patients are not selected for the trial based on biomarker status, the number of subjects from each subgroup is binomially distributed, though, for large n, the random sample sizes have only little impact and the approximation becomes accurate. Therefore, in the numerical investigations, we introduced a minimal sample size of n_min = 50 patients per treatment arm. For the contour plot (in Subsection 4.1.3) optimization was performed by evaluating the objective function for a grid of candidate sample sizes (and α_F values for the stratified design). For the optimizations in the other plots, a further optimization step was applied by optimizing the objective functions with the R Version 3.2.4 procedure optim [23] using grid points as starting values.

The one-sided significance level is set to α = 0.025 and the consistency thresholds in the multiple test for the stratified design to τ_S, τ_S′ = 0.3. We consider discrete prior distributions π_{δ_S,i,δ_S′,i} on a grid (δ_S,i, δ_S′,i),i = 1, …, l of effect sizes and specify two priors corresponding to scenarios where there is either only weak or strong prior evidence that the biomarker is predictive. The prior distributions used in the examples are defined in Table 1 and depend on an effect size parameter δ. In the examples below we set δ = 0.3 with the exception of Subsection 4.1.3 where optimal designs for other choices of δ are explored.

Table 1. Prior distributions corresponding to scenarios where there is either only weak or strong prior evidence that the biomarker is predictive.

The constant δ > 0 parametrizes the effect sizes in the prior.

δ_S	0	δ	δ	δ
δ_S’	0	0	δ/2	δ
“weak biomarker prior”	0.2	0.2	0.3	0.3
“strong biomarker prior”	0.2	0.6	0.1	0.1

Open in a new tab

The reward and cost parameters in the sponsor and the public health utility function are specified via the following three cases:

Case 1 Corresponds to a large market, where the biomarker costs are negligible, i.e. Nr_S = Nr_F = 10,000 Million US Dollars (MUSD) per unit of efficacy and c_screening = c_biomarker = 0.
Case 2 Corresponds to a small market, where the biomarker costs are still negligible, i.e. Nr_F = Nr_S = 1000 MUSD per unit of efficacy.
Case 3 We add biomarker and screening costs, c_screening = 5000 USD per patient and c_biomarker = 10 MUSD. The reward parameters Nr_S and Nr_F are the same as in Case 2.

For all three cases we choose c_per-patient = 0.05 MUSD and c_setup = 1 MUSD. Note that the setup costs are assumed to be the same for the enrichment, classical and stratified design and therefore have no impact on the order of their expected utilities. However, they do have an impact on the sign of the utility, and thus whether any trial design is superior to no trial at all. In the reward functions Eqs (3) and (4) we set the minimal clinical relevant thresholds to μ_S = μ_F = 0.1, which is a third of the effect size δ = 0.3 used in the prior distributions in Section 4.1.1 and 4.1.2.

4.1 Results

We discuss the optimal designs for the weak and the strong biomarker prior and the three cases specifying the cost and reward parameters.

4.1.1 Optimization under the Weak Biomarker Prior

Large market, no biomarker costs (Case 1) The optimized utilities and corresponding optimal classical, stratified and enriched designs are shown in Fig 1.

Fig 1 — Optimized expected utilities and sample sizes for the enrichment, classical and stratified design as functions of the prevalence for λ_S ∈ [0.05, 0.95]. For the stratified design, optimized levels α_S and α_F for the multiple testing procedure are given. The last row shows the overall probability (averaged over the prior) that a significant treatment effect in H_S or H_F can be shown (and, for the stratified design, that the thresholds τ_S and τ_S′ are crossed). The priors are defined as in Table 1 with δ = 0.3.

Optimal utility. For the sponsor utility function, the stratified design has the largest expected utility, except for low prevalences where the classical design is optimal. The latter is on first sight surprising, because in Case 1 we assume no biomarker costs. However, in the stratified design (in contrast to the classical design), to show efficacy in the full population, we require that p_S and p_S′ do not exceed τ_S = τ_S′ = 0.3 (in addition to rejection of H_F in the multiple testing procedure). Thus, for low prevalences the sample size of the stratified design needs to be substantially increased to reach a sufficient power to show efficacy in F and therefore its expected utility is lower.

For the public health utility function we observe a similar pattern. However, for large λ_S the expected utility of the classical design is almost identical to that of the stratified design. This holds because the power to reject H_S in the optimized stratified design approaches the power to reject H_F in the classical design and the rewards obtained for authorization in populations S and F are similar. Why is the stratified design for the sponsor view still optimal in this case? This results from the fact that the size of the reward in the sponsor view depends on the observed rather than the true treatment effect: for trial outcomes where H_F can be rejected in the classical design but, due to the variability of estimates, ${\hat{δ}}_{S}$ is large but ${\hat{δ}}_{S^{'}}$ is small (and thus ${\hat{δ}}_{F} < {\hat{δ}}_{S}$ ) the reward for a market authorization in S may become larger than the reward in F. However, while the classical design leads to rejection of H_F in such cases, the stratified design rejects H_S and not H_F because of the consistency threshold.

Optimal sample size. Overall, the optimized sample sizes for the public health utility function are larger than for the sponsor utility function. They are lowest for the enrichment design, and—for smaller prevalences—largest for the stratified design. For the latter, the sample size increases sharply for low prevalences. This is due to the fact that a sufficient sample size in the subgroup is required to achieve adequate power for the rejection of both H_S and H_F (for the latter due to the consistency threshold τ_S). Furthermore, the relationship of the optimal sample size and the prevalence is qualitatively different for the three designs. For both utility functions the optimal sample size is increasing in the prevalence for the enrichment design (because the gain when demonstrating efficacy in S increases), decreasing for the stratified design (because, as noted above, a sufficient sample size in S is required for the rejection of H_S and for the rejection of H_F) and non-monotone for the classical design (essentially because the effect size in population F is increasing in λ_S such that for small λ_S the expected utility does not sufficiently increase with the sample size to compensate the additional costs, while for large λ_S a smaller sample size is sufficient to achieve adequate power).

Significance levels. In the intersection hypothesis test of the optimal multiple testing procedure in the stratified design α_S is larger than α_F for almost all prevalences. To make up for the lower sample sizes in the subgroup, the optimal design uses a larger α_S than α_F. For increasing prevalences, the correlation of the test statistics used to test H_S and H_F increases such that less multiplicity correction is required and both α_S and α_F increase.

Power. We define the power corresponding to a specific trial design as the overall probability (averaged over the prior) of regulatory approval in any population. This is a slight generalization of the traditional concept of power, which in the current context may be defined as the probability of regulatory approval conditional on a specific pair of subgroup effects. The power obtained by averaging over a prior has also been referred to as assurance [24]. The curves shown in Fig 1 correspond to the optimal designs. It can be seen that the power is largest for the enrichment design, followed by the stratified and the classical design and that it increases with the prevalence.

Note that for the stratified design, the probability to obtain marketing authorization in H_F is largest for intermediate values of λ_S and much lower than for the classical design if λ_S is large (even though the optimized sample sizes are similar in this case). This is due to the application of the consistency thresholds which are a more difficult to meet if one of the subgroups S or S′ is small.

Small market, no biomarker costs (Case 2) Case 2 differs from Case 1 only in that the rewards Nr_F and Nr_S are reduced by a factor 10. Because of the lower rewards the optimized expected utilities are smaller compared to Case 1 (see Fig 2 for the expected utilities and optimized design parameters). They decrease even more than by a factor 10 as the trial costs are not reduced proportionally.

However, the optimized sample sizes (and consequently the overall probabilities to show efficacy in the respective populations) are substantially smaller than in Case 1. Overall, the expected utilities follow a similar pattern as in Case 1 but the range of prevalences where the classical design has a higher expected utility is larger than in Case 1 for both the sponsor and the public health utility functions. The assumption of a smaller market qualitatively changes the optimized sample size of the stratified designs as a function of the prevalence. For low prevalences the optimized sample size is much lower than in Case 1: because the reward is lower, it does not pay off to invest in a large overall sample size to meet the threshold τ_S in the subpopulation. This is also reflected in the optimized significance levels α_S and α_F, which give more weight to H_F than in Case 1.

Small market with biomarker costs (Case 3) Note that the addition of biomarker costs has no impact on the expected utility of the optimal classical design (as it does not require the biomarker). However, the expected utilities of the enrichment and the stratified design become smaller compared to Case 2 because of the additional costs. Therefore, the classical design now dominates the stratified design for a broader range of (small) values of λ_S and the stratified design becomes optimal only for larger values of λ_S (see Fig A in S1 File). In the public health view, the classical design dominates the stratified design also for very large values of λ_S: even though the classical design leads to lower expected rewards compared to the stratified design (since the latter is more likely to lead to market authorization for too large a population), this is compensated by the lower costs because no biomarker is required. In the sponsor view in contrast, the difference between the expected rewards of the stratified and the classical design is larger because it is determined by observed treatment effects (see also the discussion of expected utilities in Case 1, where a similar pattern is observed). Therefore, the stratified design dominates also for large values of λ_S. Moreover, the biomarker costs lead to a reduction in sample size compared to Case 2.

4.1.2 Optimization under the Strong Biomarker Prior

First, note that the expected utility and optimal sample size of the enrichment design is the same for the weak and the strong biomarker prior because the prior distribution on the treatment effect in S is identical in both.

Large market, no biomarker costs (Case 1) For the sponsor’s utility function the stratified design is still optimal, with the exception of very low prevalences (see Fig B in S1 File). In contrast, for the public health utility function, the enrichment and the stratified design have almost identical expected utilities unless the prevalence is small.

Small market, no biomarker costs (Case 2) While, as in Case 1, the stratified design is optimal for the sponsor view for all but very low prevalences, the difference between the expected utilities of the stratified and enrichment design is small.

In contrast, for the public health utility function the enrichment design achieves the highest expected utility (see Fig 3). Furthermore, for very low prevalences, none of the trial designs has a positive expected utility in the public health view and the optimal strategy is to perform no trial at all. For the sponsor view it is still optimal to perform a trial in the unselected population, albeit with the minimal sample size if the prevalence is small. This is due to the assumption that the NPV depends on the observed effect sizes, which implies that the sponsor benefits from a high variability of the treatment effect estimates.

Note that the optimal test in the stratified design gives most weight on H_F for low and on H_S for large prevalences. This holds for both the public health and the sponsor utility function.

Small market, biomarker costs (Case 3) The pattern is very similar to Case 2, however, the range of λ_S values where the classical (for the sponsor utility) or no trial (for the public health utility) are optimal becomes larger (see Fig C in S1 File).

4.1.3 Optimized Designs for Varying Effect Sizes

Fig 4 shows the optimal design as function of the prevalence λ_S and the effect size parameter δ which parametrizes the effect sizes in the weak and strong biomarker priors in Table 1. Under the sponsor view, either the classical or the stratified design is optimal while the enrichment design never maximizes the expected utility. Surprisingly, even for δ = 0, it is never optimal from a sponsor point of view to conduct no study at all in the scenarios investigated. This is due to the fact that a false positive, even though unlikely, may lead to a large reward. Therefore the optimal sample size is the minimal sample size n_min in these scenarios. This choice minimizes the costs and maximizes the variability of estimates.

For the public health view in contrast, for very low effect sizes, the optimal decision is to perform no trial at all. Under the weak biomarker prior, the enrichment design is optimal under the public health view only in the scenarios without biomarker costs, for small δ and large enough prevalences (such that the population that will benefit from a new treatment in the future is large enough). For larger effect sizes the classical design is optimal for very low and very large prevalences and the stratified design otherwise.

Under the strong biomarker prior and intermediate δ, the public health utility is optimized by the enrichment design (unless the prevalence is too low and the classical design dominates). For larger δ the stratified design is optimal, again with the exception of very low prevalences. In addition, in the scenario with biomarker costs the classical design becomes optimal for large prevalences.

5 Discussion

The current study suggests decision-theoretic models for optimizing confirmatory biomarker trials, both from a sponsor and a public health perspective. Furthermore, it explores the potential discrepancies between the two perspectives.

The optimized designs depend sensitively on the particular configuration of parameter values. Besides the priors on the effect sizes, the assumptions on the market size and costs have a substantial impact on the optimized designs. Therefore, formulating simple rules of thumb for trial designs is hardly feasible. However, a few general observations can be made. The optimized sample sizes for the public health utility function are consistently larger than for the sponsor utility (assuming the same costs, market size and reward parameters r_F, r_S in both utility functions). This finding is likely due to the fact that sponsor benefit is based on the estimate of the benefit in the trial, whereas the public health benefit depends on the actual benefit. Thus the public health perspective implies a higher standard for the evidence. This finding provides a quantitative basis for the qualitative observation that health authorities tend to require a higher standard of evidence than desired by some sponsors. Mechanism design theory could potentially be applied to try to create mechanisms which align the incentives more completely.

Furthermore, for very low prevalences, the classical design outperforms the designs that are based on the biomarker. However, in these scenarios the expected utility for all designs can be negative in the public health perspective and so weakly positive in the sponsor perspective that the sponsor would allocate its resources elsewhere as well.

We find that in the sponsor view the enrichment designs never maximize the NPV in the considered scenarios. This is due to the fact that the sponsor may benefit from an authorization in the larger population even if the treatment is effective in the subpopulation only. For similar reasons, even under the global null hypothesis the strategy to perform a trial (with minimum sample size) gives a positive NPV in the sponsor view (a phenomenon that was observed also in other contexts [25]).

In the public health view the enrichment designs are optimal for a range of scenarios. Especially, if there is sufficient prior medical understanding that the biomarker negative subpopulation is unlikely to be positively affected by the drug, it can be a waste of resources to conduct the trial in this population. Ethical considerations reinforce this, as it can be argued that genuine informed consent [26] implies that patients should not be randomised if their expected utility is higher on standard of care than on randomised trial medication. On the other hand, in particular when subpopulations can be expected to be similar in efficacy, it is not always worthwhile to conduct biomarker screening. In fact, there is an increased risk in a stratified trial that the treatment is rejected in the biomarker negative subpopulation due to chance. Still, in situations with genuine uncertainty about the relative efficacy in the two subpopulations, biomarker determination and stratified designs may have a large value. An obvious extension of our model is to allow for trial adaptations, potentially closing the biomarker negative part of the trial at an interim, in case results are negative [27, 28].

When applying the presented framework to practical design decisions, the different model components should be scrutinized. In the numerical example we have assumed for simplicity that the biomarker is not prognostic but in practice this will often not be the case. If the biomarker is also prognostic, the variability of the effect size estimates will be increased with a consequent decrease in the expected utility of the classical design.

As regards the market size for the sponsor, N denotes the number of patients, determined by the patent life, for which full payment will be received upon regulatory approval. On the other hand, for the public health authority, N denotes the total number of future patients. In an extended model, N could be fixed to always be the total number of future patients and a factor could be added next to N for the sponsor. This factor would then represent the fraction of patients corresponding to the patent life, and could be made to depend on the choice of trial design in various ways. For example, in the enrichment design we accounted only for the screening costs arising from the determination of the biomarker status of patients. However, if the restriction of the trial population leads to slower recruitment and consequently a later authorization of the drug, the result will be a reduction of the market size and the remaining patent life. This, in turn, may reduce the potential reward in different ways for the two perspectives. Another simplification made in our framework is the assumption of a zero discount rate for the sponsor. In practice, a commercial sponsor would use a non-zero rate to discount future revenues, which would lead to a further reduction of its expected utility as compared to a public health authority.

In the considered model we assumed that the subgroup prevalences in the trial are the same as in the total patient population. However, unless the recruitment is stratified by subgroup, the actual prevalence in the trial will be stochastic. Furthermore, the propensity to participate in the trial may vary between subpopulations. While our results are generally robust to random variations in the prevalence, varying propensities for trial participation may lead to a biased estimation of the effect size in the full population in the classical design (and also the stratified design, if an overall effect estimate based on observed prevalences is computed). The question of generalizability of trial results to general patient populations is however not specific to the development of targeted therapies but a more general issue.

We did not explicitly incorporate a benefit risk evaluation of the treatment into the model. However, the parameters μ_S and μ_F in the reward functions (see Eqs (3) and (4)) can be interpreted as the minimal treatment effects that compensate the “costs” of the treatment, as the burden of treatment, side effects and monetary costs. While these are considered as given in our model, they could alternatively be estimated from clinical trial data.

We modelled the sponsor and public health utility as essentially linear functions of the observed and true effect size, respectively. From a commercial perspective this can be reasonable for scenarios where no alternative treatment options are available. However, if competitor products are on the market, the model may need to be modified because the market share, in terms of number of doses prescribed, and not only the price or benefit per patient may depend on the effect size. This can be incorporated by models where the market share is a function of the posterior distribution of efficacy (and possibly safety) parameters [29, 30]. Another aspect of our model for reimbursement concerns the pricing. Although NICE in the UK indicates that they, in our situation, would accept a price proportional to net benefit, payers in other countries may use other price models, possibly closer to a constant price. As an alternative to our linear sales model, an aggregated commercial model could be plugged in and similar optimization could be performed.

Finally, we note that the case of a very low prevalence, small market size and no biomarker costs mimics the situation of a rare disease, except there is no complementary subgroup S′. Therefore, our results could be seen to suggest that the investigation of rare diseases is not recommended in either perspective. Consequently, the question arises if research in rare diseases should receive special priority and be subsidised by society such that drug development occurs even though the expected utility to society is negative, or in some case weakly positive but less positive than other alternative expenditures. However, this argument raises ethical questions because the purely utilitarian viewpoint that underlies the decision theoretic framework does not account for other ethical principles as fairness and justice. Similar issues arise for small subgroups of common diseases, an increasing issue in cancer given the fact it being subdivided into many small molecular subclasses. In the case of cancer, increased benefit due to matching between molecular subgroups and targeted therapies may mollify this issue, but this remains to be seen in individual cases, so that the ethical and public policy dilemma may still be present.

Appendix

A Computation of Expected Utilities

We derive the expected utilities for a given effect size Δ for the enrichment, the classical and the stratified design. The overall expected utilities are then obtained by integrating over the prior distribution.

Enrichment Design

For the enrichment design, $ψ_{F, E_{n}} = φ_{S^{'}, E_{n}}^{(S p o n s o r)} = 0$ . Thus, the utility is given by

U^{(S p o n s o r)} (E_{n}) = φ_{S, E_{n}}^{(S p o n s o r)} - C_{E_{n}} = λ_{S} N r_{S} ψ_{S, E_{n}} {({\hat{δ}}_{S, E_{n}} - μ_{S})}^{+} - C_{E_{n}} .

Integrating over the resulting truncated normal distribution, the expected utility given Δ is given by

E_{Δ} [U^{(S p o n s o r)} (E_{n})] = λ_{S} N r_{S} ((1 - Φ (κ)) (δ_{S} - μ_{S}) + V {[{\hat{δ}}_{S, E_{n}}]}^{1 / 2} ϕ (κ)) - C_{E_{n}},

where ϕ denotes the density and Φ denotes the cumulative distribution function of the standard normal distribution, $V [{\hat{δ}}_{S, E_{n}}] = 2 σ^{2} / n$ and

κ = [max (Φ^{- 1} (1 - α) V {[{\hat{δ}}_{S, E_{n}}]}^{1 / 2}, μ_{S}) - δ_{S}] / V {[{\hat{δ}}_{S, E_{n}}]}^{1 / 2} .

Similarly, for the public health view utility function we obtain

E_{Δ} [U^{(P u b l i c)} (E_{n})] = λ_{S} N r_{S} (δ_{S} - μ_{S}) (1 - Φ (Φ^{- 1} (1 - α) - δ_{S} V {[{\hat{δ}}_{S, E_{n}}]}^{- 1 / 2})) - C_{E_{n}} .

Classical Design

In the classical design, $ψ_{S, C_{n}} = 0$ , $φ_{S, C_{n}}^{(S p o n s o r)} = φ_{S^{'}, C_{n}}^{(S p o n s o r)}$ and

U^{(S p o n s o r)} (C_{n}) = φ_{S, C_{n}}^{(S p o n s o r)} + φ_{S^{'}, C_{n}}^{(S p o n s o r)} - C_{C_{n}} = N r_{F} ψ_{F, C_{n}} {({\hat{δ}}_{F, C_{n}} - μ_{F})}^{+} - C_{C_{n}}

If the mean response in S and S′ differ, it follows that the observations in the experimental treatment and control group follow a mixture distribution of two normal distributions. Therefore, the variance of ${\hat{δ}}_{F, C_{n}}$ in the classical design is given by

V [{\hat{δ}}_{F, C_{n}}] = (2 σ^{2} + λ_{S} λ_{S^{'}} ({(θ_{T, S} - θ_{T, S^{'}})}^{2} + {(θ_{C, S} - θ_{C, S^{'}})}^{2})) / n .

Thus, the expected utility given effect sizes Δ for the classical design is given by

E_{Δ} [U^{(S p o n s o r)} (C_{n})] = N r_{F} ((1 - Φ (κ)) (δ_{F} - μ_{F}) + V {[{\hat{δ}}_{F, C_{n}}]}^{1 / 2} ϕ (κ)) - C_{C_{n}},

where δ_F = λ_S δ_S + λ_S′ δ_S′ and $κ = (max (Φ^{- 1} (1 - α) V {[{\hat{δ}}_{F, C_{n}}]}^{1 / 2}, μ_{F}) - δ_{F}) / V {[{\hat{δ}}_{F, C_{n}}]}^{1 / 2}$ . Similarly, for the public health utility function,

E_{Δ} [U^{(P u b l i c)} (C_{n})] = N r_{F} (δ_{F} - μ_{F}) (1 - Φ (Φ^{- 1} (1 - α) - δ_{F} V {[{\hat{δ}}_{F, C_{n}}]}^{- 1 / 2})) - C_{C_{n}} .

Stratified Design

The utility of the stratified design is given by

U^{(S p o n s o r)} (S_{n, α_{S}}) = - C_{S_{n, α_{S}}} +  x007B;\begin{matrix} λ_{S} N r_{S} ψ_{S, S_{n, α_{S}}} {({\hat{δ}}_{S, S_{n, α_{S}}} - μ_{S})}^{+} & if ψ_{F, S_{n, α_{S}}} = 0 \\ N r_{F} ψ_{F, S_{n, α_{S}}} {({\hat{δ}}_{F, S_{n, α_{S}}} - μ_{F})}^{+} & otherwise. \end{matrix}

The utility of the stratified design depends on the stratified treatment effect estimate in the full population (in the following we shorten the notation by dropping the design index, ${\hat{δ}}_{F} : = {\hat{δ}}_{F, S_{n, α_{S}}}, {\hat{δ}}_{S} : = {\hat{δ}}_{S, S_{n, α_{S}}}, {\hat{δ}}_{S^{'}} : = {\hat{δ}}_{S^{'}, S_{n, α_{S}}}, ψ_{F} : = ψ_{F, S_{n, α_{S}}}, ψ_{S} : = ψ_{S, S_{n, α_{S}}}$ ) which is a weighted sum of ${\hat{δ}}_{S}$ and ${\hat{δ}}_{S^{'}}$ . The expected utility given the effect sizes Δ is given by

E_{Δ} [U^{(S p o n s o r)} (S_{n, α_{S}})] = N r_{F} E_{Δ} [ψ_{F} {({\hat{δ}}_{F} - μ_{F})}^{+}] + λ_{S} N r_{S} E_{Δ} [(1 - ψ_{F}) ψ_{S} {({\hat{δ}}_{S} - μ_{S})}^{+}] - C_{S_{n, α_{S}}} .

and can be computed by numeric integration: Let A_F(n, α_S; σ, α, λ_S, τ_S, τ_S′, μ_F) denote the region in $R^{2}$ where $ψ_{F} {({\hat{δ}}_{F} (Z_{S}, Z_{S^{'}}) - μ_{F})}^{+} > 0$ and let A_S(n, α_S; σ, α, λ_S, τ_S,τ_S′, μ_S) be the region where $(1 - ψ_{F}) ψ_{S} {({\hat{δ}}_{S} (Z_{S}) - μ_{S})}^{+} > 0$ , where ${\hat{δ}}_{F} (Z_{S}, Z_{S^{'}})$ is the stratified treatment effect estimate and Z_S, Z_S′ the z-statistics computed from the observations in S and S′ respectively. Then

E_{Δ} [ψ_{F} {({\hat{δ}}_{F} (z_{S}, z_{S^{'}}) - μ_{F})}^{+}] = \underset{A_{F}}{\int \int} ({\hat{δ}}_{F} (z_{S}, z_{S^{'}}) - μ_{F}) ϕ (z_{S}) ϕ (z_{S^{'}}) d z_{S} d z_{S^{'}},

and

E_{Δ} [(1 - ψ_{F}) ψ_{S} {({\hat{δ}}_{S} (Z_{S}) - μ_{S})}^{+}] = \underset{A_{S}}{\int \int} ({\hat{δ}}_{S} (z_{S}) - μ_{S}) ϕ (z_{S}) ϕ (z_{S^{'}}) d z_{S} d z_{S^{'}} .

The shapes of the regions A_F and A_S depend on the specific values of the parameters and the design variables (α_S, α_F, τ_S, τ_S′ and n). However, the regions may in all cases be described by means of a finite number of straight lines, implying that the expected values above can be computed using standard software for numerical quadrature in two dimensions. But since the integrands are linear in z_S′ and Z_S′ follows a normal distribution, one-dimensional integration may be carried out analytically in the z_S′-direction before applying a numerical method. This leads to faster numerical evaluations, which is useful when investigating how the optimal solution changes over the parameter space.

For the public health view the expected utility given Δ may be written as

E_{Δ} [U^{(P u b l i c)} (S_{n, α_{S}})] = N r_{F} (δ_{F} - μ_{F}) E_{Δ} [ψ_{F}] + λ_{S} N r_{S} (δ_{S} - μ_{S}) E_{Δ} [(1 - ψ_{F}) ψ_{S}] - C_{S_{n, α_{S}}} .

The numerical evaluation is similar to the evaluation of the conditional expectation of the utility of the stratified design for the sponsor’s view.

Supporting Information

S1 File. Supplementary Material.

(PDF)

Click here for additional data file.^{(99.7KB, pdf)}

Data Availability

This is a methodological paper which is not based on empirical data sets.

Funding Statement

MP, NS, TO, were funded by the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement number FP7 HEALTH 2013-602144 and CFB, FK, SJ under grant number FP7 HEALTH 2013-602552. The funder provided support in the form of salaries for authors SJ, FK, TO, MP, NS, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. CFB is an employee of AstraZeneca and was funded by grant number FP7 HEALTH 2013-602552 under a consultancy contract from AstraZeneca to Chalmers University for the purpose of conducting the work described herein. AstraZeneca did not play a role in the study data collection and analysis, decision to publish, preparation of the manuscript or financial support to CFB. The specific roles of these authors are articulated in the ‘author contributions’ section.

References

1. Mandrekar SJ, Sargent DJ. Clinical Trial Designs for Predictive Biomarker Validation: One Size Does Not Fit All. Journal of Biopharmaceutical Statistics. 2009;19(3):530–542. 10.1080/10543400902802458 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Chen C, Beckman RA. Hypothesis Testing in a Confirmatory Phase III Trial With a Possible Subset Effect. Statistics in Biopharmaceutical Research. 2009;1:431–440. 10.1198/sbr.2009.0039 [DOI] [Google Scholar]
3. Freidlin B, McShane LM, Korn EL. Randomized Clinical Trials With Biomarkers: Design Issues. JNCI Journal of the National Cancer Institute. 2010;102(3):152–160. 10.1093/jnci/djp477 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Mandrekar SJ, Sargent DJ. Design of clinical trials for biomarker research in oncology. Clinical Investigation. 2011;1(12):1627–1636. 10.4155/cli.11.152 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Freidlin B, McShane LM, Polley MYC, Korn EL. Randomized Phase II Trial Designs With Biomarkers. Journal of Clinical Oncology. 2012;30(26):3304–3309. 10.1200/JCO.2012.43.3946 [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Ziegler A, Koch A, Krockenberger K, Grosshennig A. Personalized medicine using DNA biomarkers: a review. Human Genetics. 2012;131(10):1627–1638. 10.1007/s00439-012-1188-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Ondra T, Dmitrienko A, Friede T, Graf A, Miller F, Stallard N, et al. Methods for identification and confirmation of targeted subgroups in clinical trials: a systematic review. Journal of Biopharmaceutical Statistics. 2016;26(1):99–119. 10.1080/10543406.2015.1092034 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Song Y, Chi GYH. A method for testing a prespecified subgroup in clinical trials. Statistics in Medicine. 2007;26(19):3535–3549. 10.1002/sim.2825 [DOI] [PubMed] [Google Scholar]
9. Alosh M, Huque MF. A flexible strategy for testing subgroups and overall population. Statistics in Medicine. 2009;28(1):3–23. 10.1002/sim.3461 [DOI] [PubMed] [Google Scholar]
10. Bretz F, Maurer W, Brannath W, Posch M. A graphical approach to sequentially rejective multiple test procedures. Statistics in Medicine. 2009;28(4):586–604. 10.1002/sim.3495 [DOI] [PubMed] [Google Scholar]
11. Burman CF, Sonesson C, Guilbaud O. A recycling framework for the construction of Bonferroni-based multiple tests. Statistics in Medicine. 2009;28(4):739–761. 10.1002/sim.3513 [DOI] [PubMed] [Google Scholar]
12. Zhao YD, Dmitrienko A, Tamura R. Design and Analysis Considerations in Clinical Trials With a Sensitive Subpopulation. Statistics in Biopharmaceutical Research. 2010;2(1):72–83. 10.1198/sbr.2010.08039 [DOI] [Google Scholar]
13. Spiessens B, Debois M. Adjusted significance levels for subgroup analyses in clinical trials. Contemporary Clinical Trials. 2010;31(6):647–656. 10.1016/j.cct.2010.08.011 [DOI] [PubMed] [Google Scholar]
14. Bretz F, Posch M, Glimm E, Klinglmueller F, Maurer W, Rohmeyer K. Graphical approaches for multiple comparison procedures using weighted Bonferroni, Simes, or parametric tests. Biometrical Journal. 2011;53(6):894–913. 10.1002/bimj.201000239 [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Millen BA, Dmitrienko A. Chain procedures: A class of flexible closed testing procedures with clinical trial applications. Statistics in Biopharmaceutical Research. 2011;3(1):14–30. 10.1198/sbr.2010.09014 [DOI] [Google Scholar]
16. Alosh M, Huque MF. Multiplicity considerations for subgroup analysis subject to consistency constraint. Biometrical Journal. 2013;55(3):444–462. 10.1002/bimj.201200065 [DOI] [PubMed] [Google Scholar]
17. Beckman RA, Clark J, Chen C. Integrating predictive biomarkers and classifiers into oncology clinical development programmes. Nature Reviews Drug Discovery. 2011;10(10):735–748. 10.1038/nrd3550 [DOI] [PubMed] [Google Scholar]
18. Krisam J, Kieser M. Decision Rules for Subgroup Selection Based on a Predictive Biomarker. Journal of Biopharmaceutical Statistics. 2014;24(1):188–202. 10.1080/10543406.2013.856018 [DOI] [PubMed] [Google Scholar]
19. Götte H, Donica M, Mordenti G. Improving probabilities of correct interim decision in population enrichment designs. Journal of Biopharmaceutical Statistics. 2015;25(5):1020–38. 10.1080/10543406.2014.929583 [DOI] [PubMed] [Google Scholar]
20. Kirchner M, Kieser M, Götte H, Schüler A. Utility-based optimization of phase II/III programs. Statistics in Medicine. 2016;35(2). 10.1002/sim.6624 [DOI] [PubMed] [Google Scholar]
21. Krisam J, Kieser M. Optimal Decision Rules for Biomarker-Based Subgroup Selection for a Targeted Therapy in Oncology. Int J Mol Sci. 2015;16(5):10354–75. 10.3390/ijms160510354 [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Graf AC, Posch M, Koenig F. Adaptive designs for subpopulation analysis optimizing utility functions. Biometrical Journal. 2015;57:76–89. 10.1002/bimj.201300257 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.R Core Team. R: A Language and Environment for Statistical Computing; 2016. Available from: https://www.R-project.org/.
24. O’Hagan A, Stevens JW, Campbell MJ. Assurance in clinical trial design. Pharmaceutical Statistics. 2005;4:187–201. 10.1002/pst.175 [DOI] [Google Scholar]
25. Posch M, Bauer P. Adaptive budgets in clinical trials. Statistics in Biopharmaceutical Research. 2013;5:282–292. 10.1080/19466315.2013.783504 [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Burman CF, Carlberg A. Future Challenges in Design and Ethics of Clinical Trials In: Pharmaceutical Sciences Encyclopedia. vol. 51; 2010. p. 1–28. 10.1002/9780470571224.pse250 [DOI] [Google Scholar]
27. Brannath W, Zuber E, Branson M, Bretz F, Gallo P, Posch M, et al. Confirmatory adaptive designs with Bayesian decision tools for a targeted therapy in oncology. Statistics in Medicine. 2009;28:1445–1463. 10.1002/sim.3559 [DOI] [PubMed] [Google Scholar]
28. Bauer P, Bretz F, Dragalin V, König F, Wassmer G. Twenty-five years of confirmatory adaptive designs: opportunities and pitfalls. Statistics in Medicine. 2016;35:325–347. 10.1002/sim.6472 [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Gittins J, Pezeshk H. A behavioral Bayes method for determining the size of a clinical trial. Drug Information Journal. 2000;34:355–363. 10.1177/009286150003400204 [DOI] [Google Scholar]
30. Kikuchi T, Pezeshk H, Gittins J. A Bayesian cost-benefit approach to the determination of sample size in clinical trials. Statistics in Medicine. 2008;27(1). 10.1002/sim.2965 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 File. Supplementary Material.

(PDF)

Click here for additional data file.^{(99.7KB, pdf)}

Data Availability Statement

This is a methodological paper which is not based on empirical data sets.

[pone.0163726.ref001] 1. Mandrekar SJ, Sargent DJ. Clinical Trial Designs for Predictive Biomarker Validation: One Size Does Not Fit All. Journal of Biopharmaceutical Statistics. 2009;19(3):530–542. 10.1080/10543400902802458 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0163726.ref002] 2. Chen C, Beckman RA. Hypothesis Testing in a Confirmatory Phase III Trial With a Possible Subset Effect. Statistics in Biopharmaceutical Research. 2009;1:431–440. 10.1198/sbr.2009.0039 [DOI] [Google Scholar]

[pone.0163726.ref003] 3. Freidlin B, McShane LM, Korn EL. Randomized Clinical Trials With Biomarkers: Design Issues. JNCI Journal of the National Cancer Institute. 2010;102(3):152–160. 10.1093/jnci/djp477 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0163726.ref004] 4. Mandrekar SJ, Sargent DJ. Design of clinical trials for biomarker research in oncology. Clinical Investigation. 2011;1(12):1627–1636. 10.4155/cli.11.152 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0163726.ref005] 5. Freidlin B, McShane LM, Polley MYC, Korn EL. Randomized Phase II Trial Designs With Biomarkers. Journal of Clinical Oncology. 2012;30(26):3304–3309. 10.1200/JCO.2012.43.3946 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0163726.ref006] 6. Ziegler A, Koch A, Krockenberger K, Grosshennig A. Personalized medicine using DNA biomarkers: a review. Human Genetics. 2012;131(10):1627–1638. 10.1007/s00439-012-1188-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0163726.ref007] 7. Ondra T, Dmitrienko A, Friede T, Graf A, Miller F, Stallard N, et al. Methods for identification and confirmation of targeted subgroups in clinical trials: a systematic review. Journal of Biopharmaceutical Statistics. 2016;26(1):99–119. 10.1080/10543406.2015.1092034 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0163726.ref008] 8. Song Y, Chi GYH. A method for testing a prespecified subgroup in clinical trials. Statistics in Medicine. 2007;26(19):3535–3549. 10.1002/sim.2825 [DOI] [PubMed] [Google Scholar]

[pone.0163726.ref009] 9. Alosh M, Huque MF. A flexible strategy for testing subgroups and overall population. Statistics in Medicine. 2009;28(1):3–23. 10.1002/sim.3461 [DOI] [PubMed] [Google Scholar]

[pone.0163726.ref010] 10. Bretz F, Maurer W, Brannath W, Posch M. A graphical approach to sequentially rejective multiple test procedures. Statistics in Medicine. 2009;28(4):586–604. 10.1002/sim.3495 [DOI] [PubMed] [Google Scholar]

[pone.0163726.ref011] 11. Burman CF, Sonesson C, Guilbaud O. A recycling framework for the construction of Bonferroni-based multiple tests. Statistics in Medicine. 2009;28(4):739–761. 10.1002/sim.3513 [DOI] [PubMed] [Google Scholar]

[pone.0163726.ref012] 12. Zhao YD, Dmitrienko A, Tamura R. Design and Analysis Considerations in Clinical Trials With a Sensitive Subpopulation. Statistics in Biopharmaceutical Research. 2010;2(1):72–83. 10.1198/sbr.2010.08039 [DOI] [Google Scholar]

[pone.0163726.ref013] 13. Spiessens B, Debois M. Adjusted significance levels for subgroup analyses in clinical trials. Contemporary Clinical Trials. 2010;31(6):647–656. 10.1016/j.cct.2010.08.011 [DOI] [PubMed] [Google Scholar]

[pone.0163726.ref014] 14. Bretz F, Posch M, Glimm E, Klinglmueller F, Maurer W, Rohmeyer K. Graphical approaches for multiple comparison procedures using weighted Bonferroni, Simes, or parametric tests. Biometrical Journal. 2011;53(6):894–913. 10.1002/bimj.201000239 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0163726.ref015] 15. Millen BA, Dmitrienko A. Chain procedures: A class of flexible closed testing procedures with clinical trial applications. Statistics in Biopharmaceutical Research. 2011;3(1):14–30. 10.1198/sbr.2010.09014 [DOI] [Google Scholar]

[pone.0163726.ref016] 16. Alosh M, Huque MF. Multiplicity considerations for subgroup analysis subject to consistency constraint. Biometrical Journal. 2013;55(3):444–462. 10.1002/bimj.201200065 [DOI] [PubMed] [Google Scholar]

[pone.0163726.ref017] 17. Beckman RA, Clark J, Chen C. Integrating predictive biomarkers and classifiers into oncology clinical development programmes. Nature Reviews Drug Discovery. 2011;10(10):735–748. 10.1038/nrd3550 [DOI] [PubMed] [Google Scholar]

[pone.0163726.ref018] 18. Krisam J, Kieser M. Decision Rules for Subgroup Selection Based on a Predictive Biomarker. Journal of Biopharmaceutical Statistics. 2014;24(1):188–202. 10.1080/10543406.2013.856018 [DOI] [PubMed] [Google Scholar]

[pone.0163726.ref019] 19. Götte H, Donica M, Mordenti G. Improving probabilities of correct interim decision in population enrichment designs. Journal of Biopharmaceutical Statistics. 2015;25(5):1020–38. 10.1080/10543406.2014.929583 [DOI] [PubMed] [Google Scholar]

[pone.0163726.ref020] 20. Kirchner M, Kieser M, Götte H, Schüler A. Utility-based optimization of phase II/III programs. Statistics in Medicine. 2016;35(2). 10.1002/sim.6624 [DOI] [PubMed] [Google Scholar]

[pone.0163726.ref021] 21. Krisam J, Kieser M. Optimal Decision Rules for Biomarker-Based Subgroup Selection for a Targeted Therapy in Oncology. Int J Mol Sci. 2015;16(5):10354–75. 10.3390/ijms160510354 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0163726.ref022] 22. Graf AC, Posch M, Koenig F. Adaptive designs for subpopulation analysis optimizing utility functions. Biometrical Journal. 2015;57:76–89. 10.1002/bimj.201300257 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0163726.ref023] 23.R Core Team. R: A Language and Environment for Statistical Computing; 2016. Available from: https://www.R-project.org/.

[pone.0163726.ref024] 24. O’Hagan A, Stevens JW, Campbell MJ. Assurance in clinical trial design. Pharmaceutical Statistics. 2005;4:187–201. 10.1002/pst.175 [DOI] [Google Scholar]

[pone.0163726.ref025] 25. Posch M, Bauer P. Adaptive budgets in clinical trials. Statistics in Biopharmaceutical Research. 2013;5:282–292. 10.1080/19466315.2013.783504 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0163726.ref026] 26. Burman CF, Carlberg A. Future Challenges in Design and Ethics of Clinical Trials In: Pharmaceutical Sciences Encyclopedia. vol. 51; 2010. p. 1–28. 10.1002/9780470571224.pse250 [DOI] [Google Scholar]

[pone.0163726.ref027] 27. Brannath W, Zuber E, Branson M, Bretz F, Gallo P, Posch M, et al. Confirmatory adaptive designs with Bayesian decision tools for a targeted therapy in oncology. Statistics in Medicine. 2009;28:1445–1463. 10.1002/sim.3559 [DOI] [PubMed] [Google Scholar]

[pone.0163726.ref028] 28. Bauer P, Bretz F, Dragalin V, König F, Wassmer G. Twenty-five years of confirmatory adaptive designs: opportunities and pitfalls. Statistics in Medicine. 2016;35:325–347. 10.1002/sim.6472 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0163726.ref029] 29. Gittins J, Pezeshk H. A behavioral Bayes method for determining the size of a clinical trial. Drug Information Journal. 2000;34:355–363. 10.1177/009286150003400204 [DOI] [Google Scholar]

[pone.0163726.ref030] 30. Kikuchi T, Pezeshk H, Gittins J. A Bayesian cost-benefit approach to the determination of sample size in clinical trials. Statistics in Medicine. 2008;27(1). 10.1002/sim.2965 [DOI] [PubMed] [Google Scholar]

PERMALINK

Optimizing Trial Designs for Targeted Therapies

Thomas Ondra

Sebastian Jobjörnsson

Robert A Beckman

Carl-Fredrik Burman

Franz König

Nigel Stallard

Martin Posch

Roles

Abstract

1 Introduction

2 Testing Problem and Considered Trial Designs

3 Utility Functions

3.1 The Sponsor’s Utility Function

3.2 Public Health Utility Function

3.3 Optimizing the Expected Utility

4 Numerical Examples

Table 1. Prior distributions corresponding to scenarios where there is either only weak or strong prior evidence that the biomarker is predictive.

4.1 Results

4.1.1 Optimization under the Weak Biomarker Prior

Fig 1. Weak biomarker prior and a large market with no biomarker costs (Case 1).

Fig 2. Weak biomarker prior and a small market with no biomarker costs (Case 2).

4.1.2 Optimization under the Strong Biomarker Prior

Fig 3. Strong biomarker prior and a small market with no biomarker costs (Case 2).

4.1.3 Optimized Designs for Varying Effect Sizes

Fig 4. Optimal designs for different combinations of the prevalence λS ∈ [0.05, 0.95] and effect size parameter δ ∈ [0, 1].

5 Discussion

Appendix

A Computation of Expected Utilities

Enrichment Design

Classical Design

Stratified Design

Supporting Information

Data Availability

Funding Statement

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Fig 4. Optimal designs for different combinations of the prevalence λ_S ∈ [0.05, 0.95] and effect size parameter δ ∈ [0, 1].