Abstract
In this paper we consider three-arm non-inferiority (NI) trial that includes an experimental, a reference, and a placebo arm. While for binary outcomes the risk difference (RD) is the most common and well explored functional form for testing efficacy (or effectiveness), recent FDA guideline suggested other measures such as relative risk (RR) and odds ratio (OR) on the basis of which NI of an experimental treatment can be claimed. However, developing test based on these different functions of binary outcomes are challenging since the construction and interpretation of NI margin for such functions are not trivial extensions of RD based approach. Recently, we have proposed Frequentist approaches for testing NI for these functionals. In this article we further develop Bayesian approaches for testing NI based on effect retention approach for RR and OR. Bayesian paradigm provides a natural path to integrate historical trials’ information, as well as it allows the usage of patients’/clinicians’ opinions as prior information via sequential learning. In addition we discuss, in detail, the sample size/power calculation which could be readily used while designing such trials in practice.
Keywords: Assay Sensitivity, Binary Outcome, Fraction Margin, Markov chain Monte Carlo, Non-inferiority Margin, Risk/Odds Ratio, Three-arm Trial
1. Introduction
In the presence of established treatments/therapies, Active-controlled Non-inferiority (NI) trial designs are attractive ethical alternatives to placebo-controlled Randomized Control Trials (RCTs). A slightly less efficacious treatment may be preferable to a group of patients in view of lower toxicity, less intensive side effects, ease of delivery and other less debilitating factors. NI trials are intended to show if the new intervention retains a substantial portion of the active control effect, dictated by a pre-specified margin, often termed as NI margin (δ), yet possesses other attractive properties compared to established treatment regime. Such NI margin must be prospectively defined and should be so chosen to reflect maximum acceptable extent of clinical non-inferiority of an experimental treatment. Further detailed discussion on the construction and desirable properties of NI margin can be found in the regulatory guidelines (FDA (2016), ICHE9 (2009), ICHE10 (2009)) and references (e.g., Althunian et al. (2017), Schumi and Wittes (2011), Brown et al. (2006), Hung and Wang (2004)). NI trials may or may not include a placebo arm due to ethical reasons. Such two-arm placebo-free NI trials make two important assumptions regarding Assay Sensitivity (ICHE9 (2009), ICHE10 (2009)) and Constancy, and depend heavily on external validations (D’Agostino et al. (2003) and FDA (2016)) and several other limiting factors as specified in Kieser and Stucke (2016). Assay sensitivity (AS) refers to the ability of the trial to distinguish between effective and ineffective treatments. To alleviate some of these issues, it is recommended by EMA (2005) to include a placebo arm in the current trial, when ethically acceptable as well as practically feasible. This results in a three-arm “gold-standard” design that has greater confidence concerning AS and lesser concern on external validity.
For three-arm trial in the Frequentist setup, fraction margin approach (Pigeot et al., 2003) is popularly used for NI testing, where NI margin is adaptively constructed as the pre-specified negative fraction of the unknown effect size of the reference treatment over placebo in the current three-arm trial. This approach was extended by Kieser and Friede (2007) for the binary outcomes considering risk difference (RD) as the chosen function of interest, while Chowdhury et al. (2017) extended it for the same function using a Bayesian approach. In a very recent paper Ghosh et al. (2018) proposed a Bayesian approach for RD albeit under the joint testing of AS and NI. For binary outcomes, although RD is the simplest functional form, as mentioned in the recent FDA guidance (FDA (2016), Page 24), there are other functionals (e.g., risk ratio (RR), odds ratio (OR), etc.) which could be used for assessing treatment effect (Hashemi et al., 1997) and hence in the context of NI as well. Under the NI setup, there exists published work for odds ratio for two-arm trial under the Frequentist’s approach, see for example Hilton (2010) and Rousson and Seifert (2008), however, to the best of our knowledge, no work exists for RR/OR for three-arm trial. Recently, Chowdhury et al. (2018b) proposed a novel Frequentist’s approach incorporating the condition of AS for developing three-arm NI tests for RR and OR. Also very recently Chowdhury et al. (2018a) developed a Bayesian approach of testing NI for RR/OR in two arm trial. Since NI trials involve treatments that have been well studied in the past, the availability and usage of historical information in the current NI trial is an advantage (Ghosh et al., 2011; Gamalo et al., 2011; Ghosh et al., 2018). Bayesian paradigm provides a natural path to incorporate the useful information from various historical trials and then to combine them with the current trial, thus possibly reducing sample size and other cost burden. In this paper, we propose a fully Bayesian and two approximation-based Bayesian approaches for both RR and OR. We also provide the sample size calculation under Bayesian methods to achieve a desired level of power.
The rest of the article is organized as follows. In Section 2, we give the NI hypothesis for RR and OR for three-arm NI trial and discuss briefly the Frequentist’s methods for testing. In Section 3, we propose three Bayesian methods, incorporating the condition of assay sensitivity, to design and perform the analysis of NI trial for both the functionals. In Section 4, we discuss the power and sample size calculation and details of simulation studies. Finally in Section 5, we apply our proposed methods on a published clinical trial dataset. We conclude the article with discussions in Section 6. All proofs are provided in a separate supplementary file.
2. Frequentist Approach for NI Testing for Different Functionals
For a three-arm trial, fraction margin approach (Pigeot et al., 2003; Kieser and Friede, 2007) is popularly used for testing NI hypothesis and finding the corresponding decision rule. We begin our illustration borrowing the notations from Kieser and Friede (2007). Denote the primary end-points from the Placebo (P ), Reference (R) and the Experimental (E) treatment in the current trial by XP, XR, and XE respectively, each following Bin (nl, πl), where πl is the probability of success and nl is the sample size for the lth arm, l ∈ {P, R, E}. Without loss of generality, we assume that higher response probabilities indicate greater treatment benefits. Gamalo et al. (2011) used the two-arm fixed margin approach for NI testing considering RD as the functional of interest. Pigeot et al. (2003) formulated the three-arm NI hypothesis for continuous outcome, whereas Kieser and Friede (2007) formulated the same for binary outcomes under fraction margin approach. In this set up, the construction of δ(< 0) can be mathematically expressed as δ = f (πR − πP ), where f is a negative fraction, f ∈ [−1, 0], assuming the condition of assay sensitivity, that is, (πR − πP > 0) holds. We reformulate the NI hypothesis for RR and OR using a general functional form as
(2.1) |
where δ ∈ [0, 1] denotes a pre-chosen portion of the unknown effect size of the activecontrol over placebo and ψ(.) is an appropriate function of choice (e.g., RR, OR). To formulate the NI hypothesis for RR and OR, we follow Wangge et al. (2013), where the formula for calculating δ (which is same as the quantity M2 in their paper) is given as M2 = (1/M1)1−preserved effects, where M1 is the effect of the active control relative to the placebo that is still present in the current trial. Along the same line, for the RR (or OR) scale we propose the construction of δ as δ = (1/ψ(πR, πP ))−f = (1/ψ(πR, πP ))1−θ, where f ∈ [−1, 0] is the loss of effect of the experimental drug compared to active control effect, and θ = 1 + f is the preservation effect; that is, the proportion of the active control effect retained by the experimental drug in the current trial. Now for the specific case of RR, and , while for OR, and . Note that for both RR and OR, the NI margin satisfies two important boundary conditions: for f = −1, the active control looses its practical significance over placebo, hence the test reduces to the simple superiority test of πE over πP, while for f = 0, the test becomes superiority test of πE over πR. Below, we give the NI hypothesis for RR and OR respectively:
(2.2) |
(2.3) |
Since logarithm transformations make the data conform more closely to the Normal distribution giving better asymptotic performance, we also give the NI hypothesis for RR and OR taking logarithm of both sides of (2.2) and (2.3) respectively:
(2.4) |
(2.5) |
Next, we briefly discuss the Frequentist approaches for NI testing for RR and OR that have already been proposed in Chowdhury et al. (2018b). The NI hypothesis in (2.4) and (2.5) can be written in a general form as
(2.6) |
Thus for RR: g(πl) = log(πl), and for OR: g(πl) = log(πl/(1 − πl)), l ∈ {E, R, P }. Chowdhury et al. (2018b) proposed the marginal Frequentist’s approach of NI testing for RR and OR following the existing guideline developed by Pigeot et al. (2003) and Kieser and Friede (2007). The test is based on the assumption of asymptotic normality of the Frequentist test statistic , being the MLE of πl, l ∈ {E, R, P }. However, since the marginal approach does not take into account the AS condition, it leads to biased estimates of power and/or sample size. Chowdhury et al. (2018b) also proposed the conditional approach of NI testing, considering the test statistic based on the restricted maximum likelihood estimate (RMLE), thus incorporating the AS condition in the Frequentist’s statistic itself. This leads to the modified test statistic for NI testing:. Under the asymptotic normality (AN) of W, , where E[W ] = µw and . The test is rejected if W > k*, where k* is obtained by assuming a test of size . We express the sample size in the reference (nR) and the experimental (nE) arms as the ratio r1 and r2 respectively of the sample size nP = n, say, in the placebo arm such that nP : nR : nE = 1 : nR/nP : nE/nP = 1 : r1 : r2. Here, r1 and r2 are known positive quantities that determine the allocation ratio of the sample sizes in the arms R and E, respectively, relative to the arm P. Denoting nP by n, the total sample size, thus, would be N = n (1 + r1 + r2). The sample size nP = n (for the arm P ) is calculated from the following equation to achieve a power of at least 100(1 − β)% (Chowdhury et al., 2018b):, where and are the mean and variance of W under the alternative hypothesis in (2.6).
3. Proposed Bayesian Approaches for Non-Inferiority
In an NI trial with three-arms, if the condition of AS is satisfied, it assures that the NI study is being carried out under more or less the same circumstance as that of the former studies, in which efficacy of reference drug was tested. Gamalo et al. (2011) developed Bayesian procedures for NI testing in a two-arm trial for RD that allows the incorporation of the historical data on the active control via the use of informative priors. Chowdhury et al. (2017) developed NI testing in a three-arm trial for RD under Bayesian paradigm considering fully Bayesian and approximation-based Bayesian approaches, incorporating the AS condition. In this Section, we propose a fully Bayesian and two approximation-based Bayesian test procedures. However, we emphasize the fact that, for achieving better accuracy/estimate of the sample size one should consider fully Bayesian approach almost always, although it comes at a cost of extensive computation. Note that, parallel to the conditional Frequentist approach as discussed in Section 2, we incorporate the AS condition (πR − πP > 0) in our Bayesian approaches as well.
3.1. Fully Bayesian Approach
We consider the conjugate prior, where the AS condition (πR − πP > 0) or equivalently g(πR)−g(πP ) > 0 is incorporated explicitly parallel to the proposed conditional Frequentist’s approach in Chowdhury et al. (2018b). In the conjugate prior setting, we assume a Beta distribution as prior to the proportion of successes in each arm of the trial, that is, πl ∼ Beta (αl, βl), l ∈ {E, R, P }, where αl, βl are fixed hyper-parameters which can take any value on the positive real line ℝ+. After incorporating the assumption of AS, the joint prior distribution in the three-arms becomes:
where f (πl|αl, βl) is the density of the standard Beta distribution given by . Under the well known Beta-Binomial conjugacy, posterior distribution of πl, l ∈ {E, R, P }, is also a Beta distribution. Their joint posterior distribution is given by,
0 < πl < 1. The Markov Chain Monte Carlo (MCMC) samplers can be easily generated from the joint posterior, satisfying πR > πP. The choice of hyper-parameters (αl, βl, l ∈ {E, R, P }) are driven by the precision of available prior information. For example, in the absence of prior information, a flat prior may be assigned by choosing hyper-parameter values resulting in large variance. The mean (µ), mode (µ0) and variance (σ2) of Beta (α, β) are given as, , , . The informative prior can be obtained by equating the mean (or mode) with the parameter πl, l ∈ {E, R, P } and making the variance rather smaller.
In determining NI of the experimental drug relative to the active control, the investigator picks a value of θ ∈ [0.5, 1) that represents clinically acceptable effect size relative to the placebo. The Bayesian decision rule to claim that the experimental drug is non-inferior to the reference is given by
(3.1) |
where p* is some clinically meaningful pre-specified cut-off, X = {XE, XR, XP } denotes the relevant data. The above probability can be calculated empirically by generating samplers from the posterior distribution of (πl|Xl), and then calculating the functional g(.), l ∈ {E, R, P }, where the forms of g(.) for RR and OR are defined earlier. The estimated probability is given by
(3.2) |
where , and denote the tth MCMC sample drawn from the appropriate posterior distribution satisfying , with sufficiently large T, and is the function g(.) evaluated at the sample value , l ∈ {E, R, P }, t = 1, …, T.
3.2. Normal Approximation to the Prior
We propose an approximation-based Bayesian approach for NI testing incorporating the AS condition, that gives closed form of the posterior probability and hence saves the computation time for the MCMC sample generation from the posterior distribution. In this Section, we derive the test procedure and sample size calculation formula for both RR and OR.
Under the Binomial responses, the marginal Frequentist’s test statistic is used for testing the hypothesis expressed in the general form in (2.6). Under asymptotic normality, we have , where µT = g(πE) − θg(πR) − (1 − θ)g(πP ) and , with for RR, and for OR; l ∈ {E, R, P }. Under the conjugate Beta prior, de-note the mean and variance of πl by and , l ∈ {E, R, P }. The mean can be approximated up to second order term of the Taylor series expansion; and the variance for g(πl ) can be approximated by the delta−method as at πl = E(πl), l ∈ {E, R, P }. For RR, and while for OR, and . Putting Normal prior on µT we have µT ∼ AN (µ*, σ*2), where , and . Next we bring in the condition of assay sensitivity, (g(πR) − g(πP ) > 0). So instead of taking prior on µT, we take prior on νT ≡ µT | (g(πR) − g(πP ) > 0). Assume that . Then, the posterior distribution of νT becomes . We refer to Arnold and Beaver (1993) for the detailed derivation of , .
Lemma 3.2.1 Under conditional normal approximation, the mean and variance of νT = g(πE) − θg(πR) − (1 − θ)g(πP)| (g(πR) − g(πP) > 0) are given by
(3.3) |
where , , , , , and , and are functions of the mean and variance of Beta (αl, βl), l ∈ {E, R, P }.
Proof: See A.1 in the supplementary material.
The Bayesian decision rule for claiming the experimental treatment to be non-inferior is parallel to Gamalo et al. (2014) and is given by: P (νT ≥ 0|XE, XR, XP ) > p*, where p* is a pre-specified constant usually chosen to be 0.975 or 0.95.
3.3. Normal Approximation to the Posterior
As in the previous approximation approach, we consider similar setting: Xl ∼ Bin(nl, πl) and πl ∼ Beta(αl, βl), l ∈ {E, R, P }. Instead of approximating the Frequentist’s statistic T and the prior by Normal distribution, approximation can be also made to the posterior distribution of g(πl)|Xl, l ∈ {E, R, P }. Our posterior approximation approach follows the lines of the posterior normal approximation proposed by Gamalo et al. (2011) for binary responses, albeit for two arm NI testing. Bringing in the condition of assay sensitivity, we derive the posterior mean and variance of . The posterior mean and variance of πl|Xl are respectively given by: and . Hence, the posterior mean of g(πl)|Xl can be approximated similar to that has been done in the prior approximation approach as and the posterior variance as at . Specifically for RR, and while for OR, and . Thus, under asymptotic normality we have . We again refer to Arnold and Beaver (1993) for the derivation of and .
Lemma 3.3.1 Under conditional normal approximation, the mean and variance of are given by
(3.4) |
where , , , , , and , and are functions of the mean and variances of πl|Xl ∼ Beta (αl + Xl, βl + nl − Xl), l ∈ {E, R, P }.
Proof: See A.2 in the supplementary material.
The Bayesian decision rule to claim that the experimental treatment is non-inferior to the active control in the current trial is given by .
3.4. Sample Size under Proposed Approaches
3.4.1. Fully Bayesian
We obtain the power of the test by repeating the calculation of the estimated probability in (3.2), n* times (usually 1000 or 5000), and then finding the proportion of times NI of the test drug is claimed out of n*. The power function is obtained by varying the value of πE for given values of πR and πP such that belongs to a suitable range under the alternative, H1. The ratio is so chosen that for NI testing under H0 it equals θ ∈ [0.5, 1) and exceeds θ under H1. Denote πE by under H1, and the estimated power of the test can be calculated as
Considering the allocation ratio as discussed in Section 2, the minimum sample size n of the arm P can be obtained by setting the power at 100(1 − β)%. The sample size in the other two arms can be obtained from the respective allocation ratios. We note that, since the power and sample size calculations involve generating samples from posterior distribution, there could be possibly some sampling fluctuation.
3.4.2. Normal Approximation to the Prior
The sample size n of the arm P under Bayesian prior approximation approach can be calculated by satisfying the following two conditions (see also Gamalo et al. (2014))
where the probability in (C1) is the estimated Bayesian version of type-I error while that in (C2) is the estimated power of the test, β being the type-II error. The sample size n is determined from (C2) by fixing β to achieve at least 100 (1 − β) % power and simultaneously satisfying (C1). As in the Frequentist’s approach, we choose α = 0.025. We note that
where is the 100 (1 − p*) % of the N (0, 1) distribution. Now the power function is obtained by varying πE, keeping the other parameters πR, πP, and θ fixed. Let us denote µT and by and , respectively under H0, and similarly under H1 denote the respective quantities by and . Thus, condition (C1) can be rewritten in terms of T as
(3.5) |
Similarly, condition (C2) becomes
(3.6) |
Now n can be solved from equation (3.6) by setting β = 20% and simultaneously satisfying (3.5). We vary (which is included in ) to get minimum sample size satisfying at least 80% power for each . The sample size for the arms R and P can be obtained considering the allocation ratios r1 and r2 as discussed earlier.
3.4.3. Normal Approximation to the Posterior
Under the Bayesian posterior approximation approach, we note that involves the data (XE, XR, XP ) since the mean and variance of are functions of the posterior distribution of πl|Xl, l ∈ {E, R, P }. The power of the test can be calculated as the proportion of times exceeds p*, out of n*. The power function is obtained by varying the value of πE for given values of the other parameters. The estimated power of the test is obtained at satisfying under H1, and can be calculated as . The sample size of the respective treatment arms can be obtained by setting the power to be at least 100(1 − β)% and from the allocation ratios as discussed in the Section 2.
4. Simulation Studies and Sample Size Determination
In this Section, we first evaluate the performance of the Bayesian procedures and then focus on the sample size calculation for testing NI at a desired power. We consider both balanced and unbalanced allocation of the total sample size in the three arms.
4.1. Simulation Set-up
The following simulation steps are conducted to calculate the type-I error and power of the tests, under the conditional Frequentist, fully Bayesian as well as two Bayesian approximation approaches. Under the Bayesian approaches, we assume a non-informative prior for πl, l ∈ {E, R, P } for both RR and OR. Considering a randomized trial we generate the power curves assuming equal allocation to the three arms; that is, nE = nR = nP = n. In the following we give the simulation steps for the fully Bayesian approach. The steps for the approximation-based approaches are similar.
-
S1:
Specify nE = nR = nP = n, πl, l ∈ {E, R, P } with πP < πR, and θ and vary πE such that belongs to a suitable range, to generate XE, XR and XP.
-
S2:
For a given value of or equivalently πE, generate the data Xl from Binomial distribution Bin (nl, πl), l ∈ {E, R, P }.
-
S3:
Generate T many MCMC samples from the posterior distribution given in Section 3 satisfying the condition πR > πP or equivalently g(πR) > g(πP ).
-
S4:
For the tth posterior sample calculate the ratio and calculate the posterior probability: .
-
S5:
Bayesian criterion: Initialize COUNTS to 0; if , increase COUNTS by 1; otherwise 0.
-
S6:Go back to Step 2 and repeat the simulation n* (a large number chosen apriori) times:
- Calculate type-I error by dividing COUNTS by n* for πE satisfying .
- Calculate power by dividing COUNTS by n* for πE satisfying .
-
S7:
The power curve is generated for a sequence of values of πE, such that belongs to a suitable wide enough range.
Note that for the conditional Frequentist and Bayesian approximation approaches Step 3 is not needed and Step 4 and Step 5 need to be replaced by the corresponding decision criterion as given in Section 2, 3.1, 3.2, and 3.3 respectively.
4.2. Sample Size in Simulation
Before generating the sample size tables we generate the power curves under fully Bayesian approach to get an idea about the operating characteristics of the proposed methods. For the approximation-based approaches the power curves can be generated similarly. For the conjugate Beta prior, since the posterior is available in closed form, we chose the number of posterior samplers T to be 1000. Throughout the simulation study, we consider the following specification of the parameters: πR = 0.7, πP = 0.1 and we set πE such that gives a wide enough range for exploration purpose for both RR and OR. We chose n* to be 5000. We take the value of the common sample size nE = nR = nP = n to be 100. Unequal allocation is also possible and is presented in the sample size tables (Table 1 and Table 2). Another important criteria is the choice of p* which we fix at 0.975. However, as reported in Gamalo et al. (2012) this choice could give too restrictive type-I error in Bayesian context and Bayesian calibration can be performed to alleviate this; however, this is not pursued here.
Table 1:
Sample size for RR under Frequentist, fully Bayesian and Bayesian Approximation approaches to achieve a power of 80% for θ = 0.8 and 0.7, α = 0.025 and πE ϵ[0.65, 0.9] under three different allocations
Allocation | θ | πE | Frequentist | Fully Bayesian | Prior Approx | Posterior Approx. | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | E | nP | N | nP | N | nP | N | nP | N | ||
1 | 1 | 1 | 0.8 | 0.9 | 27 | 81 | 23 | 69 | 22 | 66 | 22 | 66 |
0.85 | 33 | 99 | 28 | 84 | 28 | 84 | 28 | 84 | ||||
0.8 | 42 | 126 | 37 | 111 | 36 | 108 | 36 | 108 | ||||
0.75 | 56 | 168 | 51 | 153 | 49 | 147 | 51 | 153 | ||||
0.7 | 79 | 237 | 71 | 213 | 73 | 219 | 73 | 219 | ||||
0.65 | 124 | 372 | 118 | 354 | 118 | 354 | 119 | 357 | ||||
0.7 | 0.9 | 24 | 72 | 17 | 51 | 18 | 54 | 17 | 51 | |||
0.85 | 28 | 84 | 20 | 60 | 21 | 63 | 20 | 60 | ||||
0.8 | 33 | 99 | 25 | 75 | 26 | 78 | 26 | 78 | ||||
0.75 | 40 | 120 | 33 | 99 | 34 | 102 | 34 | 102 | ||||
0.7 | 51 | 153 | 45 | 135 | 45 | 135 | 43 | 129 | ||||
0.65 | 68 | 204 | 60 | 180 | 61 | 183 | 62 | 186 | ||||
1 | 2 | 2 | 0.8 | 0.9 | 17 | 85 | 13 | 65 | 14 | 70 | 13 | 65 |
0.85 | 21 | 105 | 17 | 85 | 19 | 95 | 18 | 90 | ||||
0.8 | 27 | 135 | 26 | 130 | 24 | 120 | 25 | 125 | ||||
0.75 | 35 | 175 | 33 | 165 | 32 | 160 | 32 | 160 | ||||
0.7 | 49 | 245 | 46 | 230 | 46 | 230 | 48 | 240 | ||||
0.65 | 76 | 380 | 76 | 380 | 74 | 370 | 76 | 380 | ||||
0.7 | 0.9 | 17 | 85 | 11 | 55 | 15 | 75 | 12 | 60 | |||
0.85 | 19 | 95 | 14 | 70 | 17 | 85 | 15 | 75 | ||||
0.8 | 23 | 115 | 19 | 95 | 22 | 110 | 19 | 95 | ||||
0.75 | 28 | 140 | 24 | 120 | 26 | 130 | 23 | 115 | ||||
0.7 | 35 | 175 | 33 | 165 | 35 | 175 | 32 | 160 | ||||
0.65 | 47 | 235 | 43 | 215 | 45 | 225 | 46 | 230 | ||||
1 | 2 | 3 | 0.8 | 0.9 | 15 | 90 | 10 | 60 | 13 | 78 | 11 | 66 |
0.85 | 18 | 108 | 13 | 78 | 16 | 96 | 14 | 84 | ||||
0.8 | 23 | 138 | 19 | 114 | 22 | 132 | 20 | 120 | ||||
0.75 | 30 | 180 | 24 | 144 | 27 | 162 | 26 | 156 | ||||
0.7 | 42 | 252 | 37 | 222 | 39 | 234 | 40 | 240 | ||||
0.65 | 64 | 384 | 62 | 372 | 64 | 384 | 64 | 384 | ||||
0.7 | 0.9 | 14 | 84 | 8 | 48 | 12 | 72 | 9 | 54 | |||
0.85 | 17 | 102 | 9 | 54 | 15 | 90 | 12 | 72 | ||||
0.8 | 20 | 120 | 14 | 84 | 19 | 114 | 15 | 90 | ||||
0.75 | 24 | 144 | 17 | 102 | 22 | 132 | 18 | 108 | ||||
0.7 | 31 | 186 | 25 | 150 | 28 | 168 | 25 | 150 | ||||
0.65 | 41 | 246 | 33 | 198 | 37 | 222 | 35 | 210 |
Table 2:
Sample size for OR under Frequentist, fully Bayesian and Bayesian Approximation approaches to achieve a power of 80% for θ = 0.8 and 0.7, α = 0.025 and πE ϵ[0.65, 0.9] under three different allocations
Allocation | θ | πE | Freq | Fully Bayesian | Prior Approx. | Posterior Approx. | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | E | nP | N | nP | N | nP | N | nP | N | ||
1 |
1 |
1 |
0.9 | 20 | 60 | 20 | 60 | 20 | 60 | 20 | 60 | |
0.85 | 31 | 93 | 25 | 75 | 27 | 81 | 27 | 81 | ||||
0.8 | 0.8 | 49 | 147 | 45 | 135 | 46 | 138 | 47 | 141 | |||
0.75 | 85 | 255 | 82 | 246 | 82 | 246 | 83 | 249 | ||||
0.7 | 165 | 495 | 160 | 480 | 157 | 471 | 158 | 474 | ||||
0.65 | 415 | 1245 | 404 | 1212 | 406 | 1218 | 406 | 1218 | ||||
0.9 | 15 | 45 | 14 | 42 | 14 | 42 | 14 | 42 | ||||
0.85 | 21 | 63 | 15 | 45 | 17 | 51 | 18 | 54 | ||||
0.7 | 0.8 | 30 | 90 | 26 | 78 | 28 | 84 | 28 | 84 | |||
0.75 | 45 | 135 | 39 | 117 | 40 | 120 | 42 | 126 | ||||
0.7 | 72 | 216 | 65 | 195 | 62 | 186 | 63 | 189 | ||||
0.65 | 125 | 375 | 116 | 348 | 114 | 342 | 116 | 348 | ||||
1 |
2 |
2 |
0.9 | 11 | 55 | 11 | 55 | 11 | 55 | 11 | 55 | |
0.85 | 16 | 80 | 15 | 75 | 15 | 75 | 16 | 80 | ||||
0.8 | 0.8 | 26 | 130 | 24 | 120 | 26 | 130 | 26 | 130 | |||
0.75 | 45 | 225 | 40 | 200 | 43 | 215 | 43 | 215 | ||||
0.7 | 87 | 435 | 82 | 410 | 83 | 415 | 83 | 415 | ||||
0.65 | 220 | 1100 | 212 | 1060 | 211 | 1055 | 212 | 1060 | ||||
0.9 | 8 | 40 | 8 | 40 | 8 | 40 | 8 | 40 | ||||
0.85 | 12 | 60 | 9 | 45 | 11 | 55 | 11 | 55 | ||||
0.7 | 0.8 | 17 | 85 | 16 | 80 | 16 | 80 | 17 | 85 | |||
0.75 | 25 | 125 | 21 | 105 | 22 | 110 | 23 | 115 | ||||
0.7 | 41 | 205 | 36 | 180 | 35 | 175 | 36 | 180 | ||||
0.65 | 71 | 355 | 62 | 310 | 63 | 315 | 64 | 320 | ||||
1 |
2 |
3 |
0.9 | 9 | 54 | 8 | 48 | 8 | 48 | 8 | 48 | |
0.85 | 13 | 78 | 12 | 72 | 12 | 72 | 12 | 72 | ||||
0.8 | 0.8 | 22 | 126 | 19 | 114 | 21 | 126 | 20 | 120 | |||
0.75 | 37 | 222 | 33 | 198 | 35 | 210 | 36 | 216 | ||||
0.7 | 72 | 432 | 70 | 420 | 68 | 408 | 70 | 420 | ||||
0.65 | 182 | 1092 | 177 | 1062 | 175 | 1050 | 177 | 1062 | ||||
0.9 | 7 | 42 | 6 | 36 | 6 | 36 | 6 | 36 | ||||
0.85 | 10 | 60 | 9 | 54 | 9 | 54 | 9 | 54 | ||||
0.7 | 0.8 | 14 | 84 | 11 | 66 | 13 | 78 | 13 | 78 | |||
0.75 | 22 | 126 | 17 | 102 | 18 | 108 | 19 | 114 | ||||
0.7 | 34 | 204 | 30 | 180 | 29 | 174 | 30 | 180 | ||||
0.65 | 60 | 360 | 54 | 324 | 52 | 312 | 53 | 318 |
In Figure 1, we plot the power curves corresponding to three different values of θ : 0.8, 0.7 and 0.6 under conjugate prior for each treatment arm for both RR and OR under fully Bayesian method. For both RR and OR we consider Beta(1, 1) in each arm as the non-informative prior. The three values of θ correspond to f = −0.2, −0.3 and −0.4 respectively, which correspond to the three choices of the NI margin. This implies that the effect of the experimental drug, must be at least 80%, 70% and 60% respectively of the effect of the active control in order to be non-inferior. From Figure 1, we observe that as θ decreases, the power curve becomes steeper which means for smaller θ the proposed test is more powerful. This is so because for smaller θ it is easier to declare non-inferiority of the experimental drug over the reference.
Figure 1:
Power curves for different θ for RR test in (a) and OR test in (b).
We refer to the Sections 3.4.1, 3.4.2, and 3.4.3 for the sample size determination under fully Bayesian and two approximation-based Bayesian approaches respectively. We compare the sample size under Bayesian approaches with that obtained under conditional Frequentist approach (Chowdhury et al., 2018b) as briefly described in Section 2. As discussed in Section 2, sample sizes in the placebo, reference, and the experimental arms are denoted by n, r1n and r2n respectively, with r1, r2 ≥ 1. To compute (nE, nR, nP ), we consider three possible allocations for (E, R, P ): (1 : 1 : 1), (2 : 2 : 1) and (3 : 2 : 1) of the total sample size N. The power expression of both the Frequentist’s and the Bayesian normal approximations do not give an explicit solution for n and hence an iterative process is needed. For Frequentist’s approach we keep α = 0.025, and for Bayesian exact as well as for Bayesian approximation approaches, the sample sizes satisfying power ≥ 1 − β also yield estimated type-I error of at most α = 0.025. The sample sizes are presented for θ = 0.8 and 0.7, and for a range of πE keeping πR = 0.7 and πP = 0.1, for RR in Table 1 and OR in Table 2. From both the tables we observe that the sample sizes under Bayesian approaches are always smaller than or at most equal to that under Frequentist approach implying the effective gain in sample size using the former.
We present the sample sizes for the placebo arm, and those for the arms R and E can be obtained from the allocation ratios. The total sample size for (1 : 1 : 1) is ; that for (1 : 2 : 2) is ; while for (1 : 2 : 3) it is , where , and are the respective sample size for the placebo arm under the three allocations. From both the tables we observe that the sample size decreases with decrease in θ for a fixed power, which is consistent to the power curve plots that show overpowering the trial with decrease in θ or equivalently increase in the margin. Also this change in sample size with varying θ is robust against the sample size allocations. Although appealing at first glance, one may not want to use a balanced study design from two aspects: (i) due to ethical reasons in case an effective treatment exists, the number of patients receiving the placebo should be kept as small as possible, and (ii) as pointed out by Koch and Tangen (1999), the difference between E and R should be expected to be much smaller than the difference of both of them relative to placebo so that the latter ones are easier to detect. As observed by Pigeot et al. (2003) for continuous outcome, the necessary sample size required for the unbalanced allocations is remarkably smaller compared to the balanced one. From Table 2 for OR we notice that the necessary sample size is remarkably smaller for the unbalanced allocation (2 : 2 : 1) as compared to a balanced design (1 : 1 : 1) and a minor reduction is again obtained for the unbalanced allocation (3 : 2 : 1) as compared to (2 : 2 : 1). However, for RR the sample sizes do not follow the same pattern as OR with respect to the allocation, as can be seen from Table 1. This might be due to the fact that unlike OR, the logarithm transformation of RR yields a skewed distribution and do not conform well to the normal approximation.
5. Application
We illustrate our proposed Bayesian methods for both RR and OR with a published dataset, described in Higuchi et al. (2009), from a three-arm comparative study on major depressive disorder. We also compare the analysis results with that of the conditional Frequentist method from Chowdhury et al. (2018b). Hida and Tango (2011) as well as Ghosh et al. (2015) also considered this specific dataset in their paper. Hida and Tango (2011) proposed a Frequentist’s version of the problem for continuous outcomes while Ghosh et al. (2015) considered the Bayesian version. The objective of the depression trial was to compare the efficacy and safety of duloxetine (E) with those of paroxetine (R) and placebo (P ). This study was a double-blinded, randomized, parallel-group active-controlled study of a six-week treatment with the following number of patients in each arm: duloxetine (nE = 147), paroxetine (nR = 148) and placebo (nP = 145). The primary endpoint was continuous type which is the change in HAMD-17 total score from baseline at the end of sixth week. Hida and Tango (2011) considered two binary outcomes for their Frequentist approach namely,Response and Remission. Response is the primary outcome defined as the reduction of more than 50% total. Remission is the secondary outcome which is defined as maintaining HAMD-17 score of ≤ 17 at the end of 6th week. We present the basic data in Table 3, in terms of Response and Remission (see Hida and Tango (2011)). We analyze both the Response and Remission outcomes separately using the conditional Frequentist and fully Bayesian approaches. To make a meaningful interpretation of the effect of the experimental drug, a clinically acceptable margin reflecting the largest loss of effect is chosen to determine non-inferiority of the experimental drug over the control. Here, we vary θ in the range [0.5, 0.8] to explore different possibilities. We use p* = 0.975 to determine NI of duloxetine over paroxetine. For the Frequentist approach we calculate the p−value of the test for both RR and OR as , where Wobs is the conditional Frequentist’s test statistic and and are the mean and variance of W under H0 (Chowdhury et al., 2018b). Now for the analysis under fully Bayesian approach, we assume Beta (1, 1) as the non-informative prior for πl, l ∈ {E, R, P } for both RR and OR, and generate the posterior samplers from Beta distributions as given in Step 3 of simulation. We calculate the Bayesian posterior probability as given in Step 4 of simulation. The Frequentist p−values and the Bayesian posterior probabilities P (H1|Data) are reported in Table 4 and Table 5 for RR and OR respectively for both the Response and the Remission data.
Table 3:
Remission and Response as Outcome in the Depression Trial of Higuchi et al. (2009)
Outcome | Duloxetine | Paroxetine | Placebo |
---|---|---|---|
Remission | 50 | 49 | 32 |
Response | 80 | 78 | 56 |
Total | nE = 147 | nR = 148 | nP = 145 |
Table 4:
Frequentist p–values and Bayesian posterior probabilities, and rejection decision for Risk Ratio
Response | Remission | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
θ | Freq | Dec | Bayes Non-inform | Dec | Bayes Inform | Dec | Freq | Dec | Bayes Non-inform | Dec | Bayes Inform | Dec |
0.5 | 0.0514 | 0 | 0.971 | 0 | 0.983 | 1 | 0.1055 | 0 | 0.917 | 0 | 0.979 | 1 |
0.55 | 0.0650 | 0 | 0.953 | 0 | 0.977 | 1 | 0.1249 | 0 | 0.898 | 0 | 0.971 | 0 |
0.6 | 0.0826 | 0 | 0.938 | 0 | 0.968 | 0 | 0.1483 | 0 | 0.874 | 0 | 0.962 | 0 |
0.65 | 0.1052 | 0 | 0.922 | 0 | 0.946 | 0 | 0.1760 | 0 | 0.858 | 0 | 0.942 | 0 |
0.7 | 0.1335 | 0 | 0.893 | 0 | 0.932 | 0 | 0.2082 | 0 | 0.826 | 0 | 0.91 | 0 |
0.75 | 0.1681 | 0 | 0.853 | 0 | 0.908 | 0 | 0.2450 | 0 | 0.786 | 0 | 0.879 | 0 |
0.8 | 0.2094 | 0 | 0.814 | 0 | 0.86 | 0 | 0.2861 | 0 | 0.738 | 0 | 0.821 | 0 |
Table 5:
Frequentist p–values and Bayesian posterior probabilities, and rejection decision for Odds Ratio
Response | Remission | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
θ | Freq | Dec | Bayes Non-inform | Dec | Bayes Inform | Dec | Freq | Dec | Bayes Non-inform | Dec | Bayes Inform | Dec |
0.5 | 0.0462 | 0 | 0.97 | 0 | 0.985 | 1 | 0.1025 | 0 | 0.909 | 0 | 0.979 | 1 |
0.55 | 0.0622 | 0 | 0.95 | 0 | 0.979 | 1 | 0.1239 | 0 | 0.884 | 0 | 0.974 | 0 |
0.6 | 0.0829 | 0 | 0.926 | 0 | 0.96 | 0 | 0.1496 | 0 | 0.857 | 0 | 0.96 | 0 |
0.65 | 0.1091 | 0 | 0.901 | 0 | 0.942 | 0 | 0.1795 | 0 | 0.826 | 0 | 0.941 | 0 |
0.7 | 0.1410 | 0 | 0.868 | 0 | 0.922 | 0 | 0.2138 | 0 | 0.791 | 0 | 0.913 | 0 |
0.75 | 0.1788 | 0 | 0.831 | 0 | 0.885 | 0 | 0.2520 | 0 | 0.758 | 0 | 0.88 | 0 |
0.8 | 0.2218 | 0 | 0.777 | 0 | 0.757 | 0 | 0.2935 | 0 | 0.716 | 0 | 0.847 | 0 |
The Frequentist’s p−values are compared with α = 0.025 while the Bayesian posterior probabilities are compared with p* to deduce the respective decision. The decision (claim NI or not) is reported for both the Frequentist and Bayesian approaches. Here “1” stands for rejection of H0 implying NI is claimed while “0” represents the acceptance of H0. From both Table 4 and Table 5 we observe that the posterior probabilities increase as θ decreases implying greater chance of declaring NI for smaller values of θ, which is consistent with the simulation results. Similar is the conclusion for the Frequentist’s p−values which increase as θ increases, since p − value < α implies rejection of NI. From both the tables we observe that, under the non-informative prior, the posterior probabilities for both the Response and Remission data are less than the pre-specified cut-off p* = 0.975 and hence non-inferiority of E relative to R cannot be claimed. However, when we choose an appropriate informative Beta prior, NI is established for smaller values of θ. For both RR and OR we choose the informative Beta priors for the three arms as E : Beta(40, 34.22), R : Beta(40, 35.584) and P : Beta(40, 64.63) under which NI is established for θ ≤ 0.55 for the Response data. For the Remission data we assume the following informative priors: E : Beta(40, 76.7), R : Beta(40, 80.19) and P : Beta(40, 139.27) and that establishes NI for θ = 0.5. Note that the informative priors are so chosen that the mode of the specified Beta distributions equal the parameter estimates from the data:, and for the Response data and , and for the Remission data. Also, note that other than equating the mode of Beta distribution with the parameter estimates, the parameter values are arbitrarily chosen for illustration purpose, taking into account the fact that the variances of these informative Beta distributions are smaller than that of the non-informative Beta(1, 1) distribution.
6. Discussion
In this paper, we have presented new Bayesian test procedures for the “gold standard” three-arm NI trial, which includes a placebo arm for binary endpoints, considering RR and OR as the functionals of interest. We believe that this is an important methodological contribution as NI can be claimed based on these functionals according to the recent FDA guideline. In the fully Bayesian approach, we explored standard conjugate prior and carried out the simulation considering non-informative priors. We also proposed two approximation-based Bayesian approaches which avoid substantial part of the computation of the fully Bayesian approach. We tabulated the sample size in Tables 1 and 2 under three different types of allocation for both RR and OR. This should provide a good starting point for accessing sample size and allocation requirement when designing such trials. We have seen that even with non-informative prior, fully Bayesian approach as well as Bayesian approximation-based approaches yield smaller sample size as compared to Frequentist’s approach for a desired power of 80%. The sample size gain is substantial for informative prior choices. Also analysis of our real clinical trial data suggests that the Bayesian methods perform favorably in all situations. Note that, the historical information plays substantial role in the design and analysis of NI trial. Hence NI trial has to be reflected in several substantive aspects including the choice of δ, the question of whether a placebo can be included as an additional arm of the study, assay sensitivity, to give a few examples among others.
We also noted that under the fraction margin approach the fraction “f “ is pre-specified, while the NI margin δ is unknown. Hence, the value of δ can vary greatly depending on the estimated effect size of the reference treatment, i.e. as a function of . On the other hand, in the fixed margin approach (see, Hida and Tango (2011) and Ghosh et al. (2015)), with the three-arms, the joint testing of NI and AS may be performed which needs additional care since it may produce conservative decision and may yield restrictive type-I error (Chuang-Stein et al., 2007; Dmitrienko et al., 2009). Albeit, development of such procedure for RR and OR under alternative definition of type-I error (e.g., average testing error of Chuang-Stein et al. (2007)) and under Bayesian calibration (Ghosh et al., 2015) is an interesting future problem.
Supplementary Material
7. Acknowledgements
The research of last author is partly supported by PCORI contract number ME-1409-21410 and NIH grant number P30-ES020957. Authors would also like to thank two anonymous referees, AE and the editorial team, whose comments provided additional insights and have greatly improved the scope and presentation of the paper.
References
- (2005). Guideline on the choice of the noninferiority margin (Doc. Ref. EMEA/CPMP/EWP/215) EMA.
- (2016). Non-Inferiority Clinical Trials to Establish Effectiveness Guidance for Industry FDA.
- Althunian TA, de Boer A, Klungel OH, Insani WN, and Groenwold RH (2017). Methods of defining the non-inferiority margin in randomized, double-blind controlled trials: a systematic review. Trials, 18(1):107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arnold BC and Beaver RJ (1993). The nontruncated marginal of a truncated bivarite normal distribuion. Psychometrika, 58(3):471–488. [Google Scholar]
- Brown D, Volkers P, and Day S (2006). An introductory note to chmp guidelines: choice of the non-inferiority margin and data monitoring committees. Statistics in medicine, 25(10):1623–1627. [DOI] [PubMed] [Google Scholar]
- Chowdhury S, Tiwari R, and Ghosh S (2018a). Approaches for testing non-inferiority in two-arm trial for risk ratio and odds ratio. Journal of Biopharmaceutical Statistics, Accepeted for Publication [DOI] [PubMed]
- Chowdhury S, Tiwari RC, and Ghosh S (2017). Non-inferiority testing for three-arm trials with binary outcome: Novel frequentist and bayesian proposals. Under review
- Chowdhury S, Tiwari RC, and Ghosh S (2018b). Non-inferiority testing for risk ratio, odds ratio and number needed to treat in three-arm trial. Computational Statistics & Data Analysis Available online first 15 September 2018. [DOI] [PMC free article] [PubMed]
- Chuang-Stein C, Stryszak P, Dmitrienko A, and Offen W (2007). Challenge of multiple co-primary endpoints: a new approach. Statistics in medicine, 26(6):1181–1192. [DOI] [PubMed] [Google Scholar]
- D’Agostino RB, Massaro JM, and Sullivan LM (2003). Noninferiority trials: Design concepts and issues-the encounters of academic consultants in statistics. Statistics in Medicine, 22(2):169–186. [DOI] [PubMed] [Google Scholar]
- Dmitrienko A, Tamhane AC, and Bretz F (2009). Multiple testing problems in pharmaceutical statistics CRC Press. [Google Scholar]
- Gamalo MA, Tiwari RC, and LaVange LM (2014). Bayesian approach to the design and analysis of non-inferiority trials for anti-infective products. Pharmaceutical Statistics, 13(1):25–40. [DOI] [PubMed] [Google Scholar]
- Gamalo MA, Wu R, and Tiwari RC (2011). Bayesian approach to noninferiority trials for proportions. Journal of biopharmaceutical statistics, 21(5):902–919. [DOI] [PubMed] [Google Scholar]
- Gamalo MA, Wu R, and Tiwari RC (2012). Bayesian approach to non-inferiority trials for normal means. Statistical Methods in Medical Research, 25(1):221–240. [DOI] [PubMed] [Google Scholar]
- Ghosh P, Nathoo F, Gonen M, and Tiwari RC (2011). Assessing noninferiority in a three-arm trial using the bayesian approach. Statistics in Medicine, 30(15):1795–1808. [DOI] [PubMed] [Google Scholar]
- Ghosh S, Ghosh S, and Tiwari RC (2015). Bayesian approach for assessing non-inferiority in a three-arm trial with pre-specified margin. Statistics in medicine, 35(5):695–708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ghosh S, Tiwari RC, and Ghosh S (2018). Bayesian approach for assessing non-inferiority in a three-arm trial with binary endpoint. Pharmaceutical Statistics, 17(4):342–357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hashemi L, Nandram B, and Goldberg R (1997). Bayesian analysis for a single 2 by 2 table. Statistics in Medicine, 16(12):1311–1328. [DOI] [PubMed] [Google Scholar]
- Hida E and Tango T (2011). On the three-arm noninferiority trial including a placebo with a prespecified margin. Statistics in Medicine, 30(3):224–231. [DOI] [PubMed] [Google Scholar]
- Higuchi T, Murasaki M, and Kamijima K (2009). Clinical evaluation of duloxetine in the treatment of major depressive disorder-placebo and paroxetine-controlled double-blinded comparative study. Japaneese Journ of Clinical Psychopharmocology, 12:1613–1634. [Google Scholar]
- Hilton JF (2010). Noninferiority trial designs for odds ratios and risk differences. Statistics in Medicine, 29(9):982–993. [DOI] [PubMed] [Google Scholar]
- Hung HMJ and Wang SJ (2004). Multiple testing of noninferiority hypotheses in active controlled trials. Journal of Biopharmaceutical Statistics, 14(2):327–335. [DOI] [PubMed] [Google Scholar]
- ICHE10 (2009). ICH Harmonised Tripartite Guideline. Choice of Control Group and Related Issues in Clinical Trials. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. [Google Scholar]
- ICHE9 (2009). ICH Harmonised Tripartite Guideline. Statistical Principles for Clinical Trials. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. [Google Scholar]
- Kieser M and Friede T (2007). Planning and analysis of three-arm non-inferiority trials with binary endpoints. Statistics in medicine, 26(2):253–273. [DOI] [PubMed] [Google Scholar]
- Kieser M and Stucke K (2016). Assessing additional benefit in noninferiority trials. Biometrical Journal, 58(1):154–169. [DOI] [PubMed] [Google Scholar]
- Koch GG and Tangen CM (1999). Non parametrc analysis of covaiance and its role in non-inferiority clinical trials. Drug information journal, 33(4):1145–1159. [Google Scholar]
- Pigeot I, Schafer J, Rohmel J, and Hauschke D (2003). Assessing noninferiority of a new treatment in a three-arm clinical trial including a placebo. Statistics in Medicine, 22(6):883–899. [DOI] [PubMed] [Google Scholar]
- Rousson V and Seifert B (2008). A mixed approach for proving non-inferiority in clinical trials with binary endpoints. Biometrical Journal, 50(2):190–204. [DOI] [PubMed] [Google Scholar]
- Schumi J and Wittes JT (2011). Through the looking glass: understanding non-inferiority. Trials, 12(2):106–118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wangge G, Roes KCB, Boer AD, Hoes AW, and Knol MJ (2013). The challenges of determining noninferiority margins: a case study of noninferiority randomized controlled trials of novel oral anticoagulants. Canadian Medical Association Journal, 185(3):222–227. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.