. 2020 Jun 17;21:528. doi: 10.1186/s13063-020-04334-x

Box 13.

Exemplars on reporting item 7b elements

Example 1. 2-arm 2-stage AD with options for early stopping for futility or superiority and to increase the sample size; binding stopping rules

“To calculate the number of patients needed to meet the primary endpoint, we expected a 3-year overall survival rate of 25% in the group assigned to preoperative chemotherapy (arm A) (based on two previous trials [191, 192]). In comparison, an increase of 10% (up to 35%) was anticipated by preoperative CRT. Using the log-rank test (one-sided at this point) at a significance level of 5%, we calculated to include 197 patients per group to ensure a power of 80%. In the first stage of the planned two-stage adaptive design [193], the study was planned to be continued on the basis of a new calculation of patients needed if the comparison of patient groups will be 0.0233 < p₁ < 0.5. Otherwise, the study may be closed for superiority (p₁ < 0.0233) or shall be closed for futility (p₁ ≥ 0.5). There was no maximum sample size cap and stopping rules were binding.” [194] Values p₁ and p₂ are p-values derived from independent stage 1 and stage 2 data, respectively. Evidence of benefit will be claimed if the overall two-stage p-value derived from p₁ and p₂ is ≤0.05.

Example 2. Timing and frequency of interim analyses; planned stopping boundaries for superiority and futility. See Table 4

Example 3. Planned timing and frequency of interim analyses; pre-specified dose selection rules for an inferentially seamless phase 2/3 (7-arm 2-stage) AD

“The interim analysis was pre-planned for when at least 110 patients per group (770 total) had completed at least 2 weeks of treatment. The dose selection guidelines were based on efficacy and safety. The mean effect of each indacaterol dose versus placebo was judged against pre-set efficacy reference criteria for trough FEV₁ and FEV₁AUC_1–4h. For trough FEV₁, the reference efficacy criterion was the highest value of: (a) the difference between tiotropium and placebo, (b) the difference between formoterol and placebo, or (c) 120 mL (regarded as the minimum clinically important difference). For standardized FEV₁AUC_1–4h, the reference efficacy criterion was the highest value of: (a) the difference between tiotropium and placebo or (b) the difference between formoterol and placebo. If more than one indacaterol dose exceeded both the efficacy criteria, the lowest effective dose plus the next higher dose were to be selected. Data on peak FEV₁, % change in FEV₁, and FVC were also supplied to the DMC for possible consideration, but these measures were not part of the formal dose selection process and are not presented here. The DMC also took into consideration any safety signals observed in any treatment arm.” [141]

Example 4. Timing and frequency of interim analyses; decision-making criteria for population enrichment and sample size increase

“Cohort 1 will enrol a total of 120 patients and followed them until 60 PFS events are obtained. At an interim analysis based on the first 40 PFS events, an independent data monitoring committee will compare the conditional power for the full population (CP_F) and the conditional power for the cutaneous subpopulation (CP_S). The formulae for these conditional powers are given in the supplementary appendix (part of item 3b, example 2, Box 8). (a) If CP_F < 0.3 and CP_S < 0.5, the results are in the unfavourable zone; the trial will enrol 70 patients to cohort 2 and follow them until 35 PFS events are obtained (then test effect in the full population). (b) If CP_F < 0.3 and CP_S > 0.5, the results are in the enrichment zone; the trial will enrol 160 patients with cutaneous disease (subpopulation) to cohort 2 and follow them until 110 PFS events have been obtained from the combined patients in both cohorts with cutaneous disease only (then test effect only in the cutaneous subpopulation). (c) If 0.3 ≤ CP_F ≤ 0.95, the results are in the promising zone (so increase sample size); the trial will enrol 220 patients (full population) to cohort 2 and follow them up until 110 PFS events are obtained (then test effect in the full population). (d) If CP_F > 0.95, the results are in the favourable zone; the trial will enrol 70 patients to cohort 2 and follow them until 35 PFS events are obtained (then test effect in full population).” [95] See Fig. 2 of Mehta et al. [95] for a decision-making tree.

Example 5. Bayesian GSD with futility early stopping; frequency and timing of interim analyses; adaptation decision-making criteria; criteria for claiming treatment benefit

“We adopted a group-sequential Bayesian design [182] with three stages, of 40 patients each (in total), and two interim analyses after 40 and 80 randomised participants, and a final analysis after a maximum of 120 randomised participants. We decided that the trial should be stopped early if there is a high (posterior) probability (90% or greater) (item 3b details) that the 90-day survival odds ratio (OR) falls below 1 (i.e. REBOA is harmful) at the first or second interim analysis. REBOA will be declared “successful” if the probability that the 90-day survival OR exceeds 1 at the final analysis is 95% or greater.” [196]