Abstract
Background
Incomplete data analysis continues to be a major issue for non-inferiority clinical trials. Due to the steadily increasing use of non-inferiority study design, we believe this topic deserves an immediate attention.
Methods
We evaluated the performance of various strategies, including complete case analysis and various imputations techniques for handling incomplete non-inferiority clinical trials when outcome of interest is difference between binomial proportions. Non-inferiority of a new treatment was determined using a fixed margin approach with 95-95% confidence interval method. The methods used to construct the confidence intervals were compared as well and included: Wald, Farrington-Manning and Newcombe methods.
Results
We found that worst-case and best-case scenario imputation methods should not be used for analysis of incomplete data in non-inferiority trial design, since such methods seriously inflate type-I error rates and produce biased estimates. In addition, we report conditions under which complete case analysis is an acceptable strategy for missing at random missingness mechanism. Importantly, we show how two-stage multiple imputation could be successfully applied for incomplete data that follow missing not at random patterns, and thus result in controlled type-I error rates and unbiased estimates.
Conclusion
This thorough simulation study provides a road map for the analysis of incomplete data in non-inferiority clinical trials for different types of missingness. We believe that the results reported in this paper could serve practitioners who encounter missing data problems in their non-inferiority clinical trials.
Keywords: Binary outcome, Incomplete data analysis, Multiple imputation, Non-inferiority design
1. Introduction
Non-inferiority (NI) clinical trials seek to show that efficacy of a new treatment is not considerably worse than that of a standard treatment [1]. Such minimally clinically acceptable deviation is called margin. While a portion of a standard treatment effect may be lost by a non-inferior agent, it offers other benefits, such as less severe adverse events, improved drug adherence and/or lower costs [2]. An NI trial design is considered when the use of placebo is unethical, as delaying treatment with a standard care would cause irreversible health damage or death [1,3].
As most clinical trials, NI trials are prone to have incomplete data, which if not properly analyzed might lead to bias in study results [4]. The importance of avoiding missing data, and performing appropriate analysis of incomplete data in clinical trials has been extensively discussed [[5], [6], [7], [8], [9]]. However, the missing data topic received a little attention with respect to NI trials [[10], [11], [12]]. Only a few simulation studies were conducted to assess impact of different analysis strategies for NI trials [[12], [13], [14]]. Moreover, the lack of deliberation around the missing data problem is evident in the published NI trials. Rehal et al. [15] reported that over 50% of reviewed NI trials didn't mention any imputation methods used in the statistical analysis. Similarly Rabe et al. [16], showed that 50% of the reviewed NI and equivalence articles used complete case analysis (CCA), a method that is generally known to produce biased results [4].
One of the principled approaches that can be used for a proper analysis of incomplete data is multiple imputation (MI) [17]. In this paper we evaluate the performance of two-stage MI, an extension of a conventional MI method [[18], [19], [20]] along with CCA and best/worst-case imputation methods for analysis of incomplete NI data. Specifically, we focus on NI trials assessing difference between binomial proportions, a commonly used outcome of interest [16]. In-line with FDA's recommendation to use confidence intervals (CIs) to test NI [1], we consider the following commonly used methods for a straightforward construction of CI for a difference between binomial proportions: Wilson-Newcombe (WN) [26] Farrington and Manning (FM) [25], and Wald [21]. Following a thorough simulation study while implementing different missingness mechanisms [4,22], we provide recommendations regarding the above incomplete data analysis strategies.
According to the recent “Estimands and Sensitivity Analysis in Clinical Trials” (ICH E9(R1)) guideline, handling of the intercurrent events, such as treatment discontinuation is embedded in the estimand's description [23]. Specifically, the guideline states that occurrence of the intercurrent events in the NI trials using treatment policy strategy might falsely contribute to apparent similarities between the treatment groups, and therefore requires a “careful reflection” [23]. We believe that, our work provides a useful road map regarding a proper handling of incomplete data for NI trials, and thus could be helpful in addressing the above regulatory warning.
In Section 2, we introduce CI methods mentioned above, the general missing data framework, and two-stage MI strategy. In Section 3, we present the simulation set-up, followed by the results in Section 4, and conclusions in Section 5.
2. Methods
2.1. Confidence intervals for difference between proportions
Let's assume that the primary endpoint in our trial is a difference between proportions of favorable events in control (C) and new treatment (T) groups. Let be an event indicator for subject in treatment group , where is the total number of subjects in group i and means that the subject experienced a favorable event. If is a true proportion of favorable events in group i, and the acceptable margin is then the hypothesis of interest has the following form:
| (1) |
We assume that the fixed margin approach is used for the above hypothesis testing, i.e., the margin is specified based on the relevant historical data prior to the current NI trial [1,24]. Further, we assume that in (1) is rejected at the pre-specified α level if the upper bound of the CI for is below [1].
Using a maximum likelihood approach, the proportions of favorable events in each treatment group are estimated by the average number of events in each group, and are denoted as and . Let be the upper quantile of a standard normal distribution, the approximate CI for using Wald method has the following form:
| (2) |
FM method has a similar form to that of Wald's CI, with only difference at the variance term estimation, where are maximum likelihood estimates of respectively under the restriction of the null hypothesis in (1) [25]:
| (3) |
Finally, WN method is based on the Wilson's score method for a single proportion [26,27]. Let be a lower and an upper CI bounds for respectively, defined as:
| (4) |
| (5) |
where
| (6) |
| (7) |
2.2. Missing data framework
2.2.1. Missing data assumptions
A common framework for missing data is based on the following missingness mechanisms: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) [4,22]. MCAR essentially means that the missing values in the study are completely random, and independent of the data observed or not observed in the study, MAR implies that the missing values depend on observed data, and MNAR means that the missing values depend on unobserved data. Since MCAR is unlikely to hold in clinical trials [9], analysis based on this assumption should be avoided. In addition to missingness mechanism, distinctness between data model parameters and parameter involved in generation of missing values plays a central role in incomplete data analysis. For the likelihood- and Bayes-based inferences, ignorability is the weakest, most general condition which allows ignoring the missingness model. It is characterized by both MAR and distinctness between the parameters mentioned above. As a result, non-ignorability holds when at least one of these two assumptions is violated. A detailed review of missingness mechanisms, ignorability, and the relation between these could be found elsewhere [19,28,29]. For simplicity, in this paper we will use MAR/ignorabile and MNAR/non-ignorabile terms interchangeably.
2.2.2. Missing data methodology
MI could be applied for any type of missingness structure, including MNAR. A detailed review and implementation of MI could be found elsewhere [28,30]. When data are MNAR, a missingness model needs to be specified. In practice, an exact specification of such model is difficult, if not impossible, as it relies on a set of unverifiable assumptions. Thus, the imputation model could be considered missing, and be multiply imputed together with subject level data using two-stage MI [19,20]. This approach allows to incorporate uncertainty associated with both, choice of the imputation model, and imputed subject level data into the final inference using simple arithmetic combination rules [[18], [19], [20], [31]].
It is well known that while CCA generates unbiased estimates under MCAR, it is generally not the case for MAR [4]. Conventional MI on other hand produces unbiased results under both MCAR and MAR [4], and therefore is usually recommended over CCA. Despite this, there are still certain conditions under which CCA would result in unbiased estimates under MAR and therefore could be safely used [32]. The advantage of conventional MI over CCA for NI trials assessing difference between binomial proportions under MAR was previously shown in terms of unbiasedness and control of the type-I error [13]. The authors, however, did not evaluate cases in which CCA provides unbiased estimates for the treatment effect. Therefore, we explore such conditions here. In addition, conventional MI may result in biased estimates under MNAR, unless, relevant auxiliary variables are included in the imputation model [33,34]. The inflation of type-I error for NI trials under MNAR, when analyzed with conventional MI was reported in [13]. To resolve the issue of type-I error inflation, and consequently treatment effect bias for NI trials under MNAR, we proposed to use the two-stage MI procedure described in the following section.
2.3. Two-stage multiple imputation
If, for example, we knew that the event probability of missing values is 10% greater than the event probability in the observed values, then we could easily specify an imputation model to account for that. The imputation model could be based on a simple transformation of ignorable imputed values to non-ignorable ones using a multiplier k [17]. More specifically we can adjust ignorable imputed probability of event (), to a non-ingorable imputed probability of event () as follows: . Unfortunately, in practice it is almost impossible to make a statement similar to the above 10% example, and consequently set up one particular value of k with an absolute confidence. Therefore, following previous work by Siddique et al. [19,20] we suggest to specify a distribution for k, which corresponds to the imputation model distribution.
In practice, the imputation model distribution needs to be specified by either a study team, or by experts who collected the data. Such distribution represents the study team's belief regarding the magnitude of the bias in the observed rate in treatment group i and their confidence in this belief. These two values represent the center of the missingness model distribution () and its variance (), respectively. For example, if the team believes the study participants were more likely to drop-out due to lack of efficacy in the new treatment, then the team will anticipate that the observed rate in the new treatment is greater than the actual rate. As a result, below 1 will be chosen, so that the ignorably imputed rate is closer to it's true value. If for the same study the team believes the observed rate in the control treatment is unbiased, then would represent such belief. As a result, there is a separate imputation model distribution for each treatment group: and for control and new treatment, respectively. We chose to use normality assumption on the distributions for simplicity. Other distributions can easily replace the normal distribution.
After the imputation model distribution is specified, we can randomly draw M models from it. Within each of the imputed models, patient level data can be imputed D times, resulting in complete datasets. Each of these complete datasets is then analyzed using a standard statistical method, such as methods presented in the previous sections. Results from the analyses are then combined using nested imputation combination rules [18] described in the next section.
2.4. Two-stage multiple imputation combination rules
In order to introduce two-stage imputation rules, a notation close to that of Siddique et al. [18,20] is used. Let Q be a quantity of interest, that approximately follows Normal distribution for a completely observed data, i.e., , where is a complete data statistic estimating Q, and U is a complete data statistic for the variance of . The imputations mentioned above correspond to completed datasets, where represent estimate and variance of Q, respectively from a imputed datasets under model m ().
Let be the overall mean of the estimates: , and let be the mean of the estimated from the mth model: .
Also, let be the three sources of variability, defined as the overall mean of the associated variance estimates, within-model and between model variance terms respectively. Specifically:
Finally, the total variance of has the following form: . The final inferences of the multiple imputed data are based on Student's t distribution , where ν is degrees of freedom, defined as:
To implement the above procedure, we set , where is the estimated proportion of difference between control and new treatment from nth imputation and mth model. For Wald, and FM, the value of was set to the corresponding variance term used in the method as presented under the square root in (2) and (3). For WN, for each treatment group was plugged into (4–7).
3. Simulations
3.1. Simulation of fully observed data
In total 30 NI clinical trials scenarios were considered. The values were set to the range between 0.6 and 0.95 by increments of 0.05. The values were set to: 0.05, 0.075, 0.1, 0.15 and 0.2. All possible combinations of the above margins and probabilities were used, excluding cases where margin was greater or equal to the corresponding failure rate . A margin equal to the corresponding failure rate would mean that the usage of a new treatment doubles a failure rate of the treated condition. Therefore, a margin greater or equal to the corresponding failure rate, was redefined as half of the original margin. Due to the high volume of the results, we present here only 9 of the 30 scenarios (unless stated otherwise), which are representative of the rest of the results. In addition, we assumed a one-sided type-I error of 2.5%, power of 90%, and 1:1 group allocation ratio.
Since different methods for comparison of binomial proportions might require different sample sizes [35], sample sizes were calculated for each method separately using above scenarios assumptions. For Wald and FM methods, the sample size calculations were performed by inversion of the corresponding CI formulas [35], while sample sizes for WN were estimated based on 5000 simulations. As a result the sample size per arm ranged between 98 and 2017 patients.
The outcome variable Y (subscripts are omitted for simplicity) was simulated for each subject using a logistic function of treatment ( for control treatment, for the new treatment) and two continuous baseline covariates () as follows:
| (8) |
Further details regarding parameters setting in the above model are provided in the supplemental material. The total number of simulated trials per scenario and method under each hypothesis was set to 10,000 repetitions.
3.2. Simulation of incomplete data
Let be a missing indicator variable for outcome , such that indicates that the outcome for patient j in group i is missing while means that the outcome for that patient is observed. Upon a generation of the complete datasets, the missing outcome values were imposed using the following logistic function (subscripts are omitted for simplicity):
| (9) |
Parameters represent effects of treatment group, outcome, treatment group by outcome interaction and baseline covariate on missingness, respectively. In order to impose a specific missingness mechanism (MCAR, MAR and MNAR), different parameter values were used. The overall drop-out rates were set to and . The drop-out rate was chosen as an upper bound, because 86% of NI and equivalence trials with incomplete data reported to have drop-out rates of up to 20% [16].
For MCAR, all model parameters but α were set to 0 (, is a target drop-out rate). For MAR, was set to , while ranged between −0.9 and 0.9 in order to assess unbalanced levels of drop-out rates of 5–15% between the treatment groups. MNAR was set up to implement scenarios where dropping out of the study is associated with either lack of efficacy in the new treatment or with overwhelming efficacy in the control treatment, therefore both were set to non-zero values as follows: i) for MNAR due to lack of efficacy in the new treatment; ii) for MNAR due to overwhelming efficacy in the control treatment. These two conditions were considered for MNAR, as both would lead to the observed difference between the treatments to appear smaller than it actually is, which leads to an incorrect study conclusion.
3.3. Analysis strategies for incomplete data
The following analysis strategies were used for the analysis of incomplete data: CCA, best-case scenario, worst-case scenario, two-stage MI using multiple imputation chained equations (MICE) [36].
Both best-case and worst-case scenario strategies were employed only for MCAR missingness mechanism. It was expected that these two strategies would inflate type-I errors, since they make the two treatment groups more alike, which in turn makes it easier to reject the null hypothesis specified in (1).
For MAR missingness, it was expected, that CCA strategy would lead to approximately unbiased estimates of and [32]. This is due to the fact that baseline covariates were balanced and had similar effect on the missingness in (9). Thus, it was expected that CCA strategy would result with type-I errors that only slightly deviate from the desired level.
For MNAR missingness, it was expected that single value imputation methods, or CCA would produce biased results with inflated type-I error rates. Although, conventional MI might produce unbiased estimates when relevant auxiliary variables are used [33,34], our simulation set-up did not address such situation and therefore we anticipated that conventional MI would not be able to provide unbiased estimates for MNAR. In order to properly analyze the incomplete data that follow such missingness process, we used two-stage MI. Two-stage MI was compared to CCA rather than to conventional MI, due to the fact that both CCA and conventional MI ought to produce biased estimates, and because CCA is an easy and dominant approach in clinical trials.
As specified in the previous section, two MNAR situations were simulated: drop-out due to lack of efficacy in the new treatment and drop-out due to overwhelming efficacy in the control treatment. For the first situation, it was expected that the observed rate in the new treatment group will be higher than it actually is, while the observed rate in the control group will be unbiased, therefore we specified where was chosen below 1 and . On the contrary, in the second situation it was expected that the observed rate in the control group is lower than it actually is, while the observed rates in the new treatment will be unbiased, therefore we set , where was chosen above 1 and .
Similar to Siddique et al. [20]; D was set to 2, and M was set to 100. The multiple imputation of the subject level data, within each imputed missingness model (randomly drawn values of ) was performed using MICE with the two baseline covariates specified above. We also performed sensitivity analysis for k, by specifying different values for , and doubling the standard deviation.
3.4. Evaluation criteria
The Wald, FM and WN performances along with analysis strategies used to handle missing data were assessed using empirical type-I error, empirical power and mean relative bias. Type-I error was estimated by the proportion of trials that reject in (1) out of the trials simulated under , and was considered appropriately controlled if it fall within bounds [37,38]. Power was estimated by the proportion of trials that reject in (1) out of the trials simulated under . A relative bias was defined under as per repetition. A result was considered unbiased if the mean relative bias fall within bounds. The negative bias implies that the new treatment (T) is worse than it appears, thus a non-inferiority of the new treatment may be incorrectly inferred.
The simulations were run using the R-package nibinom we developed. The package with additional code to reproduce the presented results are available here: https://github.com/yuliasidi/nibinom_apply.
4. Results
4.1. Missing completely at random
For MCAR, we present results for overall drop-out rate of 20%, as these are representative for lower drop-out rates. Also, since the three methods showed very similar results, only Wald method is presented for MCAR. As can be seen in Table 1, worst-case scenario imputation strategy produced inflated type-I error rates that were more than double of the completely observed data, along with significantly biased estimates. On contrary, CCA produced unbiased estimates with type-I errors being either within the pre-specified range or very close to it.
Table 1.
Empirical type-I errors and mean relative bias for MCAR, DO = 20%, worst-case imputation scenario and CCA strategies, Wald method.
| Type-I | Bias | |||||
|---|---|---|---|---|---|---|
| Full | Worst | CCA | Worst | CCA | ||
| 0.65 | 0.05 | 0.026 | 0.103 | 0.029 | −0.214 | −0.019 |
| 0.65 | 0.10 | 0.027 | 0.093 | 0.028 | −0.210 | −0.016 |
| 0.65 | 0.15 | 0.025 | 0.090 | 0.026 | −0.211 | −0.015 |
| 0.75 | 0.05 | 0.025 | 0.079 | 0.026 | −0.201 | −0.002 |
| 0.75 | 0.10 | 0.026 | 0.087 | 0.029 | −0.205 | −0.004 |
| 0.75 | 0.15 | 0.023 | 0.084 | 0.025 | −0.209 | −0.009 |
| 0.80 | 0.15 | 0.024 | 0.074 | 0.026 | −0.212 | −0.011 |
| 0.85 | 0.05 | 0.023 | 0.066 | 0.024 | −0.194 | 0.008 |
| 0.85 | 0.10 | 0.028 | 0.067 | 0.026 | −0.198 | 0.003 |
Due to the significant inflation of type-I error for worst-case imputation method, empirical power was calculated for CCA strategy only. As expected the power is decreasing with higher drop-out rates, dropping to 81.5% (see supplemental materials). Results for best-case scenario imputation were very similar to the worst-case scenario and are, therefore, omitted.
4.2. Missing at random
Empirical type-I errors under MAR assumption, analyzed using CCA for balanced drop-out rates were well controlled in most scenarios by the three methods (Fig. 1). In addition, this strategy resulted in unbiased estimates, while the empirical power went down to 81.7% (see supplemental materials). For unbalanced drop-out rates, as expected, CCA showed slight deviations from the desired level of the type-I error. The largest empirical type-I error was equal to 0.0419 for overall drop-out rate of 20%, when the drop-out rates between the treatment groups differed by 15% (see supplemental materials). Nevertheless, the mean relative bias fall within the specified bounds for all of the scenarios, methods and drop-out rates (results are not presented).
Fig. 1.
Empirical type-I error CCA strategy for MAR: drop-out rates are balanced between the treatment groups.
4.3. Missing not at random
Empirical type-I errors rates for incomplete data under MNAR due to lack of efficacy in the new treatment were seriously inflated when analyzed using CCA (Fig. 2). This was not the case for two-stage MI, which produced type-I errors either within the specified bounds or very close to them (Fig. 3). In addition, for two-stage MI, WN method has shown a less favorable results when compared to Wald and FM. The advantage of two-stage MI over CCA is also demonstrated by the mean relative bias, with CCA resulting in mean relative bias as large as −0.897 for drop-out rate of 20% when using Wald method (Table 2). The corresponding mean relative bias results for the other two methods were similar to Wald and, therefore are omitted. Furthermore, while the mean relative bias was of a smaller magnitude for lower drop-out rates, CCA still resulted in biased estimates in most cases, while two-stage MI showed unbiased estimates (results not shown). The empirical power based on the two-stage MI was below the desired level of 0.9 with lowest rate of 65.8% for overall drop-out rate of 20%. This is not surprising due to variability introduced through the MI procedure (see supplemental materials). Results from MNAR due to overwhelming efficacy in the control treatment were similar in terms of type-I errors, bias and power to the MNAR due to lack of efficacy in the new treatment (see supplemental material).
Fig. 2.
Empirical type-I errors, CCA strategy for MNAR due to lack of efficacy in T.
Fig. 3.
Empirical type-I errors, two-stage MI strategy via MICE for MNAR due to lack of efficacy T.
Table 2.
Mean relative bias for MNAR due to lack of efficacy in T, DO = 20%, CCA and two-stage MI strategies, Wald method.
| CCA | MI | ||
|---|---|---|---|
| 0.65 | 0.05 | −0.897 | −0.032 |
| 0.65 | 0.10 | −0.453 | 0.015 |
| 0.65 | 0.15 | −0.300 | 0.022 |
| 0.75 | 0.05 | −0.852 | −0.038 |
| 0.75 | 0.10 | −0.458 | 0.000 |
| 0.75 | 0.15 | −0.319 | 0.055 |
| 0.80 | 0.15 | −0.328 | 0.019 |
| 0.85 | 0.05 | −0.709 | −0.068 |
| 0.85 | 0.10 | −0.422 | −0.001 |
In Fig. 4, we present sensitivity analysis for the choice of distribution of imputation models specified by multiplier . Although type-I error rates are affected by the choice of the imputation model distribution, in all the cases the type-I errors are much smaller than the one observed for CCA strategy (solid black horizontal line).
Fig. 4.
Choice of different distribution parameters for . Empirical type-I error, two-stage MI strategy via MICE for scenario with for MNAR due to lack of efficacy in T.
5. Conclusion
Our work presents a thorough simulation study that assesses different strategies for analysis of incomplete data, when NI design is employed and the outcome of interest is a difference between binomial proportions. We evaluated three commonly used methods for construction of confidence intervals for the difference between binomial proportions: Wald, WN and FM.
We found that both best-case and worst-case imputation strategies perform poorly even when the incomplete data follows MCAR. This is due to the fact that by treating incomplete cases similarly for both treatment groups, we make the estimated proportions similar, which leads to erroneous conclusion of NI. According to Rabe et al. [16]; 28% of the reviewed articles that encountered some amount of incomplete data in the primary analysis, used single imputation strategy, including best/worst-case imputation. The simulation results we present here, along with the review results reported by Rabe et al. [16] are concerning. We believe that such imputation strategy should be abounded when dealing with an NI analysis.
Similar to previous work by Barlett et al. [32]; we found that CCA performs well when incomplete data follows MAR, and both baseline covariates that affect the missingness and the corresponding drop-out rates are balanced between treatment arms. In addition, when the drop-outs rates were higher in the new treatment, type-I errors might be inflated, depending on the scenario. Among cases with unbalanced drop-out rates, the highest type-I error rate that was seen is 0.0419% for overall drop-out rate of 20% with 15% higher drop-out in the new treatment. Considering the levels of inflations seen for MNAR and the fact that 0.0419% rate was reached by a relatively extreme missingness scenario, we believe that CCA could still be considered as a safe choice for MAR incomplete data. It should be noted that, if researchers assume that MAR is affected by variables that have different levels between the treatment groups, then conventional MI strategy is recommended over CCA as suggested in [13]. The importance of the findings for MAR presented here, is to demonstrate when CCA could be used and what assumptions need to be made in order to have a valid inference.
Importantly, we demonstrated that while CCA performs poorly for incomplete data under MNAR, which is also the case for conventional MI [13], two-stage MI strategy produces favorable results. We believe that, these results are of great importance for practitioners who encounter incomplete data in NI clinical trials. The limitation of this method is the specification of the distribution of the multiple imputation model, or the multiplier. Nevertheless, according to the sensitivity analysis we performed, it is clear that even if the parameters of the multiplier's distribution are shifted, the type-I error rates are still substantially lower than those seen with CCA strategy.
The results of the empirical power were in line with our expectation. In general, we saw a decrease of empirical power with increasing drop-out rates. In terms of the difference between the analysis methods considered here, we found that in most cases there was no difference between the three. However, when the two-stage MI procedure was used, WN performed less favorable than Wald and FM. This could be explained by the fact that we used a plug-in method for WN, rather than a proper MI combination rules. A method for a proper combination of multiple imputed data and analysis of difference between proportions using WN is unavailable yet and is of interest for a future research.
Although, we have looked at a variety of different scenarios, one limitation of our work is that it does not cover each possible scenario. Therefore, before finalizing statistical analysis plan for an NI trial, researchers should always consider the specific scenario they are dealing with. Another limitation of our work is that, we considered moderate to large sample sizes. We have not evaluated small sample sizes, which might require exact methods, such as the method due to Chan [39]; and thus might have different implications when applying MI strategy. In addition, only a simple transformation of ignorable to non-ignorable imputed values was evaluated for MNAR analysis. A more thorough examination of various functional forms of such transformation is of interest for future research.
In summary, we recommend employing the following analyses strategies when dealing with incomplete data for NI trials assessing difference between binomial proportions: 1) if the incomplete data follow MAR and it is reasonable to assume that the missingness is caused by balanced baseline covariates only, then CCA could be used, 2) if the data are MAR, but the missingness is caused by other unbalanced variables then following the work by Lipkovich and Weins [13] conventional MI should be used, 3) if MNAR is a more reasonable assumption, then two-stage MI should be used, 4) best-case and worst-case imputation should be avoided.
We believe that, the above recommendations are useful for practitioners who face incomplete data analysis of NI trials that assess difference between binomial proportions.
Footnotes
Supplementary data to this article can be found online at https://doi.org/10.1016/j.conctc.2020.100567.
Appendix A. Supplementary data
The following is the Supplementary data to this article:
References
- 1.FDA . 2016. Non-inferiority Clinical Trials to Establish Effectiveness; Guidance for Industry. [Google Scholar]
- 2.Piaggio G., Elbourne D.R., Pocock S.J., Evans S.J., Altman D.G., Group C. Reporting of noninferiority and equivalence randomized trials: extension of the consort 2010 statement. Jama. 2012;308(24):2594–2604. doi: 10.1001/jama.2012.87802. [DOI] [PubMed] [Google Scholar]
- 3.ICH . 2000. Choice of control group and related issues in clinical trials e10. [PubMed] [Google Scholar]
- 4.Little R.J., Rubin D.B. vol. 333. John Wiley & Sons; 2014. (Statistical Analysis with Missing Data). [Google Scholar]
- 5.Collins L.M., Schafer J.L., Kam C.-M. A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol. Methods. 2001;6(4):330. [PubMed] [Google Scholar]
- 6.Dziura J.D., Post L.A., Zhao Q., Fu Z., Peduzzi P. Strategies for dealing with missing data in clinical trials: from design to analysis. Yale J. Biol. Med. 2013;86(3):343. [PMC free article] [PubMed] [Google Scholar]
- 7.Fleming T.R. Addressing missing data in clinical trials. Ann. Intern. Med. 2011;154(2):113–117. doi: 10.1059/0003-4819-154-2-201101180-00010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Little R.J., D’agostino R., Cohen M.L., Dickersin K., Emerson S.S., Farrar J.T., Frangakis C., Hogan J.W., Molenberghs G., Murphy S.A. The prevention and treatment of missing data in clinical trials. N. Engl. J. Med. 2012;367(14):1355–1360. doi: 10.1056/NEJMsr1203730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.NRC . National Academies Press; 2011. The Prevention and Treatment of Missing Data in Clinical Trials. [PubMed] [Google Scholar]
- 10.Fleming T.R. Current issues in non-inferiority trials. Stat. Med. 2008;27(3):317–332. doi: 10.1002/sim.2855. [DOI] [PubMed] [Google Scholar]
- 11.Gallo P., Chuang-Steiny C. A note on missing data in noninferiority trials. Drug Inf. J. 2009;43(4):469–474. [Google Scholar]
- 12.Wiens B.L., Zhao W. The role of intention to treat in analysis of noninferiority studies. Clin. Trials. 2007;4(3):286–291. doi: 10.1177/1740774507079443. [DOI] [PubMed] [Google Scholar]
- 13.Lipkovich I., Wiens B.L. The role of multiple imputation in non-inferiority trials for binary outcomes. Stat. Biopharm. Res. 2017;10(1):57–69. [Google Scholar]
- 14.Yoo B. Impact of missing data on type 1 error rates in non-inferiority trials. Pharmaceut. Stat. 2010;9(2):87–99. doi: 10.1002/pst.378. [DOI] [PubMed] [Google Scholar]
- 15.Rehal S., Morris T.P., Fielding K., Carpenter J.R., Phillips P.P. Non-inferiority trials: are they inferior? a systematic review of reporting in major medical journals. BMJ open. 2016;6(10) doi: 10.1136/bmjopen-2016-012594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.ICH . 2017. Estimands and Sensitivity Analysis in Clinical Trials. [Google Scholar]
- 17.Rubin D.B. vol. 81. John Wiley & Sons; 2004. (Multiple Imputation for Nonresponse in Surveys). [Google Scholar]
- 18.Shen Z. Harvard University; 2000. Nested Multiple Imputations. Ph. D. thesis. [Google Scholar]
- 19.Siddique J., Harel O., Crespi C.M. Addressing missing data mechanism uncertainty using multiple-model multiple imputation: application to a longitudinal clinical trial. Ann. Appl. Stat. 2012;6(4):1814. doi: 10.1214/12-AOAS555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Siddique J., Harel O., Crespi C.M., Hedeker D. Binary variable multiple-model multiple imputation to address missing data mechanism uncertainty: application to a smoking cessation trial. Stat. Med. 2014;33(17):3013–3028. doi: 10.1002/sim.6137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Wald A. Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans. Am. Math. Soc. 1943;54(3):426–482. [Google Scholar]
- 22.Rubin D.B. Inference and missing data. Biometrika. 1976;63(3):581–592. [Google Scholar]
- 23.Rabe B.A., Day S., Fiero M.H., Bell M.L. Pharmaceutical Statistics; 2018. Missing Data Handling in Non-inferiority and Equivalence Trials: A Systematic Review. [DOI] [PubMed] [Google Scholar]
- 24.Rothmann M.D., Wiens B.L., Chan I.S. Chapman and Hall/CRC; 2016. Design and Analysis of Non-inferiority Trials. [Google Scholar]
- 25.Farrington C.P., Manning G. Test statistics and sample size formulae for comparative binomial trials with null hypothesis of non-zero risk difference or non-unity relative risk. Stat. Med. 1990;9(12):1447–1454. doi: 10.1002/sim.4780091208. [DOI] [PubMed] [Google Scholar]
- 26.Newcombe R.G. Interval estimation for the difference between independent proportions: comparison of eleven methods. Stat. Med. 1998;17(8):873–890. doi: 10.1002/(sici)1097-0258(19980430)17:8<873::aid-sim779>3.0.co;2-i. [DOI] [PubMed] [Google Scholar]
- 27.Wilson E.B. Probable inference, the law of succession, and statistical inference. J. Am. Stat. Assoc. 1927;22(158):209–212. [Google Scholar]
- 28.Harel O., Zhou X.-H. Multiple imputation: review of theory, implementation and software. Stat. Med. 2007;26(16):3057–3077. doi: 10.1002/sim.2787. [DOI] [PubMed] [Google Scholar]
- 29.Yucel R. Impact of the non-distinctness and non-ignorability on the inference by multiple imputation in multivariate multilevel data: a simulation assessment. J. Stat. Comput. Simulat. 2017;87(9):1813–1826. [Google Scholar]
- 30.Schafer J.L. Multiple imputation: a primer. Stat. Methods Med. Res. 1999;8(1):3–15. doi: 10.1177/096228029900800102. [DOI] [PubMed] [Google Scholar]
- 31.Reiter J.P., Raghunathan T.E. The multiple adaptations of multiple imputation. J. Am. Stat. Assoc. 2007;102(480):1462–1471. [Google Scholar]
- 32.Bartlett J.W., Harel O., Carpenter J.R. Asymptotically unbiased estimation of exposure odds ratios in complete records logistic regression. Am. J. Epidemiol. 2015;182(8):730–736. doi: 10.1093/aje/kwv114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Cmph . European Medicines Agency; London: 2010. Guideline on Missing Data in Confirmatory Clinical Trials. [Google Scholar]
- 34.Demirtas H., Schafer J.L. On the performance of random-coefficient pattern-mixture models for non-ignorable drop-out. Stat. Med. 2003;22(16):2553–2575. doi: 10.1002/sim.1475. [DOI] [PubMed] [Google Scholar]
- 35.Julious S.A., Owen R.J. A comparison of methods for sample size estimation for non-inferiority studies with binary outcomes. Stat. Methods Med. Res. 2011;20(6):595–612. doi: 10.1177/0962280210378945. [DOI] [PubMed] [Google Scholar]
- 36.Buuren S.v., Groothuis-Oudshoorn K. mice: multivariate imputation by chained equations in r. J. Stat. Software. 2010;1–68 [Google Scholar]
- 37.Dann R.S., Koch G.G. Methods for one-sided testing of the difference between proportions and sample size considerations related to non-inferiority clinical trials. Pharmaceut. Stat. 2008;7(2):130–141. doi: 10.1002/pst.287. [DOI] [PubMed] [Google Scholar]
- 38.Roebruck P., Kühn A. Comparison of tests and sample size formulae for proving therapeutic equivalence based on the difference of binomial probabilities. Stat. Med. 1995;14(14):1583–1594. doi: 10.1002/sim.4780141409. [DOI] [PubMed] [Google Scholar]
- 39.Chan I.S. Exact tests of equivalence and efficacy with a non-zero lower bound for comparative studies. Stat. Med. 1998;17(12):1403–1413. doi: 10.1002/(sici)1097-0258(19980630)17:12<1403::aid-sim834>3.0.co;2-y. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




