Modified reference based imputation and tipping point analysis in the presence of missing data due to COVID-19

Man Jin; Ran Liu; Weining Robieson

doi:10.1016/j.cct.2021.106575

. 2021 Sep 28;110:106575. doi: 10.1016/j.cct.2021.106575

Modified reference based imputation and tipping point analysis in the presence of missing data due to COVID-19

Man Jin ^1,^⁎, Ran Liu ¹, Weining Robieson ¹

PMCID: PMC8479366 PMID: 34597836

Abstract

In longitudinal clinical trials, missing data are inevitable due to intercurrent events (ICEs) such as treatment interruption or premature discontinuation for different reasons. The COVID-19 pandemic has had substantial impact on clinical trials since early 2020 as it may result in missing data due to missed visits and premature discontinuations. The missing data due to COVID-19 can reasonably be assumed as missing at random (MAR).

We propose a combined hypothetical strategy for sensitivity analyses to handle missing data due to both COVID-19 and non-COVID reasons. We modify the commonly used missing not at random (MNAR) methods, reference based imputation (RBI) and tipping point analysis, under this strategy. We propose the standard multiple imputation approach and derive an analytic likelihood based approach to implement the proposed methods to improve efficiency in applications. The proposed strategy and methods are applicable to a more general scenario when there are missing data due to both MAR and MNAR reasons.

Keywords: COVID-19, Missing data, Reference based imputation, Tipping point analysis, Intercurrent event

1. Introduction

The ICH E9 (R1) addendum requires a clearly defined estimand which can describe the quantity to be estimated to address a specific study objective for confirmatory clinical trials. One of the attributes in the construction of the estimand is to account for intercurrent events (ICEs) that may affect the estimand and the interpretation of the trial results. Different strategies for handling ICEs when defining estimand may provide different information on the trial data, thus the definition of estimand should be delineated with the clinical interests in clinical trials [1], [2].

The COVID-19 pandemic has had substantial impact on planned and ongoing clinical trials since early 2020. Its impact on trial data may cause potential statistical issues and may be directly or indirectly relevant to the estimand and the interpretation of the analysis results [3]. FDA has issued a guidance and EMA has issued a “points to consider” document for statistical considerations in clinical trials conducted during the COVID-19 pandemic [4], [5]. Missing data may result from ICEs associated with COVID-19 due to missed visits and premature discontinuations. For example, missing data may result from interruptions or discontinuation due to COVID infection, quarantine, or hospitalization, and thus the missingness is not associated with the unobserved measurements. Therefore, missing data due to COVID-19 can reasonably be assumed to be missing at random (MAR). As a result, missing data due to COVID-19 can be analyzed by commonly used approaches under MAR, such as mixed model for repeated measures (MMRM) or multiple imputations (MI) [6]. However, the MAR assumption may still be difficult to justify for missing data due to non-COVID related reasons such as lack of efficacy or safety issues. Methods for missing not at random (MNAR) may be more appropriate to handle missing data due to non-COVID reasons for sensitivity analyses [7], [8]. To take the MAR missingness into account, sensitivity analyses conducted during COVID-19 pandemic should incorporate missing data handling due to both COVID-19 and non-COVID reasons. We propose a combined hypothetical strategy to handle ICEs associated with COVID-19 interruptions and non-COVID dropouts to define estimands for clinical trials conducted during COVID-19 pandemic. Under the combined hypothetical strategy, missing data due to COVID-19 reasons are handled by MAR methods and missing data due to non-COVID reasons are handled by MNAR methods. As we know that MNAR methods are usually more conservative than MAR methods, the combination of MAR and MNAR will be more powerful than the pure MNAR methods, but will be less powerful than pure MAR methods. The level of power gain compared to pure MNAR methods should depend on the proportion of missing data due to MAR/COVID-19.

Reference-based imputation (RBI) and tipping point analysis are commonly used to handle MNAR missingness [9], [10]. However, these methods only use MNAR assumptions and may not represent the reasonable missing mechanism for missingness due to COVID-19. In this paper, we propose modified RBI methods and tipping point analysis method under the combined hypothetical strategy to handle missing data incorporating missingness due to COVID-19 interruptions and dropouts. Since missingness due to COVID-19 is a special case of MAR missingness, the proposed framework can be generalized to other scenarios of combined MAR and MNAR missingness mechanisms.

In practice, there are two types of missingness: intermittent missingness and monotone missingness. Intermittent missingness is often assumed to be MAR and then handle by MAR methods [6]. With this consideration, we will focus our theoretical development and investigation on monotone missingness throughout the paper.

The remainder of the manuscript is organized as following. In Section 2, we propose modified RBI methods to handle missing data incorporating missingness due to COVID-19 or other MAR reasons, and propose MI and likelihood based analytic approaches for statistical inference; in Section 3, we propose a modified tipping point analysis method to handle missing data incorporating missingness due to COVID-19 or other MAR reasons, and propose MI and likelihood based analytic approaches for statistical inference; in Section 4, the proposed methods are applied to a data example for illustration; and in Section 5, we conclude this paper with some discussion.

2. Modified RBI to incorporate missingness due to COVID-19

In this manuscript, we refer to two intercurrent events, dropouts due to COVID-19 and due to other reasons, resulting in missing data due to COVID-19 and non-COVID reasons. Reference based imputation methods have been used for analysis of missing data under MNAR [9]. The RBI methods use different imputation models for missing data in two treatment groups: the missing data in the reference or control group will be imputed under MAR assumption, whereas the missing data in the treatment group will be imputed based on the assumption that a patient drop out in the treatment group will have a similar response profile to that of the patients in the reference group.

The three most commonly used RBI methods are described below in the order from the most to the least conservative:

•
Jump to reference (J2R): it is assumed that the mean effect profile of the patients who discontinue the active treatment will immediately jump to that of patients in the reference group after discontinuation. Therefore, the missing data will have a distribution after dropout equal to that of the control group.
•
Copy reference (CR): it is assumed that the missing data in the active treatment group will have the conditional distribution after dropout equal to that in the reference group. This approach is conservative, but not as conservative as J2R since it still allows the treatment benefit received prior to dropout to be carried over to the post-dropout visits by using the prior observed values in active treatment group as predictors. As a result, the mean effect profile of the patients in the active treatment group will gradually transition to that of patients in the reference group.
•
Copy increments in reference (CIR): it is assumed that the mean effect profile of the patients who discontinue the active treatment will have an increment profile equal to that of the reference group for visits after dropout.

RBI is a type of pattern mixture model (PMM) and imputes missing data under missing patterns [11], [12]. Let us consider a longitudinal trial with the endpoints measured at visits 0, 1, 2, …, K, where 0 is the baseline visit. For patient i, the change from baseline at each visit is denoted by the vector Y_i = (Y _i1, …, Y _iK)′. Assume the sample size is n ^Z in group Z, where Z = T for treatment group or C for reference or control group.

Under the proposed combined hypothetical strategy, we propose modified RBI methods to impute missing data due to COVID-19 with MAR methods and missing data due to non-COVID reasons with MNAR methods. To facilitate our proposed imputation strategy, we need to consider missing patterns different from those defined in conventional RBI methods. In the presence of missing data due to COVID-19, there may be missingness due to COVID-19 and non-COVID reasons. We define the corresponding missing patterns due to COVID-19 and non-COVID reasons as following:

•
Missing pattern due to COVID-19: the jth missing pattern due to COVID-19 is for patients who have measures up to Visit j and drop out at Visit j + 1 due to COVID-19, j = 0, …, K − 1. Denote $π_{j, cvd}^{Z}$ the probability of patients in missing pattern j due to COVID-19 in group Z = T, C, and $π_{cvd}^{Z} = \sum_{j = 0}^{K - 1} π_{j, cvd}^{Z}$ is the probability of patients who drop out due to COVID-19 in group Z.
•
Missing pattern due to non-COVID reasons: jth missing pattern due to non-COVID reasons is for patients who have measures up to Visit j and drop out at Visit j + 1 due to non-COVID reasons, j = 0, …, K − 1. Denote $π_{j, ncvd}^{Z}$ the probability of patients in missing pattern j due to non-COVID reasons in group Z = T, C, and $π_{ncvd}^{Z} = \sum_{j = 0}^{K - 1} π_{j, ncvd}^{Z}$ is the probability of patients who drop out due to non-COVID reasons in group Z.
•
Denote $π_{K}^{Z}$ the probability of completers in group Z = T, C . Considering all patients who drop out for COVID-19, non-COVID reasons, and who complete the assigned therapy,
$\sum_{j = 0}^{K - 1} π_{j, cvd}^{Z} + \sum_{j = 0}^{K - 1} π_{j, ncvd}^{Z} + π_{K}^{Z} = π_{cvd}^{Z} + π_{ncvd}^{Z} + π_{K}^{Z} = 1 .$

In the presence of missing data due to COVID-19, there are naturally two strategies for our proposed missing patterns: one is the general RBI for all missing data, and the other is our proposed modified RBI using MAR analysis for missing data due to COVID-19 and conventional RBI for missing data due to non-COVID reasons. For the two strategies, the corresponding probabilities of missing patterns in group Z = T, C at each visit are denoted as the following using what have been defined:

\begin{matrix} p^{Z} & = & (p_{0}^{Z}, p_{1}^{Z}, \dots, p_{K}^{Z})' = (π_{0, cvd}^{Z} + π_{0, ncvd}^{Z}, π_{1, cvd}^{Z} + π_{1, ncvd}^{Z}, \dots, π_{K - 1, cvd}^{Z} + π_{K - 1, ncvd}^{Z}, π_{K}^{Z})', \\ q^{Z} & = & (q_{0}^{Z}, q_{1}^{Z}, \dots, q_{K}^{Z})' = (π_{0, ncvd}^{Z}, π_{1, ncvd}^{Z}, \dots, π_{K - 1, ncvd}^{Z}, π_{K}^{Z} + π_{cvd}^{Z})' . \end{matrix}

(1)

Denote $(μ_{0}^{Z}, μ_{1}^{Z}, \dots, μ_{K}^{Z})'$ the vector of true means of change from baseline at Visit j = 0 (baseline), 1, …, K, where Z = T or C. Without loss of generality, it is assumed that a larger value favors the treatment effect. Assume $μ_{0}^{Z} = 0$ for baseline mean effect. For any pattern j, denote the measures $(Y_{o, j}^{Z}, Y_{m, j}^{Z})$ , where $Y_{o, j}^{Z}$ is the vector of the observed measures up to Visit j $(Y_{0}^{Z}, Y_{1}^{Z}, \dots, Y_{j}^{Z})'$ and $Y_{m, j}^{Z}$ is the vector of the missing measures $(Y_{j + 1}^{Z}, Y_{j + 2}^{Z}, \dots, Y_{K}^{Z})'$ . The covariance matrix can be decomposed of block matrix with dimensions corresponding to $Y_{o, j}^{Z}$ and $Y_{m, j}^{Z}$ (Σ_oo,j is the covariance matrix corresponding to $Y_{o, j}^{Z}$ and Σ_mm,j to $Y_{m, j}^{Z}$ )

(\begin{matrix} Σ_{oo, j} & Σ_{om, j} \\ Σ_{mo, j} & Σ_{mm, j} \end{matrix}) .

For missing pattern j > 0, denote $μ_{j}^{Z} = (μ_{0}^{Z}, μ_{1}^{Z}, \dots, μ_{j}^{Z})'$ the vector of true means of the observed measures up to Visit j. The missing data for the treatment group will use the imputation model built from the mean profile in the reference group as specified by the specific assumption of RBI, and the imputed mean vector for missing data using RBI at post-dropout Visit k, j < k ≤ K becomes:

μ_{j, imp - RBI}^{k} = \{\begin{matrix} μ_{k}^{C}, & J 2 R \\ μ_{k}^{C} + {[Σ_{mo, j} Σ_{oo, j}^{- 1} (μ_{j}^{T} - μ_{j}^{C})]}_{k - j}, & CR \\ μ_{k}^{C} + μ_{j}^{T} - μ_{j}^{C}, & CIR, \end{matrix})

(2)

where $[Σ_{mo, j} Σ_{oo, j}^{- 1} (μ_{j}^{T} - μ_{j}^{C})]$ is a vector with dimension (K − j) × 1 and ${[Σ_{mo, j} Σ_{oo, j}^{- 1} (μ_{j}^{T} - μ_{j}^{C})]}_{k - j}$ is the (k − j)th element of the vector.

For missing pattern j = 0, $μ_{j, imp - RBI}^{k} = μ_{k}^{C}$ because there are no post-baseline effects.

By the general RBI methods, the true treatment difference between the treatment groups at Visit K can be derived by averaging the mean difference across all missing patterns [15], [16]:

β_{K}^{{RBI}_{g}} = \{\begin{matrix} p_{K}^{T} (μ_{K}^{T} - μ_{K}^{C}), & J 2 R \\ p_{K}^{T} (μ_{K}^{T} - μ_{K}^{C}) + \sum_{j = 1}^{K - 1} p_{j} {[Σ_{mo, j} Σ_{oo, j}^{- 1} (μ_{j}^{T} - μ_{j}^{C})]}_{K}, & CR \\ \sum_{j = 1}^{K} p_{j}^{T} (μ_{j}^{T} - μ_{j}^{C}), & CIR . \end{matrix})

(3)

Using the proposed modified RBI methods with the combined imputation strategy, for missing pattern j, the treatment difference using MAR imputation at Visit K becomes:

β_{j, imp - MAR}^{K} = μ_{K}^{T} - μ_{K}^{C},

and the treatment difference using RBI at Visit K is (3).

Therefore, we can derive the treatment difference at Visit K using the modified RBI methods which can incorporate missingness due to COVID-19 or other MAR reasons as follows:

β_{K}^{{RBI}_{m}} = \{\begin{matrix} q_{K}^{T} (μ_{K}^{T} - μ_{K}^{C}), & J 2 R \\ q_{K}^{T} (μ_{K}^{T} - μ_{K}^{C}) + \sum_{j = 1}^{K - 1} q_{j} {[Σ_{mo, j} Σ_{oo, j}^{- 1} (μ_{j}^{T} - μ_{j}^{C})]}_{K}, & CR \\ \sum_{j = 1}^{K} q_{j}^{T} (μ_{j}^{T} - μ_{j}^{C}), & CIR . \end{matrix})

(4)

From the probabilities of missing patterns with or without consideration of missingness due to COVID-19 in Eq. (1), it is straightforward that

p_{j}^{T} \geq q_{j}^{T}, j = 1, \dots, K - 1,

and

p_{K}^{T} \leq q_{K}^{T} .

Therefore, the true treatment difference at Visit K under the modified RBI is less conservative than the general RBI:

β_{K}^{{RBI}_{m}} \geq β_{K}^{{RBI}_{g}} .

We propose two approaches for the estimators and variances to implement the modified RBI methods: MI approach and likelihood based analytic approach.

2.1. Modified RBI by MI

Multiple imputation (MI) can be applied to obtain the estimators and variances of an imputation method. The first step of multiple imputation is to impute the missing values using a model under RBI multiple times to obtain m completed datasets, and the second step is to analyze each of the m datasets and combine the results using Rubin's rules [13].

Tang (2017) proposed an efficient MI approach for RBI by using proc MI in SAS, which imputes the missing data by adjusting the mean difference in the posterior predictive distributions of missing data from the imputed values by MMRM under MAR [14]. We propose the following MI approach for the modified RBI methods:

•
Impute all missing data assuming MAR by MI, then get m complete datasets;
•
Calculate the mean difference of the missing data between MMRM and RBI, from MMRM on imputed data and formula (2).
•
Adjust the imputed values under MAR for those who drop out in the treatment group due to non-COVID reasons by subtracting the difference from the MAR imputation to yield the imputed values for RBI.

The SAS code for the modified RBI methods by MI is provided in Appendix A1.1.

2.2. Modified RBI by likelihood based analytic approach

The MI approach using Rubin's rules tends to be conservative to estimate the variance [15], [16]. Liu and Pang (2016) proposed an analytic likelihood based approach for the estimators and variances of the general RBI methods. Here we extend the framework of analytic approach for our modified RBI methods in presence of missing data due to COVID-19.

From our derived Eq. (4), the estimator can be obtained as following:

{\hat{β}}_{K}^{{RBI}_{m}} = \{\begin{matrix} {\hat{q}}_{K}^{T} ({\hat{μ}}_{K}^{T} - {\hat{μ}}_{K}^{C}), & J 2 R \\ {\hat{q}}_{K}^{T} ({\hat{μ}}_{K}^{T} - {\hat{μ}}_{K}^{C}) + \sum_{j = 1}^{K - 1} {\hat{q}}_{j} {[{\hat{Σ}}_{mo, j} {\hat{Σ}}_{oo, j}^{- 1} ({\hat{μ}}_{j}^{T} - {\hat{μ}}_{j}^{C})]}_{K}, & CR \\ \sum_{j = 1}^{K} {\hat{q}}_{j}^{T} ({\hat{μ}}_{j}^{T} - {\hat{μ}}_{j}^{C}), & CIR, \end{matrix})

(5)

where ${\hat{q}}_{j}^{T}$ is the estimator of $q_{j}^{T}$ by sample proportions in the treatment group T, and ${\hat{μ}}_{j}^{T}, {\hat{μ}}_{j}^{C}$ are least-square (LS) means of the treatment effects obtained from the MMRM.

The variance can be estimated as following:

Var ({\hat{β}}_{K}^{{RBI}_{m}}) = E [Var ({\hat{β}}_{K}^{{RBI}_{m}} | {\hat{q}}^{T})] + Var [E ({\hat{β}}_{K}^{{RBI}_{m}} | {\hat{q}}^{T})] .

(6)

The first term can be approximated by the estimated variance-covariance matrix of the parameters obtained from the MMRM, and the second term can be calculated using the point estimate of ${\hat{β}}_{K}^{{RBI}_{m}}$ and variance of the sample proportion ${\hat{q}}^{T}$ . The SAS code for the modified RBI methods by the analytic likelihood based approach is provided in Appendix A1.2.

3. Modified tipping point analysis to incorporate missingness due to COVID-19

The tipping point analysis has been suggested in the literature for analysis of missing data under MNAR [10]. It is a method of exploring the influence of missingness on the overall conclusion of the treatment difference by shifting imputed missing values in the treatment group towards the reference group until the result becomes non-significant. The “tipping point” is the minimum shift needed to make the result non-significant. In other words, the tipping point method is to assess how much the distribution of the missing values need to deviate from MAR assumption to make the primary analysis non-significant. This method provides more confidence about what it would take to change the study conclusion with different assumptions for missing data. A plausible tipping point to reverse the study conclusion indicates that the MAR assumption may not be reasonable.

Tipping point method is a special application of the delta-adjust method, but it is evaluated based on a series of shift parameters.

For the general tipping point method, it is straightforward to derive the true mean difference between groups at Visit K by a given single shift parameter δ as following:

β_{K}^{TIPg} = p_{K}^{T} (μ_{K}^{T} - μ_{K}^{C}) + (1 - p_{K}^{T}) (μ_{K}^{T} - μ_{K}^{C} + δ) = μ_{K}^{T} - μ_{K}^{C} + (1 - p_{K}^{T}) δ .

(7)

Recently, there is interest to shift both the treatment group and reference group for tipping point analysis from regulatory perspective [17]. Under the framework of 2-dimension of shift parameters, the true mean difference between groups at Visit K by given shifts δ ^Z, Z = T, C is derived as following [17]:

β_{K}^{TIP 2 g} = μ_{K}^{T} + (1 - p_{K}^{T}) δ^{T} - [μ_{K}^{C} + (1 - p_{K}^{C}) δ^{C}] .

(8)

It can be seen that the single-shift tipping point method is a special case of the 2-dimension tipping point method, when the shift parameter in the reference group is set to 0.

We propose a modified tipping point method, which imputes the missingness due to COVID-19 by MAR method and imputes the missingness due to non-COVID reasons by tipping point method. The true mean difference between groups at Visit K by a given single shift parameter δ is:

β_{K}^{TIPm} = q_{K}^{T} (μ_{K}^{T} - μ_{K}^{C}) + (1 - q_{K}^{T}) (μ_{K}^{T} - μ_{K}^{C} + δ) = μ_{K}^{T} - μ_{K}^{C} + (1 - q_{K}^{T}) δ .

(9)

Under the framework of 2-dimension of shift parameters, the true mean difference between groups at Visit K by given shifts δ ^Z, Z = T, C is:

β_{K}^{TIP 2 m} = μ_{K}^{T} + (1 - q_{K}^{T}) δ^{T} - [μ_{K}^{C} + (1 - q_{K}^{C}) δ^{C}] .

(10)

Since it is assumed that a larger value favors the treatment effect, then usually δ ≤ 0 is used toward the reference group. Therefore, for the tipping point method with a single shift parameter,

β_{K}^{TIPg} \leq β_{K}^{TIPm} .

Under the framework of 2-dimension of shift parameters, it is not straightforward to draw the same conclusion because the 2-shift mean difference (8) and (10) depends on 2 shift parameters and the probabilities of missingness in both groups.

3.1. Modified tipping point analysis by multiple imputation

Tipping point analysis can be implemented by SAS PROC MI with MNAR statement. For our modified tipping point analysis, we impute missingness due to COVID-19 by MAR methods and impute missingness due to non-COVID reasons by general tipping point method. The SAS code for the modified tipping point analysis by MI is provided in Appendix A2.

3.2. Modified tipping point analysis by likelihood based analytic approach

Here we propose an analytic likelihood based approach for our modified tipping point analysis to improve the estimated variance. For the tipping point method with a single shift parameter, the estimator can be obtained by

{\hat{β_{K}}}^{TIPm} = {\hat{μ}}_{K}^{T} - {\hat{μ}}_{K}^{C} + (1 - {\hat{q}}_{K}^{T}) δ .

(11)

Using the LS means and SEs from MMRM and the sample proportions for the missingness, the variance is derived as following:

Var ({\hat{β_{K}}}^{TIPm}) = Var ({\hat{μ}}_{K}^{T} - {\hat{μ}}_{K}^{C}) + \frac{{\hat{q}}_{K}^{T} (1 - {\hat{q}}_{K}^{T}) δ^{2}}{n^{T}} .

(12)

Under the framework of 2-dimension of shift parameters, the estimator can be obtained by [17]

{\hat{β_{K}}}^{TIP 2 m} = {\hat{μ}}_{K}^{T} + (1 - {\hat{q_{K}}}^{T}) δ^{T} - [{\hat{μ}}_{K}^{C} + (1 - {\hat{q}}_{K}^{C}) {\hat{δ}}^{C} .

(13)

Using the LS means and SEs obtained from MMRM and the sample proportions for the missingness, the variance is derived as following [17]:

Var ({\hat{β_{K}}}^{TIP 2 m}) = Var ({\hat{μ}}_{K}^{T} - {\hat{μ}}_{K}^{C}) + \frac{{\hat{q}}_{K}^{T} (1 - {\hat{q}}_{K}^{T}) {δ^{T}}^{2}}{n^{T}} + \frac{{\hat{q}}_{K}^{C} (1 - {\hat{q}}_{K}^{C}) {δ^{C}}^{2}}{n^{C}} .

(14)

The SAS code for the modified tipping point analysis by the analytic likelihood based method is provided in Appendix A1.2.

4. Simulations

The tipping point method is to find the shift point against the MAR assumption to make the results non-significant, so there is no need to run simulation. There is extensive simulation for RBI methods in the literature [15], [14] and the property of RBI methods are well known. The simulation for RBI methods using MI is time-consuming. Since our proposed modified RBI methods follow the same principal as the general RBI methods with adjustment for MAR missingness due to COVID-19, we conduct some simulation using J2R methods as a demonstration of the modified method and general method.

4.1. Settings

We conduct the following simulation studies with sample size N = 65 per arm with baseline and 6 post-baseline visits. The data are generated from a multivariate normal distribution with means in the control group and treatment group as follows

μ_{p} = (9.4, 10.4, 10.65, 10.9, 10.94, 10.97, 11.04)

and

μ_{d} = (9.4, 10.9, 11.4, 11.9, 12.15, 12.4, 12.9) .

The standard deviation for each visit is 2.9, and the covariance matrix used for data generation is

(\begin{matrix} 1 & 0.4 & 0.1 & 0.2 & 0.2 & 0.1 & 0.3 \\ 0.4 & 1 & 0.5 & 0.6 & 0.4 & 0.5 & 0.6 \\ 0.1 & 0.5 & 1 & 0.5 & 0.4 & 0.4 & 0.6 \\ 0.2 & 0.6 & 0.5 & 1 & 0.5 & 0.3 & 0.6 \\ 0.2 & 0.4 & 0.4 & 0.5 & 1 & 0.6 & 0.4 \\ 0.1 & 0.5 & 0.4 & 0.3 & 0.6 & 1 & 0.4 \\ 0.3 & 0.6 & 0.6 & 0.6 & 0.4 & 0.4 & 1 \end{matrix}) .

The probability for a subject to drop out at Visit j is generated by the following logistic model:

logit (P (drop out at j or Visit j is missing) | j > 0) = φ_{1} + φ_{2} Y_{j - 1} + φ_{3} Y_{j} .

(15)

In the logistic model (15), φ ₂ is the parameter associated with the observed values at last visit j − 1, φ ₃ is the parameter associated with the values at Visit j. When φ ₂ = 0, the probability of missingness of the Visit j value depends only on the unobserved value which is the MNAR mechanism. When φ ₃ = 0, the probability of missingness of the Visit j value depends only on the observed value which is the MAR mechanism. The following missing data scenarios with combined MAR and MNAR missingness are considered in the simulations:

•
Scenario 1: The total missing rate is 20% in each group; and among the missingness, 5% is MAR missingness which is assumed to be missing due to COVID-19, and the other is MNAR missingness.
•
Scenario 2: The total missing rate is 20% in treatment group and 30% in control group; and among the missingness, 5% is MAR missingness which is assumed to be missing due to COVID-19, and the other is MNAR missingness.
•
Scenario 3: The total missing rate is 30% in treatment group and 20% in control group; and among the missingness, 5% is MAR missingness which is assumed to be missing due to COVID-19, and the other is MNAR missingness.
•
Scenario 4: The total missing rate is 30% in each group; and among the missingness, 5% is MAR missingness which is assumed to be missing due to COVID-19, and the other is MNAR missingness.

Under each missing data scenario, the MMRM and the J2R methods using MI and the likelihood based analytic approach are evaluated by mean estimator, standard error (SE) and power. For the MI approaches, 50 imputations are used and the analysis results are combined by Rubin's rule. Specifically, we evaluate the J2R methods by the general J2R methods without COVID adjustment and by the modified J2R methods with adjustment COVID.

4.2. Results

The estimates of treatment difference, standard error (SE), and power are reported from the simulated data in Table 1 . The findings are:

•
MMRM method is unbiased and more powerful then the J2R method, which is consistent with the literature [15], [14].
•
The modified J2R Methods with adjustment for COVID generate larger treatment difference and are more powerful than the general J2R method without adjustment, by both MI and analytic approaches.
•
The Type I errors are also simulated using data setting μ _d = μ _p, and there is no inflation from all methods (the results are not reported here). This is consistent with the literature [15], [14].

Table 1.

Simulation results.

Scenario	Parameter	MMRM	General J2R		Modified J2R
			Analytic	MI	Analytic	MI
1	Mean	1.86	1.50	1.50	1.55	1.55
	SE	0.52	0.43	0.52	0.44	0.52
	Power (%)	95.1	95.0	87.6	95.0	89.4

2	Mean	1.86	1.49	1.48	1.55	1.55
	SE	0.52	0.43	0.52	0.44	0.52
	Power (%)	94.1	93.6	85.6	94.0	87.5

3	Mean	1.86	1.31	1.31	1.39	1.39
	SE	0.53	0.39	0.52	0.41	0.53
	Power (%)	93.0	92.4	74.7	93.2	80.5

4	Mean	1.86	1.30	1.30	1.41	1.41
	SE	0.54	0.39	0.52	0.42	0.54
	Power (%)	93.0	92.7	74.0	93.5	79.6

Open in a new tab

5. Illustration using a data example

We use an anti-depression trial to illustrate the proposed methods [18]. The dataset was provided in the online supporting information from a paper [19], and it was used to illustrate the proposed missing data methods in that paper. A total of 172 patients were randomized to the drug (n = 84) or placebo (n = 88) groups. The dataset includes Hamilton 17-item rating scale for depression (HAMD-17) at baseline, and Visits 4, 5, 6, and 7 after randomization. The primary endpoint we are interested in is the change from baseline at Visit 7. A lower change from baseline implies better effect here. About 25% of patients discontinued prior to Visit 7 in the drug and placebo groups.

We applied MMRM to the observed change from baseline in HAMD-17 as a benchmark for comparison. To demonstrate the proposed methods with missingness due to MAR reasons such as COVID-19, we generate a 10% random sample of patients out of those who discontinued prior to Visit 7 and assume their discontinuation is due to MAR reasons, while the total proportion of discontinuation is not changed.

The results of LS mean, SE and 95% confidence interval (CI) using the general RBI and the proposed modified RBI methods are reported in Table 2 . As seen from Table 2, for both general RBI and modified RBI methods, J2R approaches gave the most conservative estimates of treatment difference at Visit 7, and then CR and CIR. The estimated SEs from the analytic approaches are smaller than those from MI. The modified RBI methods gave larger estimates of the treatment difference.

Table 2.

Analysis of change from baseline in HAMD-17 at visit 7.

Parameter	MMRM	J2R		CR		CIR
		MI	Analytic	MI	Analytic	MI	Analytic
General RBI methods
LS mean	-2.80	-2.19	-2.13	-2.43	-2.37	-2.51	-2.45
SE	1.12	1.12	0.86	1.10	1.00	1.10	1.02
p-Value	0.006	0.025	0.007	0.014	0.009	0.011	0.008
95% CI	-5.01,-0.60	-4.39,-0.01	-3.84,-0.43	-4.59,-0.27	-4.36,-0.39	-4.67,-0.35	-4.47,-0.43

Modified RBI methods, 10% dropouts due to COVID-19
LS mean	-2.80	-2.46	-2.40	-2.59	-2.54	-2.63	-2.58
SE	1.12	1.11	0.97	1.10	1.05	1.10	1.06
p-Value	0.006	0.013	0.007	0.009	0.008	0.008	0.007
95% CI	-5.01,-0.60	-4.63,-0.29	-4.31,-0.49	-4.74,-0.43	-4.61,-0.46	-4.79,-0.47	-4.68,-0.48

Open in a new tab

LS: least-square; p-value: one-sided; SE: standard error; CI: confidence interval.

The results using the general tipping point analysis and the modified tipping point analysis methods by MI are reported in Table 3 . As discussed in Section 3, the single-shift tipping point method is a special case of the 2-dimension tipping point method by setting the shift parameter in the placebo group as 0. Because of this, we apply the 2-dimension tipping point method by a given shift parameter and set the shift parameter in the placebo group as 0. From the general tipping point method by setting the shift parameter in the placebo group as 0, the tipping point is between 2.0 and 2.5, which means the study conclusion under MAR is reversed when the shift parameter is between 2.0 and 2.5. Thus, the clinical plausibility needs to be assessed for the tipping point to assess the robustness of the MAR assumption. From the modified tipping point method, the tipping point is between 4.0 and 4.5, more implausible to reverse the study conclusion than the general tipping point method, which is reasonable because the missingness under MNAR is less severe by handling missingness due to COVID-19 using MAR methods.

Table 3.

Tipping point analysis from MI.

Shift 1	Shift 2	p-Value	Tipping
General tipping point methods
1.5	0	0.032	No
2.0	0	0.042	No
2.5	0	0.054	Yes
3.0	0	0.070	Yes
3.5	0	0.090	Yes

Modified tipping point methods
3.0	0	0.033	No
3.5	0	0.039	No
4.0	0	0.046	No
4.5	0	0.054	Yes
5.0	0	0.063	Yes

Open in a new tab

The results using the modified tipping point analysis by the analytic approach are consistent with those from the MI approach, which are reported in Table 4 .

Table 4.

Tipping point analysis from analytic methods.

Shift 1	Shift 2	p-Value	Tipping
General tipping point methods
1.5	0	0.030	No
2.0	0	0.040	No
2.5	0	0.051	Yes
3.0	0	0.065	Yes
3.5	0	0.083	Yes

Modified tipping point methods
3.0	0	0.036	No
3.5	0	0.042	No
4.0	0	0.050	No
4.5	0	0.058	Yes
5.0	0	0.067	Yes

Open in a new tab

6. Discussion

The COVID-19 pandemic has had substantial impact on planned and ongoing clinical trials since early 2020 and will continue to impact clinical trials. How to handle the ICEs associated with COVID-19 that result in missing data is critical for estimand definition, analytic methods, and results.

The methods for sensitivity analyses in the current literature do not differentiate ICEs due to COVID-19 or due to other reasons. In this article, we have proposed a combined hypothetical strategy for handling the ICEs resulting in missingness due to COVID-19 and other non-COVID reasons, and it is a reasonable strategy since missingness due to COVID-19 are usually assumed to be MAR. Under this strategy, we have proposed modified RBI methods and tipping point analysis method for sensitivity analyses which analyze the MAR missingness by MAR methods and MNAR missingess by MNAR methods.

The modified methods can mitigate the impact of missing data resulting from COVID-19, because it handles missingness due to COVID-19 by MAR methods and missingness due to other reasons by MNAR methods. We have proposed standard MI approaches for estimates and variances for the modified RBI methods and tipping point analysis. Since the MI approaches tend to overestimate the variance as discussed in Section 3, we have derived analytic likelihood based approaches to improve efficiency for estimating variances.

We have provided analytical and theoretical justifications of the proposed methods. The proposed methods have also been illustrated in an example of anti-depression trial data.

The proposed methods differentiate missing patterns by MAR missingness and MNAR missingness after treatment discontinuation instead of treating all missingness due to discontinuation as MNAR missingness. Utilizing different methods to deal with them is reasonable to align with the missing patterns. Missing data incorporating COVID-19 missingness is a special case for the proposed strategy, however, the proposed strategy is applicable to more scenarios with missing data after discontinuation combined with MAR missingness and MNAR missingness in practice.

When the methods are proposed for ongoing clinical trials, communication with regulatory authorities is recommended to reduce the impact of COVID-19 pandemic on ongoing clinical trials. In particular, given the different efficiency by the standard MI approach and the analytic likelihood based approach, it would be important to communicate with regulatory authorities about the specific approach proposed for handling missing data.

Disclosure

This manuscript was supported by AbbVie. AbbVie participated in the review and approval of the content. Man Jin and Weining Robieson are AbbVie employees, and Ran Liu is a former AbbVie employee. All authors may own AbbVie stock.

Declaration of Competing Interest

The authors declare no conflict of interest.

Appendix A. SAS code for modified RBI and modified tipping point analysis

SAS code for modified RBI and modified tipping point analysis by MI and analytic likelihood based approach is provided below, respectively.

A.1. Modified RBI by MI

For modified RBI by multiple imputation approach, we could follow SAS implementation for RBI adjustment by Tang (2017) but we need to change the missing data pattern due to COVID-19 to 4 (treated as completers because of MAR assumption).

Assume the dataset “indta” contain variables: patient (patient id), trt (treatment code, 1 for drug, 0 for placebo), basval (baseline value), change1, change2, change3, change4 (repeated measures after baseline), gender (1 for male, 0 for female), pattern (missing data pattern), and type (“completer” for complete case, “cvd” for dropout due to COVID-19, “non-cvd” for dropout due to non-COVID reasons).

A.2. Modified RBI and tipping point analysis by likelihood based approach

For modified RBI by likelihood based analytic approach, we could follow SAS implementation by Liu and Pang (2016), but we need to adjust the missing data pattern due to COVID-19 to 4 for the calculation of sample proportions of patterns (treated as completers because of MAR assumption). In addition, we added a term to the variance calculation for more accurate estimation of the variances.

For modified tipping point analysis by likelihood based analytic approach, we have developed the following code embedded with the modified RBI likelihood based analytic approach.

Inline graphic

A.3. Modified tipping point analysis by MI

Inline graphic

References

1.Jin M., Liu G. Estimand framework: delineating what to be estimated with clinical questions of interest in clinical trials. Contemp. Clin. Trials. 2020;96 doi: 10.1016/j.cct.2020.106093. [DOI] [PubMed] [Google Scholar]
2.Mallinckrodt C.H., Bell J., Liu G., Ratitch B., O’Kelly M., Lipkovich I., Singh P., Xu L., Molenberghs G. Aligning estimators with estimands in clinical trials: putting the ICH E9(R1) guidelines into practice. Ther. Innov. Regul. Sci. 2020;54:353–364. doi: 10.1007/s43441-019-00063-9. [DOI] [PubMed] [Google Scholar]
3.Meyer R.D., Ratitch B., et al. Statistical issues and recommendations for clinical trials conducted during the COVID-19 pandemic. Stat. Biopharm. Res. 2020;12:399–411. doi: 10.1080/19466315.2020.1779122. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.U.S. Food and Drug Administration . FDA; 2020. Statistical Considerations for Clinical Trials During the COVID-19 Public Health Emergency. [Google Scholar]
5.European Medicines Agency . EMA; 2020. Points to Consider on Implications of Coronavirus Disease (COVID-19) on Methodological Aspects of Ongoing Clinical Trials. [Google Scholar]
6.O’Kelly M., Ratitch B. Wiley; 2014. Clinical Trials With Missing Data: A Guide for Practitioners. [Google Scholar]
7.Little R., Rubin D. Wiley; 2002. Statistical Analysis With Missing Data. [Google Scholar]
8.Mallinckrodt C., Roger J., Chuang-stein C., Molenberghs G., Lane P., O’kelly M. Missing data: turning guidance into action. Stat. Biopharm. Res. 2013;5:369L 382. [Google Scholar]
9.Carpenter J., Roger J., Kenward M. Analysis of longitudinal trials with protocol deviation: a framework for relevant, accessible assumptions, and inference via multiple imputation. J. Biopharm. Stat. 2013;23:1352–1371. doi: 10.1080/10543406.2013.834911. [DOI] [PubMed] [Google Scholar]
10.Ratitch B., O’Kelly M., Tosiello R. Missing data in clinical trials: from clinical assumptions to statistical analysis using pattern mixture models. Pharm. Stat. 2013;16:337–347. doi: 10.1002/pst.1549. [DOI] [PubMed] [Google Scholar]
11.Little R. Pattern-mixture models for multivariate incomplete data. J. Am. Stat. Assoc. 1993;88:125–134. [Google Scholar]
12.Little R., Yau L. Intent-to-treat analysis for longitudinal studies with drop-outs. Biometrics. 1996;52:1324–1333. [PubMed] [Google Scholar]
13.Rubin D. Wiley; New York, NY: 1987. Multiple Imputation for Nonresponse in Surveys. [Google Scholar]
14.Tang Y. An efficient multiple imputation algorithm for control-based and delta-adjusted pattern mixture models using SAS. Stat. Biopharm. Res. 2017;9:116–125. [Google Scholar]
15.Liu G., Pang L. On analysis of longitudinal clinical trials with missing data using reference-based imputation. J. Biopharm. Stat. 2016;26:297–304. doi: 10.1080/10543406.2015.1094810. [DOI] [PubMed] [Google Scholar]
16.Mallinckrodt C., Molenberghs G., Lipkovich I., Ratitch B. CRC Press; 2019. Estimands, Estimators and Sensitivity Analysis in Clinical Trials. [Google Scholar]
17.Torres C. 2019. A Tipping Point Method to Evaluate Sensitivity to Potential Violations in Missing Data Assumptions.https://ww2.amstat.org/meetings/biop/2019/onlineprogram/AbstractDetails.cfm?AbstractID=301002 [Google Scholar]
18.Goldstein D.J., Lu Y., Detke M.J., Wiltse C., Mallinckrodt C., Demitrack M.A. Duloxetine in the treatment of depression: a double-blind placebo-controlled comparison with paroxetine. J. Clin. Psychopharmacol. 2004;24:389–399. doi: 10.1097/01.jcp.0000132448.65972.d9. [DOI] [PubMed] [Google Scholar]
19.Tang Y. A monotone data augmentation algorithm for multivariate nonnormal data: with applications to controlled imputations for longitudinal trials. Stat. Med. 2018;38:1715–1733. doi: 10.1002/sim.8062. [DOI] [PubMed] [Google Scholar]

[bib0005] 1.Jin M., Liu G. Estimand framework: delineating what to be estimated with clinical questions of interest in clinical trials. Contemp. Clin. Trials. 2020;96 doi: 10.1016/j.cct.2020.106093. [DOI] [PubMed] [Google Scholar]

[bib0010] 2.Mallinckrodt C.H., Bell J., Liu G., Ratitch B., O’Kelly M., Lipkovich I., Singh P., Xu L., Molenberghs G. Aligning estimators with estimands in clinical trials: putting the ICH E9(R1) guidelines into practice. Ther. Innov. Regul. Sci. 2020;54:353–364. doi: 10.1007/s43441-019-00063-9. [DOI] [PubMed] [Google Scholar]

[bib0015] 3.Meyer R.D., Ratitch B., et al. Statistical issues and recommendations for clinical trials conducted during the COVID-19 pandemic. Stat. Biopharm. Res. 2020;12:399–411. doi: 10.1080/19466315.2020.1779122. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0020] 4.U.S. Food and Drug Administration . FDA; 2020. Statistical Considerations for Clinical Trials During the COVID-19 Public Health Emergency. [Google Scholar]

[bib0025] 5.European Medicines Agency . EMA; 2020. Points to Consider on Implications of Coronavirus Disease (COVID-19) on Methodological Aspects of Ongoing Clinical Trials. [Google Scholar]

[bib0030] 6.O’Kelly M., Ratitch B. Wiley; 2014. Clinical Trials With Missing Data: A Guide for Practitioners. [Google Scholar]

[bib0035] 7.Little R., Rubin D. Wiley; 2002. Statistical Analysis With Missing Data. [Google Scholar]

[bib0040] 8.Mallinckrodt C., Roger J., Chuang-stein C., Molenberghs G., Lane P., O’kelly M. Missing data: turning guidance into action. Stat. Biopharm. Res. 2013;5:369L 382. [Google Scholar]

[bib0045] 9.Carpenter J., Roger J., Kenward M. Analysis of longitudinal trials with protocol deviation: a framework for relevant, accessible assumptions, and inference via multiple imputation. J. Biopharm. Stat. 2013;23:1352–1371. doi: 10.1080/10543406.2013.834911. [DOI] [PubMed] [Google Scholar]

[bib0050] 10.Ratitch B., O’Kelly M., Tosiello R. Missing data in clinical trials: from clinical assumptions to statistical analysis using pattern mixture models. Pharm. Stat. 2013;16:337–347. doi: 10.1002/pst.1549. [DOI] [PubMed] [Google Scholar]

[bib0055] 11.Little R. Pattern-mixture models for multivariate incomplete data. J. Am. Stat. Assoc. 1993;88:125–134. [Google Scholar]

[bib0060] 12.Little R., Yau L. Intent-to-treat analysis for longitudinal studies with drop-outs. Biometrics. 1996;52:1324–1333. [PubMed] [Google Scholar]

[bib0065] 13.Rubin D. Wiley; New York, NY: 1987. Multiple Imputation for Nonresponse in Surveys. [Google Scholar]

[bib0070] 14.Tang Y. An efficient multiple imputation algorithm for control-based and delta-adjusted pattern mixture models using SAS. Stat. Biopharm. Res. 2017;9:116–125. [Google Scholar]

[bib0075] 15.Liu G., Pang L. On analysis of longitudinal clinical trials with missing data using reference-based imputation. J. Biopharm. Stat. 2016;26:297–304. doi: 10.1080/10543406.2015.1094810. [DOI] [PubMed] [Google Scholar]

[bib0080] 16.Mallinckrodt C., Molenberghs G., Lipkovich I., Ratitch B. CRC Press; 2019. Estimands, Estimators and Sensitivity Analysis in Clinical Trials. [Google Scholar]

[bib0085] 17.Torres C. 2019. A Tipping Point Method to Evaluate Sensitivity to Potential Violations in Missing Data Assumptions.https://ww2.amstat.org/meetings/biop/2019/onlineprogram/AbstractDetails.cfm?AbstractID=301002 [Google Scholar]

[bib0090] 18.Goldstein D.J., Lu Y., Detke M.J., Wiltse C., Mallinckrodt C., Demitrack M.A. Duloxetine in the treatment of depression: a double-blind placebo-controlled comparison with paroxetine. J. Clin. Psychopharmacol. 2004;24:389–399. doi: 10.1097/01.jcp.0000132448.65972.d9. [DOI] [PubMed] [Google Scholar]

[bib0095] 19.Tang Y. A monotone data augmentation algorithm for multivariate nonnormal data: with applications to controlled imputations for longitudinal trials. Stat. Med. 2018;38:1715–1733. doi: 10.1002/sim.8062. [DOI] [PubMed] [Google Scholar]

PERMALINK

Modified reference based imputation and tipping point analysis in the presence of missing data due to COVID-19

Man Jin

Ran Liu

Weining Robieson

Abstract

1. Introduction