Abstract
Patients often discontinue from a clinical trial because their health condition is not improving or they cannot tolerate the assigned treatment. Consequently, the observed clinical outcomes in the trial are likely better on average than if every patient had completed the trial. If these differences between trial completers and non-completers cannot be explained by the observed data, then the study outcomes are missing not at random (MNAR). One way to overcome this problem—the trimmed means approach for missing data due to study discontinuation—sets missing values as the worst observed outcome and then trims away a fraction of the distribution from each treatment arm before calculating differences in treatment efficacy (Permutt T, Li F. Trimmed means for symptom trials with dropouts. Pharm Stat. 2017;16(1):20–28). In this paper, we derive sufficient and necessary conditions for when this approach can identify the average population treatment effect. Simulation studies show the trimmed means approach’s ability to effectively estimate treatment efficacy when data are MNAR and missingness due to study discontinuation is strongly associated with an unfavorable outcome, but trimmed means fail when data are missing at random. If the reasons for study discontinuation in a clinical trial are known, analysts can improve estimates with a combination of multiple imputation and the trimmed means approach when the assumptions of each hold. We compare the methodology to existing approaches using data from a clinical trial for chronic pain. An R package trim implements the method. When the assumptions are justifiable, using trimmed means can help identify treatment effects notwithstanding MNAR data.
Keywords: clinical trials, estimands, missing data, trimmed means
1 |. INTRODUCTION
Some clinical trialists have proposed using a new method for dealing with missing values due to study discontinuation (dropout) that utilizes the one-sided trimmed mean.1–5 This procedure is straightforward to implement in a comparative trial with a continuous outcome that may be missing due to dropout. First, address the missing data problem by imputing any missing outcomes due to dropout as the worst observed outcome in that treatment arm. Then, calculate the difference in X% one-sided trimmed means between the two arms. The percentage trimmed X% from the lower tail of the distribution can be pre-specified, but must be at least the maximum percentage of dropout observed between the two arms. The challenging aspect of this novel approach for handling missing data due to dropout lies in the interpretation of the resulting estimate. Especially given the recent attention of the pharmaceutical community in first carefully defining a trial’s estimand,6 for any new statistical approach it is imperative to pose the question: “Can the method target the estimand of interest?”
The ICH E9 (R1) addendum emphasizes the importance of addressing intercurrent events (ICEs) when defining an estimand of interest. ICEs are for example treatment discontinuation, use of rescue medication, or death, but not missing data (such as dropout). Although dropout itself is not an ICE, in most clinical trial settings patients which discontinue from the study also discontinue treatment—which is an ICE. Additionally, the reason for study discontinuation is often known in well conducted clinical trials, which allows us to refine the ICE of treatment discontinuation coinciding with dropout (e.g., due to AE, lack of efficacy, or administrative reasons). The guidelines describe five different strategies to take into account ICEs: treatment-policy, composite, hypothetical, principal stratum, and while-on-treatment.
Wang et al.2 point out that the trimmed means approach targets an estimand that is most similar to a composite estimand. Treatment discontinuation (coinciding with dropout) is incorporated in the variable definition—that is, the variable value for patients experiencing this ICE is set to the lowest observed value in their treatment arm. They concur with Permutt and Li1 in their original interpretation of this estimand, “The treatment effect estimated using this approach can be interpreted as the treatment effect among the X% best patients in each arm.” This interpretation may appear to suggest that the method estimates a principal stratum type estimand; however, this is not the case because the sub-population in the above interpretation is not defined by patients in a particular stratum of potential outcomes. The interpretation may also be reminiscent of per-protocol set (PPS) analyses where different sub-populations are compared for two treatment arms, and consequently PPS analyses do not describe a causal effect. However, the situation is different for the trimmed mean as the average is calculated for the same variable that is also used to define the subset. Hence as discussed in Section 2, a causal interpretation of the trimmed mean as a comparison of two marginal quantities is possible.
While closely intertwined, estimands and statistical methodologies for handling missing data are two distinct considerations.7 The estimand defines “what” treatment effect is to be estimated, while missing data methodology describes “how” to estimate that estimand. Because of this, a particular statistical method for dealing with missing data due to dropout is not automatically aligned with a particular estimand. Rather, the veracity of the assumptions of that missing data methodology determines what estimand is targeted; and inherently, no method for handling missing values is assumption-free. Therefore, there is no “trimmed means estimand” per se. Instead, given the assumption that one has the ability to rank outcomes, the approach can define the “best” X% outcomes in each arm of the study and compare the treatment effect between these patients. It may be difficult for clinicians to interpret such a unique estimand, as compared to the more familiar average treatment effect for all patients with a particular indication. By using a different set of assumptions, this paper extends the scope of estimands that are estimable using the trimmed means approach for handling missing data due to dropout. Besides missing data due to dropout, other types of missing data may occur (e.g., due to a missed visit or an inadequately handled blood sample). However, using Missing at Random (MAR) assumptions for such types of missing data are usually uncontroversial, and easily addressed by multiple imputation (MI) approaches. Therefore, such missing data will not be the main focus of this article.
Herein, we formalize assumptions that allow the trimmed means methodology to estimate the average treatment effect among all randomized patients. In the process, we demonstrate how the method can be an invaluable tool for estimation when data are missing not at random (MNAR). The rest of the paper proceeds as follows: Section 2 discusses the original target estimand and provides proofs for sufficient and necessary conditions under which the trimmed means approach can identify the population average treatment effect; Section 3 describes missing data due to dropout in the context of the trimmed means approach and extends its use to a combination with MI under MAR; Section 4 evaluates the finite sample properties of the approach in a simulation study under various missing data generating mechanisms for dropout; Section 5 provides recommendations on how to apply the approach in the context of a randomized clinical trial for comparing chronic pain medications; Section 6 concludes the article with a discussion.
2 |. TARGET ESTIMANDS
2.1 |. The original target estimand
We begin by formalizing the original composite target estimand of the trimmed means approach. We utilize the potential outcomes framework,8–12 which provides a clear mathematical way to define the variety of estimands described in the ICH E9 (R1).13 A potential outcome refers to the outcome a patient would have if he/she had taken a particular treatment. We will use the terms potential outcome and counterfactual interchangeably, as is often common practice.
Within this framework, missing data are not considered relevant when defining the causal estimand of interest; however, dropout (although not an ICE) coincides in almost all clinical trial settings with treatment discontinuation (an ICE). Herein, we consider treatment discontinuation coinciding with dropout as the only relevant ICE. This seems to also be view in Permutt and Li1: “the patient will discontinue the study drug and try something else.”. Now, treatment discontinuation for patients who would not drop out would be seen as a different ICE (for which, e.g., a treatment-policy strategy may be used). Alternatively, one could consider treatment discontinuations as the same type of ICE, regardless of whether coinciding with a dropout or not. Although this would not correspond to the original approach, the results below would still hold.
Consider counterfactual outcomes Y(a) where a ∈ {0,1} indicates potential treatment assignments. Next, A ∈ {0,1} is the binary indicator for observed treatment in the trial. Let R(a) denote a binary counterfactual indicating that taking treatment a would not result in the ICE of treatment discontinuation (coinciding with dropout). Denote U(a) as a composite counterfactual defined as:
The above corresponds to low values representing unfavorable outcomes, and is defined for the entire population of interest. If high values of the outcome reflected unfavorable values then if R(a) = 0 the value for the composite variable would be set to +∞. Note that this composite counterfactual implicitly makes the sole assumption of the trimmed means approach in its original presentation: that a patient with the ICE in the trial has a worse outcome than any of the observed data. One only need to be able to rank outcomes and no assumptions need to be made about the underlying counterfactual values Y(a). Using this counterfactual notation, the original target estimand of the trimmed means approach is:
Above, represents the inverse cdf of the counterfactual distributions of U(a) evaluated at α. As the two sub-populations and are typically distinct, this may superficially give the impression that this estimand is not causal. However, this estimand may also be written as the contrast of two marginal quantities, which shows that it has in fact a causal interpretation:
where we define the statistical functional for a cdf F; see also, Permutt and Li.1 Compared to a simple average, this composite estimand may be more challenging to interpret for clinicians and patients. Additionally, the above contrast actually represents an infinite set of estimands depending on the choice of trimming quantile α. Other robust population summary measures could also be considered, such as the median, a specific quantile or the two-sided trimmed mean.
2.2 |. Equivalence to the population treatment effect
Herein, we reframe the trimmed means approach in order to extend the various types of target estimands estimable using the novel methodology. The estimand for average treatment efficacy in the population from which all randomized patients are drawn is the difference in counterfactual means E[Y(1)] – E[Y(0)]. In the following theorem, the sufficient conditions under which the original trimmed means estimand and the average treatment effect in the population are identical are formalized. Again the ICE of interest, R(a), is treatment discontinuation coinciding with dropout, which would result in unobserved counterfactual outcomes Y(a).
Theorem 1. If the counterfactual outcomes when treated and untreated are identically distributed relative to a shift and all counterfactual outcomes with the ICE fall below the trimming quantile, then the treatment difference estimated by the trimmed means approach is equivalent to the treatment difference in the population, that is:
The proof is given in Appendix.
Theorem 1 uses the following two conditions in order to prove the equality of the original estimand targeted using the trimmed means approach and treatment difference in the whole population.
1. Location family.
The distribution of potential outcomes had the patient taken the experimental treatment Y(1) ~ f1(y) is in the same location family as the distribution of potential outcomes had the patient taken the reference treatment Y(0) ~ f0(y). Consider some constant Δ then:
2. Quantile Missing Not At Random (QMNAR).
All counterfactuals with the ICE fall below the quantile at which the distributions are trimmed. Explicitly, the QMNAR assumption states:
Here is the inverse cdf of the counterfactual distributions of Y(a) evaluated at α. The QMNAR assumption ensures that the distribution of the composite counterfactual outcome U(a) is identical to that of Y(a) for all quantiles above α. The QMNAR assumption does not state that patients with counterfactual values below the quantile all deterministically have the ICE; there very well may be patients with values below this quantile and no ICE, and the theorem will still hold. Nor does it state that the counterfactual values of those with the ICE are necessarily the worst outcomes of the distribution. Note that using Theorem 1 only the difference in the counterfactual means can be recovered; one cannot accurately estimate the marginal means of Y(1) and Y(0) in the presence of MNAR data due to dropout.
Theorem 2. If the difference in untrimmed means between two distributions is equivalent to the difference in one sided trimmed means for all percentiles then the two distributions are a location shift of one another, that is, if
then,
Proof is given in Appendix.
Theorem 2 reveals that the location family assumption is a sufficient and necessary condition for the equivalence of the estimands, while the QMNAR assumption is only a sufficient condition. For all possible α, the difference in trimmed means and the population mean are equivalent if and only if the distributions being compared are a location shift of one another. There are conditions where the difference in trimmed means and population means are equivalent when the QMNAR assumption is not true (i.e., the Missing Completely at Random [MCAR] case). Theorem 2 demonstrates that using the trimmed means approach to estimate the population average treatment effect is relevant solely for treatments with an additive effect.
2.3 |. Intercurrent events
Theorem 1 extends the utility of the trimmed means approach by proving under what assumptions one can estimate the estimand representing the average treatment effect. By doing so, we demonstrate that this methodology is not aligned with a particular estimand. Rather, the approach can estimate different estimands depending on the assumptions one is willing to make about the potential outcomes characterizing the estimand of interest.
We discuss here some estimands—beyond the original composite one—that the trimmed means approach could estimate corresponding to different strategies for handling ICEs: treatment policy, hypothetical, and principal stratum. Again, the only relevant ICE considered here is treatment discontinuation coinciding with dropout. Note that the binary counterfactual R(a) is a mediator on the causal pathway from A to Y(a). Therefore, we discuss counterfactuals such as Y(a, R(a)), the counterfactual when a patient takes treatment a and then R(a) occurs downstream as a result. This is, however, different from counterfactual Y(a = 1, r = 0), which denotes the experimental treatment was given (i.e., a = 1) and the ICE was intervened on and prevented (i.e., r = 0).
2.3.1 |. Treatment policy
In the treatment policy strategy, the ICE is irrelevant to the treatment effect of interest. To see this, we formulate the treatment policy estimand (ΔITT) in counterfactual notation:
Above, the ICE, R(a), is not intervened on, which allows us to collapse Y(a, R(a)) = Y(a). This estimand represents the total effect of treatment.14
Due to an ICE being considered irrelevant to this estimand, endpoint data are always of interest even after an ICE. When these outcomes are missing due to dropout, it is still possible to identify the treatment policy estimand of interest using trimmed means if the assumptions of Theorem 1 hold—that is, the distributions of Y(a = 1) and Y(a = 0) are a location shift and for all a ∈ {0,1}. In the treatment policy or Intention To Treat (ITT) context, the QMNAR assumption may be plausible, especially if the patient’s clinical outcome measurement moves back towards baseline after treatment discontinuation. This is often the case with symptomatic treatments, for example.
2.3.2 |. Hypothetical
The hypothetical strategy for dealing with the ICE postulates what would have happened had the ICE been different in some way from what was observed in the trial. An example of one such hypothetical target estimand considers what would have happened had the ICE not occurred and the patient remained on assigned treatment for the duration of the trial. In this case, even if post-ICE data are collected, it is discarded. This hypothetical estimand in counterfactual notation is:
If the counterfactual distributions Y(a = 1, r = 0) and Y(a = 0, r = 0) are a location shift of one another and for all a ∈ {0,1}. then by Theorem 1, the trimmed means approach can estimate this hypothetical estimand using the observed data. In the hypothetical estimand corresponding to full treatment adherence, the location family assumption seems plausible when the treatment has an additive effect.
2.3.3 |. Principal stratum
A principal stratum type estimand estimates the effect of treatment in a particular stratum of potential intercurrent outcomes. For instance, the principal stratum estimand for the treatment effect in those patients who would adhere to the experimental treatment if assigned to it is:
Again, should the assumptions of Theorem 1 hold for these potential outcomes of interest, that is, Y(a = 1) | R(a = 1) = 1 and Y(a = 0) | R(a = 1) = 1 are identically distributed relative to a shift and for all a ∈ {0,1}, then we can identify this principal stratum estimand using the trimmed means. Missing data due to dropout aside, principal stratum estimands come with their own set of untestable assumptions.15
3 |. COMBINING TRIMMED MEANS WITH MI
We now focus our attention to the observed data. In well conducted clinical trials, the reason for study discontinuation (and hence treatment discontinuation) is collected for each patient who drops out of the study. Treating all types of treatment discontinuation (coinciding with dropout) as poor outcomes and ranking them at the low end of the distribution, as the trimmed means approach does, may not be an appropriate strategy for all missing data. Knowing the reason for dropping out of a study, and using that information, should lead to more precise analyses. To that end, consider an expanded missing data indicator:
Assume the complete data of Y are partitioned into the observed and missing components as follows which denote the observed, MAR, and MNAR components of Y, respectively.
Here, we propose imputing and trimming . We can perform MI of when the conditional distribution is a valid imputation model given the MAR assumption. From here onward, when we use the shorthand MI we are referring to this most commonly applied MI that assumes MAR. for Here denotes the observed outcomes, A is the assigned treatment, and X is a matrix of auxiliary covariates that may or may not be available and of use for imputations. Using this conditional distribution, one can draw k samples for the MAR and MCAR data to derive a set of data that is now complete for where missing values remain. Let denote the trimmed means statistic given that complete data on were available. Note that we do not need to observe since the trimmed means approach will trim these observations out of the analysis. MI relies on the asymptotically normal distribution of , which applies to the trimmed mean in the one sided case.16 Since data on are missing, the imputed data are utilized to calculate trimmed means estimates of the form for the k imputed datasets. Lastly, we Rubin’s rules17 to summarize the results of the trimmed means applied to each partially imputed dataset,
where σ(ℓ) is the estimated standard error of the trimmed means estimate in the ℓth imputed dataset.
This combination approach would not be possible without Theorem 1 because it demonstrates that trimmed means and imputation can estimate the same estimand, albeit under different assumptions about the missing data generating mechanism. Therefore, the combination of MI and trimmed means is only valid if the conditions of Theorem 1 hold.
4 |. SIMULATION STUDIES
4.1 |. Simulation objectives
Numerical studies herein evaluate the finite sample properties of the trimmed means approach in estimating treatment efficacy under various missing data generating mechanisms. The simulation presented is motivated by the design of Wang et al.2 This earlier work is extended in a number of ways. Firstly, a comparison to MI as well as Jump to Reference imputation under various missing data generating mechanisms is explored. For MI, data are imputed by fully conditional specification using the mice package in R.18 We chose to implement the Jump to Reference approach of Carpenter et al.19 due to its popularity, but acknowledge there exists a more powerful alternative.20 Additionally, when there exists a mixture of missing data types the combination approach outlined in Section 3 is evaluated and compared to applying the trimmed means approach and MI globally. Furthermore, the relationship between bias and consistency with the QMNAR assumption of Theorem 1 is considered under different MNAR scenarios. Lastly, we explore sensitivity to the location family assumption.
4.2 |. Simulation design
We design the study using four different ways to generate the missing data: (a) MCAR, (b) MAR, (c) MNAR, and (d) a mixture of all three types.
We use a study sample size of N = 100 (n = 50 per treatment arm) in each of the four scenarios. Each scenario was replicated K = 5000 times. The proportion trimmed (α) was chosen adaptively, unless stated otherwise. The upper part of the distribution is trimmed, corresponding to lower values reflecting better outcomes. The underlying model for the continuous outcome remains the same in all simulations:
Y is the continuous outcome variable and A is the binary variable representing experimental treatment if 1 and reference treatment if 0. Here, the error term is normally distributed ϵ ~ N(0, σ2). The goal is to estimate the difference of the means between treatments, E[Y(1)] – E[Y(0)], which is equal to βA for the outcome model used for simulation. In all scenarios, the values of the parameters for the outcome model are β0 = −1, βA = −1, σ = 1.5. We chose σ = 1.5 to obtain a benchmark ~90% power in a one-sided t-test when there is no missing data.
The missing data in outcome Y were generated via the following logit model:
where RY is the binary variable indicating that Y has been observed if equal to 1. In this model, setting parameters γA = γY = 0 correspond to MCAR because the missing values are unrelated to treatment or outcome. Setting only parameter γY = 0 corresponds to MAR because the missing values are only dependent on the observed values and not the unobserved outcome. If γY ≠ 0 then the model represents an MNAR missing data generating mechanism.
4.3 |. Simulation results
4.3.1 |. Missing not at random
Due to the theoretical results of Section 2, our main focus lies in the MNAR situation, which we present first. For this scenario, the γY parameter was set to values of −1, −2.5, −5, and −7.5 causing higher values of Y to be more likely to be missing while keeping γ0 = 2.85 and γA = 0. Here, γY is negative because a decrease in Y reflects a better outcome. This setup induces missing data rates in the experimental versus reference arms of 2 versus 5, 3 versus 10, 5 versus 15, and 7 versus 20, respectively. These missing data are not simulated strictly QMNAR but a general MNAR missing data mechanism.
The trimmed means approach is fairly unbiased, obtains ideal coverage, and maintains its power in the MNAR setup as the amount of missing data increases. While the marginal means in each arm are biased, the means in each arm increase at similar rates resulting in an accurate estimate of their difference. MI increases bias, reduces coverage of the true effect, and loses power as the fraction of MNAR data increases. Jump to Reference is even more biased towards the null than MI, as it is a conservative approach (Table 1).
TABLE 1.
MNAR simulation results
Missing rate, % | MNAR operating characteristics | ||||||
---|---|---|---|---|---|---|---|
Method | A=1 | A = 0 | Exp | Ref | Diff (% bias) | Coverage | Power |
Trimmed means | 2 | 5 | −2.14 | −1.11 | −1.03 (3%) | 0.96 | 0.90 |
3 | 10 | −2.27 | −1.26 | −1.01 (1%) | 0.96 | 0.90 | |
5 | 15 | −2.41 | −1.41 | −1.00 (0%) | 0.96 | 0.89 | |
7 | 20 | −2.48 | −1.48 | −1.00 (0%) | 0.96 | 0.90 | |
Multiple imputation | 2 | 5 | −2.04 | −1.10 | −0.94 (6%) | 0.94 | 0.88 |
3 | 10 | −2.09 | −1.26 | −0.83 (17%) | 0.90 | 0.83 | |
5 | 15 | −2.15 | −1.41 | −0.74 (26%) | 0.83 | 0.76 | |
7 | 20 | −2.18 | −1.48 | −0.70 (30%) | 0.80 | 0.74 | |
Jump to reference | 2 | 5 | −1.13 | −1.99 | −0.92 (8%) | 0.95 | 0.87 |
3 | 10 | −1.26 | −2.06 | −0.80 (20%) | 0.90 | 0.81 | |
5 | 15 | −1.41 | −2.12 | −0.71 (29%) | 0.82 | 0.73 | |
7 | 20 | −1.48 | −2.15 | −0.67 (33%) | 0.78 | 0.70 |
4.3.2 |. Missing at random
In the MAR setting, we first set the γA parameter to values of −8.61, −8.27, −7.80, and −7.06 to induce missing data rates of 20%, 15%, 10%, and 5% in the experimental arm while keeping γY = 0 and γ0 = 10 in order to maintain all outcomes observed in the reference arm. Next, we set γY = 0 and γA = 10 in order to fully observe outcomes in the experimental arm while varying γ0 to values of 2.94, 2.20, 1.73, and 1.39 to induce missing data rates of 5%, 10%, 15%, and 20% in the reference arm.
As expected, the trimmed means estimator is biased in all scenarios when the missing data is truly MAR. The bias increases when the fraction of missing data increases. The direction of the bias is positive when the placebo arm has more missing data and negative when the active arm has more missing data. The direction of the bias has an impact on power, with more MAR data in the active arm leading to a drastic decrease in power and more MAR data in the placebo arm causing unreasonably high power. The trimming is directional as all missing values are placed at the poor end of each respective treatment distribution when in reality under MAR they come from all areas of the distribution. MI obtains valid estimation in this setting as it was designed explicitly for situations where data are MAR. Jump to reference provides valid estimates when MAR data are restricted to the placebo arm, but biased with MAR data in the experimental arm. These results are presented in Appendix (Table A1).
4.3.3 |. Missing completely at random
In the MCAR setting the γ0 parameter is set to values of 2.94, 2.20, 1.74, and 1.39 to induce missing data rates of 5%, 10%, 15%, and 20% while keeping γA = γY = 0. The missing data rates are the same on average in each arm since unobserved outcomes are completely random.
Under a completely random missing data generating mechanism (MCAR), the trimmed means approach estimates the true treatment difference without bias and with appropriate coverage even as the proportion of data missing varies. As expected, power decreases as the amount of data not trimmed decreases. MI performed similarly in that bias and coverage were accurate. However, as the amount of missing data increases, power does not deteriorate as quickly using MI than when using trimmed means. This is because the trimmed means approach essentially performs inference on the subset of the observations post-trimming and thus uses a smaller effective sample size. Jump to reference is biased in all scenarios as it incorrectly imputes the missing data in the experimental arm under the placebo distribution. Results can be found in Appendix (Table A2).
4.3.4 |. Mixture: MCAR, MAR, and MNAR
Having a mixture of reasons for missing data reflects the information one would have in a closely monitored clinical trial. In many trials, data are missing for a combination of reasons such as lack of efficacy, intolerability, and administrative reasons. In order to generate such data the deletion strategies used in the previous three sections are combined. MNAR data (R3) were deleted first at rates of 2 versus 5, 3 versus 10, 5 versus 15, and 7 versus 20 in the experimental versus reference arms, respectively. MAR data (R2) were then generated in the experimental group at rates of 23, 17, 10, and 3. MCAR data (R1) were generated at a rate of 5% in each arm. Overall, the missing data rates in the four mixture scenarios in the experimental versus reference arms are 10 versus 30, 15 versus 25, 20 versus 20, and 15 versus 25, respectively.
In this mixture setting, the combination of trimmed means and MI exhibits improved bias, coverage, and power as compared to each method applied individually no matter the different fractions of MAR and MNAR data. Coverage is always near the target of 0.95. A slight reduction in power from the optimal 0.90 is proportional to the amount of MAR data in each scenario because additional uncertainty is propagated from the imputations. Across the four scenarios, the observed absolute percent bias for the combination of MI and trimmed means was lower at 4%, 0%, 2%, and 3% (Table 2) versus 52%, 40%, 22%, and 7% for trimmed means alone (Table A3) and 6%, 18%, 27%, and 30% for MI alone (Table A4). As the fraction of MNAR data increases across the four scenarios bias, coverage, and power of trimmed means applied globally improve. Contrarily, MI applied globally performs well with a large fraction of MAR data and its performance weakens as the proportion of MNAR data increases.
TABLE 2.
Trimmed means + MI with a mixture of missing data types
Missing rate, % | Trimmed means + MI | ||||||||
---|---|---|---|---|---|---|---|---|---|
Trt | R1 | R2 | R3 | Overall | Exp | Ref | Diff (% bias) | Coverage | Power |
A = 1 | 5 | 23 | 2 | 30 | −2.15 | −1.11 | −1.04 (4%) | 0.95 | 0.83 |
A = 0 | 5 | 0 | 5 | 10 | |||||
A = 1 | 5 | 17 | 3 | 25 | −2.27 | −1.26 | −1.00 (0%) | 0.95 | 0.84 |
A = 0 | 5 | 0 | 10 | 15 | |||||
A = 1 | 5 | 10 | 5 | 20 | −2.40 | −1.41 | −0.98 (2%) | 0.95 | 0.85 |
A = 0 | 5 | 0 | 15 | 20 | |||||
A = 1 | 5 | 3 | 7 | 15 | −2.45 | −1.48 | −0.97 (3%) | 0.95 | 0.86 |
A = 0 | 5 | 0 | 20 | 25 |
4.3.5 |. Sensitivity to assumptions of Theorem 1
We explored sensitivity to the assumptions of Theorem 1 (i.e., QMNAR and location family). The results are presented in Appendix. In brief, they are consistent with the theory presented in Section 2. That is, as the percentage of missing data truly falling below the trimming quantile increases, the less biased the estimator is (Table A5). For the simulations regarding the QMNAR assumption, we also varied the choice of α. This reveals a bias variance trade off as trimming more data leads to a higher % of missing values that fall below the trim point, but by trimming more data leads to a reduction in efficiency. The estimator is fairly robust to slight deviations of the location family assumption; however, the more dissimilar the treated and untreated distributions, the more the performance deteriorates (Tables A6 and A8).
5 |. APPLICATION TO A CLINICAL TRIAL
We applied the methodologies described above to data from a double-blind randomized clinical trial of two treatments (A and B) conducted in patients with neuropathic pain due to diabetic neuropathy. Seventy-one patients were randomized to treatment A and 70 to treatment B. The outcome of interest was change in pain severity from baseline to week 16, as assessed on a Visual Analog Scale (VAS). VAS is a well-studied instrument for recording pain where a score of 100 reflects the “worst pain possible” and a score of 0 reflects “no pain”.21 Pain scores were recorded in a digital diary daily by patients. At most, there were 16 weekly pain measurements for each patient, produced by averaging daily pain recordings during each week.
Study dropout, and subsequent treatment discontinuation, in the study was common, as there were 53 (38%) patients who did not stay on trial for 16 weeks. Discontinuation differed among treatment arms, 33 (46%) in the treatment A arm and 20 (29%) in the treatment B arm. The reason for discontinuing the study was recorded for each dropout and categorized as Adverse Event (AE), Loss of Efficacy (LoE), or Administrative (Table 3). The rates of study discontinuation, the time at which they occurred, and the observed data before dropout were used to inform missing data assumptions. AE and LoE generally occurred during the first half of the study period while administrative dropout occurred uniformly throughout the trial. On average AE and LoE occurred after 6.77 and 6.83 weeks on trial, respectively, and administrative dropouts after 10.6 weeks. Based on this exploratory data analysis and our clinical knowledge, we assume that the dropouts classified as AE and LoE were MNAR and administrative dropouts were MCAR when targeting a treatment policy estimand. In addition, no off treatment data were retrieved for the purposes of this analysis.
TABLE 3.
Treatment discontinuation reasons
Dropout type | Treatment A | Treatment B |
---|---|---|
Adverse events | 18 | 4 |
Loss of efficacy | 3 | 3 |
Administrative | 12 | 14 |
Total | 33 | 20 |
We applied six different methods to the trial data. First, the trimmed means approach was applied globally to all dropouts (i.e., assumes all dropouts MNAR). The fraction trimmed was chosen adaptively and thus corresponded to the amount of dropout in the treatment A arm (i.e., α = 0.46). To test the location shift assumption, we performed a Kolmogorov–Smirnov test between the distribution of treatment A shifted by the treatment effect compared to the observed distribution of treatment B. The test failed to reject that the untrimmed outcome distributions were a location shift of one another (D = 0.0946, p = 0.9849). Note that this test does not confirm the untestable assumption; rather, it demonstrates there is no strong evidence to the contrary. We also applied MI to all dropouts in an Analysis of Variance model despite the MAR assumption being unlikely for many dropouts. Next, we applied the approach that combines trimmed means and MI. To do this, we trimmed AE and LoE (MNAR) and imputed administrative dropout data (MCAR). As a consequence, the fraction trimmed was reduced to α = 0.30 in each of the imputed data sets. Lastly, we applied three more approaches for historical reference, a complete case analysis of all patients completing the trial (i.e., assumes all dropouts MCAR), a Jump to Reference imputation, and a Last Observation Carried Forward (LOCF) analysis.
Table 4 contains the results of each of these approaches. The trimmed means applied to all dropouts showed the largest treatment difference of −14.48 points lower on the pain VAS; however, the trimming inflated the standard error to 7.61. Similarly, the combination of trimmed means and MI had the second largest effect size −12.67. Contrarily, the combination approach trimmed less data which resulted in a less inflated standard error. These two methods, which involve trimming, resulted in a larger effect size than the other methods because they account for the fact that the worse performing treatment had a higher dropout rate. The other approaches presented are not appropriate given our missing data assumptions, but were included for illustrative purposes as a reference, especially the relative standard errors.
TABLE 4.
Clinical trial analysis results
Method | Pain difference | Standard error | 95% CI | p-Value |
---|---|---|---|---|
Trimmed means | −14.48 | 7.61 | [−29.38, 0.43] | 0.055 |
Trimmed means + MI | −12.67 | 6.21 | [−24.83, −0.49] | 0.041 |
Complete case analysis | −3.74 | 5.39 | [−14.45, 6.97] | 0.497 |
Multiple imputation | −3.55 | 4.92 | [−13.20, 6.09] | 0.470 |
Jump to reference | −2.61 | 4.98 | [−12.37, 7.15] | 0.601 |
LOCF | −1.76 | 4.20 | [−10.06, 6.54] | 0.675 |
Focusing on the standard error column, we see that LOCF has the smallest standard error, 4.20, as expected since the method does not admit to missing data; it replaces all missing data with the last available data point, in time, for each patient based on a single imputation. The next smallest entry, 4.92, MI, treats all missing data as MAR, which may not be plausible for all discontinuation reasons in this trial. Once again, the method creates data whenever they are missing, except not once, as in LOCF, but multiple times in order to better reflect the uncertainty induced by the missing data. As Jump to Reference is also an imputation method, it exhibited a similar standard error of 4.98. The complete case analysis has the next smallest value, 5.39. This was only included for completeness. In general, the shortcomings of this method are well known. The next smallest is the Trimmed Means + MI, 6.21. This approach achieves an unbiased comparison by maintaining an “equal percentage” of the data for those who prematurely leave the study for cause. This method also permits distinguishing between observations that are truly MAR and other observations for which the MAR assumption is not plausible (i.e., MNAR). The last one, the Trimmed Mean method has the largest standard error, 7.61. This is easily explained because this is the method that essentially discards the most data. This is excessive in that the fraction of the missing data that are MAR are best handled as MAR and thus amenable to MI.
6 |. DISCUSSION
Our work extends the utility of the trimmed means approach for missing data due to dropout in two key ways: (1) It determines sufficient conditions for which the trimmed means approach can identify the estimand for the average population treatment effect; and (2) It demonstrates that when different types of missing data due to dropout are present and can be distinguished, one could combine the trimmed means approach with MI to improve estimation.
The trimmed means approach was originally designed to estimate a unique estimand: the average treatment difference in the best (100 × α)% of patients of each arm. The work herein allows us to view the trimmed means approach in a different way, not a method estimating a unique estimand, but a method that targets the more familiar estimand of the average treatment effect in the population, where accuracy depends on how well the assumptions are satisfied.
Careful consideration should be made to specify missing data assumptions that are plausible given the counterfactual values that define the estimand of clinical interest. Again, choosing the estimand precedes choosing a statistical method for handling missing data. The assumptions for identifying the difference in population means using this approach may not be realistic in certain contexts when interested in certain estimands. For example, consider treatment discontinuation that coincides with use of rescue medication where a treatment policy estimand is of interest. In those settings, for patients with missing data after the ICE of rescue medication, you may expect patient outcomes to improve relative to what they were assigned, which is not consistent with the QMNAR assumption. Another scenario is when the estimand of interest is a treatment policy one and patients from both arms discontinue treatment and return to baseline. This mixture of distributions may satisfy QMNAR in this case, but would not be a location shift unless there is no treatment effect. One ideal scenario for targeting the population treatment effect using trimmed means would be in the context of a hypothetical estimand where the treatment has an additive effect and worse performing patients drop out. Another is in the context of a treatment policy estimand where patients off treatment regress towards their baseline measurements by similar magnitudes. Appropriately applying this methodology will depend heavily on clinical input that can provide scientific justification for these assumptions in particular trials. In situations where the location family and QMNAR assumptions are unlikely, it may still be justifiable to use trimmed means to target a composite estimand, if that is of scientific interest.
Missing data inferences are not possible without assumptions and it is crucial to make our assumptions explicit. The QMNAR assumption, like the MAR assumption, is untestable. It would be rare for this assumption to hold perfectly; however, when applied to dropouts where poor outcomes are believed to be the primary cause of dropout, the assumption may hold for enough of the missing data to justify adopting the trimmed means approach. We should also note that when both location family and QMNAR hold, then difference in quantiles or quantile regressions can also target the population treatment effect. This approach can target the same estimand, but may have different statistical properties that could be the subject of future research.
One paradox is that while trimming more data leads to a loss in efficiency, theoretically it allows the QMNAR assumption to become more plausible since missing values are then more likely to be trimmed. This bias/variance trade off should be considered when choosing the value of α. In the clinical trial analysis, applying trimmed means to all dropouts produced a larger estimate of the difference in treatment effects, but because 46% of observations were missing in one arm of the study the standard errors were inflated. The combination approach, however, preserved a similar estimate of the treatment comparison and did not inflate standard errors as drastically. The combination approach leverages a larger effective sample size than applying trimmed means alone.
The trimmed means approach is a creative solution to estimating treatment effects in a clinical trial when missing data due to dropout can safely be assumed to be due to poor outcomes. As is the case in any missing data analysis, especially those with MNAR data, no analytical method replaces a good sensitivity analysis to determine the plausible range of what could have happened. While no method can fully or confidently rectify the issues caused by missing data, a combination of MI and/or trimmed means could be useful when the assumptions of both methods are plausible.
Supplementary Material
ACKNOWLEDGMENTS
We thank Jaffer Zaidi and Zack McCaw for useful discussions regarding the proofs of Theorems 1 and 2, respectively. We would also like to thank two anonymous reviewers for their detailed comments which helped to considerably improve this article. Alex Ocampo was supported by NIH-5T32AI007358.
Funding information
National Institutes of Health, Grant/Award Number: 5T32AI007358
Footnotes
CONFLICT OF INTEREST
The authors have no conflicts of interest to declare that are relevant to the content of this article.
SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of this article.
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available on request from the corresponding author, [AO]. The simulation data and code will be publicly available. The clinical trial data are restricted for patient privacy.
REFERENCES
- 1.Permutt T, Li F. Trimmed means for symptom trials with dropouts. Pharm Stat. 2017;16(1):20–28. [DOI] [PubMed] [Google Scholar]
- 2.Wang M, Liu J, Molenberghs G, Mallinckrodt CH. An evaluation of the trimmed mean approach in clinical trials with dropout. Pharm Stat. 2018;17(3):278–289. [DOI] [PubMed] [Google Scholar]
- 3.Center for Drug Evaluation and Research. Statistical review and evaluation: glycemic control in adults with T2DM. Silver Spring, Maryland: Food and Drug Administration, US Dept of Health and Human Services; 2008. Accessed September 10, 2020. [Google Scholar]
- 4.Tang F, Kardatzke D, Burger HU. Trimmed mean to handle missing/meaningless outcomes – a recommendation from FDA. Slides Presented at: 3rd EFSPI Workshop on Regulatory Statistics; September 25th, 2018; Basel, Switzerland. [Google Scholar]
- 5.Liu GF, Liu F, Mehrotra DV. Model averaging using likelihoods that reflect poor outcomes for clinical trial dropouts. Stat Biopharm Res. 2020;12(1):79–89. [Google Scholar]
- 6.Committee for Human Medicinal Products. ICH E9 (R1) addendum on estimands and sensitivity analysis in clinical trials to the guideline on statistical principles for clinical trials, Step 2b. London: European Medicines Evaluation Agency; 2017. Accessed June 15, 2018. [Google Scholar]
- 7.Akacha M, Bretz F, Ruberg S. Estimands in clinical trials—broadening the perspective. Stat Med. 2017;36(1):5–19. [DOI] [PubMed] [Google Scholar]
- 8.Neyman J On the application of probability theory to agricultural experiments. Essay on principles section 9. Stat Sci. 1923;5:465–480. [Google Scholar]
- 9.Rubin DB. Bayesian inference for causal effects: the role of randomization. Ann Stat. 1978;6:34–58. [Google Scholar]
- 10.Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66(5):688–701. [Google Scholar]
- 11.Robins JM. A new approach to causal inference in mortality studies with sustained exposure periods – application to control of the healthy worker survivor effect. Math Model. 1986;7(9–12):1393–1512. [Google Scholar]
- 12.Holland PW. Statistics and causal inference (with discussion). JASA. 1986;81(396):945–970. [Google Scholar]
- 13.Lipkovich I, Ratitch B, Mallinckrodt CH. Causal inference and estimands in clinical trials. Stat Biopharm Res. 2020;12(1):54–67. [Google Scholar]
- 14.Daniel RM, Cousens SN, Stavola BL, Kenward MG, Sterne JA. Methods for dealing with time-dependent confounding. Stat Med. 2013;32(9):1584–1618. [DOI] [PubMed] [Google Scholar]
- 15.Frangakis CE, Rubin DB. Principal stratification in causal inference. Biometrics. 2002;58(1):21–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Stigler SM. The asymptotic distribution of the trimmed mean. Ann Stat. 1973;1(3):472–477. [Google Scholar]
- 17.Rubin DB. Multiple imputation for nonresponse in surveys. Hoboken, NJ: John Wiley & Sons; 2004. [Google Scholar]
- 18.Van Buuren S, Groothuis-Oudshoorn K. Mice: multivariate imputation by chained equations in R. J Stat Software. 2010;45:1–68. [Google Scholar]
- 19.Carpenter JR, Roger JH, Kenward MG. Analysis of longitudinal trials with protocol deviation: a framework for relevant, accessible assumptions, and inference via multiple imputation. J Biopharm Stat. 2013;23(6):1352–1371. [DOI] [PubMed] [Google Scholar]
- 20.Mehrotra DV, Liu F, Permutt T. Missing data in clinical trials: control-based mean imputation and sensitivity analysis. Pharm Stat. 2017;16(5):378–392. [DOI] [PubMed] [Google Scholar]
- 21.Burkhart B, Lorenz J. Pain measurement in man: neurophysiological correlates of pain. Electroencephalogr Clin Neurophysiol. 1984;107(4):227–253. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data that support the findings of this study are available on request from the corresponding author, [AO]. The simulation data and code will be publicly available. The clinical trial data are restricted for patient privacy.