Abstract
Imputation and inference (or analysis) models that cannot be true simultaneously are frequently used in practice when missing outcomes are present. In these situations, the conclusions can be misleading depending on how “different” the implicit inference model, induced by the imputation model, is from the inference model actually used. We introduce model-based compatibility (MBC) and compare two MBC approaches to a non-MBC approach and explore the inferential validity of the latter in a simple case. In addition, we evaluate more complex cases through a series of simulation studies. Overall, we recommend caution when making inferences using a non-MBC analysis and point out when the inferential “cost” is the largest.
Keywords: compatible models, ignorability, missingness, multiple imputation
1. INTRODUCTION
Incomplete data is a common phenomenon in clinical trials. When missing outcomes are present, multiple imputation is a common approach.1–3 Multiple imputation was introduced by Rubin4,5 and recently reviewed in several papers.6,7 Missing at random (MAR) is a common assumption for multiple imputation. MAR holds when the missingness is independent of the unobserved data, given the observed data. To make a MAR assumption more reasonable, additional information (such as auxiliary covariates V, which are not of interest for the primary research question but may help predict missingness and impute the missing response) are often included in model construction.8,9 An imputation model incorporating both the auxiliary covariates V and covariates of interest X is constructed as well as an inference (or analysis) model that only contains covariates of interest X. The concern with such a setup is whether the imputation and outcome model can hold simultaneously.10 In a probability model–based framework, we formally define such compatibility here, discuss several ways to ensure that it holds, and examine implications in simple and more complex examples when it does not hold.
Section 2 specifies the assumed missing data mechanism (MDM), defines our notion of compatibility, and describes a class of models to ensure compatibility holds. Several classes of models are carefully explored in a simple longitudinal example in Section 3. In Section 4, we compare the performance of the models in more complex settings through a series of simulation studies. Section 5 provides recommendations, and Section 6 concludes with a brief discussion.
2. GENERAL FRAMEWORK AND MODEL-BASED COMPATIBILITY
In this section, we formally define auxiliary variable MAR (A-MAR) and model-based compatibility (MBC) and introduce a class of models, which guarantees the latter.
2.1. Definition of A-MAR
We introduce some notation to define the MDM. Define the complete longitudinal response as y and the complete data as (y, r, x, v), where r indicates which components of y are observed. The observed data is (yobs, r, x, v), and the missing data is ymis, where y = (yobs, ymis). Ignorable missingness with auxiliary covariates is satisfied if the following two conditions hold: (1) the MDM is A-MAR, ie, p(r|y, v, x; ϕ) = p(r|yobs, v, x; ϕ); (2) full data parameters can be decomposed into three parts as ω = (α, ϕ, θ): ϕ indexes the MDM p(r|y, v, x; ϕ), α indexes full data response model conditional on both auxiliary covariates and primary covariates of interest g(y|x, v; α), and θ indexes the marginal distribution of auxiliary covariates p(v|x; θ), where (α, θ) and ϕ are distinct. In what follows, we assume that A-MAR holds given the included v’s. Under A-MAR, the imputation model will be specified as g(y|x, v; α). The key for A-MAR is that the imputation model does not depend on r but does depend on v.
2.2. MBC in the presence of auxiliary covariates
Given an imputation model
let
where is the collection of all the distributions for V|X such that the integral is finite. Suppose that the inferential model for Y|X to be used is
Compatibility of the imputation model to the inferential model can be defined as, for any given β* ∈ B, there exist α* ∈ A and such that
for all y, x; we call this model-based compatibility (MBC). This MBC for a well-defined functional of the distribution of y|x (MBC-F), such as the mean,
and
may be defined as, for any given β* ∈ B, there exist α* ∈ A and such that
for all x. This is weaker than MBC and implied by it. However, MBC and MBC-F are equivalent in certain cases. For example, when the functional is the mean (most common choice) and (1) normal regression models are used for both the the inference and imputation models and (2) for multivariate binary data using the models in Section 3. In what follows, we will focus development on the functional being the mean.
There are at least two ways to ensure MBC-F (in what follows, we will suppress parameters for clarity). First, specifying g(y|x, v) with a constraint that preserves the form of the functional (eg, the marginal mean, E[Y|x]); we call this constraint compatible (CC) and give a simple example in Section 3. Second, specifying g(y|x, v) in a saturated way for the functional. In particular, there are enough parameters in the imputation model such that the functional for the imputation model does not contradict the same function from the inference model; we call this saturation compatible (SC) and provide a simple example in Section 3.
We point out a few features of each. CC directly specifies a model for h(v|x), p(v|x; θ) while SC does not. CC implicitly treats the distribution of auxiliary covariates as more of a “nuisance” that is needed to ensure compatibility. SC has more parameters to estimate in the imputation model and is only possible if v is categorical (details in Section 3). CC does not require explicit imputation.
In what follows, we will focus on CC and SC with the functional being the expectation of Y|x. We provide more details on CC in the next subsection and illustrate CC and SC in a simple example in Section 3.
2.3. General specification of model-based compatible using constraints (CC)
CC can be constructed by using the idea of likelihood based marginalized models.11 To ensure compatibility with the functional being the mean, we need the following constraint:
This can be generalized to correlation data settings, including longitudinal, by conditioning on previous responses, and/or random effects, bi,
Here, we will always integrate over the auxiliary covariates.
Related formulations (without auxiliary covariates and/or in causal inference settings) have been specified for a variety of cases: (1) longitudinal binary responses12,13; (2) multivariate responses14; (3) continuous responses with a nonlinear link.15 Each can be adapted to our setting (with auxiliary covariates); Daniels et al16 did such an extension for univariate longitudinal binary data with auxiliary covariates. We provide details on this case next.
3. SIMPLE LONGITUDINAL EXAMPLE
To better understand the issues with the CC and SC models introduced in Section 2, we thoroughly explore a simple example. Consider the scenario with one binary (baseline) auxiliary covariate V, two longitudinal measurements for each subject, (Y1, Y2), and a binary covariate of interest (treatment), X.
We assume that the missingness is monotone and the MDM is A-MAR with the following form:
(1) |
(2) |
We assume that the inference model used is
(3) |
For ease of notation, we let Yi0 ≡ 0. In what follows, we assume n subjects and T time points (where, here, T = 2). The CC imputation model is
(4) |
where the parameters, Δit, which are (implicitly) a function of (β, θ, x, y), allow these two models to be compatible (ie, both can be correct simultaneously) via the following constraint:
where . To compute Δit and ensure MBC-F for the mean, the distribution p(v|x; θ) needs to be estimated. This distribution does not need to be estimated for the SC or the non-MBC models. We examine the impact of this on the efficiency and operating characteristics of the CC approach in Section 4 via simulations. In randomized clinical trials with the only subject-specific inference covariate being treatment, we can model the distribution of v separately for each arm of treatment or assume p(v|x) ≡ p(v) (by randomization). As such, in this example, we can estimate θ by the empirical distribution of v. We discuss estimation of p(v|x; θ) in more complicated settings in Section 6. In the CC approach, the imputation model is constrained by the (form of the mean of the) inference model and both are (implicitly) fit simultaneously/jointly.
Recall that the deterministic parameters, Δit, which are (implicitly) a function of (β, θ, x, y), enforce the form of the mean inference model. It is not a free regression parameter and the above specification has only five free regression parameters (β0, β1, β2, β3, α). Given the structure of the inference model in (3), there are six possible “mean” values corresponding to the values of {(t − 1), x, y}; these realizations are
As such, there are six unique Δit values.
Remark 1. If the inference model contains continuous covariates, there will be n * T distinct values of Δit. In general, the number of unique Δit corresponds to the number of observed unique sets of values of in (2). There are slightly fewer here (if x was continuous), due to Y0 being fixed at zero.
Given the inference model in (3), a typical non–MBC-F imputation model would replace Δit with and use the following imputation model:
(5) |
This imputation model is not MBC-F with (3).
Remaining in this simple scenario with only categorical covariates in the inference model, a seemingly non–MBC-F model that replaced Δit in (4) with a richer model than the one in (5), with design vector (1, t – 1, x, y, x * (t − 1), x * y), would be a SC model. We use the term saturated to indicate that there are enough parameters in the imputation model to accommodate the number of unique observed values of the mean in the inference model (this is the same as the number of unique Δit’s for the CC approach).
Remark 2. In general, for nonlinear links and continuous covariates, a SC model will not be possible or practical (cf Remark 1) as n * T regression parameters would be needed. The other thing to note is that such a general imputation model requires the estimation of 11 regression parameters (four in the inference model plus seven in the imputation model), unlike the five regression parameters in the CC model (though as mentioned previously, the CC requires an estimate of the distribution of the auxiliary covariates, for which we use the empirical distribution) and will only be equivalent to the CC specification asymptotically; we will explore this further in the simulations in Section 4.
Remark 3. For linear links, we have fewer problems. For example, the following (mean) inference model:
can hold simultaneously with
as the marginalization over p(v|x) keeps the same functional form for the mean. However, these non-CC approaches require estimation of more regression parameters (9 vs 5) than the CC model.
Next, we point out factors that affect the bias of the inference model parameters for non–MBC-F models including: (1) ϕv, the effect of auxiliary covariate V in the MDM; (2) α, the effect of auxiliary covariate V in the imputation model; and (3) the overall missing rate.
If ϕv = 0, the auxiliary covariate is not needed to impute the missing values. However, if it is used, both CC and SC will be correct and neither will give the same inference (in small samples or asymptotically) as the non–MBC-F models.
Remark 4. Note that, when we assume A-MAR, we typically do not fit the MDM explicitly, so it would not be uncommon to have unneeded auxiliary covariates in the imputation model.
When the auxiliary covariate V has no effect in imputation model, ie, α = 0 and the sample size n is large enough such that an estimator is close to the true value, we will see no difference between the analyses. However, when sample size n is small, it is possible that can be “far” from zero, which will negatively impact inference for non–MBC-F models. As |α| → ∞, the effect of auxiliary covariate V increases, and we expect larger bias for the non–MBC-F approach. Clearly, as the probability of any missingness goes to zero, the estimates from the three approaches will be very similar (with increasing sample size). So, if the probability of missingness is low and the auxiliary covariates effects on y are “small,” there will be less bias for non–MBC-F models.
We explore more complex settings and the issues raised here via simulations in the next section.
4. SIMULATIONS
We conduct a series of simulations to understand the “cost” of a non–MBC-F analysis compared to both MBC-F analyses in a more complex setting with four time points and more auxiliary covariates. We assess the impact of several factors on the performance of the two analyses including sample size, unneeded auxiliary covariates (certain ϕv = 0), estimation of p(v|x; θ), and no relation between the auxiliary covariates and the response (α = 0). The details of the simulations are provided in the following.
4.1. Simulation setup
Auxiliary covariates
Consider auxiliary covariates V with dimension p = 8, each having only two levels 0 and 1. Define
where uv* is calculated according to a log-linear model with three-way interactions
with , . One set of the true values for the remainder of the λ’s is randomly generated from
Here, we can examine the impact of estimating p(v|x; θ) using the empirical distribution on the operating characteristics of the estimates of the inference model parameters in the CC approach.
The inference model
In the setting of four longitudinal responses, Yt, the true inference model is specified as
(6) |
The imputation model
For the CC approach, the true imputation model has the following form:
(7) |
Δit is a function of (β, θ) and y and has seven possible values based on unique combinations of (t – 1, y): {(0, 0), (1, 0), (1, 1), (2, 0), (2, 1), (3, 0), (3, 1)} in the inference model in (6). Recall that, for the CC analysis, the Δ’s in the imputation model are constrained so that the conditional mean E(Yt|Yt−1 = y) obtained by marginalization of E(Yt|Yt−1 = y, v) over auxiliary covariates V is equal to that from the inference model.
The CC model is the true data generating model here. We consider three different sets of parameter values in the imputation model (ie, α) as given in Table 1. For each set, we simulate the full data response Y from (7) using the parameter values given in Table 1. The non–MBC-F and SC approaches use the same inference model as above, with the following two imputation models, respectively.
TABLE 1.
Inference Model | MDM | Imputation Set 1 | Imputation Set 2 | Imputation Set 3 | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
β0 | −0.5 | ϕv,1 | −0.5 | ϕv,7 | 0 | α1 | 0.4*3 | α6 | −0.6*3 | α1 | 0.4*1.8 | α6 | −0.6*1.8 | α1 | 0.4*0.1 | α6 | −0.6*0.1 |
β1 | 0.25 | ϕv2 | − 1.2 | ϕv,8 | 0 | α2 | 0.3*3 | α7 | −0.3*3 | α2 | 0.3*1.8 | α7 | −0.3* 1.8 | α2 | 0.3*0.1 | α7 | −0.3*0.1 |
β2 | 0.4 | ϕv,3 | −0.8 | ϕv,9 | 0 | α3 | 0.5*3 | α8 | −0.7*3 | α3 | 0.5*1.8 | α8 | −0.7*1.8 | α3 | 0.5*0.1 | α8 | −0.7*0.1 |
ϕv,3 | 0.5 | ϕ0 | 0.5 | α4 | 0.2*3 | α4 | 0.2*1.8 | α4 | 0.2*0.1 | ||||||||
ϕv,5 | 0.6 | ϕ1 | 0 | α5 | −0.4*3 | α5 | −0.4*1.8 | α5 | −0.4*0.1 | ||||||||
ϕv,6 | 0 |
Abbreviation: MDM, missing data mechanism.
The non–MBC-F model is specified as
(M1) |
and the SC model is specified as
(M2) |
The SC model parameters, , are sufficient to ensure that the inference model in (6) can hold simultaneously with the SC imputation model. There are more regression parameters in the SC model, but it should be “equivalent” to the CC analysis in larger samples.
Specification of MDM
We assume that the MDM is A-MAR and has the following form:
(8) |
where Rit = I (Yit is observed). The values of ϕ are listed in Table 1. With these values, we generate monotone missingness; the missing data rate at T = 4 is about 45%. By setting ϕ2 = 0, the MDM is assumed to be MAR conditional on the auxiliary covariates. Missingness in the full data response is generated using (8).
Inclusion of unnecessary auxiliary covariates V for A-MAR
To compare the robustness of all the analyses to the inclusion of auxiliary covariates, which are unnecessary for A-MAR assumption but predictive of the outcome Y, we set the coefficients of V5, … , V8(ϕv,5, … , ϕv,8) to zero in the MDM in (8). Thus, V1, … , V4 are necessary auxiliary covariates, and V5, … , V8 are unnecessary (or extra) auxiliary covariates.
For each model considered, we fit models with and without unnecessary auxiliary covariates. This is to assess the development regarding ϕv from Section 3 in more detail in practice.
Sample size
We run simulations with sample sizes of 200, 500, 20 000, and 100 000. The largest sample size is meant to correspond to an “asymptotic result.” For each sample size, the results are based on 1000 simulated datasets.
4.2. Comparisons between MBC-F and non–MBC-F approaches
We compare the performance in six settings: (1) U* as non–MBC-F analysis with imputation model (M1) using all available auxiliary covariates, (2) SC* as SC analysis with imputation model (M2) using all available auxiliary covariates, (3) CC* as CC analysis using all auxiliary covariates, (4) U as non–MBC-F analysis with imputation model (M1) using only necessary auxiliary covariates, (5) SC as SC analysis with imputation model (M2) using only necessary auxiliary covariates, and (6) CC as CC analysis using only necessary auxiliary covariates.
For the imputation models, M1 and M2, to remove any bias from a small “finite” number of imputations, the missing outcomes are imputed m = 100 times based on the imputation model. The inference model is fit for each of these 100 “complete” datasets. We make inference for β and conditional mean E(Yt|Yt−1 = y) according to Rubin’s multiple imputation rules.4 We use the optim() function in R to obtain the maximum likelihood estimates for the CC analysis. For randomly selected starting values, we run optim() several times to ensure the global maximum is found.
We compute the bias and mean squared error (MSE) for β and the 90% and 95% confidence interval coverage rates. Tables 2 and 3 show simulation results for parameter setting 1, Tables 4 and 5 show simulation results for parameter setting 2, and Tables S.1 and S.2 (in the supplementary materials) show simulation results for parameter setting 3. The first parameter setting illustrates the setting where the auxiliary covariates have a large impact on the response (ie, in the inference model). The second parameter setting corresponds to a smaller impact, and the third corresponds to the auxiliary covariates being almost unrelated to the response; the third setting will allow us to assess the conjecture in Section 3 that, in larger sample sizes, a non–MBC-F analysis will be essentially valid.
TABLE 2.
Size | Parameter | U* | SC* | CC* | U | SC | CC |
---|---|---|---|---|---|---|---|
200 | Bias β0 | −0.026(0.004) | −0.001(0.004) | −0.003(0.004) | −0.001(0.004) | 0.001(0.004) | 0.003(0.004) |
Bias β1 | 0.021(0.002) | −0.001(0.002) | 0.002(0.002) | −0.012(0.003) | −0.001(0.003) | −0.002(0.002) | |
Bias β2 | −0.005(0.008) | 0.001(0.008) | −0.006(0.007) | 0.003(0.008) | 0.001(0.008) | −0.001(0.008) | |
103 *MSE β0 | 14.8(0.66) | 14.4(0.64) | 14.1(0.64) | 14.7(0.66) | 14.1(0.64) | 14.1(0.64) | |
103 *MSE β1 | 6.1(0.28) | 5.9(0.27) | 5.5(0.25) | 6.9(0.33) | 5.9(0.23) | 5.4(0.21) | |
103 *MSE β2 | 59.8(2.7) | 59.8(2.6) | 54.8(2.4) | 65.0(2.8) | 55.7(2.7) | 53.2(2.4) | |
500 | Bias β0 | −0.023(0.002) | 0.001(0.002) | 0.001(0.002) | 0.003(0.002) | 0.001(0.002) | 0.001(0.002) |
Bias β1 | 0.024(0.001) | 0.003(0.002) | 0.004(0.001) | −0.011(0.002) | −0.003(0.002) | −0.002(0.002) | |
Bias β2 | −0.013(0.005) | −0.001(0.005) | −0.001(0.004) | 0.001(0.005) | −0.001(0.005) | −0.001(0.005) | |
103 *MSE β0 | 6.0(0.27) | 5.6(0.26) | 5.2(0.24) | 5.6(0.25) | 5.7(0.26) | 5.7(0.25) | |
103 *MSE β1 | 2.7(0.13) | 2.7(0.12) | 2.6(0.12) | 2.6(0.12) | 2.3(0.10) | 2.1(0.09) | |
103 *MSE β2 | 21.0(0.98) | 23.1(1.1) | 21.8(0.99) | 23.0(1.1) | 21.4(0.97) | 18.3(0.88) | |
20 000 | Bias β0 | −0.024(0.00) | 0.00(0.00) | −0.001(0.00) | 0.002(0.00) | 0.00(0.00) | 0.00(0.00) |
Bias β1 | 0.021(0.00) | 0.00(0.00) | 0.001(0.00) | −0.013(0.00) | −0.001(0.00) | −0.001(0.00) | |
Bias β2 | −0.006(0.001) | −0.001(0.001) | −0.002(0.001) | 0.004(0.001) | 0.001(0.001) | 0.001(0.001) | |
103 *MSE β0 | 0.70(0.02) | 0.18(0.01) | 0.21(0.02) | 0.15(0.01) | 0.15(0.01) | 0.12(0.01) | |
103 *MSE β1 | 0.48(0.01) | 0.14(0.01) | 0.11(0.01) | 0.30(0.01) | 0.06(0.00) | 0.05(0.00) | |
103 *MSE β2 | 0.59(0.03) | 0.63(0.03) | 0.65(0.03) | 0.54(0.03) | 0.42(0.02) | 0.62(0.03) | |
100 000 | Bias β0 | −0.024(0.00) | 0.00(0.00) | 0.00(0.00) | 0.002(0.00) | 0.00(0.00) | 0.00(0.00) |
Bias β1 | 0.021(0.00) | 0.00(0.00) | 0.00(0.00) | −0.013(0.00) | 0.00(0.00) | 0.00(0.00) | |
Bias β2 | −0.005(0.00) | 0.001(0.00) | 0.00(0.00) | 0.005(0.00) | 0.001(0.00) | 0.00(0.00) | |
103 *MSE β0 | 0.59(0.01) | 0.06(0.00) | 0.09(0.01) | 0.03(0.00) | 0.02(0.00) | 0.03(0.00) | |
103 *MSE β1 | 0.43(0.00) | 0.19(0.00) | 0.27(0.00) | 0.26(0.01) | 0.01(0.00) | 0.01(0.00) | |
103 *MSE β2 | 0.13(0.01) | 0.15(0.01) | 0.14(0.01) | 0.10(0.00) | 0.11(0.01) | 0.07(0.00) |
Abbreviations: MBC, model-based compatibility; MSE, mean squared error.
TABLE 3.
Size | Parameter | U* 90 | SC* 90 | CC* 90 | U 90 | SC 90 | CC 90 | U* 95 | SC* 95 | CC* 95 | U 95 | SC 95 | CC 95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
200 | β0 | 0.90 | 0.90 | 0.89 | 0.90 | 0.91 | 0.89 | 0.95 | 0.95 | 0.94 | 0.95 | 0.94 | 0.95 |
β1 | 0.90 | 0.91 | 0.90 | 0.92 | 0.93 | 0.91 | 0.90 | 0.91 | 0.90 | 0.92 | 0.93 | 0.93 | |
β2 | 0.89 | 0.89 | 0.89 | 0.89 | 0.89 | 0.89 | 0.93 | 0.93 | 0.92 | 0.94 | 0.94 | 0.93 | |
500 | β0 | 0.90 | 0.92 | 0.92 | 0.92 | 0.92 | 0.92 | 0.96 | 0.95 | 0.95 | 0.96 | 0.95 | 0.95 |
β1 | 0.90 | 0.92 | 0.92 | 0.91 | 0.92 | 0.92 | 0.96 | 0.95 | 0.95 | 0.96 | 0.95 | 0.95 | |
β2 | 0.89 | 0.90 | 0.90 | 0.89 | 0.90 | 0.90 | 0.93 | 0.93 | 0.94 | 0.93 | 0.94 | 0.94 | |
20 000 | β0 | 0.41 | 0.92 | 0.92 | 0.92 | 0.92 | 0.93 | 0.54 | 0.94 | 0.94 | 0.97 | 0.94 | 0.94 |
β1 | 0.16 | 0.92 | 0.90 | 0.53 | 0.92 | 0.92 | 0.25 | 0.94 | 0.96 | 0.68 | 0.95 | 0.96 | |
β2 | 0.85 | 0.90 | 0.91 | 0.84 | 0.90 | 0.92 | 0.90 | 0.94 | 0.94 | 0.89 | 0.94 | 0.95 | |
100 000 | β0 | 0.00 | 0.93 | 0.93 | 0.91 | 0.93 | 0.93 | 0.01 | 0.93 | 0.93 | 0.96 | 0.94 | 0.94 |
β1 | 0.00 | 0.93 | 0.92 | 0.03 | 0.93 | 0.93 | 0.00 | 0.96 | 0.97 | 0.05 | 0.96 | 0.97 | |
β2 | 0.81 | 0.93 | 0.93 | 0.78 | 0.94 | 0.95 | 0.89 | 0.95 | 0.96 | 0.86 | 0.94 | 0.96 |
TABLE 4.
Size | Parameter | U* | SC* | CC* | U | SC | CC |
---|---|---|---|---|---|---|---|
200 | Bias β0 | −0.014(0.004) | 0.001(0.004) | 0.001(0.004) | 0.002(0.004) | 0.001(0.004) | 0.001(0.004) |
Bias β1 | 0.007(0.003) | −0.005(0.003) | −0.003(0.003) | −0.005(0.003) | −0.005(0.003) | −0.002(0.003) | |
Bias β2 | 0.013(0.007) | 0.008(0.007) | 0.003(0.007) | 0.014(0.007) | 0.008(0.007) | 0.003(0.007) | |
103 *MSE β0 | 42.4(0.78) | 12.7(0.79) | 13.4(0.78) | 11.9(0.78) | 3.6(0.79) | 4.0(0.78) | |
103 *MSE β1 | 6.9(0.28) | 7.2(0.30) | 6.8(0.28) | 6.4(0.26) | 6.8(0.27) | 6.3(0.26) | |
103 *MSE β2 | 65.5(2.7) | 65.8(2.7) | 63.6(2.6) | 64.5(2.7) | 63.9(2.7) | 61.0(2.6) | |
500 | Bias β0 | −0.015(0.002) | 0.004(0.003) | 0.003(0.002) | −0.00(0.002) | 0.00(0.002) | −0.001(0.002) |
Bias β1 | 0.01(0.002) | −0.001(0.002) | −0.001(0.002) | −0.008(0.002) | −0.001(0.002) | −0.001(0.002) | |
Bias β2 | 0.010(0.005) | 0.009(0.005) | 0.008(0.005) | 0.008(0.005) | 0.007(0.005) | 0.007(0.004) | |
103 *MSE β0 | 35.3(0.49) | 6.2(0.49) | 7.2(0.48) | 5.4(0.49) | 4.0(0.30) | 2.5(0.39) | |
103 *MSE β1 | 2.7(0.12) | 2.9(0.13) | 2.7(0.12) | 2.5(0.11) | 2.6(0.12) | 2.4(0.11) | |
103 *MSE β2 | 32.6(1.4) | 32.7(1.4) | 31.9(1.3) | 32.4(1.4) | 31.7(1.4) | 30.2(1.3) | |
20 000 | Bias β0 | −0.014(0.00) | 0.003(0.00) | 0.003(0.00) | 0.001(0.001) | −0.00(0.00) | −0.00(0.00) |
Bias β1 | 0.011(0.00) | 0.00(0.00) | 0.00(0.00) | −0.007(0.00) | 0.00(0.00) | −0.00(0.00) | |
Bias β2 | 0.003(0.001) | 0.002(0.001) | 0.001(0.001) | 0.00(0.001) | 0.00(0.001) | 0.00(0.001) | |
103 *MSE β0 | 0.90(0.02) | 0.73(0.01) | 0.90(0.00) | 0.81(0.01) | 0.37(0.01) | 0.68(0.00) | |
103 *MSE β1 | 0.18(0.01) | 0.16(0.01) | 0.15(0.01) | 0.12(0.01) | 0.07(0.00) | 0.05(0.00) | |
103 *MSE β2 | 0.56(0.02) | 0.54(0.02) | 0.45(0.02) | 0.49(0.01) | 0.44(0.01) | 0.41(0.01) | |
100 000 | Bias β0 | −0.013(0.00) | 0.00(0.00) | 0.00(0.00) | 0.001(0.00) | 0.00(0.00) | 0.00(0.00) |
Bias β1 | 0.01(0.00) | 0.00(0.00) | 0.00(0.00) | −0.008(0.00) | 0.00(0.00) | 0.00(0.00) | |
Bias β2 | 0.003(0.00) | 0.002(0.00) | 0.002(0.00) | 0.001(0.00) | 0.001(0.00) | 0.001(0.00) | |
103 *MSE β0 | 0.52(0.01) | 0.07(0.01) | 0.06(0.01) | 0.12(0.00) | 0.05(0.01) | 0.05(0.01) | |
103 *MSE β1 | 0.12(0.00) | 0.01(0.00) | 0.01(0.00) | 0.07(0.00) | 0.01(0.00) | 0.01(0.00) | |
103 *MSE β2 | 0.22(0.01) | 0.21(0.01) | 0.20(0.01) | 0.18(0.01) | 0.15(0.00) | 0.14(0.00) |
Abbreviations: MBC, model-based compatibility; MSE, mean squared error.
TABLE 5.
Size | Parameter | U* 90 | SC* 90 | CC* 90 | U 90 | SC 90 | CC 90 | U* 95 | SC* 95 | CC* 95 | U 95 | SC 95 | CC 95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
200 | β0 | 0.90 | 0.92 | 0.92 | 0.91 | 0.92 | 0.92 | 0.96 | 0.96 | 0.95 | 0.96 | 0.96 | 0.96 |
β1 | 0.90 | 0.91 | 0.91 | 0.90 | 0.91 | 0.91 | 0.95 | 0.96 | 0.94 | 0.95 | 0.96 | 0.95 | |
β2 | 0.87 | 0.87 | 0.88 | 0.87 | 0.87 | 0.88 | 0.92 | 0.92 | 0.93 | 0.93 | 0.93 | 0.94 | |
500 | β0 | 0.91 | 0.91 | 0.90 | 0.91 | 0.91 | 0.90 | 0.96 | 0.95 | 0.95 | 0.96 | 0.95 | 0.95 |
β1 | 0.90 | 0.92 | 0.90 | 0.91 | 0.92 | 0.91 | 0.95 | 0.95 | 0.95 | 0.96 | 0.96 | 0.95 | |
β2 | 0.88 | 0.88 | 0.90 | 0.88 | 0.88 | 0.90 | 0.93 | 0.93 | 0.94 | 0.94 | 0.94 | 0.94 | |
20 000 | β0 | 0.72 | 0.90 | 0.90 | 0.91 | 0.91 | 0.91 | 0.82 | 0.94 | 0.93 | 0.96 | 0.95 | 0.95 |
β1 | 0.64 | 0.90 | 0.92 | 0.77 | 0.91 | 0.92 | 0.76 | 0.95 | 0.96 | 0.84 | 0.95 | 0.96 | |
β2 | 0.86 | 0.90 | 0.92 | 0.84 | 0.91 | 0.92 | 0.93 | 0.93 | 0.97 | 0.93 | 0.93 | 0.97 | |
100 000 | β0 | 0.22 | 0.91 | 0.90 | 0.90 | 0.91 | 0.91 | 0.33 | 0.96 | 0.94 | 0.95 | 0.96 | 0.95 |
β1 | 0.12 | 0.91 | 0.93 | 0.36 | 0.91 | 0.93 | 0.20 | 0.95 | 0.96 | 0.48 | 0.95 | 0.96 | |
β2 | 0.88 | 0.90 | 0.93 | 0.86 | 0.90 | 0.93 | 0.92 | 0.93 | 0.97 | 0.92 | 0.93 | 0.97 |
Overall, it is clear that the performance of the CC analysis and the SC analysis are quite close, and both are more accurate and efficient than the non–MBC-F analysis; however, recall that the SC approach is not possible in general (eg, with continuous auxiliary covariates as discussed in Section 3). We also note the findings regarding the comment in Remark 2. Both CC and SC seem to have similar efficiency (the former needs an estimate of the distribution of the auxiliary covariates and the latter has more regression parameters to estimate). So, at least for the simulation settings considered here, there is a similar trade-off between the extra regression parameters in the SC approach and having to estimate the distribution of the auxiliary covariates for CC. The bias for the non–MBC-F analysis does not disappear with sample size. On the contrary, the estimates for the two MBC-F approaches are asymptotically unbiased as shown in these Tables for the larger sample sizes. The true coverage rates of confidence intervals for non–MBC-F analysis severely deteriorate for the larger sample sizes. For example, for the largest sample size considered, the coverage rate of the 95% CI for β0 and β1 from the non–MBC-F analysis with unneeded auxiliary covariates (U*) are less than 25% in imputation parameter settings 1 and 2 (Tables 3 and 5).
These results also illustrate the robustness of the two MBC-F analyses to unnecessary auxiliary covariates. However, when they are included in the non–MBC-F analysis, its performance deteriorates very badly in terms of bias in β and the coverage rate of confidence intervals. For sample size of 20 000 and 100 000 in parameter settings 1 and 2, the bias in β0 from the non–MBC-F analysis including unnecessary auxiliary covariates is about 10 times that from the non–MBC-F analysis including only necessary auxiliary covariates, and the coverage rate for the former is about 20% (Tables 3 and 5). On the contrary, it appears that the MBC-F analyses are minimally impacted in terms of efficiency loss and bias increase when unnecessary auxiliary covariates are added into the imputation model. This is a practically desirable feature for a MBC-F analysis since it can minimize the need for model selection for the auxiliary covariates in the MDM.
The magnitude of the coefficients of auxiliary covariates in the imputation model (α) significantly impact the performance of the non–MBC-F analysis. For parameter settings 1 and 2, the bias increases and the true coverage rate drops for the non–MBC-F analysis is clear (Tables 2–5). This supports our conjecture in Section 3. However, for parameter setting 3 where auxiliary covariates V have the smallest imputation model coefficients, all six models perform similarly (Tables S.1 and S.2).
5. RECOMMENDATIONS
Based on the example in Section 3 and the simulations in Section 4, we provide a few recommendations. In general, for the common setting where the analyst fits both the imputation and inference models, an MBC-F approach is clearly preferred. SC is a very simple MBC-F approach that does not require a model for the auxiliary covariates or explicit computation of Δit; however, this can only be used for the case of categorical covariates. The inclusion of unnecessary auxiliary covariates, which is typical, is problematic and exacerbates bias in non–MBC-F models; as such, MBC-F approaches are preferred again.
6. DISCUSSION
We have illustrated with an analytic example and simulations the misleading inferences that can result when conducting a non–MBC-F analysis. For example, we saw the negative impact of unneeded auxiliary covariates in the non–MBC-F model. As such, we urge caution in making inference based on a non–MBC-F analysis when it can be avoided. We also observed similar performance for the CC and SC models. The latter can be inefficient (or impossible) to specify for more complex inference models as the number of regression parameters in the imputation model to estimate greatly increases.
In addition, estimation of the distribution of the auxiliary covariates, v given x can be more complex when x includes more than a few categorical covariates; otherwise, the empirical distribution can be used to estimate it (and in a randomized trial, the only covariate is typically treatment). For this case, one might use a Bayesian nonparametric model (to minimize model misspecification problems) to estimate this distribution (eg, a Dirichlet process mixture of normals17) and Monte Carlo integration to compute the Δit; we are currently exploring this.
We would expect to see worse performance for MICE approaches,18 given they typically do not correspond to a valid joint distribution and are most often non–MBC-F without explicit adjustments.10
Extension to the case where missingness is still not at random even after including all available auxiliary covariates would be very useful. We are currently working on (Bayesian) approaches for MBC-F analysis under this scenario,19 which allow for sensitivity analysis.20 Also, formal construction of CC approaches in different MDMs is needed using the framework in Section 2. We are also working on such extensions.
7. SUPPLEMENTARY MATERIALS
Tables referenced in Section 4 are available with this paper at the website.
Supplementary Material
ACKNOWLEDGEMENT
This work was partially supported by the National Institutes of Health under grant CA183854.
Footnotes
SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of the article.
REFERENCES
- 1.Lavori PW, Dawson R, Shera D. A multiple imputation strategy for clinical trials with truncation of patient data. Statist Med. 1995;14(17):1913–1925. [DOI] [PubMed] [Google Scholar]
- 2.Taylor L, Zhou XH. Multiple imputation methods for treatment noncompliance and nonresponse in randomized clinical trials. Biometrics. 2009;65(1):88–95. [DOI] [PubMed] [Google Scholar]
- 3.Burns RA, Butterworth P, Kiely KM, et al. Multiple imputation was an efficient method for harmonizing the mini-mental state examination with missing item-level data. J Clinical Epidemiology. 2011;64(7):787–793. [DOI] [PubMed] [Google Scholar]
- 4.Rubin DB. Multiple Imputation for Nonresponse in Surveys. Hoboken, NJ: John Wiley & Sons; 1987. [Google Scholar]
- 5.Rubin DB. Multiple imputation after 18+ years (with discussion). J Am Statist Assoc. 1996;91:473–489. [Google Scholar]
- 6.Kenward MG, Carpenter J. Multiple imputation: current perspectives. Statist Methods Med Res. 2007;16:199–218. [DOI] [PubMed] [Google Scholar]
- 7.Reiter JP, Raghunathan TE. The multiple adaptations of multiple imputation. J Am Statist Assoc. 2007;102(480):1462–1471. [Google Scholar]
- 8.Zhang M, Tsiatis AA, Davidian M. Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics. 2008;64(3):707–715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Conlon ASC, Taylor JMG, Sargent DJ. Improving efficiency in clinical trials using auxiliary information: application of a multi-state cure model. Biometrics. 2015;71(2):460–468. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bartlett JW, Seaman SR, White IR, Carpenter JR. Initiative Alzheimer’s Disease Neuroimaging. Multiple imputation of covariates by fully conditional specification: accommodating the substantive model. Stat Methods Med Res. 2015;24(4):462–487. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Heagerty PJ, Zeger SL. Marginalized multilevel models and likelihood inference (with comments and a rejoinder by the authors). Statistical Science. 2000;15(1):1–26. [Google Scholar]
- 12.Heagerty PJ. Marginally specified logistic-normal models for longitudinal binary data. Biometrics. 1999;55(3):688–698. [DOI] [PubMed] [Google Scholar]
- 13.Heagerty PJ. Marginalized transition models and likelihood inference for longitudinal categorical data. Biometrics. 2002;58(2):342–351. [DOI] [PubMed] [Google Scholar]
- 14.Lee K, Daniels MJ, Joo Y. Flexible marginalized models for bivariate longitudinal ordinal data. Biostatistics. 2013;14(3):462–476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Roy J, Lum KJ, Daniels MJ. A Bayesian nonparametric approach to marginal structural models for point treatments and a continuous or survival outcome. Biostatistics. 2016;18(1):32–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Daniels MJ, Wang C, Marcus BH. Fully Bayesian inference under ignorable missingness in the presence of auxiliary covariates. Biometrics. 2014;70(1):62–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Müller P, Erkanli A, West M. Bayesian curve fitting using multivariate normal mixtures. Biometrika. 1996;83(1):67–79. [Google Scholar]
- 18.Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P. A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology. 2001;27(1):85–96. [Google Scholar]
- 19.Zhou T, Daniels MJ, Mueller P. A nonparametric Bayesian approach to dropout in longitudinal studies with auxiliary covariates. 2018. In press. [DOI] [PMC free article] [PubMed]
- 20.Daniels MJ, Hogan JW. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. New York, NY: CRC Press; 2008. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.