A note on compatibility for inference with missing data in the presence of auxiliary covariates

Michael J Daniels; Xuan Luo

doi:10.1002/sim.8025

. Author manuscript; available in PMC: 2020 Oct 30.

Published in final edited form as: Stat Med. 2018 Nov 18;38(7):1190–1199. doi: 10.1002/sim.8025

A note on compatibility for inference with missing data in the presence of auxiliary covariates

Michael J Daniels ¹, Xuan Luo ¹

PMCID: PMC7598794 NIHMSID: NIHMS1639699 PMID: 30450746

Abstract

Imputation and inference (or analysis) models that cannot be true simultaneously are frequently used in practice when missing outcomes are present. In these situations, the conclusions can be misleading depending on how “different” the implicit inference model, induced by the imputation model, is from the inference model actually used. We introduce model-based compatibility (MBC) and compare two MBC approaches to a non-MBC approach and explore the inferential validity of the latter in a simple case. In addition, we evaluate more complex cases through a series of simulation studies. Overall, we recommend caution when making inferences using a non-MBC analysis and point out when the inferential “cost” is the largest.

Keywords: compatible models, ignorability, missingness, multiple imputation

1. INTRODUCTION

Incomplete data is a common phenomenon in clinical trials. When missing outcomes are present, multiple imputation is a common approach.^1–3 Multiple imputation was introduced by Rubin^4,5 and recently reviewed in several papers.^6,7 Missing at random (MAR) is a common assumption for multiple imputation. MAR holds when the missingness is independent of the unobserved data, given the observed data. To make a MAR assumption more reasonable, additional information (such as auxiliary covariates V, which are not of interest for the primary research question but may help predict missingness and impute the missing response) are often included in model construction.^8,9 An imputation model incorporating both the auxiliary covariates V and covariates of interest X is constructed as well as an inference (or analysis) model that only contains covariates of interest X. The concern with such a setup is whether the imputation and outcome model can hold simultaneously.¹⁰ In a probability model–based framework, we formally define such compatibility here, discuss several ways to ensure that it holds, and examine implications in simple and more complex examples when it does not hold.

Section 2 specifies the assumed missing data mechanism (MDM), defines our notion of compatibility, and describes a class of models to ensure compatibility holds. Several classes of models are carefully explored in a simple longitudinal example in Section 3. In Section 4, we compare the performance of the models in more complex settings through a series of simulation studies. Section 5 provides recommendations, and Section 6 concludes with a brief discussion.

2. GENERAL FRAMEWORK AND MODEL-BASED COMPATIBILITY

In this section, we formally define auxiliary variable MAR (A-MAR) and model-based compatibility (MBC) and introduce a class of models, which guarantees the latter.

2.1. Definition of A-MAR

We introduce some notation to define the MDM. Define the complete longitudinal response as y and the complete data as (y, r, x, v), where r indicates which components of y are observed. The observed data is (y_obs, r, x, v), and the missing data is y_mis, where y = (y_obs, y_mis). Ignorable missingness with auxiliary covariates is satisfied if the following two conditions hold: (1) the MDM is A-MAR, ie, p(r|y, v, x; ϕ) = p(r|y_obs, v, x; ϕ); (2) full data parameters can be decomposed into three parts as ω = (α, ϕ, θ): ϕ indexes the MDM p(r|y, v, x; ϕ), α indexes full data response model conditional on both auxiliary covariates and primary covariates of interest g(y|x, v; α), and θ indexes the marginal distribution of auxiliary covariates p(v|x; θ), where (α, θ) and ϕ are distinct. In what follows, we assume that A-MAR holds given the included v’s. Under A-MAR, the imputation model will be specified as g(y|x, v; α). The key for A-MAR is that the imputation model does not depend on r but does depend on v.

2.2. MBC in the presence of auxiliary covariates

Given an imputation model

G = {g (y ∣ x, v; α) ∣ α \in A},

let

P = {p (y ∣ x; α, h) ∣ p (y ∣ x; α, h) = \int g (y ∣ x, v; α) h (v ∣ x) d v, α \in A, h \in H},

where $H$ is the collection of all the distributions for V|X such that the integral is finite. Suppose that the inferential model for Y|X to be used is

F = {f (y ∣ x; β) ∣ β \in B} .

Compatibility of the imputation model to the inferential model can be defined as, for any given β* ∈ B, there exist α* ∈ A and $h^{*} \in H$ such that

f (y ∣ x; β^{*}) = p (y ∣ x; α^{*}, h^{*})

for all y, x; we call this model-based compatibility (MBC). This MBC for a well-defined functional of the distribution of y|x (MBC-F), such as the mean,

m (β, x) = \int y f (y ∣ x; β) d y

and

M (α, h, x) = \int y p (y ∣ x; α, h) d y,

may be defined as, for any given β* ∈ B, there exist α* ∈ A and $h^{*} \in H$ such that

m (β^{*}, x) = M (α^{*}, h^{*}, x),

for all x. This is weaker than MBC and implied by it. However, MBC and MBC-F are equivalent in certain cases. For example, when the functional is the mean (most common choice) and (1) normal regression models are used for both the the inference and imputation models and (2) for multivariate binary data using the models in Section 3. In what follows, we will focus development on the functional being the mean.

There are at least two ways to ensure MBC-F (in what follows, we will suppress parameters for clarity). First, specifying g(y|x, v) with a constraint that preserves the form of the functional (eg, the marginal mean, E[Y|x]); we call this constraint compatible (CC) and give a simple example in Section 3. Second, specifying g(y|x, v) in a saturated way for the functional. In particular, there are enough parameters in the imputation model such that the functional for the imputation model does not contradict the same function from the inference model; we call this saturation compatible (SC) and provide a simple example in Section 3.

We point out a few features of each. CC directly specifies a model for h(v|x), p(v|x; θ) while SC does not. CC implicitly treats the distribution of auxiliary covariates as more of a “nuisance” that is needed to ensure compatibility. SC has more parameters to estimate in the imputation model and is only possible if v is categorical (details in Section 3). CC does not require explicit imputation.

In what follows, we will focus on CC and SC with the functional being the expectation of Y|x. We provide more details on CC in the next subsection and illustrate CC and SC in a simple example in Section 3.

2.3. General specification of model-based compatible using constraints (CC)

CC can be constructed by using the idea of likelihood based marginalized models.¹¹ To ensure compatibility with the functional being the mean, we need the following constraint:

E [Y_{i t} ∣ x_{i}] = \int E [Y_{i t} ∣ v_{i}, x_{i}] d H (v_{i} ∣ x_{i}) .

This can be generalized to correlation data settings, including longitudinal, by conditioning on previous responses, ${\bar{Y}}_{i t} = (Y_{i 1}, \dots, Y_{i t})$ and/or random effects, b_i,

E [Y_{i t} ∣ x_{i}] = \int E [Y_{i t} ∣ b_{i}, {\bar{Y}}_{i, t - 1}, v_{i}, x_{i}] d F (b_{i}, {\bar{Y}}_{t - 1}, v_{i} ∣ x_{i}) .

Here, we will always integrate over the auxiliary covariates.

Related formulations (without auxiliary covariates and/or in causal inference settings) have been specified for a variety of cases: (1) longitudinal binary responses^12,13; (2) multivariate responses¹⁴; (3) continuous responses with a nonlinear link.¹⁵ Each can be adapted to our setting (with auxiliary covariates); Daniels et al¹⁶ did such an extension for univariate longitudinal binary data with auxiliary covariates. We provide details on this case next.

3. SIMPLE LONGITUDINAL EXAMPLE

To better understand the issues with the CC and SC models introduced in Section 2, we thoroughly explore a simple example. Consider the scenario with one binary (baseline) auxiliary covariate V, two longitudinal measurements for each subject, (Y₁, Y₂), and a binary covariate of interest (treatment), X.

We assume that the missingness is monotone and the MDM is A-MAR with the following form:

logit P (R_{2} = 0 ∣ R_{1} = 1, Y_{1} = y, v) = ϕ_{02} + ϕ_{v} v + ϕ_{1} y

(1)

logit P (R_{1} = 0 ∣ v) = ϕ_{01} + ϕ_{v} v .

(2)

We assume that the inference model used is

logit P (Y_{i t} = 1 ∣ Y_{i, t - 1} = y, x; β) = β_{0} + β_{1} (t - 1) + β_{2} x + β_{3} y, for t = 1, 2.

(3)

For ease of notation, we let Y_i0 ≡ 0. In what follows, we assume n subjects and T time points (where, here, T = 2). The CC imputation model is

logit P (Y_{i t} = 1 ∣ Y_{i, t - 1} = y, x, v; α, β, θ) = Δ_{i t} + α v, for t = 1, 2.

(4)

where the parameters, Δ_it, which are (implicitly) a function of (β, θ, x, y), allow these two models to be compatible (ie, both can be correct simultaneously) via the following constraint:

E (Y_{t} ∣ Y_{t - 1} = y, x) = \sum_{v = 0}^{1} E (Y_{t} ∣ Y_{t - 1} = y, v, x) p (v ∣ Y_{t - 1} = y, x) = \sum_{v = 0}^{1} E (Y_{t} ∣ Y_{t - 1} = y, v, x) \frac{P (Y_{t - 1} = y ∣ v, x) p (v ∣ x)}{P (Y_{t - 1} = y ∣ x)},

where $P (Y_{t} = 1 ∣ v, x) = \sum_{y = 0}^{1} P (Y_{t} = 1 ∣ Y_{t - 1} = y, v, x) P (Y_{t - 1} = y ∣ v, x)$ . To compute Δ_it and ensure MBC-F for the mean, the distribution p(v|x; θ) needs to be estimated. This distribution does not need to be estimated for the SC or the non-MBC models. We examine the impact of this on the efficiency and operating characteristics of the CC approach in Section 4 via simulations. In randomized clinical trials with the only subject-specific inference covariate being treatment, we can model the distribution of v separately for each arm of treatment or assume p(v|x) ≡ p(v) (by randomization). As such, in this example, we can estimate θ by the empirical distribution of v. We discuss estimation of p(v|x; θ) in more complicated settings in Section 6. In the CC approach, the imputation model is constrained by the (form of the mean of the) inference model and both are (implicitly) fit simultaneously/jointly.

Recall that the deterministic parameters, Δ_it, which are (implicitly) a function of (β, θ, x, y), enforce the form of the mean inference model. It is not a free regression parameter and the above specification has only five free regression parameters (β₀, β₁, β₂, β₃, α). Given the structure of the inference model in (3), there are six possible “mean” values corresponding to the values of {(t − 1), x, y}; these realizations are

{(0, 0, 0), (0, 1, 0), (1, 0, 0), (1, 1, 0), (1, 0, 1), (1, 1, 1)} .

As such, there are six unique Δ_it values.

Remark 1. If the inference model contains continuous covariates, there will be n * T distinct values of Δ_it. In general, the number of unique Δ_it corresponds to the number of observed unique sets of values of $X_{i t}^{⋆}$ in (2). There are slightly fewer here (if x was continuous), due to Y₀ being fixed at zero.

Given the inference model in (3), a typical non–MBC-F imputation model would replace Δ_it with $α_{0}^{u} + α_{1}^{u} (t - 1) + α_{2}^{u} x + α_{3}^{u} y$ and use the following imputation model:

logit P (Y_{i t} = 1 ∣ Y_{i, t - 1} = y, x, v) = α_{0}^{u} + α_{1}^{u} (t - 1) + α_{2}^{u} x + α_{3}^{u} y + α_{4}^{u} v, for t = 1, 2.

(5)

This imputation model is not MBC-F with (3).

Remaining in this simple scenario with only categorical covariates in the inference model, a seemingly non–MBC-F model that replaced Δ_it in (4) with a richer model than the one in (5), with design vector (1, t – 1, x, y, x * (t − 1), x * y), would be a SC model. We use the term saturated to indicate that there are enough parameters in the imputation model to accommodate the number of unique observed values of the mean in the inference model (this is the same as the number of unique Δ_it’s for the CC approach).

Remark 2. In general, for nonlinear links and continuous covariates, a SC model will not be possible or practical (cf Remark 1) as n * T regression parameters would be needed. The other thing to note is that such a general imputation model requires the estimation of 11 regression parameters (four in the inference model plus seven in the imputation model), unlike the five regression parameters in the CC model (though as mentioned previously, the CC requires an estimate of the distribution of the auxiliary covariates, for which we use the empirical distribution) and will only be equivalent to the CC specification asymptotically; we will explore this further in the simulations in Section 4.

Remark 3. For linear links, we have fewer problems. For example, the following (mean) inference model:

E (Y_{i t} = 1 ∣ Y_{i, t - 1} = y, x) = β_{0} + β_{1} (t - 1) + β_{2} x + β_{3} y, for t = 1, 2

can hold simultaneously with

E (Y_{i t} = 1 ∣ Y_{i, t - 1} = y, x, v) = α_{0}^{u} + α_{1}^{u} (t - 1) + α_{2}^{u} x + α_{3}^{u} y + α_{4}^{u} v, for t = 1, 2

as the marginalization over p(v|x) keeps the same functional form for the mean. However, these non-CC approaches require estimation of more regression parameters (9 vs 5) than the CC model.

Next, we point out factors that affect the bias of the inference model parameters for non–MBC-F models including: (1) ϕ_v, the effect of auxiliary covariate V in the MDM; (2) α, the effect of auxiliary covariate V in the imputation model; and (3) the overall missing rate.

If ϕ_v = 0, the auxiliary covariate is not needed to impute the missing values. However, if it is used, both CC and SC will be correct and neither will give the same inference (in small samples or asymptotically) as the non–MBC-F models.

Remark 4. Note that, when we assume A-MAR, we typically do not fit the MDM explicitly, so it would not be uncommon to have unneeded auxiliary covariates in the imputation model.

When the auxiliary covariate V has no effect in imputation model, ie, α = 0 and the sample size n is large enough such that an estimator $\hat{α}$ is close to the true value, we will see no difference between the analyses. However, when sample size n is small, it is possible that $\hat{α}$ can be “far” from zero, which will negatively impact inference for non–MBC-F models. As |α| → ∞, the effect of auxiliary covariate V increases, and we expect larger bias for the non–MBC-F approach. Clearly, as the probability of any missingness goes to zero, the estimates from the three approaches will be very similar (with increasing sample size). So, if the probability of missingness is low and the auxiliary covariates effects on y are “small,” there will be less bias for non–MBC-F models.

We explore more complex settings and the issues raised here via simulations in the next section.

4. SIMULATIONS

We conduct a series of simulations to understand the “cost” of a non–MBC-F analysis compared to both MBC-F analyses in a more complex setting with four time points and more auxiliary covariates. We assess the impact of several factors on the performance of the two analyses including sample size, unneeded auxiliary covariates (certain ϕ_v = 0), estimation of p(v|x; θ), and no relation between the auxiliary covariates and the response (α = 0). The details of the simulations are provided in the following.

4.1. Simulation setup

Auxiliary covariates

Consider auxiliary covariates V with dimension p = 8, each having only two levels 0 and 1. Define

p (v) = θ_{v} = μ_{v} / (\sum_{all v^{*}} μ_{v^{*}}),

where u_v* is calculated according to a log-linear model with three-way interactions

log (μ_{v}) = \sum_{k} λ_{v_{k}}^{k} + \sum_{k \neq l} λ_{v_{k}, v_{l}}^{k l} + \sum_{k \neq l \neq m} λ_{v_{k}, v_{l}, v_{m}}^{k l m},

with $λ_{0}^{k} = 0, λ_{00}^{k l} = λ_{01}^{k l} = λ_{10}^{k l} = 0$ , $λ_{000}^{k l m} = λ_{001}^{k l m} = λ_{010}^{k l m} = λ_{011}^{k l m} = λ_{100}^{k l m} = λ_{101}^{k l m} = λ_{110}^{k l m} = 0 for k, l, m, \in {1, \dots, 8}$ . One set of the true values for the remainder of the λ’s is randomly generated from

λ_{1} = (λ_{1}^{1}, \dots, λ_{1}^{8}) ~ Unif (8, 0, 1)

λ_{11}^{k l} ~ N (1, 0, 0.4) for k, l \in {1, \dots, 8} and k \neq l

λ_{111}^{k l m} ~ N (- 0.1, 0.2) for k, l, m \in {1, \dots, 8} and k \neq l \neq m .

Here, we can examine the impact of estimating p(v|x; θ) using the empirical distribution on the operating characteristics of the estimates of the inference model parameters in the CC approach.

The inference model

In the setting of four longitudinal responses, Y_t, the true inference model is specified as

logit P (Y_{i t} = 1 ∣ Y_{i, t - 1} = y) = β_{0} + β_{1} (t - 1) + β_{2} y .

(6)

The imputation model

For the CC approach, the true imputation model has the following form:

logit P (Y_{i t} = 1 ∣ Y_{i, t - 1} = y, v_{i}) = Δ_{i t} + \sum_{j = 1}^{p} α_{j} v_{i j} .

(7)

Δ_it is a function of (β, θ) and y and has seven possible values based on unique combinations of (t – 1, y): {(0, 0), (1, 0), (1, 1), (2, 0), (2, 1), (3, 0), (3, 1)} in the inference model in (6). Recall that, for the CC analysis, the Δ’s in the imputation model are constrained so that the conditional mean E(Y_t|Y_t−1 = y) obtained by marginalization of E(Y_t|Y_t−1 = y, v) over auxiliary covariates V is equal to that from the inference model.

The CC model is the true data generating model here. We consider three different sets of parameter values in the imputation model (ie, α) as given in Table 1. For each set, we simulate the full data response Y from (7) using the parameter values given in Table 1. The non–MBC-F and SC approaches use the same inference model as above, with the following two imputation models, respectively.

TABLE 1.

Simulation parameter values

Inference Model		MDM				Imputation Set 1				Imputation Set 2				Imputation Set 3
β₀	−0.5	ϕ_v,1	−0.5	ϕ_v,7	0	α₁	0.4*3	α₆	−0.6*3	α₁	0.4*1.8	α₆	−0.6*1.8	α₁	0.4*0.1	α₆	−0.6*0.1
β₁	0.25	ϕ_v2	− 1.2	ϕ_v,8	0	α₂	0.3*3	α₇	−0.3*3	α₂	0.3*1.8	α₇	−0.3* 1.8	α₂	0.3*0.1	α₇	−0.3*0.1
β₂	0.4	ϕ_v,3	−0.8	ϕ_v,9	0	α₃	0.5*3	α₈	−0.7*3	α₃	0.5*1.8	α₈	−0.7*1.8	α₃	0.5*0.1	α₈	−0.7*0.1
		ϕ_v,3	0.5	ϕ₀	0.5	α₄	0.2*3			α₄	0.2*1.8			α₄	0.2*0.1
		ϕ_v,5	0.6	ϕ₁	0	α₅	−0.4*3			α₅	−0.4*1.8			α₅	−0.4*0.1
		ϕ_v,6	0

Open in a new tab

Abbreviation: MDM, missing data mechanism.

The non–MBC-F model is specified as

logit P (Y_{i t} = 1 ∣ Y_{i, t - 1} = y, v_{i}) = α_{0}^{u} + α_{1}^{u} (t - 1) + α_{2}^{u} y + \sum_{j = 1}^{8} α_{v, j}^{u} v_{i j},

(M1)

and the SC model is specified as

logit P (Y_{i t} = 1 ∣ Y_{i, t - 1} = y, v_{i}) = α_{0}^{SC} + α_{1}^{SC} I {t = 2} + α_{2}^{SC} I {t = 3} + α_{3}^{SC} I {t = 4} + α_{4}^{SC} I {t = 2} y + α_{5}^{SC} I {t = 3} y + α_{6}^{SC} I {t = 4} y + \sum_{j = 1}^{8} α_{v, j}^{SC} v_{i j} .

(M2)

The SC model parameters, $α_{i}^{SC} : j = 1, \dots, 6$ , are sufficient to ensure that the inference model in (6) can hold simultaneously with the SC imputation model. There are more regression parameters in the SC model, but it should be “equivalent” to the CC analysis in larger samples.

Specification of MDM

We assume that the MDM is A-MAR and has the following form:

logit P (R_{i t} = 0 ∣ R_{i, t - 1} = 1, Y_{i} = y_{i}, v_{i}) = ϕ_{0} + ϕ_{1} y_{i, t - 1} + ϕ_{2} y_{i t} + \sum_{j = 1}^{8} ϕ_{v, j} v_{i j},

(8)

where R_it = I (Y_it is observed). The values of ϕ are listed in Table 1. With these values, we generate monotone missingness; the missing data rate at T = 4 is about 45%. By setting ϕ₂ = 0, the MDM is assumed to be MAR conditional on the auxiliary covariates. Missingness in the full data response is generated using (8).

Inclusion of unnecessary auxiliary covariates V for A-MAR

To compare the robustness of all the analyses to the inclusion of auxiliary covariates, which are unnecessary for A-MAR assumption but predictive of the outcome Y, we set the coefficients of V₅, … , V₈(ϕ_v,5, … , ϕ_v,8) to zero in the MDM in (8). Thus, V₁, … , V₄ are necessary auxiliary covariates, and V₅, … , V₈ are unnecessary (or extra) auxiliary covariates.

For each model considered, we fit models with and without unnecessary auxiliary covariates. This is to assess the development regarding ϕ_v from Section 3 in more detail in practice.

Sample size

We run simulations with sample sizes of 200, 500, 20 000, and 100 000. The largest sample size is meant to correspond to an “asymptotic result.” For each sample size, the results are based on 1000 simulated datasets.

4.2. Comparisons between MBC-F and non–MBC-F approaches

We compare the performance in six settings: (1) U* as non–MBC-F analysis with imputation model (M1) using all available auxiliary covariates, (2) SC* as SC analysis with imputation model (M2) using all available auxiliary covariates, (3) CC* as CC analysis using all auxiliary covariates, (4) U as non–MBC-F analysis with imputation model (M1) using only necessary auxiliary covariates, (5) SC as SC analysis with imputation model (M2) using only necessary auxiliary covariates, and (6) CC as CC analysis using only necessary auxiliary covariates.

For the imputation models, M1 and M2, to remove any bias from a small “finite” number of imputations, the missing outcomes are imputed m = 100 times based on the imputation model. The inference model is fit for each of these 100 “complete” datasets. We make inference for β and conditional mean E(Y_t|Y_t−1 = y) according to Rubin’s multiple imputation rules.⁴ We use the optim() function in R to obtain the maximum likelihood estimates for the CC analysis. For randomly selected starting values, we run optim() several times to ensure the global maximum is found.

We compute the bias and mean squared error (MSE) for β and the 90% and 95% confidence interval coverage rates. Tables 2 and 3 show simulation results for parameter setting 1, Tables 4 and 5 show simulation results for parameter setting 2, and Tables S.1 and S.2 (in the supplementary materials) show simulation results for parameter setting 3. The first parameter setting illustrates the setting where the auxiliary covariates have a large impact on the response (ie, in the inference model). The second parameter setting corresponds to a smaller impact, and the third corresponds to the auxiliary covariates being almost unrelated to the response; the third setting will allow us to assess the conjecture in Section 3 that, in larger sample sizes, a non–MBC-F analysis will be essentially valid.

TABLE 2.

Imputation parameter setting 1: comparison of non-MBC analysis (U) with saturation compatible (SC) and constraint compatible (CC). We denote approaches that have all available auxiliary covariates as U*, SC*, and CC* and only necessary auxiliary covariates as U, SC, and CC. Results based on 1000 simulated datasets with sample size 200, 500, 20 000, and 100 000. Monte Carlo standard errors are in parentheses

Size	Parameter	U*	SC*	CC*	U	SC	CC
200	Bias β₀	−0.026(0.004)	−0.001(0.004)	−0.003(0.004)	−0.001(0.004)	0.001(0.004)	0.003(0.004)
	Bias β₁	0.021(0.002)	−0.001(0.002)	0.002(0.002)	−0.012(0.003)	−0.001(0.003)	−0.002(0.002)
	Bias β₂	−0.005(0.008)	0.001(0.008)	−0.006(0.007)	0.003(0.008)	0.001(0.008)	−0.001(0.008)
	10³ *MSE β₀	14.8(0.66)	14.4(0.64)	14.1(0.64)	14.7(0.66)	14.1(0.64)	14.1(0.64)
	10³ *MSE β₁	6.1(0.28)	5.9(0.27)	5.5(0.25)	6.9(0.33)	5.9(0.23)	5.4(0.21)
	10³ *MSE β₂	59.8(2.7)	59.8(2.6)	54.8(2.4)	65.0(2.8)	55.7(2.7)	53.2(2.4)
500	Bias β₀	−0.023(0.002)	0.001(0.002)	0.001(0.002)	0.003(0.002)	0.001(0.002)	0.001(0.002)
	Bias β₁	0.024(0.001)	0.003(0.002)	0.004(0.001)	−0.011(0.002)	−0.003(0.002)	−0.002(0.002)
	Bias β₂	−0.013(0.005)	−0.001(0.005)	−0.001(0.004)	0.001(0.005)	−0.001(0.005)	−0.001(0.005)
	10³ *MSE β₀	6.0(0.27)	5.6(0.26)	5.2(0.24)	5.6(0.25)	5.7(0.26)	5.7(0.25)
	10³ *MSE β₁	2.7(0.13)	2.7(0.12)	2.6(0.12)	2.6(0.12)	2.3(0.10)	2.1(0.09)
	10³ *MSE β₂	21.0(0.98)	23.1(1.1)	21.8(0.99)	23.0(1.1)	21.4(0.97)	18.3(0.88)
20 000	Bias β₀	−0.024(0.00)	0.00(0.00)	−0.001(0.00)	0.002(0.00)	0.00(0.00)	0.00(0.00)
	Bias β₁	0.021(0.00)	0.00(0.00)	0.001(0.00)	−0.013(0.00)	−0.001(0.00)	−0.001(0.00)
	Bias β₂	−0.006(0.001)	−0.001(0.001)	−0.002(0.001)	0.004(0.001)	0.001(0.001)	0.001(0.001)
	10³ *MSE β₀	0.70(0.02)	0.18(0.01)	0.21(0.02)	0.15(0.01)	0.15(0.01)	0.12(0.01)
	10³ *MSE β₁	0.48(0.01)	0.14(0.01)	0.11(0.01)	0.30(0.01)	0.06(0.00)	0.05(0.00)
	10³ *MSE β₂	0.59(0.03)	0.63(0.03)	0.65(0.03)	0.54(0.03)	0.42(0.02)	0.62(0.03)
100 000	Bias β₀	−0.024(0.00)	0.00(0.00)	0.00(0.00)	0.002(0.00)	0.00(0.00)	0.00(0.00)
	Bias β₁	0.021(0.00)	0.00(0.00)	0.00(0.00)	−0.013(0.00)	0.00(0.00)	0.00(0.00)
	Bias β₂	−0.005(0.00)	0.001(0.00)	0.00(0.00)	0.005(0.00)	0.001(0.00)	0.00(0.00)
	10³ *MSE β₀	0.59(0.01)	0.06(0.00)	0.09(0.01)	0.03(0.00)	0.02(0.00)	0.03(0.00)
	10³ *MSE β₁	0.43(0.00)	0.19(0.00)	0.27(0.00)	0.26(0.01)	0.01(0.00)	0.01(0.00)
	10³ *MSE β₂	0.13(0.01)	0.15(0.01)	0.14(0.01)	0.10(0.00)	0.11(0.01)	0.07(0.00)

Open in a new tab

Abbreviations: MBC, model-based compatibility; MSE, mean squared error.

TABLE 3.

Imputation parameter setting 1 simulation results: posterior 90% and 95% confidence interval coverage rates for the six different models (U*, SC*, CC*, U, SC, and CC). Results are based on 1000 simulated datasets each for sample sizes of 200, 500, 20 000, and 100 000

Size	Parameter	U* 90	SC* 90	CC* 90	U 90	SC 90	CC 90	U* 95	SC* 95	CC* 95	U 95	SC 95	CC 95
200	β₀	0.90	0.90	0.89	0.90	0.91	0.89	0.95	0.95	0.94	0.95	0.94	0.95
	β₁	0.90	0.91	0.90	0.92	0.93	0.91	0.90	0.91	0.90	0.92	0.93	0.93
	β₂	0.89	0.89	0.89	0.89	0.89	0.89	0.93	0.93	0.92	0.94	0.94	0.93
500	β₀	0.90	0.92	0.92	0.92	0.92	0.92	0.96	0.95	0.95	0.96	0.95	0.95
	β₁	0.90	0.92	0.92	0.91	0.92	0.92	0.96	0.95	0.95	0.96	0.95	0.95
	β₂	0.89	0.90	0.90	0.89	0.90	0.90	0.93	0.93	0.94	0.93	0.94	0.94
20 000	β₀	0.41	0.92	0.92	0.92	0.92	0.93	0.54	0.94	0.94	0.97	0.94	0.94
	β₁	0.16	0.92	0.90	0.53	0.92	0.92	0.25	0.94	0.96	0.68	0.95	0.96
	β₂	0.85	0.90	0.91	0.84	0.90	0.92	0.90	0.94	0.94	0.89	0.94	0.95
100 000	β₀	0.00	0.93	0.93	0.91	0.93	0.93	0.01	0.93	0.93	0.96	0.94	0.94
	β₁	0.00	0.93	0.92	0.03	0.93	0.93	0.00	0.96	0.97	0.05	0.96	0.97
	β₂	0.81	0.93	0.93	0.78	0.94	0.95	0.89	0.95	0.96	0.86	0.94	0.96

Open in a new tab

TABLE 4.

Imputation parameter setting 2: comparison of non-MBC analysis (U) with saturation compatible (SC) and constraint compatible (CC). We denote approaches that have all available auxiliary covariates as U*, SC*, and CC* and only necessary auxiliary covariates as U, SC, and CC. Results based on 1000 simulated datasets with sample size 200, 500, 20 000, and 100 000. Monte Carlo standard errors are in parentheses

Size	Parameter	U*	SC*	CC*	U	SC	CC
200	Bias β₀	−0.014(0.004)	0.001(0.004)	0.001(0.004)	0.002(0.004)	0.001(0.004)	0.001(0.004)
	Bias β₁	0.007(0.003)	−0.005(0.003)	−0.003(0.003)	−0.005(0.003)	−0.005(0.003)	−0.002(0.003)
	Bias β₂	0.013(0.007)	0.008(0.007)	0.003(0.007)	0.014(0.007)	0.008(0.007)	0.003(0.007)
	10³ *MSE β₀	42.4(0.78)	12.7(0.79)	13.4(0.78)	11.9(0.78)	3.6(0.79)	4.0(0.78)
	10³ *MSE β₁	6.9(0.28)	7.2(0.30)	6.8(0.28)	6.4(0.26)	6.8(0.27)	6.3(0.26)
	10³ *MSE β₂	65.5(2.7)	65.8(2.7)	63.6(2.6)	64.5(2.7)	63.9(2.7)	61.0(2.6)
500	Bias β₀	−0.015(0.002)	0.004(0.003)	0.003(0.002)	−0.00(0.002)	0.00(0.002)	−0.001(0.002)
	Bias β₁	0.01(0.002)	−0.001(0.002)	−0.001(0.002)	−0.008(0.002)	−0.001(0.002)	−0.001(0.002)
	Bias β₂	0.010(0.005)	0.009(0.005)	0.008(0.005)	0.008(0.005)	0.007(0.005)	0.007(0.004)
	10³ *MSE β₀	35.3(0.49)	6.2(0.49)	7.2(0.48)	5.4(0.49)	4.0(0.30)	2.5(0.39)
	10³ *MSE β₁	2.7(0.12)	2.9(0.13)	2.7(0.12)	2.5(0.11)	2.6(0.12)	2.4(0.11)
	10³ *MSE β₂	32.6(1.4)	32.7(1.4)	31.9(1.3)	32.4(1.4)	31.7(1.4)	30.2(1.3)
20 000	Bias β₀	−0.014(0.00)	0.003(0.00)	0.003(0.00)	0.001(0.001)	−0.00(0.00)	−0.00(0.00)
	Bias β₁	0.011(0.00)	0.00(0.00)	0.00(0.00)	−0.007(0.00)	0.00(0.00)	−0.00(0.00)
	Bias β₂	0.003(0.001)	0.002(0.001)	0.001(0.001)	0.00(0.001)	0.00(0.001)	0.00(0.001)
	10³ *MSE β₀	0.90(0.02)	0.73(0.01)	0.90(0.00)	0.81(0.01)	0.37(0.01)	0.68(0.00)
	10³ *MSE β₁	0.18(0.01)	0.16(0.01)	0.15(0.01)	0.12(0.01)	0.07(0.00)	0.05(0.00)
	10³ *MSE β₂	0.56(0.02)	0.54(0.02)	0.45(0.02)	0.49(0.01)	0.44(0.01)	0.41(0.01)
100 000	Bias β₀	−0.013(0.00)	0.00(0.00)	0.00(0.00)	0.001(0.00)	0.00(0.00)	0.00(0.00)
	Bias β₁	0.01(0.00)	0.00(0.00)	0.00(0.00)	−0.008(0.00)	0.00(0.00)	0.00(0.00)
	Bias β₂	0.003(0.00)	0.002(0.00)	0.002(0.00)	0.001(0.00)	0.001(0.00)	0.001(0.00)
	10³ *MSE β₀	0.52(0.01)	0.07(0.01)	0.06(0.01)	0.12(0.00)	0.05(0.01)	0.05(0.01)
	10³ *MSE β₁	0.12(0.00)	0.01(0.00)	0.01(0.00)	0.07(0.00)	0.01(0.00)	0.01(0.00)
	10³ *MSE β₂	0.22(0.01)	0.21(0.01)	0.20(0.01)	0.18(0.01)	0.15(0.00)	0.14(0.00)

Open in a new tab

Abbreviations: MBC, model-based compatibility; MSE, mean squared error.

TABLE 5.

Imputation parameter setting 2 simulation results: posterior 90% and 95% confidence interval coverage rates for the six different models (U*, SC*, CC*, U, SC, and CC). Results are based on 1000 simulated datasets each for sample sizes of 200, 500, 20 000, and 100 000

Size	Parameter	U* 90	SC* 90	CC* 90	U 90	SC 90	CC 90	U* 95	SC* 95	CC* 95	U 95	SC 95	CC 95
200	β₀	0.90	0.92	0.92	0.91	0.92	0.92	0.96	0.96	0.95	0.96	0.96	0.96
	β₁	0.90	0.91	0.91	0.90	0.91	0.91	0.95	0.96	0.94	0.95	0.96	0.95
	β₂	0.87	0.87	0.88	0.87	0.87	0.88	0.92	0.92	0.93	0.93	0.93	0.94
500	β₀	0.91	0.91	0.90	0.91	0.91	0.90	0.96	0.95	0.95	0.96	0.95	0.95
	β₁	0.90	0.92	0.90	0.91	0.92	0.91	0.95	0.95	0.95	0.96	0.96	0.95
	β₂	0.88	0.88	0.90	0.88	0.88	0.90	0.93	0.93	0.94	0.94	0.94	0.94
20 000	β₀	0.72	0.90	0.90	0.91	0.91	0.91	0.82	0.94	0.93	0.96	0.95	0.95
	β₁	0.64	0.90	0.92	0.77	0.91	0.92	0.76	0.95	0.96	0.84	0.95	0.96
	β₂	0.86	0.90	0.92	0.84	0.91	0.92	0.93	0.93	0.97	0.93	0.93	0.97
100 000	β₀	0.22	0.91	0.90	0.90	0.91	0.91	0.33	0.96	0.94	0.95	0.96	0.95
	β₁	0.12	0.91	0.93	0.36	0.91	0.93	0.20	0.95	0.96	0.48	0.95	0.96
	β₂	0.88	0.90	0.93	0.86	0.90	0.93	0.92	0.93	0.97	0.92	0.93	0.97

Open in a new tab

Overall, it is clear that the performance of the CC analysis and the SC analysis are quite close, and both are more accurate and efficient than the non–MBC-F analysis; however, recall that the SC approach is not possible in general (eg, with continuous auxiliary covariates as discussed in Section 3). We also note the findings regarding the comment in Remark 2. Both CC and SC seem to have similar efficiency (the former needs an estimate of the distribution of the auxiliary covariates and the latter has more regression parameters to estimate). So, at least for the simulation settings considered here, there is a similar trade-off between the extra regression parameters in the SC approach and having to estimate the distribution of the auxiliary covariates for CC. The bias for the non–MBC-F analysis does not disappear with sample size. On the contrary, the estimates for the two MBC-F approaches are asymptotically unbiased as shown in these Tables for the larger sample sizes. The true coverage rates of confidence intervals for non–MBC-F analysis severely deteriorate for the larger sample sizes. For example, for the largest sample size considered, the coverage rate of the 95% CI for β₀ and β₁ from the non–MBC-F analysis with unneeded auxiliary covariates (U*) are less than 25% in imputation parameter settings 1 and 2 (Tables 3 and 5).

These results also illustrate the robustness of the two MBC-F analyses to unnecessary auxiliary covariates. However, when they are included in the non–MBC-F analysis, its performance deteriorates very badly in terms of bias in β and the coverage rate of confidence intervals. For sample size of 20 000 and 100 000 in parameter settings 1 and 2, the bias in β₀ from the non–MBC-F analysis including unnecessary auxiliary covariates is about 10 times that from the non–MBC-F analysis including only necessary auxiliary covariates, and the coverage rate for the former is about 20% (Tables 3 and 5). On the contrary, it appears that the MBC-F analyses are minimally impacted in terms of efficiency loss and bias increase when unnecessary auxiliary covariates are added into the imputation model. This is a practically desirable feature for a MBC-F analysis since it can minimize the need for model selection for the auxiliary covariates in the MDM.

The magnitude of the coefficients of auxiliary covariates in the imputation model (α) significantly impact the performance of the non–MBC-F analysis. For parameter settings 1 and 2, the bias increases and the true coverage rate drops for the non–MBC-F analysis is clear (Tables 2–5). This supports our conjecture in Section 3. However, for parameter setting 3 where auxiliary covariates V have the smallest imputation model coefficients, all six models perform similarly (Tables S.1 and S.2).

5. RECOMMENDATIONS

Based on the example in Section 3 and the simulations in Section 4, we provide a few recommendations. In general, for the common setting where the analyst fits both the imputation and inference models, an MBC-F approach is clearly preferred. SC is a very simple MBC-F approach that does not require a model for the auxiliary covariates or explicit computation of Δ_it; however, this can only be used for the case of categorical covariates. The inclusion of unnecessary auxiliary covariates, which is typical, is problematic and exacerbates bias in non–MBC-F models; as such, MBC-F approaches are preferred again.

6. DISCUSSION

We have illustrated with an analytic example and simulations the misleading inferences that can result when conducting a non–MBC-F analysis. For example, we saw the negative impact of unneeded auxiliary covariates in the non–MBC-F model. As such, we urge caution in making inference based on a non–MBC-F analysis when it can be avoided. We also observed similar performance for the CC and SC models. The latter can be inefficient (or impossible) to specify for more complex inference models as the number of regression parameters in the imputation model to estimate greatly increases.

In addition, estimation of the distribution of the auxiliary covariates, v given x can be more complex when x includes more than a few categorical covariates; otherwise, the empirical distribution can be used to estimate it (and in a randomized trial, the only covariate is typically treatment). For this case, one might use a Bayesian nonparametric model (to minimize model misspecification problems) to estimate this distribution (eg, a Dirichlet process mixture of normals¹⁷) and Monte Carlo integration to compute the Δ_it; we are currently exploring this.

We would expect to see worse performance for MICE approaches,¹⁸ given they typically do not correspond to a valid joint distribution and are most often non–MBC-F without explicit adjustments.¹⁰

Extension to the case where missingness is still not at random even after including all available auxiliary covariates would be very useful. We are currently working on (Bayesian) approaches for MBC-F analysis under this scenario,¹⁹ which allow for sensitivity analysis.²⁰ Also, formal construction of CC approaches in different MDMs is needed using the framework in Section 2. We are also working on such extensions.

7. SUPPLEMENTARY MATERIALS

Tables referenced in Section 4 are available with this paper at the website.

Supplementary Material

supp material

NIHMS1639699-supplement-supp_material.pdf^{(69.6KB, pdf)}

ACKNOWLEDGEMENT

This work was partially supported by the National Institutes of Health under grant CA183854.

Footnotes

SUPPORTING INFORMATION

Additional supporting information may be found online in the Supporting Information section at the end of the article.

REFERENCES

1.Lavori PW, Dawson R, Shera D. A multiple imputation strategy for clinical trials with truncation of patient data. Statist Med. 1995;14(17):1913–1925. [DOI] [PubMed] [Google Scholar]
2.Taylor L, Zhou XH. Multiple imputation methods for treatment noncompliance and nonresponse in randomized clinical trials. Biometrics. 2009;65(1):88–95. [DOI] [PubMed] [Google Scholar]
3.Burns RA, Butterworth P, Kiely KM, et al. Multiple imputation was an efficient method for harmonizing the mini-mental state examination with missing item-level data. J Clinical Epidemiology. 2011;64(7):787–793. [DOI] [PubMed] [Google Scholar]
4.Rubin DB. Multiple Imputation for Nonresponse in Surveys. Hoboken, NJ: John Wiley & Sons; 1987. [Google Scholar]
5.Rubin DB. Multiple imputation after 18+ years (with discussion). J Am Statist Assoc. 1996;91:473–489. [Google Scholar]
6.Kenward MG, Carpenter J. Multiple imputation: current perspectives. Statist Methods Med Res. 2007;16:199–218. [DOI] [PubMed] [Google Scholar]
7.Reiter JP, Raghunathan TE. The multiple adaptations of multiple imputation. J Am Statist Assoc. 2007;102(480):1462–1471. [Google Scholar]
8.Zhang M, Tsiatis AA, Davidian M. Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics. 2008;64(3):707–715. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Conlon ASC, Taylor JMG, Sargent DJ. Improving efficiency in clinical trials using auxiliary information: application of a multi-state cure model. Biometrics. 2015;71(2):460–468. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Bartlett JW, Seaman SR, White IR, Carpenter JR. Initiative Alzheimer’s Disease Neuroimaging. Multiple imputation of covariates by fully conditional specification: accommodating the substantive model. Stat Methods Med Res. 2015;24(4):462–487. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Heagerty PJ, Zeger SL. Marginalized multilevel models and likelihood inference (with comments and a rejoinder by the authors). Statistical Science. 2000;15(1):1–26. [Google Scholar]
12.Heagerty PJ. Marginally specified logistic-normal models for longitudinal binary data. Biometrics. 1999;55(3):688–698. [DOI] [PubMed] [Google Scholar]
13.Heagerty PJ. Marginalized transition models and likelihood inference for longitudinal categorical data. Biometrics. 2002;58(2):342–351. [DOI] [PubMed] [Google Scholar]
14.Lee K, Daniels MJ, Joo Y. Flexible marginalized models for bivariate longitudinal ordinal data. Biostatistics. 2013;14(3):462–476. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Roy J, Lum KJ, Daniels MJ. A Bayesian nonparametric approach to marginal structural models for point treatments and a continuous or survival outcome. Biostatistics. 2016;18(1):32–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Daniels MJ, Wang C, Marcus BH. Fully Bayesian inference under ignorable missingness in the presence of auxiliary covariates. Biometrics. 2014;70(1):62–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Müller P, Erkanli A, West M. Bayesian curve fitting using multivariate normal mixtures. Biometrika. 1996;83(1):67–79. [Google Scholar]
18.Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P. A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology. 2001;27(1):85–96. [Google Scholar]
19.Zhou T, Daniels MJ, Mueller P. A nonparametric Bayesian approach to dropout in longitudinal studies with auxiliary covariates. 2018. In press. [DOI] [PMC free article] [PubMed]
20.Daniels MJ, Hogan JW. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. New York, NY: CRC Press; 2008. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supp material

NIHMS1639699-supplement-supp_material.pdf^{(69.6KB, pdf)}

[R1] 1.Lavori PW, Dawson R, Shera D. A multiple imputation strategy for clinical trials with truncation of patient data. Statist Med. 1995;14(17):1913–1925. [DOI] [PubMed] [Google Scholar]

[R2] 2.Taylor L, Zhou XH. Multiple imputation methods for treatment noncompliance and nonresponse in randomized clinical trials. Biometrics. 2009;65(1):88–95. [DOI] [PubMed] [Google Scholar]

[R3] 3.Burns RA, Butterworth P, Kiely KM, et al. Multiple imputation was an efficient method for harmonizing the mini-mental state examination with missing item-level data. J Clinical Epidemiology. 2011;64(7):787–793. [DOI] [PubMed] [Google Scholar]

[R4] 4.Rubin DB. Multiple Imputation for Nonresponse in Surveys. Hoboken, NJ: John Wiley & Sons; 1987. [Google Scholar]

[R5] 5.Rubin DB. Multiple imputation after 18+ years (with discussion). J Am Statist Assoc. 1996;91:473–489. [Google Scholar]

[R6] 6.Kenward MG, Carpenter J. Multiple imputation: current perspectives. Statist Methods Med Res. 2007;16:199–218. [DOI] [PubMed] [Google Scholar]

[R7] 7.Reiter JP, Raghunathan TE. The multiple adaptations of multiple imputation. J Am Statist Assoc. 2007;102(480):1462–1471. [Google Scholar]

[R8] 8.Zhang M, Tsiatis AA, Davidian M. Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics. 2008;64(3):707–715. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Conlon ASC, Taylor JMG, Sargent DJ. Improving efficiency in clinical trials using auxiliary information: application of a multi-state cure model. Biometrics. 2015;71(2):460–468. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Bartlett JW, Seaman SR, White IR, Carpenter JR. Initiative Alzheimer’s Disease Neuroimaging. Multiple imputation of covariates by fully conditional specification: accommodating the substantive model. Stat Methods Med Res. 2015;24(4):462–487. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Heagerty PJ, Zeger SL. Marginalized multilevel models and likelihood inference (with comments and a rejoinder by the authors). Statistical Science. 2000;15(1):1–26. [Google Scholar]

[R12] 12.Heagerty PJ. Marginally specified logistic-normal models for longitudinal binary data. Biometrics. 1999;55(3):688–698. [DOI] [PubMed] [Google Scholar]

[R13] 13.Heagerty PJ. Marginalized transition models and likelihood inference for longitudinal categorical data. Biometrics. 2002;58(2):342–351. [DOI] [PubMed] [Google Scholar]

[R14] 14.Lee K, Daniels MJ, Joo Y. Flexible marginalized models for bivariate longitudinal ordinal data. Biostatistics. 2013;14(3):462–476. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Roy J, Lum KJ, Daniels MJ. A Bayesian nonparametric approach to marginal structural models for point treatments and a continuous or survival outcome. Biostatistics. 2016;18(1):32–47. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Daniels MJ, Wang C, Marcus BH. Fully Bayesian inference under ignorable missingness in the presence of auxiliary covariates. Biometrics. 2014;70(1):62–72. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Müller P, Erkanli A, West M. Bayesian curve fitting using multivariate normal mixtures. Biometrika. 1996;83(1):67–79. [Google Scholar]

[R18] 18.Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P. A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology. 2001;27(1):85–96. [Google Scholar]

[R19] 19.Zhou T, Daniels MJ, Mueller P. A nonparametric Bayesian approach to dropout in longitudinal studies with auxiliary covariates. 2018. In press. [DOI] [PMC free article] [PubMed]

[R20] 20.Daniels MJ, Hogan JW. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. New York, NY: CRC Press; 2008. [Google Scholar]

PERMALINK

A note on compatibility for inference with missing data in the presence of auxiliary covariates

Michael J Daniels

Xuan Luo

Abstract

1. INTRODUCTION

2. GENERAL FRAMEWORK AND MODEL-BASED COMPATIBILITY

2.1. Definition of A-MAR

2.2. MBC in the presence of auxiliary covariates

2.3. General specification of model-based compatible using constraints (CC)

3. SIMPLE LONGITUDINAL EXAMPLE

4. SIMULATIONS

4.1. Simulation setup

Auxiliary covariates

The inference model

The imputation model

TABLE 1.

Specification of MDM

Inclusion of unnecessary auxiliary covariates V for A-MAR

Sample size

4.2. Comparisons between MBC-F and non–MBC-F approaches

TABLE 2.

TABLE 3.

TABLE 4.

TABLE 5.

5. RECOMMENDATIONS

6. DISCUSSION

7. SUPPLEMENTARY MATERIALS

Supplementary Material

ACKNOWLEDGEMENT

Footnotes

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A note on compatibility for inference with missing data in the presence of auxiliary covariates

Michael J Daniels

Xuan Luo

Abstract

1. INTRODUCTION

2. GENERAL FRAMEWORK AND MODEL-BASED COMPATIBILITY

2.1. Definition of A-MAR

2.2. MBC in the presence of auxiliary covariates

2.3. General specification of model-based compatible using constraints (CC)

3. SIMPLE LONGITUDINAL EXAMPLE

4. SIMULATIONS

4.1. Simulation setup

Auxiliary covariates

The inference model

The imputation model

TABLE 1.

Specification of MDM

Inclusion of unnecessary auxiliary covariates V for A-MAR

Sample size

4.2. Comparisons between MBC-F and non–MBC-F approaches

TABLE 2.

TABLE 3.

TABLE 4.

TABLE 5.

5. RECOMMENDATIONS

6. DISCUSSION

7. SUPPLEMENTARY MATERIALS

Supplementary Material

ACKNOWLEDGEMENT

Footnotes

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases