Theory and Inference for Regression Models with Missing Responses and Covariates

Qingxia Chen; Joseph G Ibrahim; Ming-Hui Chen; Pralay Senchaudhuri

doi:10.1016/j.jmva.2007.08.009

. Author manuscript; available in PMC: 2009 Jan 23.

Published in final edited form as: J Multivar Anal. 2008 Jul;99(6):1302–1331. doi: 10.1016/j.jmva.2007.08.009

Theory and Inference for Regression Models with Missing Responses and Covariates

Qingxia Chen ¹, Joseph G Ibrahim ¹, Ming-Hui Chen ¹, Pralay Senchaudhuri ¹

PMCID: PMC2630292 NIHMSID: NIHMS52711 PMID: 19169388

Summary

In this paper, we carry out an in-depth theoretical investigation for inference with missing response and covariate data for general regression models. We assume that the missing data are Missing at Random (MAR) or Missing Completely at Random (MCAR) throughout. Previous theoretical investigations in the literature have focused only on missing covariates or missing responses, but not both. Here, we consider theoretical properties of the estimates under three different estimation settings: complete case analysis (CC), a complete response analysis (CR) that involves an analysis of those subjects with only completely observed responses, and the all case analysis (AC), which is an analysis based on all of the cases. Under each scenario, we derive general expressions for the likelihood and devise estimation schemes based on the EM algorithm. We carry out a theoretical investigation of the three estimation methods in the normal linear model and analytically characterize the loss of information for each method, as well as derive and compare the asymptotic variances for each method assuming the missing data are MAR or MCAR. In addition, a theoretical investigation of bias for the CC method is also carried out. A simulation study and real dataset are given to illustrate the methodology.

1 Introduction

Missing data arise in nearly every type of application in the statistical sciences. Over the past 30 years, there has been an enormous literature on likelihood-based methods of estimation and inference for a wide variety of missing data problems, including missing covariate data in linear models, generalized linear models, generalized linear mixed models, and survival models, as well as missing response data for models of longitudinal data. Since the literature is too vast to list here, we refer the reader to three review articles that discuss various methods for handling missing data: Little (1992), Horton and Laird (1999), and Ibrahim, Chen, Lipsitz, and Herring (2005). There has also been some literature for likelihood-based methods for establishing identifiability and asymptotic properties of estimators in missing covariate problems including Robins and Rotnitzky (1995), Lipsitz, Ibrahim, and Zhao (1999), Herring and Ibrahim (2001), and Chen, Ibrahim, and Shao (2004). There also has been some work on models for longitudinal data with nonignorable responses, including Baker and Laird (1988), Ibrahim, Chen, and Lipsitz (2001), and Tang, Little, and Raghunathan (2003). There has been some work done on maximum likelihood estimation in the presence of ignorable or nonignorable missing response and/or covariate data in longitudinal models, including Stubbendick and Ibrahim (2003, 2006) and Chen and Ibrahim (2006). However, there has been almost no literature examining theoretical properties of estimators in the presence of both MAR responses and covariates in regression models. This type of missing data problem presents many new challenges in estimation and theory that do not arise in missing covariate problems or missing response problems alone.

We refer to a regression problem with missing covariates and responses as a “missing (x, y) problem” throughout. An important issue in a missing (x, y) problem is the contribution to the information matrix of the cases with missing responses alone, the contributions of the cases with missing covariates alone, and the contributions of the cases with missing covariates and responses. In particular, we consider theoretical properties of the estimates under three different estimation settings: complete case analysis (CC), a complete response analysis (CR) that involves an analysis of those subjects with only completely observed responses, and the all case analysis (AC), which is an analysis based on all of the cases. Under each scenario, we derive general expressions for the likelihood and devise estimation schemes based on the EM algorithm. We compare the three estimation methods in the normal linear model and characterize the loss of information for each method as well as derive and compare the asymptotic variances for each method assuming the missing data are MAR. For the linear model, we show that AC analysis has more information than the CR and CC analyses in the sense that the Fisher information for the AC analysis has a greater determinant and trace compared to the Fisher information matrices for the CR and CC analyses, and the CR analysis yields a Fisher information with a greater determinant and trace compared to the Fisher information matrix for the CC analysis. Moreover, we show that the asymptotic variances of the estimates for the CC analyses are larger than the other two methods (CR or AC), and the asymptotic variances for some estimates in the AC analysis are larger than that of the corresponding estimates based on a CR analysis. We also carry out a theoretical investigation of bias for the CC method and analytically show that CC estimates under certain settings are biased.

The rest of this paper is organized as follows. In Section 2, we consider the basic data structure for a regression model with MAR response and/or covariate data. In Section 3, we consider the three analysis methods: CC, CR, and AC. For each method, we give the likelihood function corresponding to the method. Section 4 gets into the heart of the theory and properties of estimators for the three methods and several results are given characterizing the behavior of the Fisher information matrix and asymptotic variances for each method for the normal linear model with missing (x, y). Section 5 examines bias issues for MAR response and covariate data. Section 6 presents a simulation study and real dataset illustrating the theoretical results derived in Section 4. A brief discussion is given in Section 7. In the Appendix A, we devise the computational schemes based on the EM algorithm for obtaining the maximum likelihood estimates (MLE’s), and derive E and M-steps of the EM algorithm as well as the observed information matrix based on the observed data using Louis’s method for the missing (x, y) problem.

2 Model and Data Structure

2.1 Model

Suppose that {(x_i, y_i), i = 1, 2,…, n} are independent observations, where y_i is the response variable, and x_i = (x_i1, … , x_ip)′ is a p × 1 random vector of covariates. We specify the joint distribution of (x_i, y_i) by specifying the conditional distribution of y_i given x_i, denoted [y_i | x_i], and the marginal distribution of x_i, denoted [x_i].

We let

f (x_{i} | α)

denote the joint density for the marginal distribution [x_i], where where α is the vector of model parameters for i = 1, 2, … , n. Assume that the distribution function for [y_i | x_i] is of the form

f (y_{i} | x_{i}, β, ϕ) = f (y_{i} | x_{i}^{'} β, ζ),

(2.1)

where β = (β₁, β₂,…, β_p)′ denotes the p × 1 random vector of regression coefficients, and ζ is the column vector of nuisance parameters. In (2.1), we assume that the distribution [y_i | x_i] depends on x_i and β only through x′β. If an intercept is included in the model, x_i and β are modified accordingly.

The generalized linear model (GLM) is a special case of (2.1). In the GLM, the conditional density of [y_i|x_i] is given by

f (y_{i} | x_{i}, β, τ) = exp {a_{i}^{- 1} τ (y_{i} θ_{i} - b (θ_{i})) + c (y_{i}, τ)}, i = 1, 2, \dots, n,

(2.2)

where θ_i = θ(η_i) is the canonical parameter, $η_{i} = x_{i}^{'} β$ , and τ is a dispersion parameter. The functions b and c determine a particular family in the class, such as the binomial, normal, Poisson, etc. The functions a_i(τ) are commonly of the form $a_{i} (τ) = τ^{- 1} w_{i}^{- 1}$ , where the w_i’s are known weights. Thus, (2.1) reduces to the GLM with

f (y_{i} | x_{i}, β, ζ) = exp {a_{i} τ (y_{i} θ (x_{i}^{'} β) - b (θ (x_{i}^{'} β))) + c (y_{i}, τ)}

and ζ = τ.

2.2 Missing Data Structures

We consider a general setting in which y_i and some components of x_i may be missing. Let M_i = {1 ≤ l ≤ p : x_il is missing}, which denotes the set of indices for the i^th missing covariates. We also let Ω = {1,2,…,p} denote the whole index space for x_i. Table 1 gives the general data structure and characterizations of the various missing data patterns in the missing (x, y) problem.

Table 1.

The Data Structure with Various Missing Patterns

Pattern (Block)	Response	Covariates
Pattern (Block)	y_i	x_i
B₁	observed	observed (M_i = θ)
B₂	Observed	partially or all missing (M_i ≠ θ or M_i = Ω)
B₃	missing	observed (M_i = θ)
B₄	missing	only partially missing (M_i ≠ θ and M_i ≠ Ω)
B₅	missing	completely missing (M_i = Ω)

Open in a new tab

We denote each pattern above by B_j, j = 1,…, 5, and refer to B_j as the j^th pattern or j^th block. B₁ denotes the portion of the data with both y_i and x_i completely observed. In B₂, y_i is observed while each x_i is at least partially missing or completely missing. In B₃, y_i is missing and x_i is completely observed, in B₄, y_i is missing but at least partial x_i is observed, and both y_i and x_i are completely missing in B₅.

Based on the data structure given in Table 1, we use y_i if the i^th response is observed and y_i,mis if the i_th response variable is missing. Also, we write $x_{i} = (x_{i, obs}^{'}, x_{i, mis}^{'})'$ where $x_{i, mis}^{'} = (x_{i l}, l \in M_{i})$ and x_i,obs is the observed portion of x_i.

3 Three Analysis Methods

In this section, we assume that the missing response and the missing covariates are missing at random (MAR). Under the MAR assumption, we only need to model [y_i, x_i]. We give the forms of the observed data log-likelihood functions under three analysis methods: complete case (CC) analysis, complete response (CR) analysis, and all case (AC) analysis.

3.1 Complete Case (CC) Analysis

Because standard techniques for regression models require full response and covariate information, one simple way to avoid the problem of missing data is to analyze only those subjects who are completely observed. This method is known as a complete case (CC) analysis.

Based on the data structure displayed in Table 1, the CC analysis uses the portion of data given in Block B₁. Thus, the likelihood function under this method is given by

L_{c c} (θ) = \prod_{i : y_{i} observed, M_{i} = θ} f (y_{i} | x_{i}, β, ϕ) f (x_{i} | α),

(3.1)

where θ = (β, ϕ, α), and the log-likelihood function is given by

l_{c c} (θ) = \sum_{i : y_{i} observed, M_{i} = θ} [log f (y_{i} | x_{i}, β, ϕ) + log f (x_{i} | α)] .

(3.2)

3.2 Complete Response (CR) Analysis

The complete response cases (CR) analysis is to analyze only those subjects whose responses are completely observed. Thus, in the CR analysis, we only include the portion of data given in Blocks B₁ and B₂ of Table 1. The likelihood function under CR is given by

L_{c r} (θ) = L_{c c} \prod_{i : y_{i} observed, M_{i} \neq θ} [\int f (y_{i} | x_{i}, β, ϕ) f (x_{i, obs}, x_{i, mis} | α) d x_{i, mis}],

(3.3)

and the log-likelihood function is given by

l_{c r} (θ) = l_{c c} + \sum_{i : y_{i} observed, M_{i} \neq θ} log [\int f (y_{i} | x_{i}, β, ϕ) f (x_{i, obs}, x_{i, mis} | α) d x_{i, mis}] .

(3.4)

3.3 All Case (AC) Analysis

The all cases (AC) analysis uses the whole data. The likelihood function is given by

\begin{matrix} L_{a c} (θ) = L_{c r} [\prod_{i : y_{i} missing, M_{i} = θ} \int f (y_{i, mis} | x_{i}, β, ϕ) f (x_{i} | α) d y_{i, mis}] \\ \times [\prod_{i : y_{i} missing, M_{i} \neq θ, M_{i} \neq Ω} \int \int f (y_{i, mis} | x_{i}, β, ϕ) f (x_{i, obs}, x_{i, mis} | α) d y_{i, mis} d x_{i, mis}] \\ \times [\prod_{i : y_{i} missing, M_{i} = Ω} \int \int f (y_{i, mis} | x_{i, mis}, β, ϕ) f (x_{i, mis} | α) d y_{i, mis} d x_{i, mis}] . \end{matrix}

(3.5)

Since

\int f (y_{i, mis} | x_{i}, β, ϕ) d y_{i, mis} = 1 and \int f (x_{i, mis} | α) d x_{i, mis} = 1,

the likelihood function L_ac(θ) reduces to

\begin{matrix} L_{a c} (θ) & = & L_{c r} [\prod_{i : y_{i} missing, M_{i} = θ} f (x_{i} | α)] \\ \times [\prod_{i : y_{i} missing, M_{i} \neq θ, M_{i} \neq Ω} \int f (x_{i, obs}, x_{i, mis} | α) d x_{i, mis}] . \end{matrix}

(3.6)

Thus, the portion of the data given in Block B₅ of Table 1 does not make any contribution to the likelihood function even under the AC analysis under the MAR assumption.

Using (3.6), the log-likelihood function is given by

\begin{matrix} l_{a c} (θ) = & l_{c r} + [\sum_{i : y_{i} missing, M_{i} = θ} log f (x_{i} | α)] \\ + [\sum_{i : y_{i} missing, M_{i} \neq θ, M_{i} \neq Ω} log \int f (x_{i, obs}, x_{i, mis} | α) d x_{i, mis}] . \end{matrix}

(3.7)

4 Theoretical Comparisons Between CC, CR, and AC for the Normal Linear Regression Model

In this section, we characterize the properties of the three analysis methods by examining the Fisher information matrix under each method and determining information loss (gain) for each method as well as comparing the asymptotic variances for each method. This comparison allows us to examine the efficiency of each method. To facilitate comparisons in this section, we assume that the missing data are MCAR, since closed-form analytic results for the Fisher information are available in the (x, y) missing problem only under MCAR.

4.1 Simple Linear Regression Model with Missing Responses and Covariates

We first consider a simple normal regression model with a single covariate and unit variances. In this case, we have

f (y_{i} | x_{i}, β, α) = \frac{1}{\sqrt{2 π}} exp {- \frac{{(y_{i} - β_{0} - β_{1} x_{i})}^{2}}{2}} and f (x_{i} | α) = \frac{1}{\sqrt{2 π}} exp {- \frac{{(x_{i} - α)}^{2}}{2}} .

(4.1)

Write θ = (β′, α)′. Let n_j = #(B_j) be the cardinality of B_j for j = 1, 2, 3 and n = n₁ + n₂ + n₃. For the CC analysis, we have

l_{c c} (θ) = \sum_{i : y_{i} observed, M_{i} = θ} [- log (2 π) - \frac{{(y_{i} - β_{0} - β_{1} x_{i})}^{2}}{2} - \frac{{(x_{i} - α)}^{2}}{2}]

and the Fisher information matrix is given by

I_{c c} (θ) = - E [\frac{\partial^{2}}{\partial θ \partial θ^{'}} l_{c c} (θ)] = E [\sum_{i \in B_{1}} (\begin{matrix} 1 & x_{i} & 0 \\ x_{i} & x_{i}^{2} & 0 \\ 0 & 0 & 1 \end{matrix})] = n_{1} (\begin{matrix} 1 & α & 0 \\ α & 1 + α^{2} & 0 \\ 0 & 0 & 1 \end{matrix}) .

(4.2)

For the CR analysis, we have

l_{c r} (θ) = l_{c c} (θ) + \sum_{i \in B_{2}} {- log \sqrt{2 π} - \frac{1}{2} log (1 + β_{1}^{2}) - \frac{1}{2 (1 + β_{1}^{2})} {(y_{i} - β_{0} - β_{1} α)}^{2}} .

After some messy algebra, we obtain the Fisher information matrix given by

\begin{matrix} I_{c r} (θ) = - E [\frac{\partial^{2}}{\partial θ \partial θ^{'}} l_{c r} (θ)] \\ = n_{1} (\begin{matrix} 1 & α & 0 \\ α & 1 + α^{2} & 0 \\ 0 & 0 & 1 \end{matrix}) + n_{2} (\begin{matrix} \frac{1}{1 + β_{1}^{2}} & \frac{α}{1 + β_{1}^{2}} & \frac{β_{1}}{1 + β_{1}^{2}} \\ \frac{α}{1 + β_{1}^{2}} & \frac{α^{2}}{1 + β_{1}^{2}} + \frac{2 β_{1}^{2}}{{(1 + β_{1}^{2})}^{2}} & \frac{α β_{1}}{1 + β_{1}^{2}} \\ \frac{β_{1}}{1 + β_{1}^{2}} & \frac{α β_{1}}{1 + β_{1}^{2}} & \frac{β_{1}^{2}}{1 + β_{1}^{2}} \end{matrix}) . \end{matrix}

(4.3)

For the AC analysis, the log-likelihood function is given by

l_{a c} (θ) = l_{c r} (θ) + \sum_{i : y_{i} missing, M_{i} = θ} log f (x_{i} | α) = l_{c r} (θ) + \sum_{i \in B_{3}} [- log \sqrt{2 π} - \frac{1}{2} {(x_{i} - α)}^{2}] .

The corresponding Fisher information matrix is given by

\begin{matrix} I_{a c} (θ) = - E [\frac{\partial^{2}}{\partial θ \partial θ^{'}} l_{a c} (θ)] \\ = n_{1} (\begin{matrix} 1 & α & 0 \\ α & 1 + α^{2} & 0 \\ 0 & 0 & 1 \end{matrix}) + n_{2} (\begin{matrix} \frac{1}{1 + β_{1}^{2}} & \frac{α}{1 + β_{1}^{2}} & \frac{β_{1}}{1 + β_{1}^{2}} \\ \frac{α}{1 + β_{1}^{2}} & \frac{α^{2}}{1 + β_{1}^{2}} + \frac{2 β_{1}^{2}}{{(1 + β_{1}^{2})}^{2}} & \frac{α β_{1}}{1 + β_{1}^{2}} \\ \frac{β_{1}}{1 + β_{1}^{2}} & \frac{α β_{1}}{1 + β_{1}^{2}} & \frac{β_{1}^{2}}{1 + β_{1}^{2}} \end{matrix}) + n_{3} (\begin{matrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{matrix}) . \end{matrix}

(4.4)

Using (4.2), we have

det (I_{c c}) = n_{1}^{3}, tr (I_{c c}) = n_{1} (3 + α^{2}), and I_{c c}^{- 1} = \frac{1}{n_{1}} (\begin{matrix} 1 + α^{2} & - α & 0 \\ - α & 1 & 0 \\ 0 & 0 & 1 \end{matrix}) .

(4.5)

Observe that

I_{c r} (θ) = (\begin{matrix} n_{1} & n_{1} α & 0 \\ n_{1} α & n_{1} + n_{1} α^{2} + \frac{2 n_{2} β_{1}^{2}}{{(1 + β_{1}^{2})}^{2}} & 0 \\ 0 & 0 & n_{1} \end{matrix}) + \frac{n_{2}}{1 + β_{1}^{2}} (\begin{matrix} 1 \\ α \\ β_{1} \end{matrix}) (\begin{matrix} 1 & α & β_{1} \end{matrix}) \overset{def}{=} A + u u^{'},

where

A = (\begin{matrix} n_{1} & n_{1} α & 0 \\ n_{1} α & n_{1} + n_{1} α^{2} + \frac{2 n_{2} β_{1}^{2}}{{(1 + β_{1}^{2})}^{2}} & 0 \\ 0 & 0 & n_{1} \end{matrix})

and $u = \sqrt{\frac{n_{2}}{1 + β_{1}^{2}}} (\begin{matrix} 1 & α & β_{1})^{'} \end{matrix}$ . After some algebra, we have

A^{- 1} = \frac{1}{n_{1} [1 + \frac{2 n_{2} β_{1}^{2}}{n_{1} {(1 + β_{1}^{2})}^{2}}]} (\begin{matrix} 1 + α^{2} + \frac{2 n_{2} β_{1}^{2}}{n_{1} {(1 + β_{1}^{2})}^{2}} & - α & 0 \\ - α & 1 & 0 \\ 0 & 0 & 1 + \frac{2 n_{2} β_{1}^{2}}{n_{1} {(1 + β_{1}^{2})}^{2}} \end{matrix}) .

Note that u′A⁻¹u = n₂/n₁. Thus, we have

\begin{matrix} det (I_{c r}) & = det (A) (1 + u^{'} A^{- 1} u) = n_{1}^{3} [1 + \frac{2 n_{2} β_{1}^{2}}{n_{1} {(1 + β_{1}^{2})}^{2}}] (1 + \frac{n_{2}}{n_{1}}) \\ = n_{1}^{3} + n_{1}^{2} n_{2} + \frac{2 n_{1} n_{2} β_{1}^{2} (n_{1} + n_{2})}{{(1 + β_{1}^{2})}^{2}} = det (I_{c c}) + n_{1}^{2} n_{2} + \frac{2 n_{1} n_{2} β_{1}^{2} (n_{1} + n_{2})}{{(1 + β_{1}^{2})}^{2}}, \end{matrix}

(4.6)

the trace of I_cr is given by

tr (I_{c r}) = tr (I_{c c}) + n_{2} [1 + \frac{α^{2}}{1 + β_{1}^{2}} + \frac{2 β_{1}^{2}}{{(1 + β_{1}^{2})}^{2}}],

(4.7)

and the inverse matrix of I_cr is given by

\begin{matrix} I_{c r}^{- 1} (θ) & = A^{- 1} - \frac{1}{1 + u^{'} A^{- 1} u} A^{- 1} u u^{'} A^{- 1} = A^{- 1} - \frac{1}{1 + n_{2} / n_{1}} A^{- 1} u u^{'} A^{- 1} \\ = A^{- 1} - \frac{n_{2}}{n_{1} (n_{1} + n_{2}) (1 + β_{1}^{2})} (\begin{matrix} 1 & 0 & β_{1} \\ 0 & 0 & 0 \\ β_{1} & 0 & β_{1}^{2} \end{matrix}) \\ = \frac{1}{n_{1}} (\begin{matrix} 1 + \frac{α^{2}}{1 + \frac{2 n_{2} β_{1}^{2}}{n_{1} {(1 + β_{1}^{2})}^{2}}} - \frac{n_{2}}{(n_{1} + n_{2}) (1 + β_{1}^{2})} & - \frac{α}{1 + \frac{2 n_{2} β_{1}^{2}}{n_{1} {(1 + β_{1}^{2})}^{2}}} & - \frac{n_{2} β_{1}}{(n_{1} + n_{2}) (1 + β_{1}^{2})} \\ - \frac{α}{1 + \frac{2 n_{2} β_{1}^{2}}{n_{1} {(1 + β_{1}^{2})}^{2}}} & \frac{1}{1 + \frac{2 n_{2} β_{1}^{2}}{n_{1} {(1 + β_{1}^{2})}^{2}}} & 0 \\ - \frac{n_{2} β_{1}}{(n_{1} + n_{2}) (1 + β_{1}^{2})} & 0 & 1 - \frac{n_{2}}{n_{1} + n_{2}} \frac{β_{1}^{2}}{1 + β_{1}^{2}} \end{matrix}) . \end{matrix}

(4.8)

Similarly, we can write

I_{a c} (θ) = I_{c r} + υ υ^{'},

where $υ = \sqrt{n_{3}} (\begin{matrix} 0 & 0 & 1 \end{matrix})^{'}$ . We have

\begin{matrix} det (I_{a c}) & = det (I_{c r}) (1 + υ^{'} I_{c r}^{- 1} υ) = det (I_{c r}) + det (I_{c r}) υ^{'} I_{c r}^{- 1} υ \\ = det (I_{c r}) + n_{3} [n_{1} + \frac{2 n_{2} β_{1}^{2}}{{(1 + β_{1}^{2})}^{2}}] [n_{1} + \frac{n_{2}}{1 + β_{1}^{2}}], \end{matrix}

(4.9)

the trace of I_ac is given by

tr (I_{a c}) = tr (I_{c r}) + n_{3},

(4.10)

and the inverse matrix of I_ac is given by

\begin{matrix} I_{a c}^{- 1} (θ) = I_{c r}^{- 1} - \frac{1}{1 + υ^{'} I_{c r}^{- 1} υ} I_{c r}^{- 1} υ υ^{'} I_{c r}^{- 1} \\ = & I_{c r}^{- 1} - \frac{b *}{n_{1}} (\begin{matrix} n_{2} β_{1}^{2} & 0 & - n_{2} β_{1} [n_{1} (1 + β_{1}^{2}) + n_{2}] \\ 0 & 0 & 0 \\ - n_{2} β_{1} [n_{1} (1 + β_{1}^{2}) + n_{2}] & 0 & {[n_{1} (1 + β_{1}^{2}) + n_{2}]}^{2} \end{matrix}) \\ = & \frac{1}{n_{1}} (\begin{matrix} 1 + \frac{α^{2}}{1 + \frac{2 n_{2} β_{1}^{2}}{n_{1} {(1 + β_{1}^{2})}^{2}}} - \frac{n_{2}}{(n_{1} + n_{2}) (1 + β_{1}^{2})} - b^{*} n_{2} β_{1}^{2} & - \frac{α}{1 + \frac{2 n_{2} β_{1}^{2}}{n_{1} {(1 + β_{1}^{2})}^{2}}} & - \frac{n_{2} β_{1}}{(n_{1} + n_{2}) (1 + β_{1}^{2})} + b^{*} n_{2} β_{1} b_{1}^{*} \\ - \frac{α}{1 + \frac{2 n_{2} β_{1}^{2}}{n_{1} {(1 + β_{1}^{2})}^{2}}} & \frac{1}{1 + \frac{2 n_{2} β_{1}^{2}}{n_{1} {(1 + β_{1}^{2})}^{2}}} & 0 \\ - \frac{n_{2} β_{1}}{(n_{1} + n_{2}) (1 + β_{1}^{2})} + b^{*} n_{2} β_{1} b_{1}^{*} & 0 & 1 - \frac{n_{2}}{n_{1} + n_{2}} \frac{β_{1}^{2}}{1 + β_{1}^{2}} - b^{*} {b_{1}^{*}}^{2} \end{matrix}), \end{matrix}

(4.11)

where $b^{*} = \frac{n_{3}}{(n_{1} + n_{2}) (1 + β_{1}^{2}) [n_{2} n_{3} + n_{1} (n_{1} + n_{2} + n_{3}) (1 + β_{1}^{2})]} and b_{1}^{*} = n_{1} (1 + β_{1}^{2}) + n_{2}$ .

Using (4.5) — (4.11), we are led to the following results.

Result 4.1

Based on either the determinant or the trace of Fisher information matrix, AC yields most gain in information over both CR and CC, and CR gains more information than CC. Specifically, we have

\begin{matrix} det (I_{a c}) - det (I_{c r}) = n_{3} [n_{1} + \frac{2 n_{2} β_{1}^{2}}{{(1 + β_{1}^{2})}^{2}}] [n_{1} + \frac{n_{2}}{1 + β_{1}^{2}}] > 0, \\ det (I_{c r}) - det (I_{c c}) = n_{1}^{2} n_{2} + \frac{2 n_{1} n_{2} β_{1}^{2} (n_{1} + n_{2})}{{(1 + β_{1}^{2})}^{2}} > 0, \\ tr (I_{a c}) - tr (I_{c r}) = n_{3} > 0, \end{matrix}

and

tr (I_{c r}) - tr (I_{c c}) = n_{2} [1 + \frac{α^{2}}{1 + β_{1}^{2}} + \frac{2 β_{1}^{2}}{{(1 + β_{1}^{2})}^{2}}] .

The inverse of the Fisher information gives the asymptotic variance and covariance matrix of the MLE’s under each analysis method. Now, let Var $({\hat{β}}_{j}, .)$ and Var (α̂.) denote the asymptotic variances under each of CC, CR, AC. Then, we have the following results.

Result 4.2

(i) CR leads to smaller asymptotic variances for all parameters than CC. Specifically, we have

\begin{matrix} Var ({\hat{β}}_{0, c c}) - Var ({\hat{β}}_{0, c r}) & = \frac{2 n_{2} α^{2} β_{1}^{2}}{n_{1} [{(1 + β_{1}^{2})}^{2} + 2 n_{2} β_{1}^{2}]} + \frac{n_{2}}{n_{1} (n_{1} + n_{2}) (1 + β_{1}^{2})} > 0, \\ Var ({\hat{β}}_{1, c c}) - Var ({\hat{β}}_{1, c r}) & = \frac{2 n_{2} β_{1}^{2}}{n_{1} [n_{1} {(1 + β_{1}^{2})}^{2} + 2 n_{2} β_{1}^{2}]} > 0, \\ Var ({\hat{α}}_{c c}) - Var ({\hat{α}}_{c r}) & = \frac{n_{1} n_{2} β_{1}^{2}}{n_{1} (n_{1} + n_{2}) (1 + β_{1}^{2})} > 0 . \end{matrix}

(ii) AC improves the asymptotic variances for β₀ and α over CR, but not for β₁. Specifically, we have

\begin{matrix} Var ({\hat{β}}_{0, c r}) - Var ({\hat{β}}_{0, a c}) & = b^{*} n_{2} β_{1}^{2} > 0, \\ Var ({\hat{β}}_{1, c r}) - Var ({\hat{β}}_{1, a c}) & = 0, \\ Var ({\hat{α}}_{c r}) - Var ({\hat{α}}_{a c}) & = b^{*} {b_{1}^{*}}^{2} > 0, \end{matrix}

where b^* and $b_{1}^{*}$ are given in (4.11).

From Result 4.2, the additional information from Block B₃ does improve the standard errors of ${\hat{β}}_{0}$ and α̂. Surprisingly, the information from Block B₃ does not help improve the standard error of ${\hat{β}}_{1}$ .

4.2 Multiple Linear Regression Model with Missing Responses and Covariates

To further examine the theoretical relationship among these three analysis methods, we consider a multiple normal linear regression model with p ≥ 2. For illustrative purposes, it suffices to consider two missing covariates. Specifically, we assume that x_i,p−1 and x_i,p have at least one missing value and x_i1, x_i2, and x_i,p−2 are observed in all cases as we have shown in Section 3.3 that the cases in Block 5 do not make any contribution to the log-likelihood function l_ac(θ). For notational convenience, we let z_i1 = (1, x_i1,…,x_i,p−2)′. We further assume

\begin{matrix} y_{i} | x_{i}, β ~ N ((z_{i 1}^{'}, x_{i, p - 1}, x_{i p})^{'} (β_{0}, β_{1}, β_{2}), σ^{2}), \\ x_{i, p - 1} | x_{i}^{(obs)}, α_{1} ~ N (z_{i 1}^{'} α_{1}, τ_{1}^{2}), \end{matrix}

where α₁ = (α₁₀, α₁₁,…,α_1,p−2)′, and

x_{i p} | x_{i}^{(obs)}, x_{i, p - 1}, α_{2} ~ N ((z_{i 1}^{'}, x_{i, p - 1}) (α_{21}, α_{22}), τ_{2}^{2}),

where α₂₁ = (α₂₀, α₂₁,…, α_2,p−2)′. We assume all variances $σ^{2}, τ_{1}^{2}, and τ_{2}^{2}$ are known. For ease of exploration, we choose $σ^{2} = τ_{1}^{2} = τ_{2}^{2} = 1$ .

In this setting, we need to consider the cases from Blocks 1 to 4.

For Blocks 2 and 4, we assume $B_{2} = \cup_{k = 2}^{4} B_{2 k} and B_{4} = \cup_{k = 2}^{3} B_{4 k}$ , where

\begin{matrix} B_{j 1} = {i : both x_{i, p - 1} and x_{i p} are observed}, \\ B_{j 2} = {i : x_{i, p - 1} is missing and x_{i p} is observed}, \\ B_{j 3} = {i : x_{i, p - 1} is observed and x_{i p} is missing} \end{matrix}

for j = 2, 4, and B₂₄ = {i : both x_i,p−1 and x_ip are missing}. We further let n_jk = #(B_jk) be the cardinality of B_jk for j = 2, 4. Then we have $n_{2} = \sum_{k = 2}^{4} n_{2 k} and n_{4} = \sum_{k = 2}^{3} n_{4 k}$ .

Define $A_{j} = \sum_{i \in B_{j}} z_{i 1} z_{i 1}^{'}$ , where j = 1, 22, 23, 24, 3, 42, 43, and $θ = (β_{0}^{'}, β_{1}, β_{2}, α_{1}^{'}, α_{21}^{'}, α_{22})'$ . For the CC analysis, we have

\begin{matrix} l_{c c} (θ) = \sum_{i : y_{i} observed, M_{i} = θ} & [- \frac{3}{2} log (2 π) - \frac{{(y_{i} - z_{i 1}^{'} β_{0} - β_{1} x_{i, p - 1} - β_{2} x_{i p})}^{2}}{2} \\ - \frac{{(x_{i, p - 1} - z_{i 1} α_{1})}^{2}}{2} - \frac{{(x_{i p} - z_{i 1} α_{21} - x_{i, p - 1} α_{22})}^{2}}{2}] \end{matrix}

and the Fisher information matrix is given by

I_{c c} (θ) = (\begin{matrix} A_{1} & A_{1} α_{1} & A_{1} {\tilde{α}}_{12} & 0 & 0 & 0 \\ α_{1}^{'} A_{1} & n_{1} + α_{1}^{'} A_{1} α_{1} & n_{1} α_{22} + α_{1}^{'} A_{1} {\tilde{α}}_{12} & 0 & 0 & 0 \\ {\tilde{α}}_{12}^{'} A_{1} & n_{1} α_{22} + α_{1}^{'} A_{1} {\tilde{α}}_{12} & n_{1} (1 + α_{22}^{2}) + {\tilde{α}}_{12}^{'} A_{1} {\tilde{α}}_{12} & 0 & 0 & 0 \\ 0 & 0 & 0 & A_{1} & 0 & 0 \\ 0 & 0 & 0 & 0 & A_{1} & A_{1} α_{1} \\ 0 & 0 & 0 & 0 & α_{1}^{'} A_{1} & n_{1} + α_{1}^{'} A_{1} α_{1} \end{matrix}),

where ${\tilde{α}}_{12} = α_{21} + α_{1} α_{22}$ .

For the CR analysis, we have

\begin{matrix} l_{c r} (θ) = l_{c c} (θ) \\ + \sum_{i : y_{i} observed, M_{i} \neq θ} log [\int f (y_{i} | x_{i}, β) f (x_{i, p} | z_{i 1}, x_{i, p - 1}, α_{2}) f (x_{i, p - 1} | z_{i 1}, α_{1}) d x_{i, mis}] \\ = l_{c c} (θ) + l_{c c, 1} (θ) + l_{c c, 2} (θ), \end{matrix}

where

\begin{matrix} l_{c c, 1} (θ) \\ = & \sum_{i \in B_{22}} {- log (2 π) - \frac{log (1 + β_{1}^{2} + α_{22}^{2})}{2} - [{(α_{22} (y_{i} - z_{i 1}^{'} β_{0} - x_{i p} β_{2}) - β_{1} (x_{i p} - z_{i 1}^{'} α_{21}))}^{2} \\ + {(y_{i} - z_{i 1}^{'} β_{0} - β_{1} z_{i 1}^{'} α_{1} - x_{i p} β_{2})}^{2} + {(x_{i p} - z_{i 1}^{'} α_{21} - α_{22} z_{i 1}^{'} α_{1})}^{2}] / 2 (1 + β_{1}^{2} + α_{22}^{2})} \end{matrix}

and

\begin{matrix} l_{c c, 2} (θ) \\ = & \sum_{i \in B_{23}} {- log (2 π) - \frac{1}{2} log (1 + β_{2}^{2}) - \frac{[{(y_{i} - z_{i 1}^{'} β_{0} - x_{i, p - 1} β_{1} - β_{2} (z_{i 1}^{'} α_{21} + x_{i, p - 1} α_{22})]}^{2}}{2 (1 + β_{2}^{2})} \\ - \frac{{(x_{i, p - 1} - z_{i 1}^{'} α_{1})}^{2}}{2}} + \sum_{i \in B_{24}} {- \frac{1}{2} log (2 π) - \frac{1}{2} log ({(β_{1} + β_{2} α_{22})}^{2} + 1 + β_{2}^{2}) \\ - \frac{[{(y_{i} - z_{i 1}^{'} β_{0} - β_{2} z_{i 1}^{'} α_{1} - (β_{1} + β_{2} α_{22}) z_{i 1}^{'} α_{1}]}^{2}}{2 ({(β_{1} + β_{2} α_{22})}^{2} + 1 + β_{2}^{2})}} . \end{matrix}

After some messy algebra, we obtain the Fisher information matrix given by

I_{c r} (θ) = - E [\frac{\partial^{2}}{\partial θ \partial θ^{'}} l_{c r} (θ)] = I_{c r} (θ) + I_{c r, 1} (θ) + n_{22} I_{c r, 2} (θ) + n_{23} I_{c r, 3} + n_{24} I_{c r, 4},

where

I_{c r, 1} (θ) = (\begin{matrix} M_{1} & M_{1} α_{1} & M_{1} {\tilde{α}}_{12} & M_{2} & M_{3} & M_{3} α_{1} \\ α_{1}^{'} M_{1} & α_{1}^{'} M_{1} α_{1} & α_{1}^{'} M_{1} {\tilde{α}}_{12} & α_{1}^{'} M_{2} & α_{1}^{'} M_{3} & α_{1}^{'} M_{3} α_{1} \\ {\tilde{α}}_{12}^{'} M_{1} & {\tilde{α}}_{12}^{'} M_{1} α_{1} & {\tilde{α}}_{12}^{'} M_{1} {\tilde{α}}_{12} & {\tilde{α}}_{12}^{'} M_{2} & {\tilde{α}}_{12}^{'} M_{3} & {\tilde{α}}_{12}^{'} M_{3} α_{1} \\ M_{2} & M_{2} α_{1} & M_{2} {\tilde{α}}_{12} & M_{6} & M_{4} & M_{4} α_{1} \\ M_{3} & M_{3} α_{1} & M_{3} {\tilde{α}}_{12} & M_{4} & M_{5} & M_{5} α_{1} \\ α_{1}^{'} M_{3} & α_{1}^{'} M_{3} α_{1} & α_{1}^{'} M_{3} {\tilde{α}}_{12} & α_{1}^{'} M_{4} & α_{1}^{'} M_{5} & α_{1}^{'} M_{5} α_{1} \end{matrix}),

I_{c r, 2} (θ) = (\begin{matrix} 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & \frac{α_{22}^{2}}{υ_{22}} + \frac{2 β_{1}^{2}}{υ_{22}^{2}} & \frac{α_{22} (1 + α_{22}^{2})}{υ_{22}} & 0 & 0 & - \frac{α_{22} β_{1}}{υ_{22}} + \frac{2 α_{22} β_{1}}{υ_{22}^{2}} \\ 0 & \frac{α_{22} (1 + α_{22}^{2})}{υ_{22}} & \frac{{(1 + α_{22}^{2})}^{2}}{υ_{22}} & 0 & 0 & \frac{β_{1} (1 - α_{22}^{2})}{υ_{22}} \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & - \frac{α_{22} β_{1}}{υ_{22}} + \frac{2 α_{22} β_{1}}{υ_{22}^{2}} & \frac{β_{1} (1 - α_{22}^{2})}{υ_{22}} & 0 & 0 & \frac{β_{1}^{2}}{υ_{22}} + \frac{2 α_{22}^{2}}{υ_{22}^{2}} \end{matrix}),

I_{c r, 3} (θ) = (\begin{matrix} 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & \frac{1}{υ_{23}} & \frac{α_{22}}{υ_{23}} & 0 & 0 & \frac{β_{2}}{υ_{23}} \\ 0 & \frac{α_{22}}{υ_{23}} & \frac{α_{22}^{2}}{υ_{23}} + \frac{2 β_{2}^{2}}{υ_{23}^{2}} & 0 & 0 & \frac{β_{2} α_{22}}{υ_{23}} \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & \frac{β_{2}}{υ_{23}} & \frac{β_{2} α_{22}}{υ_{23}} & 0 & 0 & \frac{β_{2}^{2}}{υ_{23}} \end{matrix}),

I_{c r, 4} (θ) = (\begin{matrix} 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & \frac{2 {(β_{1} + β_{2} α_{22})}^{2}}{υ_{24}^{2}} & \frac{2 (β_{1} + β_{2} α_{22}) υ_{25}}{υ_{24}^{2}} & 0 & 0 & \frac{2 β_{2} {(β_{1} + β_{2} α_{22})}^{2}}{υ_{24}^{2}} \\ 0 & \frac{2 (β_{1} + β_{2} α_{22}) υ_{25}}{υ_{24}^{2}} & \frac{2 υ_{25}^{2}}{υ_{24}^{2}} & 0 & 0 & \frac{2 β_{2} (β_{1} + β_{2} α_{22}) υ_{25}}{υ_{24}^{2}} \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & \frac{2 β_{2} {(β_{1} + β_{2} α_{22})}^{2}}{υ_{24}^{2}} & \frac{2 β_{2} (β_{1} + β_{2} α_{22}) υ_{25}}{υ_{24}^{2}} & 0 & 0 & \frac{2 β_{2}^{2} {(β_{1} + β_{2} α_{22})}^{2}}{υ_{24}^{2}} \end{matrix}),

$υ_{22} = 1 + β_{1}^{2} + α_{22}^{2}, υ_{23} = 1 + β_{2}^{2}, υ_{24} = {(β_{1} + β_{2} α_{22})}^{2} + 1 + β_{2}^{2}, υ_{25} = β_{2} + (β_{1} + β_{2} α_{22}) α_{22}, M_{1} = \frac{1 + α_{22}^{2}}{υ_{22}} A_{22} + \frac{1}{υ_{23}} A_{23} + \frac{1}{υ_{24}} A_{24}, M_{2} = \frac{β_{1}}{υ_{22}} A_{22} + \frac{β_{1} + β_{2} α_{22}}{υ_{24}} A_{24}, M_{3} = - \frac{β_{1} α_{22}}{υ_{22}} A_{22} + \frac{β_{2}}{υ_{23}} A_{23} + \frac{β_{2}}{υ_{24}} A_{24}, M_{4} = \frac{α_{22}}{υ_{22}} A_{22} + \frac{β_{2} (β_{1} + β_{2} α_{22})}{υ_{24}} A_{24}, M_{5} = \frac{1 + β_{1}^{2}}{υ_{22}} A_{22} + \frac{β_{2}^{2}}{υ_{23}} A_{23} + \frac{β_{2}^{2}}{υ_{24}} A_{24}, and M_{6} = \frac{β_{1}^{2} + α_{22}^{2}}{υ_{22}} A_{22} + A_{23} + \frac{{(β_{1} + β_{2} α_{22})}^{2}}{υ_{24}} A_{24}$ .

For the AC analysis, the log-likelihood function is given by

\begin{matrix} l_{a c} (θ) \\ = & l_{c r} (θ) + \sum_{i \in B_{3}} {- log (2 π) - \frac{{(x_{i, p - 1} - z_{i 1}^{'} α_{1})}^{2}}{2} - \frac{{(x_{i p} - z_{i 1}^{'} α_{21} - x_{i, p - 1} α_{22})}^{2}}{2}} \\ + \sum_{i \in B 42} {- \frac{log (2 π)}{2} - \frac{1}{2} log (1 + α_{22}^{2}) - \frac{{(x_{i p} - z_{i 1}^{'} α_{21} - α_{22}^{2} z_{i 1}^{'} α_{1})}^{2}}{2 (1 + α_{22}^{2})}} + \\ + \sum_{i \in B 43} {- \frac{1}{2} log (2 π) - \frac{1}{2} {(x_{i, p - 1} - z_{i 1}^{'} α_{1})}^{2}} \end{matrix}

The corresponding Fisher information matrix is given by

\begin{matrix} I_{a c} (θ) = I_{c r} (θ) \\ + (\begin{matrix} 0_{(p + 1) \times (p + 1)} & 0_{(p + 1) \times (p - 1)} & 0_{(p + 1) \times (p - 1)} & 0 \\ 0_{(p - 1) \times (p + 1)} & A_{3} + \frac{α_{22}^{2}}{υ_{42}} A_{42} + A_{43} & \frac{α_{22}}{υ_{42}} A_{42} & \frac{α_{22}}{υ_{42}} A_{42} α_{1} \\ 0_{(p - 1) \times (p + 1)} & \frac{α_{22}}{υ_{42}} A_{42} & A_{3} + \frac{1}{υ_{42}} A_{42} & A_{3} α_{1} + \frac{1}{υ_{42}} A_{42} α_{1} \\ 0_{1 \times (p + 1)} & \frac{α_{22}}{υ_{42}} α_{1}^{'} A_{42} & α_{1}^{'} A_{3} + \frac{1}{υ_{42}} α_{1}^{'} A_{42} & i_{22} \end{matrix}), \end{matrix}

(4.12)

where $υ_{42} = 1 + α_{22}^{2} and i_{22} = n_{3} + α_{1}^{'} A_{3} α_{1} + n_{42} \frac{2 α_{22}^{2}}{υ_{42}^{2}} + \frac{1}{υ_{42}} α_{1}^{'} A_{42} α_{1}$ .

For ease of exploration, we choose p=2, in other words, the completely observed covariates only include the intercept.

Result 4.3

(i) When n₂₃ = n₂₄ = 0, CR leads to smaller asymptotic variances for β₁ and β₂ than CC. Specifically, we have

\begin{matrix} Var ({\hat{β}}_{1, c c}) - Var ({\hat{β}}_{1, c r}) \\ = & \frac{2 n_{22} β_{1}^{2} [υ_{42} (n_{1} + n_{22}) (n_{1} υ_{42}^{2} + 2 n_{22} α_{22}^{2}) + n_{1}^{2} υ_{42}^{2} β_{1}^{2} + n_{1} n_{22} (1 + α_{22}^{4}) β_{1}^{2}]}{d_{11}} > 0, \\ Var ({\hat{β}}_{2, c c}) - Var ({\hat{β}}_{2, c r}) \\ = & \frac{4 n_{22}^{3} α_{22}^{2} β_{1}^{2} + n_{1}^{2} n_{22} υ_{22}^{2} + 2 n_{1} n_{22} υ_{22} α_{22}^{2} β_{1}^{2} (n_{1} + n_{22}) + 2 n_{1} n_{22}^{2} (α_{22}^{2} + β_{1}^{2} + 3 α_{22}^{2} β_{1}^{2})}{d_{11}} > 0, \end{matrix}

where $d_{11} = n_{1}^{2} (n_{1} + n_{22}) υ_{22} [n_{1} υ_{22}^{2} + 2 n_{22} (β_{1}^{2} + α_{22}^{2} + β_{1}^{2} α_{22}^{2})] + 2 n_{1} n_{22}^{2} β_{1}^{2} α_{22}^{2} (3 n_{1} + 2 n_{22})$ .

(ii) When n₂₃ = n₂₄ = 0, AC leads to smaller asymptotic variances for β₁ and β₂ than CR. Specifically, we have

\begin{matrix} Var ({\hat{β}}_{1, c r}) - Var ({\hat{β}}_{1, a c}) = \frac{4 n_{1} n_{22}^{2} α_{22}^{2} β_{1}^{6} (n_{3} υ_{42}^{2} + 2 n_{42} α_{22}^{2}) {[υ_{42} (n_{1} + n_{22}) + n_{1} β_{1}^{2}]}^{2}}{d_{13}} > 0, \\ Var ({\hat{β}}_{2, c r}) - Var ({\hat{β}}_{2, a c}) = \frac{n_{1} n_{22}^{2} β_{1}^{2} (n_{3} υ_{42}^{2} + 2 n_{42} α_{22}^{2}) {[2 n_{22} β_{1}^{2} - n_{1} α_{22}^{4} + n_{1} {(1 + β_{1}^{2})}^{2}]}^{2}}{d_{13}} > 0, \end{matrix}

where $d_{13} = d_{11} {2 n_{3} n_{22}^{2} υ_{42}^{4} β_{1}^{2} + 2 n_{22} υ_{42}^{2} α_{22}^{2} (n_{22} + n_{42}) (2 n_{22} β_{1}^{2} + n_{1} υ_{42}) + n_{1}^{3} υ_{42}^{2} υ_{22}^{3} + n_{1} n_{22} υ_{42} [n_{3} υ_{42}^{4} + 2 υ_{42} β_{1}^{2} (2 n_{3} υ_{42}^{2} + n_{22} + α_{22}^{2} (n_{22} υ_{42} + 5 n_{22} + 4 n_{42})) + υ_{42}^{2} β_{1}^{4} (3 n_{3} + 2 n_{22}) + 6 β_{1}^{4} α_{22}^{2} n_{42}] + n_{1}^{2} υ_{22} [n_{3} υ_{42}^{2} υ_{22}^{2} + 2 n_{42} α_{22}^{2} υ_{22}^{2} + n_{22} υ_{42}^{2} (υ_{22}^{2} + 2 β_{1}^{2} + 2 α_{22}^{2} + 2 α_{22}^{2} β_{1}^{2})]}$ .

Result 4.4

(i) When n₂₂ = n₂₄ = 0, CR leads to smaller asymptotic variances for β₁ and β₂ than CC. Specifically, we have

\begin{matrix} Var ({\hat{β}}_{1, c c}) - Var ({\hat{β}}_{1, c r}) = \frac{n_{1} n_{23} υ_{23}^{2} + 2 n_{23} β_{2}^{2} [n_{23} + (n_{1} + n_{23}) υ_{23} α_{22}^{2}]}{n_{1} (n_{1} + n_{23}) υ_{23} (2 n_{23} β_{2}^{2} + n_{1} υ_{23}^{2})} > 0, \\ Var ({\hat{β}}_{2, c c}) - Var ({\hat{β}}_{2, c r}) = \frac{2 n_{23} β_{2}^{2}}{n_{1} (2 n_{23} β_{2}^{2} + n_{1} υ_{23}^{2})} > 0 . \end{matrix}

(ii) When n₂₂ = n₂₄ = 0, AC improves the asymptotic variances for β₁ over CR, but not for β₁. Specifically, we have

\begin{matrix} Var ({\hat{β}}_{1, c r}) - Var ({\hat{β}}_{1, a c}) = \frac{n_{23}^{2} (υ_{42}^{2} n_{3} + 2 α_{22}^{2} n_{42}) β_{2}^{2}}{d_{23}} > 0, \\ Var ({\hat{β}}_{2, c r}) - Var ({\hat{β}}_{2, a c}) = 0, \end{matrix}

where $d_{23} = n_{1} (n_{1} + n_{23}) υ_{23} [n_{1} (n_{1} + n_{23}) υ_{23} υ_{42}^{2} + (2 n_{42} α_{22}^{2} + n_{3} υ_{42}^{2}) (n_{1} υ_{23} + n_{23})]$ .

Result 4.5

(i) When n₂₂ = n₂₃ = 0, CR leads to smaller asymptotic variances for β than CC. Specifically, we have

\begin{matrix} Var ({\hat{β}}_{1, c c}) - Var ({\hat{β}}_{1, c r}) = \frac{2 n_{24} β_{1}^{2}}{d_{31}} > 0, \\ Var ({\hat{β}}_{2, c c}) - Var ({\hat{β}}_{2, c r}) = \frac{2 n_{24} β_{2}^{2}}{d_{31}} > 0, \end{matrix}

where $d_{31} = 2 n_{1} n_{24} [β_{1}^{2} υ_{23} + 2 υ_{23} β_{1} β_{2} α_{22} + β_{2}^{2} + β_{2}^{2} α_{22}^{2} υ_{23}] + n_{1}^{2} υ_{24}^{2}$ .

(ii) When n₂₂ = n₂₃ = 0, AC leads to smaller asymptotic variances for β than CR. Specifically, we have

\begin{matrix} Var ({\hat{β}}_{1, c r}) - Var ({\hat{β}}_{1, a c}) = \frac{4 n_{24}^{2} β_{1}^{2} β_{2}^{2} {(β_{1} + α_{22} β_{2})}^{2} (2 α_{22}^{2} n_{42} + n_{3} υ_{42}^{2})}{d_{33}} > 0, \\ Var ({\hat{β}}_{2, c r}) - Var ({\hat{β}}_{2, a c}) = \frac{4 n_{24}^{2} β_{2}^{4} {(β_{1} + α_{22} β_{2})}^{2} (2 α_{22}^{2} n_{42} + n_{3} υ_{42}^{2})}{d_{33}} > 0, \end{matrix}

where $d_{33} = d_{31} [2 n_{24} {(β_{1} + β_{2} α_{22})}^{2} (n_{3} υ_{42}^{2} + n_{1} υ_{23} υ_{42}^{2} + 2 n_{42} α_{22}^{2}) + υ_{42}^{2} (n_{1} + n_{3}) (2 n_{24} β_{2}^{2} + n_{1} υ_{24}^{2}) + 2 n_{42} α_{22}^{2} (2 n_{24} β_{2}^{2} + n_{1} υ_{24}^{2})]$ .

Remark 1

The information in Block B₄₃ does not improve the asymptotic variances of the estimates of β₁ and β₂ in all of the three situations considered here.

Remark 2

When n₂₃ = n₂₄ = 0 (Result 4.3), the differences of the asymptotic variances for β₁ and β₂ do not depend on β₂ comparing CR to CC and AC to CR.

Remark 3

When n₂₂ = n₂₄ = 0 (Result 4.4), the differences of the asymptotic variances for β₁ and β₂ do not depend on β₁ comparing CR to CC and AC to CR.

Remark 4

When n₂₂ = n₂₃ = 0 (Result 4.5), the ratios of the asymptotic variances improvement of β₁ versus β₂ equal to $β_{1}^{2} / β_{2}^{2}$ for CR versus CC and AC versus CR, i.e.

\frac{Var ({\hat{β}}_{1, c c}) - Var ({\hat{β}}_{1, c r})}{Var ({\hat{β}}_{2, c c}) - Var ({\hat{β}}_{2, c r})} = \frac{Var ({\hat{β}}_{1, c r}) - Var ({\hat{β}}_{1, a c})}{Var ({\hat{β}}_{2, c r}) - Var ({\hat{β}}_{2, a c})} = \frac{β_{1}^{2}}{β_{2}^{2}}

Remark 5

When n₂₃ = n₂₄ = 0, the differences of the asymptotic variances of β₁ are monotone decreasing function of n₁ for CR versus CC and AC versus CR. Other monotonic properties are listed in Table 2.

Table 2.

Monotonic Properties for CR versus CC and AC versus CR

Missing Pattern	Parameter	n₁	n₂₂	n₂₃	n₂₄
	$Var ({\hat{β}}_{1, c c}) - Var ({\hat{β}}_{1, c r})$	NM	NM
Result 4.3	$Var ({\hat{β}}_{2, c c}) - Var ({\hat{β}}_{2, c r})$	NM	NM
n₂₃ = n₂₄ = 0	$Var ({\hat{β}}_{1, c r}) - Var ({\hat{β}}_{1, a c})$	NM	NM
	$Var ({\hat{β}}_{2, c r}) - Var ({\hat{β}}_{2, a c})$	NM	NM

	$Var ({\hat{β}}_{1, c c}) - Var ({\hat{β}}_{1, c r})$	↘		NM
Result 4.4	$Var ({\hat{β}}_{2, c c}) - Var ({\hat{β}}_{2, c r})$	↘		↗
n₂₂ = n₂₄ = 0	$Var ({\hat{β}}_{1, c r}) - Var ({\hat{β}}_{1, a c})$	↘		↗
	$Var ({\hat{β}}_{2, c r}) - Var ({\hat{β}}_{2, a c}) = 0$

	$Var ({\hat{β}}_{1, c c}) - Var ({\hat{β}}_{1, c r})$	↘			↗
Result 4.5	$Var ({\hat{β}}_{2, c c}) - Var ({\hat{β}}_{2, c r})$	↘			↗
n₂₂ = n₂₃ = 0	$Var ({\hat{β}}_{1, c r}) - Var ({\hat{β}}_{1, a c})$	↘			↗
	$Var ({\hat{β}}_{2, c r}) - Var ({\hat{β}}_{2, a c})$	↘			↗

Open in a new tab

5 Analysis of Bias

In this section, we examine bias in the CC situation when the missing data are MAR. For inference with only missing response data, Little and Rubin (2002, page 43.) note without proof that when the data are MAR, the CC estimates are not biased when the missing data mechanism depends only on the covariates and not the response. The estimates, however, are biased if the missing data mechanism depends on the response. We now examine this bias issue in the missing (x, y) problem, where missingness is MAR.

Based on the data structure given in Table 1, define missing data indicators $r_{i} = (r_{i y,} r_{i x}^{'})'$ as

r_{i y} = {\begin{matrix} 1 & if y_{i} is observed \\ 0 & if y_{i} is missing \end{matrix}

and

r_{i x, j} = {\begin{matrix} 1 & if the j^{t h} component of x_{i} is observed, \\ 0 & if the j^{t h} component of x_{i} is missing, \end{matrix}

for j = 1, 2,…,p.

Let f(r_i|ϕ, y_i, x_i denote the distribution of r_i, which may possibly depend on y_i and x_i, where ϕ is the vector of parameters in the r_i model. Under the MAR assumption, following models for r_i are possible:

f (r_{i} | ϕ, y_{i}, x_{i}) = f (r_{i} | ϕ),

(5.1)

f (r_{i} | ϕ, y_{i}, x_{i}) = f (r_{i x} | ϕ_{1}) f (r_{i y} | ϕ_{2}, r_{i x} ○ x_{i}),

(5.2)

where ○ denotes the direct product, or

f (r_{i} | ϕ, y_{i}, x_{i}) = f (r_{i y} | ϕ_{1}) f (r_{i x} | ϕ_{2}, r_{i y} y_{i}) .

(5.3)

Note that the model specified by (5.1) defines MCAR and some other versions of the MAR models can be considered as well.

Based on the data structure displayed in Table 1, the CC analysis uses the portion of data given in Block B₁. Thus, the likelihood function under this method is given by

L_{c c} (θ) = \prod_{i : both y_{i} and x_{i} observed} f (y_{i} | x_{i}, β) f (x_{i} | α) f (r_{i} | ϕ, y_{i}, x_{i}),

(5.4)

where θ = (β,α,ϕ), and the log-likelihood function is given by

l_{c c} (θ) = \sum_{i : both y_{i} and x_{i} observed} [log f (y_{i} | x_{i}, β) + log f (x_{i} | α) + log f (r_{i} | ϕ, y_{i}, x_{i})] .

(5.5)

Under CC, we will make conditional inference given r_i = 1, where 1 = (1, 1,…,1)′. More specifically, we need to consider conditional distribution [y_i, x_i|r_i = 1] in examining biasness of the MLE’s and in deriving the Fisher information matrix. We assume throughout that ϕ is distinct from β and α.

Under the model given by (5.1), we have

f (y_{i}, x_{i} | β, α, ϕ, r_{i} = 1) = \frac{f (y_{i} | x_{i}, β) f (x_{i} | α) f (r_{i} | ϕ)}{\int f (y_{i} | x_{i}, β) f (x_{i} | α) f (r_{i} | ϕ) d x_{i} d y_{i}} = f (y_{i} | x_{i}, β) f (x_{i} | α) .

Thus, under MCAR, the conditional distribution of (y_i, x_i) given r_i = 1 is the same as the unconditional distribution and hence, the MLE’s of β and α are unbiased or asymptotically consistent under certain usual regularity conditions.

Under MAR with the model given by (5.2) for r_i, we have

\begin{matrix} f (y_{i}, x_{i} | β, α, ϕ, r_{i} = 1) & = \frac{f (y_{i} | x_{i}, β) f (x_{i} | α) f (r_{i x} = 1 | ϕ_{1}) f (r_{i y} = 1 | ϕ_{2}, x_{i})}{\int f (y_{i} | x_{i}, β) f (x_{i} | α) f (r_{i x} = 1 | ϕ_{1}) f (r_{i y} = 1 | ϕ_{2}, x_{i}) d y_{i} d x_{i}} \\ = f (y_{i} | x_{i}, β) \times \frac{f (x_{i} | α) f (r_{i y} = 1 | ϕ_{2}, x_{i})}{\int f (x_{i} | α) f (r_{i y} = 1 | ϕ_{2}, x_{i}) d x_{i}} . \end{matrix}

(5.6)

From (5.6), it is easy to see that the MLE of β is unbiased or asymptotically consistent, but the MLE of α may not be in this case.

Under the MAR with the model given by (5.3) for r_i, we obtain

\begin{matrix} f (y_{i}, x_{i} | β, α, ϕ, r_{i} = 1) & = \frac{f (y_{i} | x_{i}, β) f (x_{i} | α) f (r_{i y} = 1 | ϕ_{1}) f (r_{i x} = 1 | ϕ_{2}, y_{i})}{\int f (y_{i} | x_{i}, β) f (x_{i} | α) f (r_{i y} = 1 | ϕ_{1}) f (r_{i x} = 1 | ϕ_{2}, y_{i}) d y_{i} d x_{i}} \\ = \frac{f (y_{i} | x_{i}, β) f (x_{i} | α) f (r_{i x} = 1 | ϕ_{2}, y_{i})}{\int f (y_{i} | x_{i}, β) f (x_{i} | α) f (r_{i x} = 1 | ϕ_{2}, y_{i}) d y_{i} d x_{i}} . \end{matrix}

(5.7)

In this case, the MLE’s for both β and α are likely to be biased.

To obtain the closed form analytical results for (5.6) and (5.7), we consider the simple normal regression model given by (4.1). For notational simplicity, we assume that both y_i and x_i are observed for i = 1, 2,…,m. Let y_obs = (y₁, y₂,…, y_m)′, $X_{obs}^{'} = (\begin{matrix} 1 & 1 & \dots & 1 \\ x_{1} & x_{2} & \dots & x_{m} \end{matrix})$ , and r_obs = (1, 1,…,1)′. Then, the MLE of β is given by ${\hat{β}}_{c c} = {(X_{obs}^{'} X_{obs})}^{- 1} X_{obs}^{'} y_{obs}$ and the MLE of α is ${\hat{α}}_{c c} = \frac{1}{m} \sum_{i = 1}^{m} x_{i}$ .

In (5.6), we assume $f (r_{i y} = 1 | ϕ_{2}, x_{i}) = \frac{exp (ϕ_{20} + ϕ_{21} x_{i})}{1 + exp (ϕ_{20} + ϕ_{21} x_{i})}$ . Then, (5.6) implies

f (y_{i} | β, α, ϕ, x_{i}, r_{i} = 1) = f (y_{i} | x_{i}, β) = \frac{1}{\sqrt{2 π}} exp {- \frac{{(y_{i} - β_{0} - β_{1} x_{i})}^{2}}{2}}

and

f (x_{i} | α, ϕ, r_{i} = 1) = \frac{exp {- {(x_{i} - α)}^{2} / 2} [\frac{exp (ϕ_{20} + ϕ_{21} x_{i})}{1 + exp (ϕ_{20} + ϕ_{21} x_{i})}]}{\int exp {- {(x_{i} - α)}^{2} / 2} [\frac{exp (ϕ_{20} + ϕ_{21} x_{i})}{1 + exp (ϕ_{20} + ϕ_{21} x_{i})}] d x_{i}} .

(5.8)

Thus, we have

E [{\hat{β}}_{c c} | r_{obs}] = E {E {[(X_{obs}^{'} X_{obs})}^{- 1} X_{obs}^{'} y_{obs} | X_{obs}] | r_{obs}} = E {{(X_{obs}^{'} X_{obs})}^{- 1} X_{obs}^{'} X_{obs} β | r_{obs}} = β,

which is unbiased. However,

E [{\hat{α}}_{c c} | r_{obs}] \neq α,

which may be biased. Also, an analytical derivation of the Fisher information matrix is not possible as the conditional distribution of x_i given r_i = 1 involves an analytically intractable integral.

Under MAR given by (5.3), we assume a logistic regression model for r_ix, i.e.,

f (r_{i x} = 1 | ϕ_{2}, r_{i y} y_{i}) = \frac{exp (ϕ_{20} + ϕ_{21} r_{i y} y_{i})}{1 + exp (ϕ_{20} + ϕ_{21} r_{i y} y_{i})} .

From (5.7), we obtain

f (y_{i} | β, α, ϕ, x_{i}, r_{i} = 1) = \frac{exp {- \frac{{(y_{i} - β_{0} - β_{1} x_{i})}^{2}}{2}} [\frac{exp (ϕ_{20} + ϕ_{21} y_{i})}{1 + exp (ϕ_{20} + ϕ_{21} y_{i})}]}{\int exp {- \frac{{(y_{i} - β_{0} - β_{1} x_{i})}^{2}}{2}} [\frac{exp (ϕ_{20} + ϕ_{21} y_{i})}{1 + exp (ϕ_{20} + ϕ_{21} y_{i})}] d y_{i}} .

Thus, E[y_i|β, α, ϕ, x_i, r_i = 1] ≠ β₀ + β₁x_i. In this case, both ${\hat{β}}_{c c}$ and ${\hat{α}}_{c c}$ may be biased. Again, an analytical derivation of the Fisher information matrix is not possible.

6 Simulation Studies and a Real Data Example

In this section, we present two detailed simulation studies and a real data example, demonstrating the various properties of the CC, CR, and AC methodology for analyzing MCAR and MAR response and/or covariate data in linear regression and logistic regression. In particular, we study efficiency and bias in the estimates for the three methods for these two types of regression models.

6.1 Simulation Study I: Normal Linear Regression Model with MCAR Response and Covariate Data

We consider a multiple linear regression model with an intercept, a completely observed covariate and two missing covariates. 5000 replicates with n = 500 subjects are considered. The response model is y_i ~ N(β₀ + β₁z_i1 + β₂x_i1 + β₃x_i2, 1), where z_i is simulated from Unif(0, 1), x_i1 is simulated from N(α₁₀ + α₁₁z_i1, 1), x_i2 is simulated from N(α₂₀ + α₂₁z_i1 + α₂₂x_i1, 1), and y_i, x_i1 and x_i2 are missing for some subjects. In each simulation, the sizes for each missing pattern, n₁₁, n₂, n₂₃, n₂₄, n₃, n₄₂, n₄₃ and n₅, were varied in order to evaluate the various properties of the CC, CR, and AC methods. To better study the differences of the asymptotic variances of the estimates of the regression coefficients using the three methods, we calculated the variances in two ways: plugging in the true parameter values as well as plugging in the maximum likelihood estimates into the Fisher information matrix.

Table 3 gives the simulation results of the linear regression model with the variances evaluated at the true parameter values. We note here that the variance estimates decrease monotonically based on the three methods, CC, CR, and AC. In particular, Result 4.3 and Result 4.4 hold and Remark 1 – Remark 3 hold, but not Remark 4. We note that Remark 1 is only true for the regression coefficients of the missing covariates but not of the completely observed covariates. The information in Block B₄₃ does improve the asymptotic variances of regression coefficients of the z_i0 = 1 and z_i1 when the AC method is used.

Table 3.

Variance Comparison with True Parameters Plugged-in for Linear Regression

(n₁, n₂₂, n₂₃, n₂₄, n₃, n₄₂, n₄₃, n₅)	Para	CC (V_CC)	CR (V_CC-V_CR)	AC (V_CR-V_AC)
(50, 200, 0, 0, 100, 100, 50, 0)	β₀ = 1.0	1.224 ×10⁻¹	6.111 ×10⁻²	1.451 ×10⁻²
	β₁ = 1.0	2.895 ×10⁻¹	1.420 ×10⁻¹	3.481 ×10⁻²
	β₂ = 2.0	1.000 ×10⁻¹	6.419 ×10⁻²	1.117 ×10⁻³
	β₃ = 3.0	2.000 ×10⁻²	1.336 ×10⁻²	8.718 ×10⁻⁶

(50, 200, 0, 0, 100, 100, 50, 0)	β₀ = 1.0	1.224 ×10⁻¹	6.111 ×10⁻²	1.451 ×10⁻²
	β₁ = 1.0	2.895 ×10⁻¹	1.420 ×10⁻¹	3.481 ×10⁻²
	β₂ = 2.0	1.000 ×10⁻¹	6.419 ×10⁻²	1.117 ×10⁻³
	β₃ = 2.5	2.000 ×10⁻²	1.336 ×10⁻²	8.718 ×10⁻⁶

(50, 200, 0, 0, 100, 100, 20, 30)	β₀ = 1.0	1.224 ×10⁻¹	6.111 ×10⁻²	1.424 ×10⁻²
	β₁ = 1.0	2.895 ×10⁻¹	1.420 ×10⁻¹	3.412 ×10⁻²
	β₂ = 2.0	1.000 ×10⁻¹	6.419 ×10⁻²	1.117 ×10⁻³
	β₃ = 3.0	2.000 ×10⁻²	1.336 ×10⁻²	8.718 ×10⁻⁶

(50, 0, 200, 0, 100, 100, 50, 0)	β₀ = 1.0	1.224 ×10⁻¹	1.660 ×10⁻²	8.930 ×10⁻³
	β₁ = 1.0	2.895 ×10⁻¹	3.001 ×10⁻²	2.380 ×10⁻²
	β₂ = 2.0	1.000 ×10⁻¹	3.509 ×10⁻²	1.749 ×10⁻³
	β₃ = 3.0	2.000 ×10⁻²	8.372 ×10⁻³	0

(50, 0, 200, 0, 100, 100, 50, 0)	β₀ = 1.0	1.224 ×10⁻¹	1.660 ×10⁻²	8.930 ×10⁻³
	β₁ = 1.0	2.895 ×10⁻¹	3.001 ×10⁻²	2.380 ×10⁻²
	β₂ = 1.5	1.000 ×10⁻¹	3.509 ×10⁻²	1.749 ×10⁻³
	β₃ = 3.0	2.000 ×10⁻²	8.372 ×10⁻³	0

(50, 0, 200, 0, 100, 100, 20, 30)	β₀ = 1.0	1.224 ×10⁻¹	1.660 ×10⁻²	8.921 ×10⁻³
	β₁ = 1.0	2.895 ×10⁻¹	3.001 ×10⁻²	2.378 ×10⁻²
	β₂ = 2.0	1.000 ×10⁻¹	3.509 ×10⁻²	1.749 ×10⁻³
	β₃ = 3.0	2.000 ×10⁻²	8.372 ×10⁻³	0

(50, 0, 0, 200, 100, 100, 50, 0)	β₀ = 1.0	1.224 ×10⁻¹	7.183 ×10⁻³	2.314 ×10⁻³
	β₁ = 1.0	2.895 ×10⁻¹	1.057 ×10⁻²	4.814 ×10⁻²
	β₂ = 2.0	1.000 ×10⁻¹	1.439 ×10⁻²	8.327 ×10⁻³
	β₃ = 3.0	2.000 ×10⁻²	5.173 ×10⁻³	1.090 ×10⁻³

(50, 0, 0, 200, 100, 100, 50, 0)	β₀ = 1.0	1.224 ×10⁻¹	1.116 ×10⁻²	3.274 ×10⁻³
	β₁ = 1.0	2.895 ×10⁻¹	1.476 ×10⁻²	9.621 ×10⁻²
	β₂ = 2.0	1.000 ×10⁻¹	2.191 ×10⁻²	1.112 ×10⁻³
	β₃ = 2.0	2.000 ×10⁻²	7.754 ×10⁻³	1.264 ×10⁻³

(50, 0, 0, 200, 100, 100, 20, 30)	β₀ = 1.0	1.224 ×10⁻¹	7.183 ×10⁻³	2.206 ×10⁻³
	β₁ = 1.0	2.895 ×10⁻¹	1.057 ×10⁻²	4.519 ×10⁻³
	β₂ = 2.0	1.000 ×10⁻¹	1.439 ×10⁻²	8.327 ×10⁻³
	β₃ = 3.0	2.000 ×10⁻²	5.173 ×10⁻³	1.090 ×10⁻³

Open in a new tab

Table 4 gives the simulation results of the linear regression model with the variances evaluated at the maximum likelihood estimates (MLE’s). The results show the gain in using the AC method over CR method, and using CR method over CC method. When when n₂₂ = 200 and n₂₃ = n₂₄ = 0, the gain on the asymptotic variance of β₃ is small compared to the AC to CR method and the difference of the empirical variances is slight.

Table 4.

Variance Comparison with MLE’s Plugged in for Linear Regression

(n₁, n₂₂, n₂₃n₃, n₄₂, n₄₃, n₅)		CC				CR				AC
(n₁, n₂₂, n₂₃n₃, n₄₂, n₄₃, n₅)	Para	MLE	Sim:V_cc	V_cc	95% CP	MLE	Sim:V_cc–Sim:V_cr	V_cc–V_cr	95% CP	MLE	Sim:V_cr–Sim:V_ac	V_cr–V_ac	95% CP
(50, 200, 0, 0, 100, 100, 50, 0)	β₀ = 1.0	0.988	0.130	0.124	97.2	0.994	6.922 ×10⁻²	6.214 ×10⁻²	97.4	0.992	1.448 ×10⁻²	1.490 ×10⁻²	96.8
	β₁ = 1.0	1.011	0.305	0.292	97.2	0.995	1.500 ×10⁻¹	1.442 ×10⁻¹	97.4	0.994	4.252 ×10⁻²	3.566 ×10⁻²	97.5
	β₂ = 2.0	2.003	0.112	0.100	97.0	1.987	7.352 ×10⁻²	6.392 ×10⁻²	96.7	1.981	1.383 ×10⁻³	1.101 ×10⁻³	96.6
	β₃ = 3.0	3.000	0.022	0.020	97.1	3.006	1.483 ×10⁻²	1.332 ×10⁻²	97.7	3.009	< 1.0 × 10⁻⁴	< 1.0 × 10⁻⁴	98.0

(50, 200, 0, 0, 100, 100, 50, 0)	β₀ = 1.0	0.988	0.130	0.124	97.2	0.994	6.922 ×10⁻²	6.214 ×10⁻²	97.4	0.992	1.448 ×10⁻²	1.490 ×10⁻²	96.8
	β₁ = 1.0	1.011	0.305	0.292	97.2	0.995	1.500 ×10⁻¹	1.442 ×10⁻¹	97.4	0.994	4.252 ×10⁻²	3.566 ×10⁻²	97.5
	β₂ = 2.0	2.003	0.112	0.100	97.0	1.987	7.352 ×10⁻²	6.392 ×10⁻²	96.7	1.981	1.383 ×10⁻³	1.101 ×10⁻³	96.6
	β₃ = 3.0	2.500	0.022	0.020	97.1	2.506	1.483 ×10⁻²	1.332 ×10⁻²	97.7	2.509	< 1.0 × 10⁻⁴	< 1.0 × 10⁻⁴	98.0

(50, 200, 0, 0, 100, 100, 50, 0)	β₀ = 1.0	0.988	0.130	0.124	97.2	0.994	6.922 ×10⁻²	6.214 ×10⁻²	97.4	0.992	1.405 ×10⁻²	1.461 ×10⁻²	96.8
	β₁ = 1.0	1.011	0.305	0.292	97.2	0.995	1.500 ×10⁻¹	1.442 ×10⁻¹	97.4	0.994	4.117 ×10⁻²	3.495 ×10⁻²	97.4
	β₂ = 2.0	2.003	0.112	0.100	97.0	1.987	7.352 ×10⁻²	6.392 ×10⁻²	96.7	1.981	1.383 ×10⁻³	1.100 ×10⁻³	96.6
	β₃ = 3.0	3.000	0.022	0.020	97.1	3.006	1.483 ×10⁻²	1.332 ×10⁻²	97.7	3.009	< 1.0 × 10⁻⁴	< 1.0 × 10⁻⁴	98.0

(50, 0, 200, 0, 100, 100, 50, 0)	β₀ = 1.0	0.988	0.130	0.124	97.3	0.987	1.939 ×10⁻²	1.686 ×10⁻²	97.4	0.991	8.583 ×10⁻³	9.371 ×10⁻³	97.2
	β₁ = 1.0	1.011	0.305	0.292	97.3	1.010	4.199 ×10⁻²	3.087 ×10⁻²	97.7	1.008	2.295 ×10⁻²	2.459 ×10⁻²	97.9
	β₂ = 2.0	2.003	0.112	0.100	97.0	2.003	4.255 ×10⁻²	3.509 ×10⁻²	97.3	2.005	3.047 ×10⁻³	1.784 ×10⁻³	97.4
	β₃ = 3.0	3.000	0.022	0.020	97.1	3.000	1.017 ×10⁻²	8.376 ×10⁻³	97.1	2.999	< 1.0 × 10⁻⁴	< 1.0 × 10⁻⁴	97.0

(50, 0, 200, 0, 100, 100, 50, 0)	β₀ = 1.0	0.988	0.130	0.124	97.3	0.987	1.939 ×10⁻²	1.686 ×10⁻²	97.4	0.991	8.583 ×10⁻³	9.371 ×10⁻³	97.2
	β₁ = 1.0	1.011	0.305	0.292	97.3	1.010	4.199 ×10⁻²	3.087 ×10⁻²	97.7	1.008	2.295 ×10⁻²	2.459 ×10⁻²	97.9
	β₂ = 1.5	1.503	0.112	0.100	97.0	1.503	4.255 ×10⁻²	3.509 ×10⁻²	97.3	1.503	3.047 ×10⁻³	1.784 ×10⁻³	97.4
	β₃ = 3.0	3.000	0.022	0.020	97.1	3.000	1.017 ×10⁻²	8.376 ×10⁻³	97.1	2.999	< 1.0 × 10⁻⁴	< 1.0 × 10⁻⁴	97.0

(50, 0, 200, 0, 100, 100, 20, 30)	β₀ = 1.0	0.988	0.130	0.124	97.3	0.987	1.939 ×10⁻²	1.686 ×10⁻²	97.4	0.991	8.625 ×10⁻³	9.341 ×10⁻³	97.2
	β₁ = 1.0	1.011	0.305	0.292	97.3	1.010	4.199 ×10⁻²	3.087 ×10⁻²	97.7	1.008	2.305 ×10⁻²	2.456 ×10⁻²	97.9
	β₂ = 2.0	2.003	0.112	0.100	97.0	2.003	4.255 ×10⁻²	3.509 ×10⁻²	97.3	2.005	3.046 ×10⁻³	1.784 ×10⁻³	97.4
	β₃ = 3.0	3.000	0.022	0.020	97.1	3.000	1.017 ×10⁻²	8.376 ×10⁻³	97.1	2.999	< 1.0 × 10⁻⁴	< 1.0 × 10⁻⁴	97.0

(50, 0, 200, 0, 100, 100, 50, 0)	β₀ = 1.0	0.988	0.130	0.124	97.2	0.983	2.849 ×10⁻³	6.160 ×10⁻³	96.9	0.987	3.209 ×10⁻³	3.779 ×10⁻³	96.9
	β₁ = 1.0	1.011	0.305	0.292	97.2	1.005	5.615 ×10⁻³	8.373 ×10⁻³	97.4	1.008	8.295 ×10⁻³	8.444 ×10⁻³	97.5
	β₂ = 2.0	2.003	0.112	0.100	97.0	1.998	2.241 ×10⁻⁴	1.448 ×10⁻²	95.9	2.001	1.159 ×10⁻⁴	8.268 ×10⁻³	95.1
	β₃ = 3.0	3.000	0.022	0.020	97.1	3.003	1.849 ×10⁻⁴	5.171 ×10⁻³	94.9	3.001	< 1.0 × 10⁻⁴	1.088 ×10⁻³	94.0

(50, 0, 200, 0, 100, 100, 50, 0)	β₀ = 1.0	0.988	0.130	0.124	97.2	0.983	5.208 ×10⁻³	1.032 ×10⁻²	96.6	0.988	5.290 ×10⁻³	4.636 ×10⁻³	96.8
	β₁ = 1.0	1.011	0.305	0.292	97.2	1.004	9.803 ×10⁻³	1.305 ×10⁻²	97.4	1.007	1.362 ×10⁻²	1.395 ×10⁻²	97.5
	β₂ = 2.0	2.003	0.112	0.100	97.0	1.996	5.067 ×10⁻⁴	2.205 ×10⁻²	95.3	2.000	2.442 ×10⁻⁴	1.101 ×10⁻²	93.6
	β₃ = 2.0	2.000	0.022	0.020	97.1	2.004	2.720 ×10⁻⁴	7.745 ×10⁻³	93.1	2.001	1.257 ×10⁻⁴	1.264 ×10⁻³	91.9

(50, 0, 200, 0, 100, 100, 50, 0)	β₀ = 1.0	0.988	0.130	0.124	97.2	0.983	2.849 ×10⁻³	6.160 ×10⁻³	97.4	0.987	2.939 ×10⁻³	3.632 ×10⁻³	96.9
	β₁ = 1.0	1.011	0.305	0.292	97.2	1.005	5.615 ×10⁻³	8.373 ×10⁻³	97.7	1.008	7.312 ×10⁻³	8.101 ×10⁻³	97.6
	β₂ = 2.0	2.003	0.112	0.100	97.0	1.998	2.241 ×10⁻⁴	1.448 ×10⁻²	97.3	2.001	9.505 ×10⁻⁵	8.266 ×10⁻³	95.1
	β₃ = 3.0	3.000	0.022	0.020	97.1	3.003	1.849 ×10⁻⁴	5.171 ×10⁻³	97.1	3.001	< 1.0 × 10⁻⁴	1.088 ×10⁻³	93.9

Open in a new tab

Sim:V_cc, Sim:V_cr and Sim:V_ac are the simulated variances of CC, CR and AC methods. V_cc, V_cr and V_ac are the variances of CC, CR and AC methods using the formula derived in Section 4 with MLE’s plug-in. 95% CP is the 95% coverage probability.

6.2 Simulation Study II: Logistic Regression Model with MAR Response and Covariate Data

A simulation with 1000 replicates was conducted to numerically compare the CC, CR and AC methods in a logistic regression model. The estimates using the full data (FD) before missing are also provided as a benchmark of other methods. In each simulation, we generated 500 binary samples from a logistic regression model logit(P(y_i = 1)) = β₀ + β₁z_i1 + β₂x_i1, where z_i1 was simulated from Unif(0, 1) and x_i1 was simulated from a Bernoulli distribution with the success probability modeled as logit(P(x_i1 = 1)) = α₀ + α₁z_i1. The covariate z_i1 is completely observed for all subjects, and x_i1 and the response y_i are missing at random (MAR) for some subjects. The missing mechanisms for y_i and x_i1 are logit(P(r_iy = 1)) = ϕ₂₀ + ϕ₂₁z_i1 and logit(P(r_ix = 1)) = ϕ₁₀ + ϕ₁₁z_i1 + ϕ₁₂r_iyy_i, where r_iy = 1 or r_ix = 1 if y_i or x_i1 is observed, 0 otherwise. On average, 31.4% samples have completely observed covariate and response, 34.8% have missing covariate but observed response, 15.0% have observed covariate but missing response, and 18.8% have missing covariate and missing response.

Table 5 gives the simulation results of the logistic regression model. The AC method provides estimates with higher precision (smaller standard error) and lower mean square error (MSE) than the CR method for all the parameters. Both the CR and AC methods are uniformly better than the CC method in terms of bias, simulated standard error and MSE.

Table 5.

Simulation for Logistic Regression Model

Method		β₀ = −1.8	β₁ = 1.0	β₂ = 2.0	α₀ = −0.5	α₁ = 2.0
CC	Bias	0.224	0.483	0.059	0.104	0.273
	SE	0.450	0.723	0.408	0.351	0.672
	SSE	0.466	0.720	0.419	0.351	0.680
	CP%	99.5	99.6	97.8	98.8	98.4
	MSE	0.484	1.268	0.355	0.257	1.000

CR	Bias	−0.036	−0.008	0.059	−0.017	0.056
	SE	0.378	0.544	0.437	0.372	0.687
	SSE	0.408	0.527	0.419	0.339	0.650
	CP%	97.9	98.7	97.9	98.7	97.7
	MSE	0.333	0.555	0.355	0.230	0.847

AC	Bias	−0.035	−0.007	0.055	−0.019	0.051
	SE	0.377	0.528	0.417	0.262	0.539
	SSE	0.402	0.516	0.415	0.250	0.533
	CP%	97.7	98.3	97.6	98.5	97.7
	MSE	0.324	0.533	0.347	0.126	0.571

FD	Bias	−0.020	0.006	0.023	−0.011	0.016
	SE	0.245	0.366	0.223	0.186	0.344
	SSE	0.247	0.364	0.225	0.181	0.337
	CP%	97.4	97.5	97.4	97.4	98.0
	MSE	0.123	0.264	0.102	0.065	0.227

Open in a new tab

$Bias = \hat{β} - β_{true}$ , SE is the mean of the standard error calculated by Louis’s formula, SSE is the simulated standard error, CP is the coverage probability, MSE = Bias² + SSE² is the mean square error.

6.3 Analysis of Small Cell Lung Cancer Data

We consider a real dataset to compare the three analysis methods in terms of bias and efficiency. We consider a lung cancer dataset from a recent phase III clinical trial (Socinski et al., 2002) of non-small-cell lung cancer (NSCLC), which is the leading cause of cancer-related mortality. In the year 2001, among approximately 170,000 patients newly diagnosed, more than 90% died from NSCLC and approximately 35% of all new cases were stage IIIB/IV (malignant pleural effusion) the disease. A randomized, two-armed, multi-center trial was initiated in 1998 with the aim to determine the optimal duration of chemotherapy by comparing four cycles of therapy versus continuous therapy in advanced NSCLC. Patients were randomized to two treatment arms: four cycles of carboplatin at an area under the curve of 6 and paclitaxel 200 mg/m² every 21 days (arm A), or continuous treatment with carboplatin/paclitaxel until progression (arm B). At progression, all patients on both arms received second-line weekly paclitaxel at 80 mg/m². One of the primary endpoints was quality of life (QOL). There were n = 230 patients in this dataset. The response variable considered in this analysis is the quality of life (QOL) factg score. The covariates included in the model were treatment (trt, 0=arm A, 1=arm B), gender (0=female, 1=male), Histology (hist, 0=Non-Squamous, 1=Squamous), age at entry in years, highest grade toxicity (recorded by cycle) (apex, 0 if highest grade toxicity =0 and 1 if highest grade toxicity > 0), and recovery status (recov, 0 if recovered and 1 otherwise). For these six covariates, apex and recov had missing information and trt, gender, hist, and age were completely observed for all cases. In this population, 63% of the patients were male, and the age at entry ranged from 32 to 82 with a mean of 62. The missing data fractions were 28% in apex, 54% in recov, and 35% in factg. There was a total missing data fraction of 74% on apex, recov, and factg.

We use a linear regression to model the response variable, factg, as

{factg}_{i} = β_{0} + β_{1} {trt}_{i} + β_{2} {gender}_{i} + β_{3} {hist}_{i} + β_{4} {age}_{i} + + β_{5} {apex}_{i} + β_{6} {recov}_{i} + ε_{i} .

We consider two models for the missing covariates recov and apex as follows.

Model M1

\begin{matrix} logit ({apex}_{i} = 1) = α_{10} + α_{11} {trt}_{i} + α_{12} {gender}_{i} + α_{13} {hist}_{i} + α_{14} {age}_{i}, \\ logit ({recov}_{i} = 1 | {apex}_{i} = 1) = α_{20} + α_{21} {trt}_{i} + α_{22} {gender}_{i} + α_{23} {hist}_{i} + α_{24} {age}_{i} + α_{25} {apex}_{i} . \end{matrix}

Model M2

\begin{matrix} logit ({recov}_{i} = 1) = α_{10} + α_{11} {trt}_{i} + α_{12} {gender}_{i} + α_{13} {hist}_{i} + α_{14} {age}_{i}, \\ logit ({apex}_{i} = 1 | {recov}_{i}) = α_{20} + α_{21} {trt}_{i} + α_{22} {gender}_{i} + α_{23} {hist}_{i} + α_{24} {age}_{i} + α_{25} {recov}_{i} . \end{matrix}

Table 6 shows the results for the CC, CR and AC methods discussed in Section 4. We assume that the missing data are MAR so that a missing data mechanism need not be considered in the estimation scheme for β. As shown in the table, the overall conclusions are the same for the CR and AC methods, as these two methods yield similar p-values for the various regression coefficients. significance level. However, the CR and AC methods yield more significant p-values than the CC analysis, especially for the age effect. Table 6 also shows that the estimates of β and the standard errors of the estimated regression coefficients are quite similar for model M1 and model M2, indicating robustness of estimates to the the choice of covariate distribution. The EM algorithm was implemented for computing all maximum likelihood estimates. The convergence criterion for the EM algorithm was that the squared distance between the k^th and (k + 10)^th iterations was less than 10⁻⁷. The EM algorithm required 25 iterations to converge under both model M1 and model M2.

Table 6.

Lung Cancer Data Analysis

		Model M1			Model M2
Method	Effect	Estimate	SE	P-value	Estimate	SE	P-value

CC	Intercept	79.008	3.536	< 0.001	79.008	3.536	< 0.001
	trt	−2.366	3.222	0.463	−2.366	3.222	0.463
	gender	−3.033	3.431	0.377	−3.033	3.431	0.377
	hist	3.011	3.694	0.415	3.011	3.694	0.415
	age	3.049	1.587	0.055	3.049	1.587	0.055
	apex	4.825	5.004	0.335	4.825	5.004	0.335
	recov	−0.485	3.421	0.887	−0.485	3.421	0.887
	$σ_{MLE}^{2}$	147.896	27.230	< 0.001	147.896	27.230	< 0.001

CR	Intercept	81.565	2.912	< 0.001	81.573	2.910	< 0.001
	trt	0.743	2.543	0.770	0.730	2.540	0.774
	gender	−6.048	2.580	0.019	−6.034	2.576	0.019
	hist	2.003	3.076	0.515	2.007	3.076	0.514
	age	3.640	1.244	0.003	3.641	1.244	0.003
	apex	3.473	6.088	0.568	3.433	6.078	0.572
	recov	−4.676	4.319	0.279	−4.700	4.317	0.276
	$σ_{MLE}^{2}$	210.197	25.511	< 0.001	210.183	25.525	< 0.001

AC	Intercept	81.533	2.768	< 0.001	81.535	2.762	< 0.001
	trt	0.905	2.556	0.723	0.905	2.550	0.723
	gender	−5.778	2.516	0.021	−5.777	2.508	0.021
	hist	2.067	3.064	0.500	2.074	3.057	0.497
	age	3.527	1.234	0.004	3.532	1.232	0.004
	apex	3.227	6.168	0.601	3.199	6.154	0.603
	recov	−5.326	4.362	0.222	−5.343	4.348	0.219
	$σ_{MLE}^{2}$	208.812	25.667	< 0.001	208.436	25.573	< 0.001

Open in a new tab

Although we have assumed that data are MAR and a missing data mechanism need not be modeled, it is of some interest if we could determine the best fitting MAR missing data mechanism, so that we could at least (though somewhat ad-hoc) determine whether the missing data are MAR or MCAR. Towards this goal, we posited several different MAR and MCAR missing data mechanisms, and using the complete cases to fit these models as logistic regression in SAS. We then computed the log-likelihood statistic to determine the best fitting missing data mechanism. We considered 5 missing data mechanism, two of them are MCAR, and the other three are MAR. Let r_i,factg, r_i,apex, and r_i,recov denote the missing data indicators for factg, apex, and recov, respectively. To determine the final log-likelihood statistic value, we added the three contributions from the three parts of the binary regression models for r_i,factg, r_i,apex, and r_i,recov. Two MCAR models are [r_i,factg][r_i,apex][r_i,recov] (MCAR1) and [r_i,factg|r_i,apex, r_i,recov][r_i,apex][r_i,recov|r_i,apex] (MCAR2), where, for example, [r_i,factg] denotes a logistic regression model with intercept only and [r_i,factg|r_i,apex, r_i,recov] is a logistic regression model with intercept and covariates r_i,apex and r_i,recov. Let x_i,obs = (trt_i, gender_i, hist_i, age_i). Three MAR models include [r_i,factg|x_i,obs, apex_ir_i,apex, recov_ir_i,recov] [r_i,apex|x_i,obs][r_i,recov|x_i,obs, apex_ir_i,apex] (MAR1), [r_i,factg|x_i,obs, r_i,apex, r_i,recov][r_i,apex|x_i,obs] [r_i,recov|x_i,obs, r_i,apex] (MAR2), and [r_i,factg|trt_i, gender_i, r_i,apex, r_i,recov][r_i,apex|trt_i, gender_i] [r_i,recov|trt_i, gender_i, r_i,apex] (MAR3). For the lung cancer data, the log-likelihood statistics, −2 log(likelihood), are 889.4, 873.5, 868.5, 856.6, and 860.5 under models MCAR1, MCAR2, MAR1, MAR2, and MAR3, respectively. We see from these results that the best fitting model is MAR2 as the missing data mechanism suggesting that the missing data are missing at random.

7 Discussion

We have given several results regarding bias and efficiency of estimates in missing (x, y) regression problems, and have shown that the AC analysis provides the most efficient estimates and the least biased estimates in the MAR setting. The results derived in Section 4, Section 5, and Section 6 are new and important and shed light on the bias and efficiency of estimates in regression problems with MCAR or MAR responses and/or covariates. In Section 4.1 and 4.2, the variances are assumed to be known. This assumption can be relaxed. The asymptotic variance and covariance matrix of the MLE’s under each analysis method for the simple linear regression model with unknown variances is derived in Appendix B. With the known variances, as shown in Result 4.2, the AC analysis does not improve the efficiency of the MLE for β₁ over the CR analysis. When the variances are unknown, it is interesting to see, from Appendix B, that the AC analysis improves the efficiency of the MLE for ²₁ over both the CC and CR analsyese. Thus, the AC analysis becomes even more important in this case. However, the derivation of the asymptotic variance and covariance matrix of the MLE’s under each analysis method for the multiple linear regression model with unknown variances becomes very lengthy and hence, detailed derivations are omitted for brevity.

Finally, we mention that we have assumed throughout that jointly, (x_i, y_i) are iid. This is by far the most common approach in regression settings with missing covariate and/or response data. We note here, however, that since inference typically focuses on the parameters of [y_i|x_i], the y_i’s conditional on the x_i’s are not iid, but rather only independent. This development is still quite general since is covers settings such as the linear model and generalized linear models with MAR covariate and/or response data. Future work involves examination of the proposed methods for dependent responses, including dynamic linear models, models for longitudinal data, and generalized linear mixed models. Such theoretical investigations are currently being examined. The initial investigation taken here is the first of its kind, and should lead to fruitful results for other types of models.

Note: Blank block stands for missing, marked block stands for observed.

Acknowledgements

The authors wish to thank the Editor, the Associate Editor and two referees for helpful comments and suggestions which have improved the paper. Dr. Ibrahim’s and Dr. Chen’s research was partially supported by National Institute of Health (NIH) grant numbers GM 70335 and CA 74015.

Appendix A

Computational Development

We describe the model fitting and computational procedures for each of the analysis methods. For CC, the MLE of θ can be obtained by standard statistical software such as SAS. Here, we consider only for AC as the computation of the MLE’s under CR is similar to and even easier than AC.

We first consider the case where all x_i,mis’s are categorical. In this case, we use the EM algorithm via the method of weights proposed by Ibrahim (1990). Let θ^(t) = (β^(t), ϕ^(t), α^(t)) denote the value of θ at the t^th iteration of EM algorithm.

The E-step at the (t + 1)^st iteration can be written as

\begin{matrix} Q (θ | θ^{(t)}) & = & \sum_{i : y_{i} observed, M_{i} = θ} [log f (y_{i} | x_{i}, β, ϕ) + log f (x_{i} | α)] \\ + \sum_{i : y_{i} observed, M_{i} \neq θ} \sum_{x_{i, mis} (j)} w_{i j, (t)} [log f (y_{i} | x_{i}, (j), β, ϕ) + log f (x_{i} (j) | α)] \\ + \sum_{i : y_{i} missing, M_{i} = θ} log f (x_{i} | α) \\ + \sum_{i : y_{i} missing, M_{i} \neq θ, M_{i} \neq Ω} \sum_{x_{i, mis} (j)} w_{i j, (t)} log f (x_{i} (j) | α) \\ = & Q^{(1)} (β, ϕ | θ^{(t)}) + Q^{(2)} (α | θ^{(t)}), \end{matrix}

(A.1)

where $x_{i} (j) = (x_{i, obs}^{'}, x_{i, mis}^{'} (j))'$ ,

\begin{matrix} Q^{(1)} (β, ϕ | θ^{(t)}) & = & \sum_{i : y_{i} observed, M_{i} = θ} log f (y_{i} | x_{i}, β, ϕ) \\ + \sum_{i : y_{i} observed, M_{i} \neq θ} \sum_{x_{i, mis} (j)} w_{i j, (t)} log f (y_{i} | x_{i} (j), β, ϕ) \end{matrix}

(A.2)

and

\begin{matrix} Q^{(2)} (α | θ^{(t)}) = & \sum_{i : y_{i} observed or missing, M_{i} = θ} log f (x_{i} | α) \\ + \sum_{i : y_{i} observed, M_{i} \neq θ} \sum_{x_{i, mis} (j)} w_{i j, (t)} log f (x_{i} (j) | α) \\ + \sum_{i : y_{i} missing, M_{i} \neq θ, M_{i} \neq Ω} \sum_{x_{i, mis} (j)} w_{i j, (t)} log f (x_{i} (j) | α) . \end{matrix}

(A.3)

The inner sum extends over all of the possible values of the missing components of the covariate vector, with j indexing the distinct covariate patterns for subject i.

The weights, w_ij,(t), are the conditional probabilities corresponding to [x_i,mis|x_i,obs, y_i θ] or [x_i,mis|x_i,obs, α] and are given by

\begin{matrix} w_{i j, (t)} = f (x_{i, mis} (j) | x_{i, obs}, y_{i}, θ^{(t)}) \\ = & \frac{f (y_{i} | x_{i, obs}, x_{i, mis} (j), β^{(t)}, ϕ^{(t)}) f (x_{i, obs}, x_{i, mis} (j) | α^{(t)})}{\sum_{x_{i, mis} (j)} f (y_{i} | x_{i, obs}, x_{i, mis} (j), β^{(t)}, ϕ^{(t)}) f (x_{i, obs}, x_{i, mis} (j) | α^{(t)})} \end{matrix}

w_{i j, (t)} = f (x_{i, mis} (j) | x_{i, obs}, α^{(t)}) = \frac{f (x_{i, obs}, x_{i, mis} (j) | α^{(t)})}{\sum_{x_{i, mis} (j)} f (x_{i, obs}, x_{i, mis} (j) | α^{(t)})} .

The M-step at the (t + 1)^st iteration proceeds as follows. We first compute

\underset{β, ϕ}{(β^{(t + 1)}, ϕ^{(t + 1)}) = arg max Q^{(1)} (β, ϕ | θ^{(t)})}

and

\underset{α}{α^{(t + 1)} = arg max Q^{(2)} (α | θ^{(t)})} .

When we use the saturated model for x_i,mis, we have α = (α_(j)), where α_(j) denotes the probability of the j^th missing pattern. In this case, we update

α_{(j)}^{(t + 1)} = \frac{\sum_{i = 1}^{n} w_{i j, (t)}}{n}

where when x_i is completely observed, w_ij,(t) = 1 if x_i(j) = x_i and w_ij,(t) = 0 if x_i(j) ≠ x_i.

Let θ̂ denote the estimate of θ at EM convergence. We use Louis’s method (Louis, 1982) to compute the estimated observed information matrix of θ based on the observed data. Write the matrix of second derivatives of Q(θ|θ^(t)) as

\ddot{Q} (θ | θ^{(t)}) = (\begin{matrix} \frac{\partial^{2}}{\partial γ \partial γ^{'}} Q^{(1)} (β, ϕ | θ^{(t)}) & 0 \\ 0 & \frac{\partial^{2}}{\partial α \partial α^{'}} Q^{(2)} (α | θ^{(t)}) \end{matrix}),

where γ = (β′, ϕ′)′,

\begin{matrix} \frac{\partial^{2}}{\partial γ \partial γ^{'}} Q^{(1)} (β, ϕ | θ^{(t)}) = & \sum_{i : y_{i} observed, M_{i} = θ} \frac{\partial^{2}}{\partial γ \partial γ^{'}} log f (y_{i} | x_{i}, β, ϕ) \\ + \sum_{i : y_{i} observed, M_{i} \neq θ} \frac{\partial^{2}}{\partial γ \partial γ^{'}} \sum_{x_{i, mis} (j)} w_{i j, (t)} log f (y_{i} | x_{i} (j), β, ϕ), \end{matrix}

and

\begin{matrix} \frac{\partial^{2}}{\partial α \partial α^{'}} Q^{(2)} (α | θ^{(t)}) = & \sum_{i : y_{i} observed or missing, M_{i} = θ} \frac{\partial}{\partial^{2} α \partial α^{'}} log f (x_{i} | α) \\ + \sum_{i : y_{i} observed, M_{i} \neq θ} \sum_{x_{i, mis} (j)} w_{i j, (t)} \frac{\partial^{2}}{\partial α \partial α^{'}} log f (x_{i} (j) | α) \\ + \sum_{i : y_{i} missing, M_{i} \neq θ, M_{i} \neq Ω} \sum_{x_{i, mis} (j)} w_{i j, (t)} \frac{\partial^{2}}{\partial α \partial α^{'}} log f (x_{i} (j) | α) . \end{matrix}

Write the gradient vector of Q_i(θ|θ^(t)) for the i^th observation as

\begin{matrix} {\dot{Q}}_{i} (θ | θ^{(t)}) \\ = & {\begin{matrix} \frac{\partial}{\partial θ} log [f (y_{i} | x_{i}, β, ϕ) f (x_{i} | α)] & if y_{i} observed, M_{i} = θ \\ \sum_{x_{i, mis} (j)} w_{i j, (t)} \frac{\partial}{\partial θ} log [f (y_{i} | x_{i} (j), β, ϕ) f (x_{i} (j) | α)] & if y_{i} observed, M_{i} \neq θ \\ \frac{\partial}{\partial θ} log f (x_{i} | α) & if y_{i} missing, M_{i} = θ \\ \sum_{x_{i, mis} (j)} w_{i j, (t)} \frac{\partial}{\partial θ} log f (x_{i} (j) | α) & if y_{i} missing, M_{i} \neq θ, M_{i} \neq Ω . \end{matrix} \end{matrix}

In addition, write the complete data score vector S_i(θ|x_i, y_i) as

S_{i} (θ | x_{i}, y_{i}) = {\begin{matrix} \frac{\partial}{\partial θ} log [f (y_{i} | x_{i}, β, ϕ) f (x_{i} | α)] & if y_{i} observed \\ \frac{\partial}{\partial θ} log f (x_{i} | α) & if y_{i} missing . \end{matrix}

Then, the estimated observed information matrix of θ̂ is given by

\begin{matrix} I (\hat{θ}) = & - \ddot{Q} (\hat{θ} | \hat{θ}) \\ - {[\sum_{i : M_{i} \neq θ} \sum_{x_{i, mis (j)}} w_{i j, (t)} S_{i} (\hat{θ} | x_{i} (j), y_{i}) S_{i} (\hat{θ} | x_{i} (j), y_{i})^{'}] - \sum_{i : M_{i} \neq θ} {\dot{Q}}_{i} (\hat{θ} | \hat{θ}) {\dot{Q}}_{i} (\hat{θ} | \hat{θ})^{'}}, \end{matrix}

(A.4)

where the weights, w_ij,(t), are computed at EM convergence. Thus, the estimate of the asymptotic covariance matrix of θ̂ is [I (θ̂)]⁻¹.

When missing covariates are continuous or mixed Continuous and categorical, a Monte Carlo EM (MCEM) algorithm is required. The implementation of the MCEM is similar to the EM algorithm for the categorical missing covariates, and is developed in detail in Ibrahim, Lipsitz and Chen (1999), and Ibrahim, Chen, and Lipsitz (1999). Specifically, we replace the weight average in (A.2), (A.3) and (A.4) by a Monte Carlo average. For example, in the E-Step, for missing covariates in the Block B₂, we take an MCMC sample of size $m_{i}^{(t)}, x_{i, mis}^{(t 1)}, x_{i, mis}^{(t 2)}, \dots, x_{i, mis}^{(t m^{(t)})},$ from

f (x_{i, mis} | x_{i, obs}, y_{i}, θ^{(t)}) \propto f (y_{i} | x_{i, obs}, x_{i, mis}, β^{(t)}, ϕ^{(t)}) f (x_{i, obs}, x_{i, mis} | α^{(t)}) .

Then, we compute

\begin{matrix} Q^{(1)} (β, ϕ | θ^{(t)}) = & \sum_{i : y_{i} observed, M_{i} = θ} log f (y_{i} | x_{i}, β, ϕ) \\ + \sum_{i : y_{i} observed, M_{i} \neq θ} \frac{1}{m^{(t)}} \sum_{l = 1}^{m^{(t)}} log f (y_{i} | x_{i}^{(t l)}, β, ϕ), \end{matrix}

$x_{i}^{(t l)} = (x_{i, obs}^{'}, x_{i, mis}^{(t l)^{'}})^{'}$ . We then take m^(t+1) = m^(t) + Δm, where Δm > 0. In such a way, the MCEM algorithm requires much less computational time, as a large m^(t) is not needed in early iterations of the algorithm.

Appendix B

Simple Linear Regression Model with Unknown Variances

We consider a simple normal regression model with a single covariate and unknown variances here. In this case, we have

f (y_{i} | x_{i}, β, σ) = \frac{1}{\sqrt{2 π σ^{2}}} exp {- \frac{{(y_{i} - β_{0} - β_{1} x_{i})}^{2}}{2 σ^{2}}} and f (x_{i} | α, τ) = \frac{1}{\sqrt{2 π τ^{2}}} exp {- \frac{{(x_{i} - α)}^{2}}{2 τ^{2}}} .

Write θ = (β′, σ², α, τ²)′. Let n_j = #(B_j) be the cardinality of B_j for j = 1, 2, 3 and n = n₁ + n₂ + n₃. For the CC analysis, we have

l_{c c} (θ) = \sum_{i : y_{i} observed, M_{i} = θ} [- log (2 π \sqrt{σ^{2} τ^{2}}) - \frac{{(y_{i} - β_{0} - β_{1} x_{i})}^{2}}{2 σ^{2}} - \frac{{(x_{i} - α)}^{2}}{2 τ^{2}}]

and the Fisher information matrix is given by

I_{c c} (θ) = - E [\frac{\partial^{2}}{\partial θ \partial θ^{'}} l_{c c} (θ)] = n_{1} (\begin{matrix} 1 / σ^{2} & α / σ^{2} & 0 & 0 & 0 \\ α / σ^{2} & (τ^{2} + α^{2}) / σ^{2} & 0 & 0 & 0 \\ 0 & 0 & 1 / (2 σ^{4}) & 0 & 0 \\ 0 & 0 & 0 & 1 / τ^{2} & 0 \\ 0 & 0 & 0 & 0 & 1 / (2 τ^{4}) \end{matrix}) .

For the CR analysis, we have

l_{c r} (θ) = l_{c c} (θ) + \sum_{i \in B_{2}} {- log \sqrt{2 π} - \frac{log (υ)}{2} - \frac{{(y_{i} - β_{0} - β_{1} α)}^{2}}{2 υ}},

where $υ = σ^{2} + β_{1}^{2} τ^{2}$ . After some messy algebra, we obtain the Fisher information matrix given by

I_{c r} (θ) = - E [\frac{\partial^{2}}{\partial θ \partial θ^{'}} l_{c r} (θ)] = I_{c c} (θ) + \frac{n_{2}}{υ^{2}} (\begin{matrix} υ & α υ & 0 & β_{1} υ & 0 \\ α υ & α^{2} υ + 2 β_{1}^{2} τ^{4} & β_{1} τ^{2} & α β_{1} υ & β_{1}^{3} τ^{2} \\ 0 & β_{1} τ^{2} & 1 / 2 & 0 & β_{1}^{2} / 2 \\ β_{1} υ & α β_{1} υ & 0 & β_{1}^{2} υ & 0 \\ 0 & β_{1}^{3} τ^{2} & β_{1}^{2} / 2 & 0 & β_{1}^{4} \end{matrix}) .

For the AC analysis, the log-likelihood function is given by

l_{a c} (θ) = l_{c r} (θ) + \sum_{i : y_{i} missing, M_{i} = θ} log f (x_{i} | α) = l_{c r} (θ) + \sum_{i \in B_{3}} [- log \sqrt{2 π τ^{2}} - \frac{{(x_{i} - α)}^{2}}{2 τ^{2}}] .

The corresponding Fisher information matrix is given by

I_{a c} (θ) = - E [\frac{\partial^{2}}{\partial θ \partial θ^{'}} l_{a c} (θ)] = I_{c r} (θ) + n_{3} (\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 / τ^{2} & 0 \\ 0 & 0 & 0 & 0 & 1 / (2 τ^{4}) \end{matrix}) .

Based on either the determinant or the trace of Fisher information matrix, AC yields most gain in information over both CR and CC, and CR gains more information than CC. Specifically, we have

\begin{matrix} det (I_{a c}) - det (I_{c r}) \\ = & \frac{n_{1} n_{3}}{4 υ^{5} σ^{8} τ^{4}} [n_{1}^{2} υ^{5} (2 n_{1} + 3 n_{2} + n_{3}) + n_{1} n_{2} σ^{2} υ^{4} (n_{1} + 2 n_{2} + 2 n_{3}) + n_{2} σ^{2} υ^{3} (n_{2} n_{3} σ^{2} + n_{1} n_{2} β_{1}^{2} τ^{2} \\ + n_{1} n_{3} β_{1}^{2} τ^{2}) + n_{2}^{2} β_{1}^{2} σ^{2} τ^{2} υ^{2} (n_{3} σ^{2} + 2 n_{1} β_{1}^{2} τ^{2}) + n_{2}^{2} β_{1}^{4} σ^{2} τ^{4} υ (n_{2} σ^{2} + n_{1} β_{1}^{2} τ^{2}) + n_{2}^{3} β_{1}^{6} σ^{4} τ^{6}] \\ > & 0, \end{matrix}

\begin{matrix} det (I_{c r}) - det (I_{c c}) \\ = & \frac{n_{1}^{2} n_{2}}{4 υ^{4} σ^{8} τ^{4}} [n_{1} υ^{4} (2 n_{1} + n_{2}) + n_{1} β_{1}^{4} τ^{4} υ^{2} (n_{1} + n_{2}) + n_{2} β_{1}^{4} σ^{2} τ^{4} (n_{1} + n_{2}) (υ + β_{1}^{2} τ^{2})] > 0, \end{matrix}

tr (I_{a c}) - tr (I_{c r}) = \frac{n_{3} (1 + 2 τ^{2})}{2 τ^{4}} > 0,

and

tr (I_{c r}) - tr (I_{c c}) = \frac{n_{2}}{2 υ^{2}} [2 υ (1 + β_{1}^{2} + α^{2}) + 4 β_{1}^{2} τ^{4} + 2 β_{1}^{4} + 1] > 0 .

CR leads to smaller asymptotic variances for all parameters than CC. Specifically, we have

\begin{matrix} Var ({\hat{β}}_{1, c c}) - Var ({\hat{β}}_{1, c r}) & = & 2 n_{2} β_{1}^{2} σ^{4} (n_{2} β_{1}^{4} τ^{4} + n_{1} υ^{2}) / a_{1} > 0, \\ Var ({\hat{σ}}_{c c}^{2}) - Var ({\hat{σ}}_{c r}^{2}) & = & 2 n_{2} σ^{8} (n_{2} β_{1}^{4} τ^{4} + n_{1} υ^{2}) / a_{1} > 0, \\ Var ({\hat{α}}_{c c}) - Var ({\hat{α}}_{c r}) & = & n_{2} β_{1}^{2} τ^{4} / n_{1} (n_{1} + n_{2}) υ > 0, \\ Var ({\hat{τ}}_{c c}^{2}) - Var ({\hat{τ}}_{c r}^{2}) & = & 2 n_{2} β_{1}^{4} τ^{8} [2 n_{1} υ^{2} + n_{2} σ^{2} (υ + β_{1}^{2} τ^{2})] / a_{1} > 0, \end{matrix}

where $a_{1} = n_{1} (n_{1}^{2} υ^{4} + n_{2}^{2} β_{1}^{4} σ^{2} τ^{4} (υ + β_{1}^{2} τ^{2}) + n_{1} n_{2} υ^{2} (υ^{2} + β_{1}^{4} τ^{4}))$ .

In addition, AC improves the asymptotic variances over CR. Specifically, we have

\begin{matrix} Var ({\hat{β}}_{1, c r}) - Var ({\hat{β}}_{1, a c}) & = & 2 υ^{4} n_{2}^{2} n_{3} β_{1}^{6} σ^{4} τ^{4} / a_{2} a_{3} > 0, \\ Var ({\hat{σ}}_{c r}^{2}) - Var ({\hat{σ}}_{a c}^{2}) & = & 2 υ^{4} n_{2}^{2} n_{3} β_{1}^{4} σ^{8} τ^{4} / a_{2} a_{3} > 0, \\ Var ({\hat{α}}_{c r}) - Var ({\hat{α}}_{a c}) & = & n_{3} τ^{2} (n_{2} σ^{2} + n_{1} υ) / n_{1} (n_{1} + n_{2}) (n_{2} n_{3} σ^{2} + n_{1} (n_{1} + n_{2} + n_{3}) υ) υ > 0, \\ Var ({\hat{τ}}_{c r}^{2}) - Var ({\hat{τ}}_{a c}^{2}) & = & 2 n_{3} τ^{4} υ^{4} {[n_{1} υ^{2} + n_{2} σ^{2} (υ + β_{1}^{2} τ^{2})]}^{2} / a_{2} a_{3} > 0, \end{matrix}

where $a_{2} = υ^{4} n_{1} (n_{1} + n_{2}) + n_{2} β_{1}^{4} σ^{4} υ (n_{1} υ + n_{2} σ^{2}) + n_{2}^{2} β_{1}^{6} σ^{2} τ^{6}, and a_{3} = n_{1} (n_{1} + n_{2} + n_{3}) υ^{4} + n_{2} σ^{2} (υ + β_{1}^{2} τ^{2}) (n_{3} υ^{2} + n_{2} β_{1}^{4} τ^{4}) + n_{1} n_{2} β_{1}^{4} τ^{4} υ^{2}$ .

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

Baker SG, Laird NM. Regression Analysis for Categorical Variables with Outcome Subject to Nonignorable Nonresponse. Journal of the American Statistical Association. 1988;83:62–69. [Google Scholar]
Chen Q, Ibrahim JG. Semiparametric Models for Missing Covariate and Response Data in Regression Models. Biometrics. 2006;62:177–184. doi: 10.1111/j.1541-0420.2005.00438.x. [DOI] [PubMed] [Google Scholar]
Chen M-H, Ibrahim JG, Shao Q-M. Propriety of the Posterior Distribution and Existence of the Maximum Likelihood Estimator for Regression Models with Covariates Missing at Random. Journal of the American Statistical Association. 2004;99:421–438. [Google Scholar]
Herring AH, Ibrahim JG. Likelihood-based Methods for Missing Covariates in the Cox Proportional Hazards Model. Journal of the American Statistical Association. 2001;96:292–302. [Google Scholar]
Horton NJ, Laird NM. Maximum Likelihood Analysis of Generalized Linear Models with Missing Covariates. Statistical Methods in Medical Research. 1999;8:37–50. doi: 10.1177/096228029900800104. [DOI] [PubMed] [Google Scholar]
Ibrahim JG. Incomplete data in generalized linear models. Journal of the American Statistical Association. 1990;85:765–769. [Google Scholar]
Ibrahim JG, Chen M-H, Lipsitz SR. Monte Carlo EM for Missing Covariates in Parametric Regression Models. Biometrics. 1999;55:591–596. doi: 10.1111/j.0006-341x.1999.00591.x. [DOI] [PubMed] [Google Scholar]
Ibrahim JG, Chen M-H, Lipsitz SR. Missing Responses in Generalized Linear Mixed Models When the Missing Data Mechanism Is Nonignorable. Biometrika. 2001;88:551–564. [Google Scholar]
Ibrahim JG, Chen M-H, Lipsitz SR, Herring AH. Missing Data Methods for Generalized Linear Models: A Comparative Review. Journal of the American Statistical Association. 2005;100:332–346. [Google Scholar]
Ibrahim JG, Lipsitz SR, Chen M-H. Missing Covariates in Generalized Linear Models When the Missing Data Mechanism is Nonignorable. Journal of the Royal Statistical Society, Series B. 1999;61:173–190. [Google Scholar]
Lipsitz SR, Ibrahim JG, Zhao LP. A New Weighted Estimating Equation for Missing Covariate Data with Properties Similar to Maximum Likelihood. Journal of the American Statistical Association. 1999;94:1147–1160. [Google Scholar]
Little RJA. Regression with Missing X’s: A Review. Journal of the American Statistical Association. 1992;87:1227–1237. [Google Scholar]
Louis TA. Finding the Observed Information Matrix When Using the EM Algorithm. Journal of the Royal Statistical Society, Series B. 1982;44:226–233. [Google Scholar]
Robins JM, Rotnitzky A. Semiparametric Efficiency in Multivariate Regression Models with Missing Data. Journal of the American Statistical Association. 1995;90:122–129. [Google Scholar]
Socinski MA, Schell MJ, Peterman A, Bakri K, Yates S, Gitten R, Unger P, Lee J, Lee Ji, Tynan M, Moore M, Kies M. Phase III Trial Comparing Defined Duration of Therapy Versus Continuous Therapy Followed by Second-LineTherapy in Advanced-Stage IIIB/IV Non-Small-Cell Lung Cancer. Journal of Clinical Oncology. 2002;20:1335–1343. doi: 10.1200/JCO.2002.20.5.1335. [DOI] [PubMed] [Google Scholar]
Stubbendick AL, Ibrahim JG. Maximum Likelihood Methods for Nonignorable Responses and Covariates in Random Effects Models. Biometrics. 2003;59:1140–1150. doi: 10.1111/j.0006-341x.2003.00131.x. [DOI] [PubMed] [Google Scholar]
Stubbendick AL, Ibrahim JG. Likelihood-based Inference with Nonignorable Missing Responses and Covariates in Models for Discrete Longitudinal Data. Statistica Sinica. 2006;16:1143–1167. [Google Scholar]
Tang G, Little RJA, Raghunathan TE. Analysis of Multivariate Missing Data with Nonignorable Nonresponse. Biometrika. 2003;90:747–764. [Google Scholar]

[R1] Baker SG, Laird NM. Regression Analysis for Categorical Variables with Outcome Subject to Nonignorable Nonresponse. Journal of the American Statistical Association. 1988;83:62–69. [Google Scholar]

[R2] Chen Q, Ibrahim JG. Semiparametric Models for Missing Covariate and Response Data in Regression Models. Biometrics. 2006;62:177–184. doi: 10.1111/j.1541-0420.2005.00438.x. [DOI] [PubMed] [Google Scholar]

[R3] Chen M-H, Ibrahim JG, Shao Q-M. Propriety of the Posterior Distribution and Existence of the Maximum Likelihood Estimator for Regression Models with Covariates Missing at Random. Journal of the American Statistical Association. 2004;99:421–438. [Google Scholar]

[R4] Herring AH, Ibrahim JG. Likelihood-based Methods for Missing Covariates in the Cox Proportional Hazards Model. Journal of the American Statistical Association. 2001;96:292–302. [Google Scholar]

[R5] Horton NJ, Laird NM. Maximum Likelihood Analysis of Generalized Linear Models with Missing Covariates. Statistical Methods in Medical Research. 1999;8:37–50. doi: 10.1177/096228029900800104. [DOI] [PubMed] [Google Scholar]

[R6] Ibrahim JG. Incomplete data in generalized linear models. Journal of the American Statistical Association. 1990;85:765–769. [Google Scholar]

[R7] Ibrahim JG, Chen M-H, Lipsitz SR. Monte Carlo EM for Missing Covariates in Parametric Regression Models. Biometrics. 1999;55:591–596. doi: 10.1111/j.0006-341x.1999.00591.x. [DOI] [PubMed] [Google Scholar]

[R8] Ibrahim JG, Chen M-H, Lipsitz SR. Missing Responses in Generalized Linear Mixed Models When the Missing Data Mechanism Is Nonignorable. Biometrika. 2001;88:551–564. [Google Scholar]

[R9] Ibrahim JG, Chen M-H, Lipsitz SR, Herring AH. Missing Data Methods for Generalized Linear Models: A Comparative Review. Journal of the American Statistical Association. 2005;100:332–346. [Google Scholar]

[R10] Ibrahim JG, Lipsitz SR, Chen M-H. Missing Covariates in Generalized Linear Models When the Missing Data Mechanism is Nonignorable. Journal of the Royal Statistical Society, Series B. 1999;61:173–190. [Google Scholar]

[R11] Lipsitz SR, Ibrahim JG, Zhao LP. A New Weighted Estimating Equation for Missing Covariate Data with Properties Similar to Maximum Likelihood. Journal of the American Statistical Association. 1999;94:1147–1160. [Google Scholar]

[R12] Little RJA. Regression with Missing X’s: A Review. Journal of the American Statistical Association. 1992;87:1227–1237. [Google Scholar]

[R13] Louis TA. Finding the Observed Information Matrix When Using the EM Algorithm. Journal of the Royal Statistical Society, Series B. 1982;44:226–233. [Google Scholar]

[R14] Robins JM, Rotnitzky A. Semiparametric Efficiency in Multivariate Regression Models with Missing Data. Journal of the American Statistical Association. 1995;90:122–129. [Google Scholar]

[R15] Socinski MA, Schell MJ, Peterman A, Bakri K, Yates S, Gitten R, Unger P, Lee J, Lee Ji, Tynan M, Moore M, Kies M. Phase III Trial Comparing Defined Duration of Therapy Versus Continuous Therapy Followed by Second-LineTherapy in Advanced-Stage IIIB/IV Non-Small-Cell Lung Cancer. Journal of Clinical Oncology. 2002;20:1335–1343. doi: 10.1200/JCO.2002.20.5.1335. [DOI] [PubMed] [Google Scholar]

[R16] Stubbendick AL, Ibrahim JG. Maximum Likelihood Methods for Nonignorable Responses and Covariates in Random Effects Models. Biometrics. 2003;59:1140–1150. doi: 10.1111/j.0006-341x.2003.00131.x. [DOI] [PubMed] [Google Scholar]

[R17] Stubbendick AL, Ibrahim JG. Likelihood-based Inference with Nonignorable Missing Responses and Covariates in Models for Discrete Longitudinal Data. Statistica Sinica. 2006;16:1143–1167. [Google Scholar]

[R18] Tang G, Little RJA, Raghunathan TE. Analysis of Multivariate Missing Data with Nonignorable Nonresponse. Biometrika. 2003;90:747–764. [Google Scholar]

PERMALINK

Theory and Inference for Regression Models with Missing Responses and Covariates

Qingxia Chen

Joseph G Ibrahim

Ming-Hui Chen

Pralay Senchaudhuri

Summary

1 Introduction

2 Model and Data Structure

2.1 Model

2.2 Missing Data Structures

Table 1.

3 Three Analysis Methods

3.1 Complete Case (CC) Analysis

3.2 Complete Response (CR) Analysis

3.3 All Case (AC) Analysis

4 Theoretical Comparisons Between CC, CR, and AC for the Normal Linear Regression Model

4.1 Simple Linear Regression Model with Missing Responses and Covariates

Result 4.1

Result 4.2

4.2 Multiple Linear Regression Model with Missing Responses and Covariates

Result 4.3

Result 4.4

Result 4.5

Remark 1

Remark 2

Remark 3

Remark 4

Remark 5

Table 2.

5 Analysis of Bias

6 Simulation Studies and a Real Data Example

6.1 Simulation Study I: Normal Linear Regression Model with MCAR Response and Covariate Data

Table 3.

Table 4.

6.2 Simulation Study II: Logistic Regression Model with MAR Response and Covariate Data

Table 5.

6.3 Analysis of Small Cell Lung Cancer Data

Model M1

Model M2

Table 6.

7 Discussion

Figure 1. Missing Patterns.

Acknowledgements

Appendix A

Computational Development

Appendix B

Simple Linear Regression Model with Unknown Variances

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases