Bayesian Case Influence Measures for Statistical Models with Missing Data

Hongtu Zhu; Joseph G Ibrahim; Hyunsoon Cho; Niansheng Tang

doi:10.1198/jcgs.2011.10139

. Author manuscript; available in PMC: 2013 Feb 6.

Published in final edited form as: J Comput Graph Stat. 2010 Aug 1;21(1):253–271. doi: 10.1198/jcgs.2011.10139

Bayesian Case Influence Measures for Statistical Models with Missing Data

Hongtu Zhu ¹, Joseph G Ibrahim ², Hyunsoon Cho ^3,^*, Niansheng Tang ⁴

PMCID: PMC3565846 NIHMSID: NIHMS265375 PMID: 23399928

Abstract

We examine three Bayesian case influence measures including the φ-divergence, Cook's posterior mode distance and Cook's posterior mean distance for identifying a set of influential observations for a variety of statistical models with missing data including models for longitudinal data and latent variable models in the absence/presence of missing data. Since it can be computationally prohibitive to compute these Bayesian case influence measures in models with missing data, we derive simple first-order approximations to the three Bayesian case influence measures by using the Laplace approximation formula and examine the applications of these approximations to the identification of influential sets. All of the computations for the first-order approximations can be easily done using Markov chain Monte Carlo samples from the posterior distribution based on the full data. Simulated data and an AIDS dataset are analyzed to illustrate the methodology.

Keywords: Case influence measures, Cook distance, First-order approximation, ϕ-divergence, Markov chain Monte Carlo

1 Introduction

One of the main goals of any statistical analysis is to assess model assumptions and model fit. Towards this goal, it is critical to assess the influence of individual cases (or generally, a set of observations) on an analysis and to identify influential observations (or sets of observations) and/or outliers (Cook, 1977; Cook and Weisberg, 1982; McCulloch, 1989; Geisser, 1975, 1993). In Bayesian analysis, considerable research has been devoted to developing single case influence measures for various specific statistical models including generalized linear models, time series models, and survival models (Johnson and Geisser, 1983; Johnson, 1985; Pettit, 1986; Kass et al., 1989; Carlin and Polson, 1991; Gelfand et al., 1992; Geisser, 1993; Blyth, 1994; Peng and Dey, 1995; Christensen, 1997; Bradlow and Zaslavsky, 1997). The influence of an individual observation or set of observations is often assessed by deleting the observation (or set of observations) and then comparing the the posterior (or predictive) distribution based on the full data compared to that of posterior (or predictive) distribution based on the deleted observation (observations).

There are four major types of Bayesian case influence measures. These are posterior probabilities of outlying sets, posterior outlier statistics, predictive diagnostics, and posterior diagnostics for identifying outliers and influential points (Zhu et al., 2010). Computing posterior probabilities of outlying sets is conceptually simple (Box and Tiao, 1968), but it is difficult to implement computationally in most models with missing data. So far, this approach is limited to several simple regression models, such as the classical linear regression model with conjugate priors (Abraham and Box, 1979). The posterior outlier statistic is based on using the posterior distribution of an outlier statistic, such as the raw residual, to define outliers and calculate the posterior probability that an observation is an outlier. This method is computationally simple and has been further extended to generalized linear models, survival models, latent variable models, state space models, and many others (Chaloner, 1991; Albert and Chib, 1993; Lee, 2007). Predictive diagnostics assess the discordance of a set of observations based on their predictive distribution (Gelfand et al., 1992; Geisser, 1993; Gelfand and Dey, 1994). Predictive diagnostics are also conceptually simple, but computing them can be difficult for regression models with missing data. In contrast to predictive diagnostics, posterior diagnostics compare the posterior distributions of the parameters given the complete data and the reduced data (Csiszár, 1967; Weiss and Cook, 1992).

Despite the extensive literature on Bayesian diagnostics for various types of models, very little has been done on systematically examining Bayesian case influence measures using case deletion, namely the ϕ-divergence, Cook's posterior mode distance and Cook's posterior mean distance, in statistical models for both dependent and independent data in the presence of missing data. We refer the reader to a review of Bayesian diagnostics in Zhu et al. (2010) and Peng and Dey (1995). Computationally, as shown later, it can be very difficult to directly compute such Bayesian case influence measures for many complex models with missing data (Molenberghs and Kenward, 2007; Molenberghs and Verbeke, 2005; Lee, 2007; Ibrahim et al., 2005; Daniels and Hogan, 2008; Little, 1992; Little and Rubin, 2002; Ibrahim and Molenberghs, 2009; Skrondal and Rabe-Hesketh, 2004). For example, in the real data analysis presented in Section 3.2, we present a Bayesian diagnostic analysis for a complex Bayesian structural equations model with nonignorable missing data, for which it is infeasible to compute the exact values of these Bayesian case influence measures. This setting thus facilitates the need for deriving computationally feasible approximations for these Bayesian case influence measures.

The aims of this paper are to systematically examine the above-mentioned Bayesian case influence measures based on case deletion, to derive their first-order approximations, and to evaluate their roles in detecting a set of influential observations for a variety of regression models with missing data. By using a Laplace approximation (Kass et al., 1990; Tierney et al., 1989), we show that under some mild conditions, the first-order approximations hold for a large class of statistical models for both dependent and independent data within the Bayesian framework. We extensively examine the accuracy of these first-order approximations for the three Bayesian case influence measures using both theoretical results and simulation studies. Specifically, we show that the first-order approximations are quite accurate and all of the computations for the first-order approximations can be easily numerically computed using Markov chain Monte Carlo (MCMC) samples from the posterior distribution based on the full data.

The rest of this paper is organized as follows. In Section 2, we introduce the three Bayesian case influence measures and propose computational formulas for them. We derive two first-order approximations to the Bayesian case influence measures. In Section 3, we illustrate the proposed methodology for latent variable models with missing data. We conclude the paper with some discussions in Section 4.

2 Methods

2.1 Bayesian Case Influence Measures

Let p(Y∣θ) be the probability function for a random vector $Y^{T} = (Y_{1}^{T}, \dots, Y_{n}^{T})$ , parameterized by an unknown parameter vector θ = (θ₁, …, θ_p)^T in an open subset Θ of R^p. Moreover, the dimension of Y_i = (y_i₁, …, y_{im_i})^T, denoted by m_i, can vary across all i. For example, in longitudinal studies and mixed models, m_i is the number of observations in each cluster and this may vary significantly across the clusters. Let p(θ) be the prior distribution of θ. The posterior distribution for the full data Y is given by p(θ∣Y) ∝ p(Y∣θ)p(θ).

We are interested in assessing the influence of deleting a set of observations, denoted by S, on posterior inferences regarding θ. Let $N = \sum_{i = 1}^{n} m_{i}$ and N_S be, respectively, the total number of observations and the number of observations in the set S. A subscript ‘[S]’ denotes the relevant quantity with all observations in S deleted. For instance, if S = {i}, then Y_[_S_] is the corresponding observed data with all of Y_i deleted, whereas if S = {i₁, i₂}, then Y_[_S_] is the corresponding observed data with Y_i₁ and Y_i₂ deleted. Furthermore, we may set S = {i₁, …, i_k} and S = {(i₁, j₁), …, (i_k, j_k)} to allow for more complicated case deletion schemes. Let Y_S denote a subsample of Y consisting of all the observations in S and let Y_[_S_] denote a subsample of Y with all observations in Y_S deleted. The posterior distribution for a subsample of the data Y is given by p(θ∣Y_[_S_]) ∝ p(Y_[_S_]∣θ)p(θ).

Now, we examine three types of Bayesian case influence measures based on case deletion. The first type is the ϕ–influence of Y_[_S_], defined by

D_{ϕ} (S) = \int ϕ (R_{[S]} (θ)) p (θ ∣ Y) d θ,

(1)

where R_[_S_](θ) = p(θ∣Y_[_S_])/p(θ∣Y) and ϕ(·) is a convex function with ϕ(1) = 0 (Weiss and Cook, 1992; Weiss, 1996). D_ϕ(S) directly measures the distance (discrepancy) between two posterior distributions p(θ∣Y_[_S_]) and p(θ∣Y) (Csiszár, 1967; Weiss and Cook, 1992) and a large value of D_ϕ(S) corresponds to a set of influential observations. Various forms of ϕ(·) have been widely considered in the literature (Kass et al., 1989; Weiss and Cook, 1992; Blyth, 1994; Peng and Dey, 1995; Weiss, 1996). For instance, ϕ(·) can be chosen to be ϕ_α(u), which is defined by 4{1 − u⁽¹⁺^α^)/2}/{1 − α²) for α ≠ ±1, u log(u) for α = 1, and − log(u) for α = −1. In particular, ϕ₁(·) and ϕ₋₁(·) lead to the Kullback-Leibler divergence (K-L divergence); moreover, ϕ(u) = ϕ₁(u) + ϕ₋₁(u) leads to the symmetric K-L divergence. The L₁–distance and the χ²–divergence correspond to ϕ(u) = 0.5|u − 1| and ϕ(u) = (u − 1)², respectively (Weiss, 1996).

The second Bayesian influence measure assesses the discrepancy between the posterior mode of θ with and without the ith case (Cook and Weisberg, 1982). We call this measure Cook's posterior mode distance. Specifically, we define the posterior modes of θ for the full sample Y and a subsample Y_[_S_] as θ̂ = argmax_θ log p(θ∣Y) and θ̂_[_S_] = argmax_θ log p(θ∣Y_[_S_]), respectively. Then, Cook's posterior mode distance for comparing Y and Y_[_S_], denoted by CP(S), can be defined as follows:

CP (S) = {({\hat{θ}}_{[S]} - \hat{θ})}^{T} G_{θ} ({\hat{θ}}_{[S]} - \hat{θ}),

(2)

where G_θ is chosen to be $- \partial_{θ}^{2} log p (θ ∣ Y) = - \partial_{θ}^{2} log p (Y ∣ θ) - \partial_{θ}^{2} log p (θ)$ evaluated at θ̂, where $\partial_{θ}^{2}$ represents the second-order derivative with respect to θ. If we consider a uniform improper prior for θ, then CP(S) reduces to the well-known Cook's distance for deleting a set of observations (Cook and Weisberg, 1982). A large value of CP(S) implies more influence of the set S on the posterior mode.

The third type of Bayesian influence measure assesses the distance between the posterior mean of θ with and without the observations in S. We define the posterior mean of θ for the full sample Y and a subsample Y_[_S_] as θ̃ = ∫ θ · p(θ∣Y)dθ and θ̃_[_S_] = ∫ θ · p(θ∣Y_[_S_])dθ, respectively. Cook's posterior mean distance for deleting the observations in the set S, denoted by CM(S), can then be defined as follows:

CM (S) = {({\tilde{θ}}_{[S]} - \tilde{θ})}^{T} W_{θ} ({\tilde{θ}}_{[S]} - \tilde{θ}),

(3)

where W_θ is chosen to be the inverse of the full-data posterior covariance matrix of θ. A large value of CM(S) corresponds to an influential set S regarding the posterior mean.

Although all three Bayesian case influence measures assess the influence of a set of observations, there is a conceptual difference among those measures. D_ϕ(S) quantifies the effects of deleting a set of observations on the overall posterior distribution, whereas CP(S) and CM(S) quantify the effects of deleting a set of observations on the posterior mode and the posterior mean of θ, respectively. Since D_ϕ(S) measures the overall difference between p(θ∣Y) and p(θ∣Y_[_S_]), and such a difference may include shape, mode, mean etc., D_ϕ(S) can be more sensitive to some changes of the posterior distributions other than the posterior mean or posterior mode due to the deletion of the observations in S compared with CP(S) and CM(S). However, compared with D_ϕ(S), CP(S) and CM(S) may be more sensitive to a change in the posterior mean or posterior mode.

2.2 Computational Formula and its Difficulties

When p(Y∣θ) is relatively easy to compute, all three Bayesian case influence measures can be computed using only MCMC samples from the full posterior distribution, p(θ∣Y). We define p_S(θ), the ratio of likelihoods with and without the observations in S as

p_{S} (θ) = \frac{p (Y ∣ θ)}{p (Y_{[S]} ∣ θ)} = p (Y_{S} ∣ Y_{[S]}, θ),

(4)

which is the conditional distribution of Y_S given Y_[_S_]. Then, we have

p (θ ∣ Y_{[S]}) = {[p_{S} (θ)]}^{- 1} p (Y ∣ θ) p (θ) / \int {[p_{S} (θ)]}^{- 1} p (Y ∣ θ) p (θ) d θ .

Thus, following Weiss (1996), the computational formula for D_ϕ(S) can be obtained as

D_{ϕ} (S) = E_{θ ∣ Y} [ϕ (\frac{{[p_{S} (θ)]}^{- 1}}{E_{θ ∣ Y} {{[p_{S} (θ)]}^{- 1}}})],

(5)

where E_θ_∣_Y denotes the expectation taken with respect to the posterior distribution p(θ∣Y). Specifically, for the K-L divergence (ϕ(u) = − log(u)), the computational formula is given by D_ϕ(S) = log E_θ_∣_Y{[p_S(θ)]⁻¹} + E_θ_∣_Y{log[p_S(θ)]}. It is well recognized that the accuracy of approximating D_ϕ(S) depends heavily on the variability of p_S(θ) (Epifani et al., 2008; Peruggia, 1997).

To compute CP(S), we need to evaluate θ̂ and θ̂_S. In general, the posterior mode of θ does not have a closed analytic form, thus we have to rely on iterative methods such as Newton-Raphson to obtain θ̂ and θ̂_[_S_]. However, this can be computationally intensive for most models, such as state space models. G_θ in CP(S) can be analytically obtained by evaluating $J_{N} (θ) = - \partial_{θ}^{2} log p (θ ∣ Y) = - \partial_{θ}^{2} log p (Y ∣ θ) - \partial_{θ}^{2} log p (θ)$ at θ̂.

Since we can write θ̃ = E_θ_∣_Y(θ) and

{\tilde{θ}}_{[S]} = E_{θ ∣ Y} {θ \cdot {[p_{S} (θ)]}^{- 1}} / E_{θ ∣ Y} {{[p_{S} (θ)]}^{- 1}},

(6)

we can easily compute CM(S) using MCMC samples from the full posterior distribution, p(θ∣Y). Specifically, the posterior mean of θ, denoted θ̃, can be obtained directly by averaging the MCMC samples and W_θ can be analytically obtained by evaluating J_N(θ) at θ̃. Furthermore, G_θ can be approximated by the inverse of the posterior covariance matrix, obtained from the MCMC samples. Based on the above discussion, computing (5) and (6) strongly depends on the the computation of p_S(θ) = p(Y_S∣Y_[_S_], θ).

It can be computationally quite cumbersome to approximate p_S(θ) in the presence of missing data, which therefore makes the computation of the three Bayesian case influence measures infeasible. To see this fact, denote the missing data by $Y_{mis} = {(Y_{1, mis}^{T}, \dots, Y_{n, mis}^{T})}^{T}$ and the complete data by $Y_{com} = {(Y_{1, com}^{T}, \dots, Y_{n, com}^{T})}^{T}$ , in which Y_i,com = (Y_i,mis, Y_i) for i = 1, …, n. Let p(Y_com∣θ) be the probability function for Y_com. We define Y_com_,[_S_] = (Y_[_S_], Y_mis) as the complete data after deleting all observations in Y_S and p(Y_com_,[_S_]∣θ) is the probability function for Y_com_,[_S_] such that ∫ p(Y_com_,[_S_]∣θ)dY_mis = p(Y_[_S_]∣θ). This kind of model structure is very general and subsumes most commonly used models, such as GLMs with missing responses and/or covariates and random-effects models (Ibrahim et al., 2005, 2010; Zhu et al., 2001; Molenberghs and Kenward, 2007; Molenberghs and Verbeke, 2005; Lee, 2007; Ibrahim et al., 2005; Daniels and Hogan, 2008; Little, 1992; Little and Rubin, 2002; Ibrahim and Molenberghs, 2009; Skrondal and Rabe-Hesketh, 2004). With missing data, the primary computational challenge lies in the computation of p_S(θ), because

p_{S} (θ) = \frac{\int p (Y, Y_{mis} ∣ θ) d Y_{mis}}{\int p (Y_{[S]}, Y_{mis} ∣ θ) d Y_{mis}}

(7)

typically involves high-dimensional integrals.

Example 1

To illustrate the methodological development, we consider n independent observations {Y_i,com = (x_i, z_i, r_i, y_i), i = 1, …, N}, where y_i is the response variable, x_i is a p₁ × 1 vector of completely observed covariates, and z_i is a p₂ × 1 vector of partially observed covariates. Moreover, let z_mis,i and z_obs,i denote the missing and observed components of z_i, respectively. Let r_i be a p₂ × 1 vector, whose j^th component, r_ij, equals 1 if z_ij is observed, and 0 if z_ij is missing. We assume that p(x_i, z_i, r_i, y_i∣θ) = p(y_i∣x_i, z_i, θ)p(x_i, z_i∣θ) p(r_i∣y_i, x_i, z_i, θ), where θ denotes the vector of unknown parameters.

We assume a generalized linear model for p(y_i∣x_i, z_i, β, τ) given by

p (y_{i} ∣ x_{i}, z_{i}, β, τ) = exp {a_{i}^{- 1} (τ) [y_{i} η_{i} (β) - b (η_{i} (β))] + c (y_{i}, τ)}

(8)

for i = 1, …, n, where a_i(·), b(·) and c(·, ·) are known functions, η_i = η(μ_i) and $μ_{i} = g ((x_{i}^{'}, z_{i}^{'}) β)$ , in which g(·) is a known link function, β = (β₁, …, β_p)′ and p = p₁ + p₂. We assume that

p (x_{i}, z_{i} ∣ α) = p (z_{i p_{2}} ∣ z_{i, p_{2} - 1}, \dots, z_{i 1}, x_{i}, α) \times \dots \times p (z_{i 1} ∣ x_{i}, α) p (x_{i} ∣ α) .

(9)

Similarly, we model the missing-data mechanism p(r_i∣y_i, x_i, z_i, ξ) as

p (r_{i} ∣ y_{i}, x_{i}, z_{i}, ξ) = p (r_{i p_{2}} ∣ r_{i, p_{2} - 1}, \dots, r_{i 1}, y_{i}, x_{i}, z_{i}, ξ) \times \dots \times p (r_{i 1} ∣ y_{i}, x_{i}, z_{i}, ξ) .

(10)

Here, θ = (β, τ, α, ξ). To carry out a Bayesian analysis, we need to specify a prior for θ. Following Huang, Chen and Ibrahim (2005), we specify a prior of θ such that p(θ) = p(τ)p(β)p(ξ)p(α).

Now we consider the deletion of the i–th observation (x_i, z_i, r_i, y_i), that is S = {i}. With some calculations, we get

p_{{i}} (θ) = \int p (y_{i} ∣ x_{i}, z_{i}, β, τ) p (x_{i}, z_{i} ∣ α) p (r_{i} ∣ y_{i}, x_{i}, z_{i}, ξ) d z_{mis, i},

(11)

which may involve intractable integrals when the dimension of z_mis,i is relatively large. Although one may be able to use some numerical methods (e.g., the trapezoidal rule) to approximate p_S(θ), the accuracy of such approximations can be impossible to assess when the integrals are high dimensional. This setting thus requires the derivation of computationally simple approximations to these Bayesian case influence measures.

2.3 First-order Approximations

For diagnostic purposes, it is desirable to derive computationally feasible approximations to these case influence measures. We obtain the following theorems, whose detailed proofs can be found in the the supplementary document.

Theorem 1

If Assumptions C1-C5 in the supplementary document hold and N_S is bounded by a fixed constant, then we have the following results:

D_ϕ(S) can be approximated by

$D_{ϕ} (S) = 0.5 \ddot{ϕ} (1) {[\partial_{θ} log p_{S} (\hat{θ})]}^{T} {[J_{N} (\hat{θ})]}^{- 1} [\partial_{θ} log p_{S} (\hat{θ})] [1 + O_{p} (N^{- 1})],$ (12)

where ${\ddot{ϕ} (1) = \partial_{u}^{2} ϕ (u) ∣}_{u = 1}$ .
The one-step approximation for θ̂_[S] is given by

${\hat{θ}}_{[S]} = \hat{θ} + O_{p} (N^{- 1}) = \hat{θ} - {[J_{N} (\hat{θ})]}^{- 1} \partial_{θ} log p_{S} (\hat{θ}) [1 + O_{p} (N^{- 1})] .$ (13)
The one-step approximation for θ̃_[S] is given by

${\tilde{θ}}_{[S]} = \tilde{θ} - {[J_{N} (\hat{θ})]}^{- 1} \partial_{θ} log p_{S} (\hat{θ}) [1 + O_{p} (N^{- 1})] .$ (14)
2D_ϕ(S)/ϕ̈(1), CP(S), and CM(S) are asymptotically equivalent, that is,

$D_{ϕ} (S) = 0.5 \ddot{ϕ} (1) \times C P (S) + O_{p} (N^{- 2}) = 0.5 \ddot{ϕ} (1) \times C M (S) + O_{p} (N^{- 2}) .$ (15)

Theorem 1 has several important implications. Theorem 1 (a) provides a theoretical and computational approximation of D_ϕ(S) as a quadratic form in ∂_θ log p_S(θ̂). Theorem 1 (b) and (c) provide the one-step approximations of θ̂_[_S_] and θ̃_[_S_], which reduce the burden of computing θ̂_[_S_] and θ̃_[_S_] for each S. Moreover, to the best of our knowledge, Theorem 1 (d) is the first result that establishes a direct connection between D_ϕ(S), CP(S) and CM(S) for any ϕ(·) within the Bayesian framework. In particular, for ϕ_α(u) = − log(u), it can be shown that ${\partial_{u}^{2} ϕ_{α} (u) ∣}_{u = 1} = 1$ , which leads to D_ϕα(S) = 0.5CP(S) + O_p(N⁻²) = 0.5CM(S) + O_p(N⁻²) for all α. Furthermore, for the χ²–divergence and the symmetric K-L divergence, we have ${\partial_{u}^{2} ϕ (u) ∣}_{u = 1} = 2$ , which gives D_ϕ(S) = CP(S) + O_p(N⁻²) = CM(S) + O_p(N⁻²). However, for ϕ(u) = 0.5|u − 1|, because ϕ̈(1) = 0 and ϕ(u) is not differentiable at u = 1, the conditions of Theorem 1 are not valid. Thus, the approximation given in Theorem 1 (a) and the equivalence among the three diagnostic measures do not hold for the L₁–distance.

Practically, these approximations lead to computationally efficient formulas for approximating these Bayesian case influence measures. Because the first-order approximations hold for a large class of statistical models for both dependent and independent data, and in the presence of missing data within the Bayesian framework, they are reminiscent of the first-order approximations for various specific models within the frequentist framework (Christensen et al., 1992; Cook and Weisberg, 1982; Wei, 1998). For influential points, even though the accuracy of the first-order approximation may be relatively low, the first-order approximated measure can easily pick out these influential points. Thus, for diagnostic purposes, the first-order approximation may be more effective at identifying influential clusters compared with the three Bayesian case influence measures. We conduct simulation studies to investigate the performance of these first-order approximations relative to the exact formula in Section 3. See numerical comparisons in Table 4.

Table 4.

Frequency of correctly ranking the influential clusters as the top 4 clusters based on 100 replications for four different scenarios.

n	Type	D_ϕ	CP	CM	AP₁	AP₂
10	I	88	84	85	91	91
10	II	99	99	99	99	99
50	I	96	74	94	96	96
50	II	100	91	100	100	100

Open in a new tab

According to Theorem 1, to approximate these case influence measures, we only need to compute the posterior mean θ̃, the observed-data information matrix J_N(θ̃), ∂_θ log p_S(θ) evaluated at θ̃, and

{AP}_{1} (S; \tilde{θ}) = {[\partial_{θ} log p_{S} (\tilde{θ})]}^{T} {[J_{N} (\tilde{θ})]}^{- 1} [\partial_{θ} log p_{S} (\tilde{θ})] .

(16)

In particular, θ̃ and J_N(θ̃) can be easily computed from the MCMC samples based on the full data. Specifically, J_N(θ̃) can be approximated by using the Louis' formula (Louis, 1982). For most statistical models, the computation of ∂_θ log p_S(θ) = ∂_θ log p(Y∣θ) − ∂_θ log p(Y_[_S_]∣θ) is relatively straightforward. Specifically, we have

\partial_{θ} log p (Y ∣ \tilde{θ}) \approx \int \partial_{θ} log p (Y_{com} ∣ \tilde{θ}) p (Y_{mis} ∣ Y, \tilde{θ}) d Y_{mis},

(17)

\partial_{θ} log p (Y_{[S]} ∣ \tilde{θ}) \approx \int \partial_{θ} log p (Y_{com, [S]} ∣ \tilde{θ}) p (Y_{mis} ∣ Y, \tilde{θ}) d Y_{mis} .

(18)

Here, we use the fact that the posterior mean and posterior mode are asymptotically equivalent under suitable regularity conditions which are satisfied for the models considered here.

Example 1 (continued)

Let's consider deletion of the i–th observation (x_i, z_i, r_i, y_i). It can be shown that

\partial_{θ} log p_{{i}} (\tilde{θ}) = \int \partial_{θ} [log p (y_{i} ∣ x_{i}, z_{i}, β, τ) + log p (x_{i}, z_{i} ∣ α) + {log p (r_{i} ∣ y_{i}, x_{i}, z_{i}, ξ)] ∣}_{θ = \tilde{θ}} p (z_{mis, i} ∣ y_{i}, x_{i}, z_{obs, i}, \tilde{θ}) d z_{mis, i} .

Note that in most models with missing data, it is relatively easy to compute the first-order derivative of the complete-data log-likelihood function. Moreover, compared to the computation of the observed-data log-likelihood function, it is numerically much more stable to calculate the first-order derivative of the complete-data log-likelihood function. This is the key advantage of using (17) and (18) to approximate AP₁({i}; θ̃).

We now present the four key steps in computing AP₁(S; θ̃) in (16).

Step 1

Using the full data Y, we obtain the MCMC sample θ⁽^j⁾ for j = 1, …, J from p(Y∣θ) and estimate $\tilde{θ} = J^{- 1} \sum_{j = 1}^{J} θ^{(j)}$ .

Step 2

We use MCMC methods to draw samples ${Y_{mis}^{(j)} : j = 1, \dots, J}$ from p(Y_mis∣Y, θ̃) given θ̃ and Y.

Step 3

For each set S, we approximate J_N(θ̃) using Louis' formula and approximate ∂_θ log p(Y∣θ̃) and ∂_θ log p(Y_[_S_]∣θ̃) by

\partial_{θ} log p (Y ∣ \tilde{θ}) \approx J^{- 1} \sum_{j = 1}^{J} \partial_{θ} log p (Y_{mis}^{(j)}, Y_{obs} ∣ \tilde{θ}),

(19)

\partial_{θ} log p (Y_{[S]} ∣ \tilde{θ}) \approx J^{- 1} \sum_{j = 1}^{J} \partial_{θ} log p (Y_{mis}^{(j)}, Y_{[S]} ∣ \tilde{θ}) .

(20)

Step 4

Approximate AP₁(S; θ̃) using equation (16) for each set S.

Although we have systematically examined the deletion of a relatively small number of observations, it is common to delete relatively large numbers of observations for clustered data. Specifically, unbalanced clustered data are commonly collected from familial and longitudinal studies and we may be interested in deleting all the observations in a cluster, whose number may be comparable with the total number of observations N (Wang et al., 1999). We now obtain the following theorem for large cluster sizes.

Theorem 2

If Assumptions C1, C2, C3, C4′ and C5 in the supplementary document hold and N_S → ∞ and N_S/N → γ ∈ [0, 1), then we have the following results:

The one-step approximation for θ̂_[S] is given by

${\hat{θ}}_{[S]} = \hat{θ} + O_{p} (N^{- 1 / 2}) = \hat{θ} - {[J_{N, [S]} (\hat{θ})]}^{- 1} \partial_{θ} log p_{S} (\hat{θ}) [1 + O_{p} (N^{- 1 / 2})],$ (21)

where $J_{N, [S]} (θ) = - \partial_{θ}^{2} log p (θ ∣ Y_{[S]})$ . Ifγ = 0, then

${\hat{θ}}_{[S]} = \hat{θ} - {[J_{N} (\hat{θ})]}^{- 1} \partial_{θ} log p_{S} (\hat{θ}) [1 + O_{p} (N^{- 1 / 2}) + O_{p} (N_{S} / N)] .$ (22)
The one-step approximation for θ̃_[S] is given by

${\tilde{θ}}_{[S]} - \tilde{θ} = ({\hat{θ}}_{[S]} - \hat{θ}) [1 + o_{p} (1)] .$
CP(S) and CM(S) can be approximated by

$A P_{2} (S; \tilde{θ}) = {[\partial_{θ} log p_{S} (\tilde{θ})]}^{T} {[J_{N, [S]} (\tilde{θ})]}^{- 1} [J_{N} (\tilde{θ})] {[J_{N, [S]} (\tilde{θ})]}^{- 1} [\partial_{θ} log p_{S} (\tilde{θ})] .$

Ifγ = 0, then

$C P (S) = C M (S) [1 + o_{p} (1)] = A P_{2} (S; \tilde{θ}) [1 + o_{p} (1)] .$ (23)
D_ϕ(S) can be approximated by

$D_{ϕ} (S) = ϕ (A_{S}) + O_{p} (N^{- 1}),$ (24)

where A_S = σ × p(Y_[_S_]∣θ̂)p(θ̂)/[σ_[_S_] × p(Y_[_S_]∣θ̂_[_S_])p(θ̂_S)], σ² = |J_N(θ̂)/N|⁻¹ and $σ_{[S]}^{2} = {∣ J_{N, [S]} ({\hat{θ}}_{S}) / N ∣}^{- 1}$ .

Theorem 2 has several important implications. Theorem 2 (a) and (b) provide the one-step approximations of θ̂_[_S_] and θ̃_[_S_], which reduce the burden of computing θ̂_[_S_] and θ̃_[_S_] for each S. Theorem 2 (c) provides the theoretical approximations of CP(S) and CM(S). If N_S/N → 0, such as $N_{S} = \sqrt{N}$ , then CP(S) and CM(S) can be well approximated by AP₂(S; θ̃). Theorem 2 (d) shows that when N_S → ∞, D_ϕ(S), which can be approximated by ϕ(A_S), is not asymptotically equivalent to AP₂(S; θ̃) in any case. Therefore, we cannot use AP₂(S; θ̃) to characterize the asymptotic behavior of D_ϕ(S). Since calculating CM(S), CP(S), D_ϕ(S) and p(Y_[_S_]∣θ) can be computationally tedious for models with missing data, we generally suggest using their first-order approximations AP₁(S; θ̃) and AP₂(S; θ̃) for identifying influential observations. Moreover, we can easily develop a similar procedure for computing AP₂(S; θ̃).

3 Illustrative Examples

In this section, we illustrate our methodology with simulated data and a real dataset.

3.1 Simulated Data

The goal of our simulations was to evaluate the accuracy of the first-order approximations of the three Bayesian case influence measures in small and moderate sample sizes. We generated 100 data sets from a Binomial mixed model, which has been extensively studied in the literature (Molenberghs and Verbeke, 2005). Specifically, each data set contains n clusters. For each cluster, the random effect b_i, which can be regarded as “missing data”, was first independently generated from a N(0, σ²) distribution and then, given b_i, the observations y_ij (j = 1, 2, 3; i = 1, …, n) were independently generated from a Binomial random generator such that y_ij ∼ B(n_ij, p_ij) with p_ij satisfying $log (p_{i j} / (1 - p_{i j})) = x_{i j}^{T} β + b_{i}$ , in which the n_ij were randomly drawn from {1, …, 5}. Moreover, the covariates x_ij were set as (1, u_ij − 0.5)^T, and the u_ij were independently generated from a U[0, 1] distribution. For all 100 data sets, both the responses and covariates were repeatedly generated, while the true value of (β^T, σ²) was fixed at (0.5, 0.5, 0.5). The sample size n was set at 10 and 50, respectively, to represent small and moderate sample sizes.

For each simulated data set, we created two types of influential observations in order to compare the accuracy of the first-order approximations and their capability in the identification of these influential clusters. For the first type, we deleted all the observations in clusters n − 1 and n and then reset {n_i,j = 3 : j = 1, 2, 3; i = n − 1, n} and (b_n₋₁, b_n) = (4, −4) to generate y_i,j for i = n − 1, n and j = 1, 2, 3 according to the above binomial random effects model. Thus, the new (n − 1)th and nth clusters can be regarded as influential clusters due to the extreme values of b_n₋₁ and b_n. Moreover, the number of observations in these two clusters is relatively small compared to the total number of observations.

For the second type of influential observations, we deleted the observations from the n-th cluster and then reset {n_i,j = 10 : j = 1, 2, 3; i = n} and then generated {y_nj : j = 1, 2} with b_n = 0 and y_i₃ with b_n = −6 from the same model. Since different b_n values were used to generate different observations in the n–th cluster, the n-th cluster can be treated as an “outlier”. Moreover, the number of observations in the last cluster is relatively large compared to the total number of observations when n = 10.

For each data set, we deleted each cluster one at a time and then calculated the differences between the three Bayesian diagnostic measures and their first-order approximations for each cluster. Since b_i is one-dimensional, we used a standard numerical method to compute p_S(θ) = p(Y_S∣Y_[_S_], θ) and to approximate the exact values of the three Bayesian diagnostic measures based on (5) and (6). Moreover, for the true ‘good’ and influential clusters, we computed the average biases and standard errors of these differences (Tables 1-3). For the true ‘good’ clusters, the use of AP₂(S; θ̃) leads to smaller average biases and comparable standard errors to CP(S) and CW(S) compared with AP₁(S; θ̃). Increasing the sample size decreases the average bias and standard error of the first-order approximations. Compared with AP₂(S; θ̃), AP₁(S; θ̃) is a better approximation to D_ϕ(S).

Table 1.

Results from simulation studies for two types of influential observations for n = 10. Average biases (ABs) and standard errors (SEs) of the differences between the three Bayesian case influence measures and their first-order approximations. For the first type, clusters 9 and 10 are influential clusters, while cluster 10 is the outlying cluster for the second type.

	Cluster	AP₁ (AB)			AP₂ (AB)			AP₁ (SE)			AP₂ (SE)
	Cluster	D_ϕ	CP	CM	D_ϕ	CP	CM	D_ϕ	CP	CM	D_ϕ	CP	CM
I	1	-0.055	0.077	0.057	-0.119	0.013	-0.007	0.062	0.133	0.098	0.159	0.075	0.088
	2	-0.050	0.079	0.059	-0.124	0.005	-0.015	0.065	0.165	0.138	0.235	0.170	0.148
	3	-0.057	0.076	0.063	-0.136	-0.003	-0.016	0.074	0.141	0.118	0.211	0.089	0.086
	4	-0.053	0.070	0.055	-0.117	0.006	-0.008	0.064	0.088	0.086	0.147	0.075	0.065
	5	-0.059	0.058	0.045	-0.128	-0.011	-0.025	0.068	0.082	0.092	0.197	0.132	0.139
	6	-0.046	0.055	0.058	-0.101	0.001	0.003	0.047	0.106	0.104	0.119	0.071	0.065
	7	-0.062	0.086	0.062	-0.150	-0.003	-0.027	0.071	0.125	0.124	0.230	0.130	0.152
	8	-0.061	0.074	0.061	-0.137	-0.002	-0.015	0.055	0.135	0.107	0.180	0.113	0.111
	9*	-0.200	0.129	0.080	-0.532	-0.203	-0.252	0.100	0.223	0.201	0.478	0.327	0.355
	10*	-0.348	0.250	0.012	-1.380	-0.782	-1.021	0.108	0.249	0.220	0.913	0.789	0.923

II	1	-0.066	0.081	0.074	-0.151	-0.004	-0.011	0.080	0.137	0.147	0.239	0.180	0.216
	2	-0.050	0.053	0.048	-0.097	0.005	0.000	0.064	0.077	0.076	0.125	0.068	0.087
	3	-0.063	0.061	0.058	-0.132	-0.008	-0.011	0.069	0.097	0.108	0.172	0.098	0.129
	4	-0.073	0.088	0.074	-0.176	-0.016	-0.029	0.092	0.134	0.162	0.262	0.129	0.165
	5	-0.057	0.062	0.057	-0.109	0.010	0.005	0.066	0.103	0.100	0.134	0.085	0.093
	6	-0.069	0.063	0.052	-0.144	-0.013	-0.024	0.078	0.096	0.088	0.198	0.123	0.122
	7	-0.062	0.076	0.050	-0.137	0.000	-0.026	0.078	0.143	0.131	0.202	0.109	0.156
	8	-0.066	0.078	0.113	-0.172	-0.029	0.006	0.077	0.119	0.171	0.324	0.219	0.284
	9	-0.068	0.071	0.074	-0.162	-0.023	-0.020	0.088	0.173	0.186	0.304	0.234	0.269
	10*	0.085	0.681	0.301	-2.637	-2.041	-2.421	0.771	0.544	0.509	2.549	2.926	3.014

Open in a new tab

Table 3.

Selected results from simulation studies for n = 50 and the first type of influential observations. Average biases (ABs) and standard errors (SEs) of the differences between the three Bayesian case influence measures and their first-order approximations. Note that cluster 50 is the outlying cluster.

Cluster	AP₁ (AB)			AP₂ (AB)			AP₁ (SE)			AP₂ (SE)
Cluster	D_ϕ	CP	CM	D_ϕ	CP	CM	D_ϕ	CP	CM	D_ϕ	CP	CM
1	-0.019	0.008	0.010	-0.022	0.005	0.007	0.019	0.024	0.043	0.026	0.026	0.046
5	-0.020	0.011	0.005	-0.023	0.007	0.002	0.021	0.022	0.028	0.029	0.024	0.032
10	-0.016	0.010	0.011	-0.018	0.007	0.008	0.016	0.020	0.034	0.019	0.020	0.034
15	-0.016	0.014	0.010	-0.020	0.010	0.007	0.016	0.025	0.033	0.022	0.023	0.034
20	-0.020	0.012	0.007	-0.024	0.008	0.003	0.018	0.025	0.031	0.025	0.026	0.033
25	-0.016	0.009	0.006	-0.019	0.006	0.004	0.018	0.022	0.028	0.023	0.023	0.030
30	-0.019	0.009	0.014	-0.022	0.005	0.010	0.019	0.028	0.045	0.026	0.030	0.046
35	-0.026	0.008	0.004	-0.032	0.001	-0.002	0.028	0.031	0.047	0.040	0.034	0.057
40	-0.021	0.014	0.011	-0.025	0.010	0.006	0.021	0.028	0.038	0.029	0.030	0.040
45	-0.024	0.007	0.005	-0.031	0.000	-0.002	0.035	0.032	0.041	0.062	0.055	0.065
50*	-0.162	0.034	-0.170	-0.510	-0.314	-0.518	0.058	0.081	0.164	0.416	0.414	0.507

Open in a new tab

We also computed the frequency of correctly ranking the true influential/outlying clusters as the top four clusters based on the 100 simulated data sets for each scenario (Table 4). Due to the randomness introduced by random number generator, it is possible that few ‘good’ observations may appear as influential observations. Among all three case influence measures, D_ϕ(S) is more sensitive for detecting influential/outlying clusters compared to CP(S) and CW(S). Although AP₁(S; θ̃) and AP₂(S; θ̃) are not very accurate approximations to the three Bayesian case influence measures (Tables 1-3), they consistently selected the influential clusters as the top four clusters. Furthermore, the results in Table 4 actually indicate that compared with all three Bayesian case influence measures, AP₁(S; θ̃) and AP₂(S; θ̃) may be more effective in detecting influential/outlying clusters in all scenarios. Thus, for the purpose of detecting influential/outlying clusters, the performance of the first-order approximations AP₁(S; θ̃) and AP₂(S; θ̃) are quite satisfactory.

Finally, we generated a data set by using the same Binomial mixed model. We deleted all observations of the 38-th cluster with the smallest leverage (Figure 1) and then regenerated {y_38,_{j : j} = 1, 2, 3} by using the same model except that x_38,_j_,2 was changed into x_38,_j_,2 + 5.0. However, when we fitted the same Binomial mixed model, we used the original x_38,_j. Thus, the 38-th cluster can be regarded as an ‘outlying’ cluster since different x_38,_j values were used to generate observations. Index plots of CM(i) and D_ϕ(i) (Figure 2) indicate that CM(i) seems to be more sensitive to outliers with low leverage covariates.

Index plot of leverage values in simulation study.

Index plots of CM(i), *D_ϕ*(i), AP₁(i) and AP₂(i) in simulation study.

3.2 AIDS Data

We considered a small portion of a real dataset from a study of the relationship between acquired immune deficiency syndrome (AIDS) and the use of condoms (Morisky et al., 1998). This data set contained 11 items on knowledge about AIDS and beliefs, behaviors and attitudes towards condoms use collected from n = 1116 subjects. Nine of them were taken as responses in y_i = (y_i₁, …, y_i₉)^T and a continuous item x_i₁ (item 37) and an ordered categorical item x_i₂ (item 21, which was treated as continuous) were taken as covariates. The definitions of the nine items are given in the Appendix. In this dataset, the variables y_i₁, y_i₂, y_i₃, y_i₇, y_i₈ and y_i₉ were measured via a 5-point scale and hence were treated as continuous; the variables y_i₄, y_i₅ and y_i₆ were continuous. The responses and covariates are missing at least once for 361 subjects (32%), while the covariate x_i₂ is completely observed. The missing data patterns for the response variables are given in Table 4 of Lee and Tang (2006).

To fit the AIDS data, we considered a complex structural equations modeling in the presence of not missing at random (NMAR) responses and missing at random (MAR) covariates. The responses (y_i₁, y_i₂, y_i₃) are related to a latent variable, η_i, which can be interpreted as the ‘threat of AIDS’, while the responses (y_i₄, y_i₅, y_i₆) and (y_i₇, y_i₈, y_i₉) are respectively related to the latent variables ξ_i₁ and ξ_i₂, which can be respectively interpreted as ‘aggressiveness of the sex worker’ and ‘worry of contracting AIDS’. Specifically, to identify the relationship between the responses y_i and the latent variables ω_i = (η_i, ξ_i₁, ξ_i₂)^T, we consider the following measurement equation:

y_{i} = μ + Λ ω_{i} + ɛ_{i}, i = 1, \dots, n,

(25)

where μ = (μ₁, …, μ₉)^T is a vector of intercepts, (ξ_i₁, ξ_i₂) is independent of the measurement error vector ε_i, and (ξ_i₁, ξ_i₂) ∼ N(0, Φ) and ε_i ∼ N(0, Ψ), in which Ψ = diag(ψ₁, …, ψ₉) and Φ = (ϕ_ij) is a 2 × 2 covariance matrix. We also assume the following structure for Λ:

Λ^{T} = (\begin{matrix} 1.0 * & λ_{21} & λ_{31} & 0.0 * & 0.0 * & 0.0 * & 0.0 * & 0.0 * & 0.0 * \\ 0.0 * & 0.0 * & 0.0 * & 1.0 * & λ_{52} & λ_{62} & 0.0 * & 0.0 * & 0.0 * \\ 0.0 * & 0.0 * & 0.0 * & 0.0 * & 0.0 * & 0.0 * & 1.0 * & λ_{83} & λ_{93} \end{matrix}),

(26)

where 0.0* and 1.0* are regarded as fixed values to identify the scale of the latent factor. To study the relationship between η and (x₁, x₂, ξ₁, ξ₂), we consider the following nonlinear structural equations model:

η_{i} = b_{1} x_{i 1} + b_{2} x_{i 2} + γ_{1} ξ_{i 1} + γ_{2} ξ_{i 2} + γ_{3} ξ_{i 1} ξ_{i 2} + δ_{i},

(27)

where δ_i ∼ N(0, ψ_δ). We let r_yij = 1 if y_ij is missing and r_yij = 0 if y_ij is observed; r_xi₁ = 1 if x_i₁ is missing and r_xi₁ = 0 if x_i₁ is observed. Based on the missingness patterns, we assume that the missing data for the responses is NMAR whilst the missing data for the covariates is MAR. In particular, we consider the following missing data mechanism for y_ij:

logit {Pr (r_{yij} = 1 ∣ φ)} = φ_{0} + φ_{1} y_{i 1} + \dots + φ_{9} y_{i 9},

(28)

where φ = (φ₀, φ₁, …, φ₉)^T. Since x_i₁ may be missing, we have to specify a distribution for it. For simplicity, we assume that x_i₁ ∼ N(0, ψ_x). Posterior inference for the structural parameters is obtained via Markov chain Monte Carlo methods as introduced in (Lee and Tang, 2006).

Due to the computational complexity in calculating the exact values of the three Bayesian influence measures, we only calculated the case influence measures AP₁({i}) and AP₂({i}) (Figure 1). Both case influence measures identify subjects 14, 25, 28, 137, 168, 175, 274, 408, 492, and 985 as influential cases (Figs. 3 and 4). Among them, cases 14 and 985 stand out as the most influential cases. Inspecting Table 5 reveals that case 14 has the largest number of vaginal sex (y₄) in the last 7 days and case 985 has the largest number of blow jobs (y₅) in the last 7 days (Fig. 4 (a) and (c)). Cases 25, 28, 137, and 408 have the largest numbers of hand jobs (y₆) given in the last 7 days, cases 168, 175, and 274 have relatively large numbers of blow jobs compared with other subjects, and case 492 has the second largest number of vaginal sex in the last 7 days (Table 5).

AIDS data: index plots of diagnostic measures (a) AP₁(i) and (b) AP₂(i).

AIDS data: scatter plots of (a) (*y_i*₄, *y_i*₅), (b) (*y_i*₄, *y_i*₆), (c) (*y_i*₅, *y_i*₆), and (d) (*y_i*₄, *y_i*₅, *y_i*₆).

Table 5.

The observed responses and covariates of ten influential observations from the AIDS data set. Note that variables y₄, y₅, y₆ and x₁ are centered. ‘No’ denotes the subject number and missing value is represented as ·.

No	y₁	y₂	y₃	y₄	y₅	y₆	y₇	y₈	y₉	x₁	x₂
14	3	3	5	-0.03	0.98	20.88	3	4	3	4.12	1
25	4	4	3	3.86	9.29	10.30	5	1	4	-0.14	3
28	1	2	2	0.18	9.78	-0.28	3	1	3	-0.25	2
137	2	3	4	-0.46	9.29	-0.28	5	2	2	0.93	2
168	3	3	2	1.69	2.94	6.77	3	2	3	-0.33	4
175	4	4	·	1.69	4.40	6.77	5	1	·	0.93	4
274	5	3	5	3.86	-0.48	6.77	5	1	5	-0.37	3
408	2	5	5	1.69	6.85	0.42	5	·	5	1.84	5
492	2	5	2	8.19	0.98	-0.28	2	5	5	0.93	4
985	3	1	3	20.97	-0.48	-0.28	5	5	3	-0.45	1

Open in a new tab

Furthermore, following the classical linear model, we also calculated $h_{i} = x_{i}^{T} {(\sum_{k} x_{k} x_{k}^{T})}^{- 1} x_{i}$ for all of the fully observed x_i's, where Σ_k sums over all fully observed covariates. Among all 10 influential observations, only subject 14 has a high leverage point with h₁₄ = 0.0155. Inspecting all leverage values reveals that the 93rd subject, who has an extremely high leverage point, has the longest period as a prostitute among all subjects, whereas this subject is not an influential subject.

We further computed the posterior means and standard deviations of all the parameters with and without these 10 influential observations. We observed some effects of deleting these 10 influential observations on the parameters associated with y₄, y₅, and y₆ (Table 6). For instance, after deleting the influential observations, λ₆₂ was changed from 0.855 to 1.296 and φ₅ was changed from -0.413 to -0.297. This indicates that it is important to identify influential observations and assess their effects on the statistical inference in complex statistical models with missing data.

Table 6.

AIDS data: posterior means and standard deviations of the parameters with and without 10 influential observations. ES denotes the posterior mean and SD denotes the posterior standard deviation. Parameters with relatively large changes are highlighted.

Results without 10 influential observations

μ₁

μ₂

μ₃

μ_{4}^{*}

μ₅

μ₆

μ₇

μ₈

μ₉

0.213

0.164

0.258

-0.042

-0.041

-0.047

0.159

0.120

0.229

0.025

0.030

0.024

0.021

0.025

0.019

0.023

0.028

λ₂₁

λ₃₁

λ₅₂

λ_{62}^{*}

λ₈₃

λ₉₃

ψ_δ

ψ_x

0.340

0.579

1.957

0.855

0.691

1.176

0.230

1.314

0.093

0.101

0.305

0.126

0.160

0.272

0.035

0.861

ψ₁

ψ₂

ψ₃

ψ_{4}^{*}

ψ_{5}^{*}

ψ_{6}^{*}

ψ₇

ψ₈

ψ₉

0.348

0.802

0.398

0.423

0.399

0.336

0.430

0.770

0.474

0.038

0.039

0.025

0.022

0.055

0.019

0.030

0.036

0.039

b₁

b₂

γ₁

γ₂

γ₃

ϕ₁₁

ϕ₁₂

ϕ₂₂

φ₀

-0.047

0.059

-0.233

0.290

-0.123

0.079

-0.014

0.099

-2.976

0.024

0.022

0.098

0.133

0.217

0.015

0.006

0.025

0.073

φ₁

φ₂

φ₃

φ₄

φ_{5}^{*}

φ₆

φ₇

φ₈

φ₉

0.037

-0.263

0.242

-0.218

-0.413

-0.170

-0.383

-0.071

0.171

0.083

0.077

0.124

0.103

0.092

0.097

0.077

0.075

0.130

Results without 10 influential observations

μ₁

μ₂

μ₃

μ_{4}^{*}

μ₅

μ₆

μ₇

μ₈

μ₉

0.209

0.171

0.259

-0.003

0.003

0.002

0.158

0.116

0.234

0.026

0.030

0.023

0.029

0.031

0.030

0.023

0.029

λ₂₁

λ₃₁

λ₅₂

λ_{62}^{*}

λ₈₃

λ₉₃

ψ_δ

ψ_x

0.315

0.584

2.126

1.296

0.686

1.198

0.227

1.326

0.102

0.082

0.286

0.145

0.187

0.246

0.030

0.889

ψ₁

ψ₂

ψ₃

ψ_{4}^{*}

ψ_{5}^{*}

ψ_{6}^{*}

ψ₇

ψ₈

ψ₉

0.348

0.798

0.399

0.890

0.551

0.825

0.437

0.774

0.472

0.031

0.039

0.024

0.041

0.074

0.049

0.029

0.037

0.046

b₁

b₂

γ₁

γ₂

γ₃

ϕ₁₁

ϕ₁₂

ϕ₂₂

φ₀

-0.039

0.057

-0.283

0.333

-0.041

0.100

-0.016

0.096

-2.961

0.023

0.112

0.104

0.200

0.015

0.009

0.022

0.071

φ₁

φ₂

φ₃

φ₄

φ_{5}^{*}

φ₆

φ₇

φ₈

φ₉

0.042

-0.245

0.288

-0.251

-0.297

-0.097

-0.372

-0.091

0.205

0.085

0.074

0.120

0.103

0.077

0.071

0.076

0.073

0.139

Open in a new tab

4 Discussion

We have derived two first-order approximations to three Bayesian case influence measures under the deletion of a small (or large) number of observations. We have shown that the first-order approximation measures are useful tools for detecting influential observations in the presence of missing data for a large class of statistical models.

Supplementary Material

appendix

NIHMS265375-supplement-appendix.pdf^{(193.3KB, pdf)}

Table 2.

Cluster	AP₁ (AB)			AP₂ (AB)			AP₁ (SE)			AP₂ (SE)
Cluster	D_ϕ	CP	CM	D_ϕ	CP	CM	D_ϕ	CP	CM	D_ϕ	CP	CM
1	-0.023	0.012	-0.002	-0.028	0.006	-0.008	0.027	0.026	0.026	0.041	0.031	0.038
5	-0.022	0.014	0.003	-0.027	0.009	-0.002	0.025	0.028	0.023	0.035	0.030	0.028
10	-0.018	0.014	0.004	-0.022	0.010	-0.000	0.025	0.023	0.022	0.036	0.027	0.029
15	-0.022	0.013	0.004	-0.026	0.008	-0.001	0.032	0.027	0.021	0.041	0.028	0.025
20	-0.021	0.016	0.007	-0.026	0.011	0.002	0.026	0.029	0.033	0.041	0.033	0.044
25	-0.016	0.011	0.003	-0.019	0.009	0.001	0.016	0.024	0.017	0.020	0.025	0.018
30	-0.022	0.010	0.003	-0.027	0.005	-0.002	0.030	0.029	0.028	0.043	0.030	0.036
35	-0.022	0.012	-0.001	-0.028	0.005	-0.008	0.027	0.023	0.035	0.043	0.029	0.048
40	-0.022	0.011	0.005	-0.027	0.006	0.000	0.024	0.028	0.030	0.033	0.032	0.035
45	-0.023	0.014	0.002	-0.029	0.008	-0.004	0.025	0.026	0.023	0.039	0.029	0.032
49*	-0.051	0.005	-0.017	-0.066	-0.009	-0.032	0.016	0.041	0.037	0.022	0.043	0.039
50*	-0.141	-0.001	-0.137	-0.238	-0.098	-0.235	0.070	0.083	0.101	0.124	0.122	0.154

Open in a new tab

Footnotes

Supplemental materials for the article are available online.

C++ Code The supplemental files for this article include C++ programs which can be used to replicate the simulation study and real data analysis included in the article. Please read file README contained in the zip file for more details. (ZhuJoeTang.zip, zip archive)

Appendix The supplemental files include the Appendix which all variables in the AIDS dataset and the proofs of Theorems 1 and 2. (cho2-jcgsApp.pdf)

Contributor Information

Hongtu Zhu, Department of Biostatistics, University of North Carolina at Chapel Hill.

Joseph G. Ibrahim, Department of Biostatistics, University of North Carolina at Chapel Hill

Hyunsoon Cho, Department of Biostatistics, University of North Carolina at Chapel Hill.

Niansheng Tang, Department of Statistics, Yunnan University.

References

Abraham B, Box GEP. Bayesian analysis of some outlier problems in time series. Biometrika. 1979;66:229–236. [Google Scholar]
Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association. 1993;88:669–679. [Google Scholar]
Blyth S. Local divergence and association. Biometrika. 1994;81:579–584. [Google Scholar]
Box GEP, Tiao GC. A Bayesian approach to some outlier problems. Biometrika. 1968;55:119–129. [PubMed] [Google Scholar]
Bradlow ET, Zaslavsky AM. Case influence analysis in Bayesian inference. Journal of Computational and Graphical Statistics. 1997;6:314–331. [Google Scholar]
Carlin BP, Polson NG. An expected utility approach to influence diagnostics. Journal of the American Statistical Association. 1991;86:1013–1021. [Google Scholar]
Chaloner K. Bayesian residual analysis in the presence of censoring. Biometrika. 1991;78:637–644. [Google Scholar]
Christensen R. Log-linear Models and Logistic Regression. Springer-Verlag Inc.; 1997. [Google Scholar]
Christensen R, Pearson LM, Johnson W. Case-deletion diagnostics for mixed models. Technometrics. 1992;34:38–45. [Google Scholar]
Cook RD. Detection of influential observation in linear regression. Technometrics. 1977;19:15–18. [Google Scholar]
Cook RD, Weisberg S. Residuals and Influence in Regression. Chapman & Hall; 1982. [Google Scholar]
Csiszár I. Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica. 1967;2:299–318. [Google Scholar]
Daniels MJ, Hogan JW. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. London: Chapman and Hall; 2008. [Google Scholar]
Epifani I, MacEachern SN, Peruggia M. Case-deletion importance sampling estimators: central limit theorems and related results. Electronic Journal of Statistics. 2008;2:774–806. [Google Scholar]
Geisser S. The predictive sample reuse method with applications. Journal of the American Statistical Association. 1975;70:320–328. [Google Scholar]
Geisser S. Predictive Inference: An Introduction. London: Chapman & Hall; 1993. [Google Scholar]
Gelfand AE, Dey DK. Bayesian model choice: asymptotics and exact calculations. Journal of the Royal Statistical Soceity Series B. 1994;56:501–514. [Google Scholar]
Gelfand AE, Dey DK, Chang H. Model determination using predictive distributions, with implementation via sampling-based methods (Disc: P160-167) In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, editors. Bayesian Statistics 4. Oxford: Oxford University Press; 1992. pp. 147–159. [Google Scholar]
Ibrahim JG, Chen MH, Lipsitz SR, Herring A. Missing-data methods for generalized linear models: a comparative review. Journal of the American Statistical Association. 2005;100:332–346. [Google Scholar]
Ibrahim JG, Molenberghs G. Missing data methods in longitudinal studies: a review. Test. 2009;18:1–43. doi: 10.1007/s11749-009-0138-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ibrahim JG, Zhu HT, Tang NS. Bayesian local influence for survival models. Lifetime Data Analysis. 2010 doi: 10.1007/s10985-010-9170-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Johnson W. Influence measures for logistic regression: another point of view. Biometrika. 1985;72:59–65. [Google Scholar]
Johnson W, Geisser S. A predictive view of the detection and characterization of influential observations in regression analysis. Journal of the American Statistical Association. 1983;78:137–144. [Google Scholar]
Kass RE, Kadane JB, Tierney L. Asymptotic evaluation of integrals arising in Bayesian inference. In: Page C, LePage R, editors. Computing Science and Statistics: Proceedings of the Symposium on the Interface. Springer-Verlag Inc.; 1990. pp. 38–42. [Google Scholar]
Kass RE, Tierney L, Kadane JB. Approximate methods for assessing influence and sensitivity in Bayesian analysis. Biometrika. 1989;76:663–674. [Google Scholar]
Lee SY. Structural Equation Modelling: A Bayesian Approach. Wiley; 2007. [Google Scholar]
Lee SY, Tang NS. Analysis of nonlinear structural equation models with nonignorable missing covariates and ordered categorical data. Statistica Sinica. 2006;16:1117–1141. [Google Scholar]
Little RJA. Regression with missing x's: A review. Journal of the American Statistical Association. 1992;87:1227–1237. [Google Scholar]
Little RJA, Rubin DB. Statistical Analysis With Missing Data. New York: Wiley; 2002. [Google Scholar]
Louis TA. Finding the observed information using the em algorithm. Journal of the Royal Statistical Soceity Series B. 1982;44:226–233. [Google Scholar]
McCulloch RE. Local model influence. Journal of the American Statistical Association. 1989;84:473–478. [Google Scholar]
Molenberghs G, Kenward G. Missing Data in Clinical Studies. Wiley; New York: 2007. [Google Scholar]
Molenberghs G, Verbeke G. Models for Discrete Longitudinal Data. Springer; New York: 2005. [Google Scholar]
Morisky DE, Tiglao TV, Sneed CD, Tempongko SB, Baltazar JC, Detels R, Stein JA. The effects of establishment practices, knowledge and attitudes on condom use among filipina sex workers. AIDS Care. 1998;10:213–320. doi: 10.1080/09540129850124460. [DOI] [PubMed] [Google Scholar]
Peng F, Dey DK. Bayesian analysis of outlier problems using divergence measures. The Canadian Journal of Statistics / La Revue Canadienne de Statistique. 1995;23:199–213. [Google Scholar]
Peruggia M. On the variability of case-deletion importance sampling weights in the bayesian linear model. Journal of the American Statistical Assocation. 1997;92:199–207. [Google Scholar]
Pettit LI. Diagnostics in Bayesian model choice. The Statistician: Journal of the Institute of Statisticians. 1986;35:183–190. [Google Scholar]
Skrondal A, Rabe-Hesketh S. Generalized Latent Variable Modeling. London: Chapman and Hall; 2004. [Google Scholar]
Tierney L, Kass RE, Kadane JB. Fully exponential Laplace approximations to expectations and variances of nonpositive functions. Journal of the American Statistical Association. 1989;84:710–716. [Google Scholar]
Wang HM, Jones MP, Burns TL. Regression diagnostics for the class a regressive model with quantitative phenotypes. Genetic Epidemiology. 1999;17:174–187. doi: 10.1002/(SICI)1098-2272(1999)17:3<174::AID-GEPI3>3.0.CO;2-G. [DOI] [PubMed] [Google Scholar]
Wei B. Exponential Family Nonlinear Models. Singapore: Springer-Verlag; 1998. [Google Scholar]
Weiss R. An approach to Bayesian sensitivity analysis. Journal of the Royal Statistical Soceity Series B. 1996;58:739–750. [Google Scholar]
Weiss RE, Cook RD. A graphical case statistic for assessing posterior influence. Biometrika. 1992;79:51–55. [Google Scholar]
Zhu H, Lee SY, Wei BC, Zhou J. Case deletion measures for models with incomplete data. Biometrika. 2001;88:727–737. [Google Scholar]
Zhu HT, Ibrahim JG, Cho N, Tang NS. A general review of Bayesian influence analysis. In: Chen MH, Dey DK, Muller P, Sun D, Ye K, editors. Frontier of statistical decision making and Bayesian analysis. New York: Springer-Verlag; 2010. pp. 219–237. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

appendix

NIHMS265375-supplement-appendix.pdf^{(193.3KB, pdf)}

[R1] Abraham B, Box GEP. Bayesian analysis of some outlier problems in time series. Biometrika. 1979;66:229–236. [Google Scholar]

[R2] Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association. 1993;88:669–679. [Google Scholar]

[R3] Blyth S. Local divergence and association. Biometrika. 1994;81:579–584. [Google Scholar]

[R4] Box GEP, Tiao GC. A Bayesian approach to some outlier problems. Biometrika. 1968;55:119–129. [PubMed] [Google Scholar]

[R5] Bradlow ET, Zaslavsky AM. Case influence analysis in Bayesian inference. Journal of Computational and Graphical Statistics. 1997;6:314–331. [Google Scholar]

[R6] Carlin BP, Polson NG. An expected utility approach to influence diagnostics. Journal of the American Statistical Association. 1991;86:1013–1021. [Google Scholar]

[R7] Chaloner K. Bayesian residual analysis in the presence of censoring. Biometrika. 1991;78:637–644. [Google Scholar]

[R8] Christensen R. Log-linear Models and Logistic Regression. Springer-Verlag Inc.; 1997. [Google Scholar]

[R9] Christensen R, Pearson LM, Johnson W. Case-deletion diagnostics for mixed models. Technometrics. 1992;34:38–45. [Google Scholar]

[R10] Cook RD. Detection of influential observation in linear regression. Technometrics. 1977;19:15–18. [Google Scholar]

[R11] Cook RD, Weisberg S. Residuals and Influence in Regression. Chapman & Hall; 1982. [Google Scholar]

[R12] Csiszár I. Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica. 1967;2:299–318. [Google Scholar]

[R13] Daniels MJ, Hogan JW. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. London: Chapman and Hall; 2008. [Google Scholar]

[R14] Epifani I, MacEachern SN, Peruggia M. Case-deletion importance sampling estimators: central limit theorems and related results. Electronic Journal of Statistics. 2008;2:774–806. [Google Scholar]

[R15] Geisser S. The predictive sample reuse method with applications. Journal of the American Statistical Association. 1975;70:320–328. [Google Scholar]

[R16] Geisser S. Predictive Inference: An Introduction. London: Chapman & Hall; 1993. [Google Scholar]

[R17] Gelfand AE, Dey DK. Bayesian model choice: asymptotics and exact calculations. Journal of the Royal Statistical Soceity Series B. 1994;56:501–514. [Google Scholar]

[R18] Gelfand AE, Dey DK, Chang H. Model determination using predictive distributions, with implementation via sampling-based methods (Disc: P160-167) In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, editors. Bayesian Statistics 4. Oxford: Oxford University Press; 1992. pp. 147–159. [Google Scholar]

[R19] Ibrahim JG, Chen MH, Lipsitz SR, Herring A. Missing-data methods for generalized linear models: a comparative review. Journal of the American Statistical Association. 2005;100:332–346. [Google Scholar]

[R20] Ibrahim JG, Molenberghs G. Missing data methods in longitudinal studies: a review. Test. 2009;18:1–43. doi: 10.1007/s11749-009-0138-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Ibrahim JG, Zhu HT, Tang NS. Bayesian local influence for survival models. Lifetime Data Analysis. 2010 doi: 10.1007/s10985-010-9170-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Johnson W. Influence measures for logistic regression: another point of view. Biometrika. 1985;72:59–65. [Google Scholar]

[R23] Johnson W, Geisser S. A predictive view of the detection and characterization of influential observations in regression analysis. Journal of the American Statistical Association. 1983;78:137–144. [Google Scholar]

[R24] Kass RE, Kadane JB, Tierney L. Asymptotic evaluation of integrals arising in Bayesian inference. In: Page C, LePage R, editors. Computing Science and Statistics: Proceedings of the Symposium on the Interface. Springer-Verlag Inc.; 1990. pp. 38–42. [Google Scholar]

[R25] Kass RE, Tierney L, Kadane JB. Approximate methods for assessing influence and sensitivity in Bayesian analysis. Biometrika. 1989;76:663–674. [Google Scholar]

[R26] Lee SY. Structural Equation Modelling: A Bayesian Approach. Wiley; 2007. [Google Scholar]

[R27] Lee SY, Tang NS. Analysis of nonlinear structural equation models with nonignorable missing covariates and ordered categorical data. Statistica Sinica. 2006;16:1117–1141. [Google Scholar]

[R28] Little RJA. Regression with missing x's: A review. Journal of the American Statistical Association. 1992;87:1227–1237. [Google Scholar]

[R29] Little RJA, Rubin DB. Statistical Analysis With Missing Data. New York: Wiley; 2002. [Google Scholar]

[R30] Louis TA. Finding the observed information using the em algorithm. Journal of the Royal Statistical Soceity Series B. 1982;44:226–233. [Google Scholar]

[R31] McCulloch RE. Local model influence. Journal of the American Statistical Association. 1989;84:473–478. [Google Scholar]

[R32] Molenberghs G, Kenward G. Missing Data in Clinical Studies. Wiley; New York: 2007. [Google Scholar]

[R33] Molenberghs G, Verbeke G. Models for Discrete Longitudinal Data. Springer; New York: 2005. [Google Scholar]

[R34] Morisky DE, Tiglao TV, Sneed CD, Tempongko SB, Baltazar JC, Detels R, Stein JA. The effects of establishment practices, knowledge and attitudes on condom use among filipina sex workers. AIDS Care. 1998;10:213–320. doi: 10.1080/09540129850124460. [DOI] [PubMed] [Google Scholar]

[R35] Peng F, Dey DK. Bayesian analysis of outlier problems using divergence measures. The Canadian Journal of Statistics / La Revue Canadienne de Statistique. 1995;23:199–213. [Google Scholar]

[R36] Peruggia M. On the variability of case-deletion importance sampling weights in the bayesian linear model. Journal of the American Statistical Assocation. 1997;92:199–207. [Google Scholar]

[R37] Pettit LI. Diagnostics in Bayesian model choice. The Statistician: Journal of the Institute of Statisticians. 1986;35:183–190. [Google Scholar]

[R38] Skrondal A, Rabe-Hesketh S. Generalized Latent Variable Modeling. London: Chapman and Hall; 2004. [Google Scholar]

[R39] Tierney L, Kass RE, Kadane JB. Fully exponential Laplace approximations to expectations and variances of nonpositive functions. Journal of the American Statistical Association. 1989;84:710–716. [Google Scholar]

[R40] Wang HM, Jones MP, Burns TL. Regression diagnostics for the class a regressive model with quantitative phenotypes. Genetic Epidemiology. 1999;17:174–187. doi: 10.1002/(SICI)1098-2272(1999)17:3<174::AID-GEPI3>3.0.CO;2-G. [DOI] [PubMed] [Google Scholar]

[R41] Wei B. Exponential Family Nonlinear Models. Singapore: Springer-Verlag; 1998. [Google Scholar]

[R42] Weiss R. An approach to Bayesian sensitivity analysis. Journal of the Royal Statistical Soceity Series B. 1996;58:739–750. [Google Scholar]

[R43] Weiss RE, Cook RD. A graphical case statistic for assessing posterior influence. Biometrika. 1992;79:51–55. [Google Scholar]

[R44] Zhu H, Lee SY, Wei BC, Zhou J. Case deletion measures for models with incomplete data. Biometrika. 2001;88:727–737. [Google Scholar]

[R45] Zhu HT, Ibrahim JG, Cho N, Tang NS. A general review of Bayesian influence analysis. In: Chen MH, Dey DK, Muller P, Sun D, Ye K, editors. Frontier of statistical decision making and Bayesian analysis. New York: Springer-Verlag; 2010. pp. 219–237. [Google Scholar]

PERMALINK

Bayesian Case Influence Measures for Statistical Models with Missing Data

Hongtu Zhu

Joseph G Ibrahim

Hyunsoon Cho

Niansheng Tang

Abstract

1 Introduction

2 Methods

2.1 Bayesian Case Influence Measures

2.2 Computational Formula and its Difficulties

Example 1

2.3 First-order Approximations

Theorem 1

Table 4.

Example 1 (continued)

Step 1

Step 2

Step 3

Step 4

Theorem 2

3 Illustrative Examples

3.1 Simulated Data

Table 1.

Table 3.

Figure 1.

Figure 2.

3.2 AIDS Data

Figure 3.

Figure 4.

Table 5.

Table 6.

4 Discussion

Supplementary Material

Table 2.

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases