Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Feb 6.
Published in final edited form as: J Comput Graph Stat. 2010 Aug 1;21(1):253–271. doi: 10.1198/jcgs.2011.10139

Bayesian Case Influence Measures for Statistical Models with Missing Data

Hongtu Zhu 1, Joseph G Ibrahim 2, Hyunsoon Cho 3,*, Niansheng Tang 4
PMCID: PMC3565846  NIHMSID: NIHMS265375  PMID: 23399928

Abstract

We examine three Bayesian case influence measures including the φ-divergence, Cook's posterior mode distance and Cook's posterior mean distance for identifying a set of influential observations for a variety of statistical models with missing data including models for longitudinal data and latent variable models in the absence/presence of missing data. Since it can be computationally prohibitive to compute these Bayesian case influence measures in models with missing data, we derive simple first-order approximations to the three Bayesian case influence measures by using the Laplace approximation formula and examine the applications of these approximations to the identification of influential sets. All of the computations for the first-order approximations can be easily done using Markov chain Monte Carlo samples from the posterior distribution based on the full data. Simulated data and an AIDS dataset are analyzed to illustrate the methodology.

Keywords: Case influence measures, Cook distance, First-order approximation, ϕ-divergence, Markov chain Monte Carlo

1 Introduction

One of the main goals of any statistical analysis is to assess model assumptions and model fit. Towards this goal, it is critical to assess the influence of individual cases (or generally, a set of observations) on an analysis and to identify influential observations (or sets of observations) and/or outliers (Cook, 1977; Cook and Weisberg, 1982; McCulloch, 1989; Geisser, 1975, 1993). In Bayesian analysis, considerable research has been devoted to developing single case influence measures for various specific statistical models including generalized linear models, time series models, and survival models (Johnson and Geisser, 1983; Johnson, 1985; Pettit, 1986; Kass et al., 1989; Carlin and Polson, 1991; Gelfand et al., 1992; Geisser, 1993; Blyth, 1994; Peng and Dey, 1995; Christensen, 1997; Bradlow and Zaslavsky, 1997). The influence of an individual observation or set of observations is often assessed by deleting the observation (or set of observations) and then comparing the the posterior (or predictive) distribution based on the full data compared to that of posterior (or predictive) distribution based on the deleted observation (observations).

There are four major types of Bayesian case influence measures. These are posterior probabilities of outlying sets, posterior outlier statistics, predictive diagnostics, and posterior diagnostics for identifying outliers and influential points (Zhu et al., 2010). Computing posterior probabilities of outlying sets is conceptually simple (Box and Tiao, 1968), but it is difficult to implement computationally in most models with missing data. So far, this approach is limited to several simple regression models, such as the classical linear regression model with conjugate priors (Abraham and Box, 1979). The posterior outlier statistic is based on using the posterior distribution of an outlier statistic, such as the raw residual, to define outliers and calculate the posterior probability that an observation is an outlier. This method is computationally simple and has been further extended to generalized linear models, survival models, latent variable models, state space models, and many others (Chaloner, 1991; Albert and Chib, 1993; Lee, 2007). Predictive diagnostics assess the discordance of a set of observations based on their predictive distribution (Gelfand et al., 1992; Geisser, 1993; Gelfand and Dey, 1994). Predictive diagnostics are also conceptually simple, but computing them can be difficult for regression models with missing data. In contrast to predictive diagnostics, posterior diagnostics compare the posterior distributions of the parameters given the complete data and the reduced data (Csiszár, 1967; Weiss and Cook, 1992).

Despite the extensive literature on Bayesian diagnostics for various types of models, very little has been done on systematically examining Bayesian case influence measures using case deletion, namely the ϕ-divergence, Cook's posterior mode distance and Cook's posterior mean distance, in statistical models for both dependent and independent data in the presence of missing data. We refer the reader to a review of Bayesian diagnostics in Zhu et al. (2010) and Peng and Dey (1995). Computationally, as shown later, it can be very difficult to directly compute such Bayesian case influence measures for many complex models with missing data (Molenberghs and Kenward, 2007; Molenberghs and Verbeke, 2005; Lee, 2007; Ibrahim et al., 2005; Daniels and Hogan, 2008; Little, 1992; Little and Rubin, 2002; Ibrahim and Molenberghs, 2009; Skrondal and Rabe-Hesketh, 2004). For example, in the real data analysis presented in Section 3.2, we present a Bayesian diagnostic analysis for a complex Bayesian structural equations model with nonignorable missing data, for which it is infeasible to compute the exact values of these Bayesian case influence measures. This setting thus facilitates the need for deriving computationally feasible approximations for these Bayesian case influence measures.

The aims of this paper are to systematically examine the above-mentioned Bayesian case influence measures based on case deletion, to derive their first-order approximations, and to evaluate their roles in detecting a set of influential observations for a variety of regression models with missing data. By using a Laplace approximation (Kass et al., 1990; Tierney et al., 1989), we show that under some mild conditions, the first-order approximations hold for a large class of statistical models for both dependent and independent data within the Bayesian framework. We extensively examine the accuracy of these first-order approximations for the three Bayesian case influence measures using both theoretical results and simulation studies. Specifically, we show that the first-order approximations are quite accurate and all of the computations for the first-order approximations can be easily numerically computed using Markov chain Monte Carlo (MCMC) samples from the posterior distribution based on the full data.

The rest of this paper is organized as follows. In Section 2, we introduce the three Bayesian case influence measures and propose computational formulas for them. We derive two first-order approximations to the Bayesian case influence measures. In Section 3, we illustrate the proposed methodology for latent variable models with missing data. We conclude the paper with some discussions in Section 4.

2 Methods

2.1 Bayesian Case Influence Measures

Let p(Yθ) be the probability function for a random vector YT=(Y1T,,YnT), parameterized by an unknown parameter vector θ = (θ1, …, θp)T in an open subset Θ of Rp. Moreover, the dimension of Yi = (yi1, …, yimi)T, denoted by mi, can vary across all i. For example, in longitudinal studies and mixed models, mi is the number of observations in each cluster and this may vary significantly across the clusters. Let p(θ) be the prior distribution of θ. The posterior distribution for the full data Y is given by p(θY) ∝ p(Yθ)p(θ).

We are interested in assessing the influence of deleting a set of observations, denoted by S, on posterior inferences regarding θ. Let N=i=1nmi and NS be, respectively, the total number of observations and the number of observations in the set S. A subscript ‘[S]’ denotes the relevant quantity with all observations in S deleted. For instance, if S = {i}, then Y[S] is the corresponding observed data with all of Yi deleted, whereas if S = {i1, i2}, then Y[S] is the corresponding observed data with Yi1 and Yi2 deleted. Furthermore, we may set S = {i1, …, ik} and S = {(i1, j1), …, (ik, jk)} to allow for more complicated case deletion schemes. Let YS denote a subsample of Y consisting of all the observations in S and let Y[S] denote a subsample of Y with all observations in YS deleted. The posterior distribution for a subsample of the data Y is given by p(θY[S]) ∝ p(Y[S]θ)p(θ).

Now, we examine three types of Bayesian case influence measures based on case deletion. The first type is the ϕ–influence of Y[S], defined by

Dϕ(S)=ϕ(R[S](θ))p(θY)dθ, (1)

where R[S](θ) = p(θY[S])/p(θY) and ϕ(·) is a convex function with ϕ(1) = 0 (Weiss and Cook, 1992; Weiss, 1996). Dϕ(S) directly measures the distance (discrepancy) between two posterior distributions p(θY[S]) and p(θY) (Csiszár, 1967; Weiss and Cook, 1992) and a large value of Dϕ(S) corresponds to a set of influential observations. Various forms of ϕ(·) have been widely considered in the literature (Kass et al., 1989; Weiss and Cook, 1992; Blyth, 1994; Peng and Dey, 1995; Weiss, 1996). For instance, ϕ(·) can be chosen to be ϕα(u), which is defined by 4{1 − u(1+α)/2}/{1 − α2) for α ≠ ±1, u log(u) for α = 1, and − log(u) for α = −1. In particular, ϕ1(·) and ϕ−1(·) lead to the Kullback-Leibler divergence (K-L divergence); moreover, ϕ(u) = ϕ1(u) + ϕ−1(u) leads to the symmetric K-L divergence. The L1–distance and the χ2–divergence correspond to ϕ(u) = 0.5|u − 1| and ϕ(u) = (u − 1)2, respectively (Weiss, 1996).

The second Bayesian influence measure assesses the discrepancy between the posterior mode of θ with and without the ith case (Cook and Weisberg, 1982). We call this measure Cook's posterior mode distance. Specifically, we define the posterior modes of θ for the full sample Y and a subsample Y[S] as θ̂ = argmaxθ log p(θY) and θ̂[S] = argmaxθ log p(θY[S]), respectively. Then, Cook's posterior mode distance for comparing Y and Y[S], denoted by CP(S), can be defined as follows:

CP(S)=(θ^[S]θ^)TGθ(θ^[S]θ^), (2)

where Gθ is chosen to be θ2logp(θY)=θ2logp(Yθ)θ2logp(θ) evaluated at θ̂, where θ2 represents the second-order derivative with respect to θ. If we consider a uniform improper prior for θ, then CP(S) reduces to the well-known Cook's distance for deleting a set of observations (Cook and Weisberg, 1982). A large value of CP(S) implies more influence of the set S on the posterior mode.

The third type of Bayesian influence measure assesses the distance between the posterior mean of θ with and without the observations in S. We define the posterior mean of θ for the full sample Y and a subsample Y[S] as θ̃ = ∫ θ · p(θY)dθ and θ̃[S] = ∫ θ · p(θY[S])dθ, respectively. Cook's posterior mean distance for deleting the observations in the set S, denoted by CM(S), can then be defined as follows:

CM(S)=(θ[S]θ)TWθ(θ[S]θ), (3)

where Wθ is chosen to be the inverse of the full-data posterior covariance matrix of θ. A large value of CM(S) corresponds to an influential set S regarding the posterior mean.

Although all three Bayesian case influence measures assess the influence of a set of observations, there is a conceptual difference among those measures. Dϕ(S) quantifies the effects of deleting a set of observations on the overall posterior distribution, whereas CP(S) and CM(S) quantify the effects of deleting a set of observations on the posterior mode and the posterior mean of θ, respectively. Since Dϕ(S) measures the overall difference between p(θY) and p(θY[S]), and such a difference may include shape, mode, mean etc., Dϕ(S) can be more sensitive to some changes of the posterior distributions other than the posterior mean or posterior mode due to the deletion of the observations in S compared with CP(S) and CM(S). However, compared with Dϕ(S), CP(S) and CM(S) may be more sensitive to a change in the posterior mean or posterior mode.

2.2 Computational Formula and its Difficulties

When p(Yθ) is relatively easy to compute, all three Bayesian case influence measures can be computed using only MCMC samples from the full posterior distribution, p(θY). We define pS(θ), the ratio of likelihoods with and without the observations in S as

pS(θ)=p(Yθ)p(Y[S]θ)=p(YSY[S],θ), (4)

which is the conditional distribution of YS given Y[S]. Then, we have

p(θY[S])=[pS(θ)]1p(Yθ)p(θ)/[pS(θ)]1p(Yθ)p(θ)dθ.

Thus, following Weiss (1996), the computational formula for Dϕ(S) can be obtained as

Dϕ(S)=EθY[ϕ([pS(θ)]1EθY{[pS(θ)]1})], (5)

where EθY denotes the expectation taken with respect to the posterior distribution p(θY). Specifically, for the K-L divergence (ϕ(u) = − log(u)), the computational formula is given by Dϕ(S) = log EθY{[pS(θ)]−1} + EθY{log[pS(θ)]}. It is well recognized that the accuracy of approximating Dϕ(S) depends heavily on the variability of pS(θ) (Epifani et al., 2008; Peruggia, 1997).

To compute CP(S), we need to evaluate θ̂ and θ̂S. In general, the posterior mode of θ does not have a closed analytic form, thus we have to rely on iterative methods such as Newton-Raphson to obtain θ̂ and θ̂[S]. However, this can be computationally intensive for most models, such as state space models. Gθ in CP(S) can be analytically obtained by evaluating JN(θ)=θ2logp(θY)=θ2logp(Yθ)θ2logp(θ) at θ̂.

Since we can write θ̃ = EθY(θ) and

θ[S]=EθY{θ[pS(θ)]1}/EθY{[pS(θ)]1}, (6)

we can easily compute CM(S) using MCMC samples from the full posterior distribution, p(θY). Specifically, the posterior mean of θ, denoted θ̃, can be obtained directly by averaging the MCMC samples and Wθ can be analytically obtained by evaluating JN(θ) at θ̃. Furthermore, Gθ can be approximated by the inverse of the posterior covariance matrix, obtained from the MCMC samples. Based on the above discussion, computing (5) and (6) strongly depends on the the computation of pS(θ) = p(YSY[S], θ).

It can be computationally quite cumbersome to approximate pS(θ) in the presence of missing data, which therefore makes the computation of the three Bayesian case influence measures infeasible. To see this fact, denote the missing data by Ymis=(Y1,misT,,Yn,misT)T and the complete data by Ycom=(Y1,comT,,Yn,comT)T, in which Yi,com = (Yi,mis, Yi) for i = 1, …, n. Let p(Ycomθ) be the probability function for Ycom. We define Ycom,[S] = (Y[S], Ymis) as the complete data after deleting all observations in YS and p(Ycom,[S]θ) is the probability function for Ycom,[S] such that ∫ p(Ycom,[S]θ)dYmis = p(Y[S]θ). This kind of model structure is very general and subsumes most commonly used models, such as GLMs with missing responses and/or covariates and random-effects models (Ibrahim et al., 2005, 2010; Zhu et al., 2001; Molenberghs and Kenward, 2007; Molenberghs and Verbeke, 2005; Lee, 2007; Ibrahim et al., 2005; Daniels and Hogan, 2008; Little, 1992; Little and Rubin, 2002; Ibrahim and Molenberghs, 2009; Skrondal and Rabe-Hesketh, 2004). With missing data, the primary computational challenge lies in the computation of pS(θ), because

pS(θ)=p(Y,Ymisθ)dYmisp(Y[S],Ymisθ)dYmis (7)

typically involves high-dimensional integrals.

Example 1

To illustrate the methodological development, we consider n independent observations {Yi,com = (xi, zi, ri, yi), i = 1, …, N}, where yi is the response variable, xi is a p1 × 1 vector of completely observed covariates, and zi is a p2 × 1 vector of partially observed covariates. Moreover, let zmis,i and zobs,i denote the missing and observed components of zi, respectively. Let ri be a p2 × 1 vector, whose jth component, rij, equals 1 if zij is observed, and 0 if zij is missing. We assume that p(xi, zi, ri, yiθ) = p(yixi, zi, θ)p(xi, ziθ) p(riyi, xi, zi, θ), where θ denotes the vector of unknown parameters.

We assume a generalized linear model for p(yixi, zi, β, τ) given by

p(yixi,zi,β,τ)=exp{ai1(τ)[yiηi(β)b(ηi(β))]+c(yi,τ)} (8)

for i = 1, …, n, where ai(·), b(·) and c(·, ·) are known functions, ηi = η(μi) and μi=g((xi,zi)β), in which g(·) is a known link function, β = (β1, …, βp)′ and p = p1 + p2. We assume that

p(xi,ziα)=p(zip2zi,p21,,zi1,xi,α)××p(zi1xi,α)p(xiα). (9)

Similarly, we model the missing-data mechanism p(riyi, xi, zi, ξ) as

p(riyi,xi,zi,ξ)=p(rip2ri,p21,,ri1,yi,xi,zi,ξ)××p(ri1yi,xi,zi,ξ). (10)

Here, θ = (β, τ, α, ξ). To carry out a Bayesian analysis, we need to specify a prior for θ. Following Huang, Chen and Ibrahim (2005), we specify a prior of θ such that p(θ) = p(τ)p(β)p(ξ)p(α).

Now we consider the deletion of the i–th observation (xi, zi, ri, yi), that is S = {i}. With some calculations, we get

p{i}(θ)=p(yixi,zi,β,τ)p(xi,ziα)p(riyi,xi,zi,ξ)dzmis,i, (11)

which may involve intractable integrals when the dimension of zmis,i is relatively large. Although one may be able to use some numerical methods (e.g., the trapezoidal rule) to approximate pS(θ), the accuracy of such approximations can be impossible to assess when the integrals are high dimensional. This setting thus requires the derivation of computationally simple approximations to these Bayesian case influence measures.

2.3 First-order Approximations

For diagnostic purposes, it is desirable to derive computationally feasible approximations to these case influence measures. We obtain the following theorems, whose detailed proofs can be found in the the supplementary document.

Theorem 1

If Assumptions C1-C5 in the supplementary document hold and NS is bounded by a fixed constant, then we have the following results:

  1. Dϕ(S) can be approximated by

    Dϕ(S)=0.5ϕ¨(1)[θlogpS(θ^)]T[JN(θ^)]1[θlogpS(θ^)][1+Op(N1)], (12)

    where ϕ¨(1)=u2ϕ(u)u=1.

  2. The one-step approximation for θ̂[S] is given by

    θ^[S]=θ^+Op(N1)=θ^[JN(θ^)]1θlogpS(θ^)[1+Op(N1)]. (13)
  3. The one-step approximation for θ̃[S] is given by

    θ[S]=θ[JN(θ^)]1θlogpS(θ^)[1+Op(N1)]. (14)
  4. 2Dϕ(S)/ϕ̈(1), CP(S), and CM(S) are asymptotically equivalent, that is,

    Dϕ(S)=0.5ϕ¨(1)×CP(S)+Op(N2)=0.5ϕ¨(1)×CM(S)+Op(N2). (15)

Theorem 1 has several important implications. Theorem 1 (a) provides a theoretical and computational approximation of Dϕ(S) as a quadratic form in θ log pS(θ̂). Theorem 1 (b) and (c) provide the one-step approximations of θ̂[S] and θ̃[S], which reduce the burden of computing θ̂[S] and θ̃[S] for each S. Moreover, to the best of our knowledge, Theorem 1 (d) is the first result that establishes a direct connection between Dϕ(S), CP(S) and CM(S) for any ϕ(·) within the Bayesian framework. In particular, for ϕα(u) = − log(u), it can be shown that u2ϕα(u)u=1=1, which leads to Dϕα(S) = 0.5CP(S) + Op(N−2) = 0.5CM(S) + Op(N−2) for all α. Furthermore, for the χ2–divergence and the symmetric K-L divergence, we have u2ϕ(u)u=1=2, which gives Dϕ(S) = CP(S) + Op(N−2) = CM(S) + Op(N−2). However, for ϕ(u) = 0.5|u − 1|, because ϕ̈(1) = 0 and ϕ(u) is not differentiable at u = 1, the conditions of Theorem 1 are not valid. Thus, the approximation given in Theorem 1 (a) and the equivalence among the three diagnostic measures do not hold for the L1–distance.

Practically, these approximations lead to computationally efficient formulas for approximating these Bayesian case influence measures. Because the first-order approximations hold for a large class of statistical models for both dependent and independent data, and in the presence of missing data within the Bayesian framework, they are reminiscent of the first-order approximations for various specific models within the frequentist framework (Christensen et al., 1992; Cook and Weisberg, 1982; Wei, 1998). For influential points, even though the accuracy of the first-order approximation may be relatively low, the first-order approximated measure can easily pick out these influential points. Thus, for diagnostic purposes, the first-order approximation may be more effective at identifying influential clusters compared with the three Bayesian case influence measures. We conduct simulation studies to investigate the performance of these first-order approximations relative to the exact formula in Section 3. See numerical comparisons in Table 4.

Table 4.

Frequency of correctly ranking the influential clusters as the top 4 clusters based on 100 replications for four different scenarios.

n Type Dϕ CP CM AP1 AP2
10 I 88 84 85 91 91
10 II 99 99 99 99 99
50 I 96 74 94 96 96
50 II 100 91 100 100 100

According to Theorem 1, to approximate these case influence measures, we only need to compute the posterior mean θ̃, the observed-data information matrix JN(θ̃), θ log pS(θ) evaluated at θ̃, and

AP1(S;θ)=[θlogpS(θ)]T[JN(θ)]1[θlogpS(θ)]. (16)

In particular, θ̃ and JN(θ̃) can be easily computed from the MCMC samples based on the full data. Specifically, JN(θ̃) can be approximated by using the Louis' formula (Louis, 1982). For most statistical models, the computation of θ log pS(θ) = θ log p(Yθ) − θ log p(Y[S]θ) is relatively straightforward. Specifically, we have

θlogp(Yθ)θlogp(Ycomθ)p(YmisY,θ)dYmis, (17)
θlogp(Y[S]θ)θlogp(Ycom,[S]θ)p(YmisY,θ)dYmis. (18)

Here, we use the fact that the posterior mean and posterior mode are asymptotically equivalent under suitable regularity conditions which are satisfied for the models considered here.

Example 1 (continued)

Let's consider deletion of the i–th observation (xi, zi, ri, yi). It can be shown that

θlogp{i}(θ)=θ[logp(yixi,zi,β,τ)+logp(xi,ziα)+logp(riyi,xi,zi,ξ)]θ=θp(zmis,iyi,xi,zobs,i,θ)dzmis,i.

Note that in most models with missing data, it is relatively easy to compute the first-order derivative of the complete-data log-likelihood function. Moreover, compared to the computation of the observed-data log-likelihood function, it is numerically much more stable to calculate the first-order derivative of the complete-data log-likelihood function. This is the key advantage of using (17) and (18) to approximate AP1({i}; θ̃).

We now present the four key steps in computing AP1(S; θ̃) in (16).

Step 1

Using the full data Y, we obtain the MCMC sample θ(j) for j = 1, …, J from p(Yθ) and estimate θ=J1j=1Jθ(j).

Step 2

We use MCMC methods to draw samples {Ymis(j):j=1,,J} from p(YmisY, θ̃) given θ̃ and Y.

Step 3

For each set S, we approximate JN(θ̃) using Louis' formula and approximate θ log p(Yθ̃) and θ log p(Y[S]θ̃) by

θlogp(Yθ)J1j=1Jθlogp(Ymis(j),Yobsθ), (19)
θlogp(Y[S]θ)J1j=1Jθlogp(Ymis(j),Y[S]θ). (20)
Step 4

Approximate AP1(S; θ̃) using equation (16) for each set S.

Although we have systematically examined the deletion of a relatively small number of observations, it is common to delete relatively large numbers of observations for clustered data. Specifically, unbalanced clustered data are commonly collected from familial and longitudinal studies and we may be interested in deleting all the observations in a cluster, whose number may be comparable with the total number of observations N (Wang et al., 1999). We now obtain the following theorem for large cluster sizes.

Theorem 2

If Assumptions C1, C2, C3, C4′ and C5 in the supplementary document hold and NS → ∞ and NS/N → γ ∈ [0, 1), then we have the following results:

  1. The one-step approximation for θ̂[S] is given by

    θ^[S]=θ^+Op(N1/2)=θ^[JN,[S](θ^)]1θlogpS(θ^)[1+Op(N1/2)], (21)

    where JN,[S](θ)=θ2logp(θY[S]). Ifγ = 0, then

    θ^[S]=θ^[JN(θ^)]1θlogpS(θ^)[1+Op(N1/2)+Op(NS/N)]. (22)
  2. The one-step approximation for θ̃[S] is given by

    θ[S]θ=(θ^[S]θ^)[1+op(1)].
  3. CP(S) and CM(S) can be approximated by

    AP2(S;θ)=[θlogpS(θ)]T[JN,[S](θ)]1[JN(θ)][JN,[S](θ)]1[θlogpS(θ)].

    Ifγ = 0, then

    CP(S)=CM(S)[1+op(1)]=AP2(S;θ)[1+op(1)]. (23)
  4. Dϕ(S) can be approximated by

    Dϕ(S)=ϕ(AS)+Op(N1), (24)

    where AS = σ × p(Y[S]θ̂)p(θ̂)/[σ[S] × p(Y[S]θ̂[S])p(θ̂S)], σ2 = |JN(θ̂)/N|−1 and σ[S]2=JN,[S](θ^S)/N1.

Theorem 2 has several important implications. Theorem 2 (a) and (b) provide the one-step approximations of θ̂[S] and θ̃[S], which reduce the burden of computing θ̂[S] and θ̃[S] for each S. Theorem 2 (c) provides the theoretical approximations of CP(S) and CM(S). If NS/N → 0, such as NS=N, then CP(S) and CM(S) can be well approximated by AP2(S; θ̃). Theorem 2 (d) shows that when NS → ∞, Dϕ(S), which can be approximated by ϕ(AS), is not asymptotically equivalent to AP2(S; θ̃) in any case. Therefore, we cannot use AP2(S; θ̃) to characterize the asymptotic behavior of Dϕ(S). Since calculating CM(S), CP(S), Dϕ(S) and p(Y[S]θ) can be computationally tedious for models with missing data, we generally suggest using their first-order approximations AP1(S; θ̃) and AP2(S; θ̃) for identifying influential observations. Moreover, we can easily develop a similar procedure for computing AP2(S; θ̃).

3 Illustrative Examples

In this section, we illustrate our methodology with simulated data and a real dataset.

3.1 Simulated Data

The goal of our simulations was to evaluate the accuracy of the first-order approximations of the three Bayesian case influence measures in small and moderate sample sizes. We generated 100 data sets from a Binomial mixed model, which has been extensively studied in the literature (Molenberghs and Verbeke, 2005). Specifically, each data set contains n clusters. For each cluster, the random effect bi, which can be regarded as “missing data”, was first independently generated from a N(0, σ2) distribution and then, given bi, the observations yij (j = 1, 2, 3; i = 1, …, n) were independently generated from a Binomial random generator such that yijB(nij, pij) with pij satisfying log(pij/(1pij))=xijTβ+bi, in which the nij were randomly drawn from {1, …, 5}. Moreover, the covariates xij were set as (1, uij − 0.5)T, and the uij were independently generated from a U[0, 1] distribution. For all 100 data sets, both the responses and covariates were repeatedly generated, while the true value of (βT, σ2) was fixed at (0.5, 0.5, 0.5). The sample size n was set at 10 and 50, respectively, to represent small and moderate sample sizes.

For each simulated data set, we created two types of influential observations in order to compare the accuracy of the first-order approximations and their capability in the identification of these influential clusters. For the first type, we deleted all the observations in clusters n − 1 and n and then reset {ni,j = 3 : j = 1, 2, 3; i = n − 1, n} and (bn−1, bn) = (4, −4) to generate yi,j for i = n − 1, n and j = 1, 2, 3 according to the above binomial random effects model. Thus, the new (n − 1)th and nth clusters can be regarded as influential clusters due to the extreme values of bn−1 and bn. Moreover, the number of observations in these two clusters is relatively small compared to the total number of observations.

For the second type of influential observations, we deleted the observations from the n-th cluster and then reset {ni,j = 10 : j = 1, 2, 3; i = n} and then generated {ynj : j = 1, 2} with bn = 0 and yi3 with bn = −6 from the same model. Since different bn values were used to generate different observations in the n–th cluster, the n-th cluster can be treated as an “outlier”. Moreover, the number of observations in the last cluster is relatively large compared to the total number of observations when n = 10.

For each data set, we deleted each cluster one at a time and then calculated the differences between the three Bayesian diagnostic measures and their first-order approximations for each cluster. Since bi is one-dimensional, we used a standard numerical method to compute pS(θ) = p(YSY[S], θ) and to approximate the exact values of the three Bayesian diagnostic measures based on (5) and (6). Moreover, for the true ‘good’ and influential clusters, we computed the average biases and standard errors of these differences (Tables 1-3). For the true ‘good’ clusters, the use of AP2(S; θ̃) leads to smaller average biases and comparable standard errors to CP(S) and CW(S) compared with AP1(S; θ̃). Increasing the sample size decreases the average bias and standard error of the first-order approximations. Compared with AP2(S; θ̃), AP1(S; θ̃) is a better approximation to Dϕ(S).

Table 1.

Results from simulation studies for two types of influential observations for n = 10. Average biases (ABs) and standard errors (SEs) of the differences between the three Bayesian case influence measures and their first-order approximations. For the first type, clusters 9 and 10 are influential clusters, while cluster 10 is the outlying cluster for the second type.

Cluster AP1 (AB) AP2 (AB) AP1 (SE) AP2 (SE)
Dϕ CP CM Dϕ CP CM Dϕ CP CM Dϕ CP CM
I 1 -0.055 0.077 0.057 -0.119 0.013 -0.007 0.062 0.133 0.098 0.159 0.075 0.088
2 -0.050 0.079 0.059 -0.124 0.005 -0.015 0.065 0.165 0.138 0.235 0.170 0.148
3 -0.057 0.076 0.063 -0.136 -0.003 -0.016 0.074 0.141 0.118 0.211 0.089 0.086
4 -0.053 0.070 0.055 -0.117 0.006 -0.008 0.064 0.088 0.086 0.147 0.075 0.065
5 -0.059 0.058 0.045 -0.128 -0.011 -0.025 0.068 0.082 0.092 0.197 0.132 0.139
6 -0.046 0.055 0.058 -0.101 0.001 0.003 0.047 0.106 0.104 0.119 0.071 0.065
7 -0.062 0.086 0.062 -0.150 -0.003 -0.027 0.071 0.125 0.124 0.230 0.130 0.152
8 -0.061 0.074 0.061 -0.137 -0.002 -0.015 0.055 0.135 0.107 0.180 0.113 0.111
9* -0.200 0.129 0.080 -0.532 -0.203 -0.252 0.100 0.223 0.201 0.478 0.327 0.355
10* -0.348 0.250 0.012 -1.380 -0.782 -1.021 0.108 0.249 0.220 0.913 0.789 0.923

II 1 -0.066 0.081 0.074 -0.151 -0.004 -0.011 0.080 0.137 0.147 0.239 0.180 0.216
2 -0.050 0.053 0.048 -0.097 0.005 0.000 0.064 0.077 0.076 0.125 0.068 0.087
3 -0.063 0.061 0.058 -0.132 -0.008 -0.011 0.069 0.097 0.108 0.172 0.098 0.129
4 -0.073 0.088 0.074 -0.176 -0.016 -0.029 0.092 0.134 0.162 0.262 0.129 0.165
5 -0.057 0.062 0.057 -0.109 0.010 0.005 0.066 0.103 0.100 0.134 0.085 0.093
6 -0.069 0.063 0.052 -0.144 -0.013 -0.024 0.078 0.096 0.088 0.198 0.123 0.122
7 -0.062 0.076 0.050 -0.137 0.000 -0.026 0.078 0.143 0.131 0.202 0.109 0.156
8 -0.066 0.078 0.113 -0.172 -0.029 0.006 0.077 0.119 0.171 0.324 0.219 0.284
9 -0.068 0.071 0.074 -0.162 -0.023 -0.020 0.088 0.173 0.186 0.304 0.234 0.269
10* 0.085 0.681 0.301 -2.637 -2.041 -2.421 0.771 0.544 0.509 2.549 2.926 3.014

Table 3.

Selected results from simulation studies for n = 50 and the first type of influential observations. Average biases (ABs) and standard errors (SEs) of the differences between the three Bayesian case influence measures and their first-order approximations. Note that cluster 50 is the outlying cluster.

Cluster AP1 (AB) AP2 (AB) AP1 (SE) AP2 (SE)
Dϕ CP CM Dϕ CP CM Dϕ CP CM Dϕ CP CM
1 -0.019 0.008 0.010 -0.022 0.005 0.007 0.019 0.024 0.043 0.026 0.026 0.046
5 -0.020 0.011 0.005 -0.023 0.007 0.002 0.021 0.022 0.028 0.029 0.024 0.032
10 -0.016 0.010 0.011 -0.018 0.007 0.008 0.016 0.020 0.034 0.019 0.020 0.034
15 -0.016 0.014 0.010 -0.020 0.010 0.007 0.016 0.025 0.033 0.022 0.023 0.034
20 -0.020 0.012 0.007 -0.024 0.008 0.003 0.018 0.025 0.031 0.025 0.026 0.033
25 -0.016 0.009 0.006 -0.019 0.006 0.004 0.018 0.022 0.028 0.023 0.023 0.030
30 -0.019 0.009 0.014 -0.022 0.005 0.010 0.019 0.028 0.045 0.026 0.030 0.046
35 -0.026 0.008 0.004 -0.032 0.001 -0.002 0.028 0.031 0.047 0.040 0.034 0.057
40 -0.021 0.014 0.011 -0.025 0.010 0.006 0.021 0.028 0.038 0.029 0.030 0.040
45 -0.024 0.007 0.005 -0.031 0.000 -0.002 0.035 0.032 0.041 0.062 0.055 0.065
50* -0.162 0.034 -0.170 -0.510 -0.314 -0.518 0.058 0.081 0.164 0.416 0.414 0.507

We also computed the frequency of correctly ranking the true influential/outlying clusters as the top four clusters based on the 100 simulated data sets for each scenario (Table 4). Due to the randomness introduced by random number generator, it is possible that few ‘good’ observations may appear as influential observations. Among all three case influence measures, Dϕ(S) is more sensitive for detecting influential/outlying clusters compared to CP(S) and CW(S). Although AP1(S; θ̃) and AP2(S; θ̃) are not very accurate approximations to the three Bayesian case influence measures (Tables 1-3), they consistently selected the influential clusters as the top four clusters. Furthermore, the results in Table 4 actually indicate that compared with all three Bayesian case influence measures, AP1(S; θ̃) and AP2(S; θ̃) may be more effective in detecting influential/outlying clusters in all scenarios. Thus, for the purpose of detecting influential/outlying clusters, the performance of the first-order approximations AP1(S; θ̃) and AP2(S; θ̃) are quite satisfactory.

Finally, we generated a data set by using the same Binomial mixed model. We deleted all observations of the 38-th cluster with the smallest leverage (Figure 1) and then regenerated {y38,j : j = 1, 2, 3} by using the same model except that x38,j,2 was changed into x38,j,2 + 5.0. However, when we fitted the same Binomial mixed model, we used the original x38,j. Thus, the 38-th cluster can be regarded as an ‘outlying’ cluster since different x38,j values were used to generate observations. Index plots of CM(i) and Dϕ(i) (Figure 2) indicate that CM(i) seems to be more sensitive to outliers with low leverage covariates.

Figure 1.

Figure 1

Index plot of leverage values in simulation study.

Figure 2.

Figure 2

Index plots of CM(i), Dϕ(i), AP1(i) and AP2(i) in simulation study.

3.2 AIDS Data

We considered a small portion of a real dataset from a study of the relationship between acquired immune deficiency syndrome (AIDS) and the use of condoms (Morisky et al., 1998). This data set contained 11 items on knowledge about AIDS and beliefs, behaviors and attitudes towards condoms use collected from n = 1116 subjects. Nine of them were taken as responses in yi = (yi1, …, yi9)T and a continuous item xi1 (item 37) and an ordered categorical item xi2 (item 21, which was treated as continuous) were taken as covariates. The definitions of the nine items are given in the Appendix. In this dataset, the variables yi1, yi2, yi3, yi7, yi8 and yi9 were measured via a 5-point scale and hence were treated as continuous; the variables yi4, yi5 and yi6 were continuous. The responses and covariates are missing at least once for 361 subjects (32%), while the covariate xi2 is completely observed. The missing data patterns for the response variables are given in Table 4 of Lee and Tang (2006).

To fit the AIDS data, we considered a complex structural equations modeling in the presence of not missing at random (NMAR) responses and missing at random (MAR) covariates. The responses (yi1, yi2, yi3) are related to a latent variable, ηi, which can be interpreted as the ‘threat of AIDS’, while the responses (yi4, yi5, yi6) and (yi7, yi8, yi9) are respectively related to the latent variables ξi1 and ξi2, which can be respectively interpreted as ‘aggressiveness of the sex worker’ and ‘worry of contracting AIDS’. Specifically, to identify the relationship between the responses yi and the latent variables ωi = (ηi, ξi1, ξi2)T, we consider the following measurement equation:

yi=μ+Λωi+ɛi,i=1,,n, (25)

where μ = (μ1, …, μ9)T is a vector of intercepts, (ξi1, ξi2) is independent of the measurement error vector εi, and (ξi1, ξi2) ∼ N(0, Φ) and εiN(0, Ψ), in which Ψ = diag(ψ1, …, ψ9) and Φ = (ϕij) is a 2 × 2 covariance matrix. We also assume the following structure for Λ:

ΛT=(1.0λ21λ310.00.00.00.00.00.00.00.00.01.0λ52λ620.00.00.00.00.00.00.00.00.01.0λ83λ93), (26)

where 0.0* and 1.0* are regarded as fixed values to identify the scale of the latent factor. To study the relationship between η and (x1, x2, ξ1, ξ2), we consider the following nonlinear structural equations model:

ηi=b1xi1+b2xi2+γ1ξi1+γ2ξi2+γ3ξi1ξi2+δi, (27)

where δiN(0, ψδ). We let ryij = 1 if yij is missing and ryij = 0 if yij is observed; rxi1 = 1 if xi1 is missing and rxi1 = 0 if xi1 is observed. Based on the missingness patterns, we assume that the missing data for the responses is NMAR whilst the missing data for the covariates is MAR. In particular, we consider the following missing data mechanism for yij:

logit{Pr(ryij=1φ)}=φ0+φ1yi1++φ9yi9, (28)

where φ = (φ0, φ1, …, φ9)T. Since xi1 may be missing, we have to specify a distribution for it. For simplicity, we assume that xi1N(0, ψx). Posterior inference for the structural parameters is obtained via Markov chain Monte Carlo methods as introduced in (Lee and Tang, 2006).

Due to the computational complexity in calculating the exact values of the three Bayesian influence measures, we only calculated the case influence measures AP1({i}) and AP2({i}) (Figure 1). Both case influence measures identify subjects 14, 25, 28, 137, 168, 175, 274, 408, 492, and 985 as influential cases (Figs. 3 and 4). Among them, cases 14 and 985 stand out as the most influential cases. Inspecting Table 5 reveals that case 14 has the largest number of vaginal sex (y4) in the last 7 days and case 985 has the largest number of blow jobs (y5) in the last 7 days (Fig. 4 (a) and (c)). Cases 25, 28, 137, and 408 have the largest numbers of hand jobs (y6) given in the last 7 days, cases 168, 175, and 274 have relatively large numbers of blow jobs compared with other subjects, and case 492 has the second largest number of vaginal sex in the last 7 days (Table 5).

Figure 3.

Figure 3

AIDS data: index plots of diagnostic measures (a) AP1(i) and (b) AP2(i).

Figure 4.

Figure 4

AIDS data: scatter plots of (a) (yi4, yi5), (b) (yi4, yi6), (c) (yi5, yi6), and (d) (yi4, yi5, yi6).

Table 5.

The observed responses and covariates of ten influential observations from the AIDS data set. Note that variables y4, y5, y6 and x1 are centered. ‘No’ denotes the subject number and missing value is represented as ·.

No y1 y2 y3 y4 y5 y6 y7 y8 y9 x1 x2
14 3 3 5 -0.03 0.98 20.88 3 4 3 4.12 1
25 4 4 3 3.86 9.29 10.30 5 1 4 -0.14 3
28 1 2 2 0.18 9.78 -0.28 3 1 3 -0.25 2
137 2 3 4 -0.46 9.29 -0.28 5 2 2 0.93 2
168 3 3 2 1.69 2.94 6.77 3 2 3 -0.33 4
175 4 4 · 1.69 4.40 6.77 5 1 · 0.93 4
274 5 3 5 3.86 -0.48 6.77 5 1 5 -0.37 3
408 2 5 5 1.69 6.85 0.42 5 · 5 1.84 5
492 2 5 2 8.19 0.98 -0.28 2 5 5 0.93 4
985 3 1 3 20.97 -0.48 -0.28 5 5 3 -0.45 1

Furthermore, following the classical linear model, we also calculated hi=xiT(kxkxkT)1xi for all of the fully observed xi's, where Σk sums over all fully observed covariates. Among all 10 influential observations, only subject 14 has a high leverage point with h14 = 0.0155. Inspecting all leverage values reveals that the 93rd subject, who has an extremely high leverage point, has the longest period as a prostitute among all subjects, whereas this subject is not an influential subject.

We further computed the posterior means and standard deviations of all the parameters with and without these 10 influential observations. We observed some effects of deleting these 10 influential observations on the parameters associated with y4, y5, and y6 (Table 6). For instance, after deleting the influential observations, λ62 was changed from 0.855 to 1.296 and φ5 was changed from -0.413 to -0.297. This indicates that it is important to identify influential observations and assess their effects on the statistical inference in complex statistical models with missing data.

Table 6.

AIDS data: posterior means and standard deviations of the parameters with and without 10 influential observations. ES denotes the posterior mean and SD denotes the posterior standard deviation. Parameters with relatively large changes are highlighted.

Results without 10 influential observations
μ1 μ2 μ3
μ4
μ5 μ6 μ7 μ8 μ9
ES 0.213 0.164 0.258 -0.042 -0.041 -0.047 0.159 0.120 0.229
SD 0.025 0.030 0.024 0.021 0.025 0.019 0.023 0.028 0.028
λ21 λ31 λ52
λ62
λ83 λ93 ψδ ψx
ES 0.340 0.579 1.957 0.855 0.691 1.176 0.230 1.314
SD 0.093 0.101 0.305 0.126 0.160 0.272 0.035 0.861
ψ1 ψ2 ψ3
ψ4
ψ5
ψ6
ψ7 ψ8 ψ9
ES 0.348 0.802 0.398 0.423 0.399 0.336 0.430 0.770 0.474
SD 0.038 0.039 0.025 0.022 0.055 0.019 0.030 0.036 0.039
b1 b2 γ1 γ2 γ3 ϕ11 ϕ12 ϕ22 φ0
ES -0.047 0.059 -0.233 0.290 -0.123 0.079 -0.014 0.099 -2.976
SD 0.024 0.022 0.098 0.133 0.217 0.015 0.006 0.025 0.073
φ1 φ2 φ3 φ4
φ5
φ6 φ7 φ8 φ9
ES 0.037 -0.263 0.242 -0.218 -0.413 -0.170 -0.383 -0.071 0.171
SD 0.083 0.077 0.124 0.103 0.092 0.097 0.077 0.075 0.130

Results without 10 influential observations
μ1 μ2 μ3
μ4
μ5 μ6 μ7 μ8 μ9
ES 0.209 0.171 0.259 -0.003 0.003 0.002 0.158 0.116 0.234
SD 0.026 0.030 0.023 0.029 0.031 0.030 0.023 0.029 0.029
λ21 λ31 λ52
λ62
λ83 λ93 ψδ ψx
ES 0.315 0.584 2.126 1.296 0.686 1.198 0.227 1.326
SD 0.102 0.082 0.286 0.145 0.187 0.246 0.030 0.889
ψ1 ψ2 ψ3
ψ4
ψ5
ψ6
ψ7 ψ8 ψ9
ES 0.348 0.798 0.399 0.890 0.551 0.825 0.437 0.774 0.472
SD 0.031 0.039 0.024 0.041 0.074 0.049 0.029 0.037 0.046
b1 b2 γ1 γ2 γ3 ϕ11 ϕ12 ϕ22 φ0
ES -0.039 0.057 -0.283 0.333 -0.041 0.100 -0.016 0.096 -2.961
SD 0.023 0.023 0.112 0.104 0.200 0.015 0.009 0.022 0.071
φ1 φ2 φ3 φ4
φ5
φ6 φ7 φ8 φ9
ES 0.042 -0.245 0.288 -0.251 -0.297 -0.097 -0.372 -0.091 0.205
SD 0.085 0.074 0.120 0.103 0.077 0.071 0.076 0.073 0.139

4 Discussion

We have derived two first-order approximations to three Bayesian case influence measures under the deletion of a small (or large) number of observations. We have shown that the first-order approximation measures are useful tools for detecting influential observations in the presence of missing data for a large class of statistical models.

Supplementary Material

appendix

Table 2.

Selected results from simulation studies for n = 50 and the first type of influential observations. Average biases (ABs) and standard errors (SEs) of the differences between the three Bayesian case influence measures and their first-order approximations. Note that clusters 49 and 50 are influential clusters.

Cluster AP1 (AB) AP2 (AB) AP1 (SE) AP2 (SE)
Dϕ CP CM Dϕ CP CM Dϕ CP CM Dϕ CP CM
1 -0.023 0.012 -0.002 -0.028 0.006 -0.008 0.027 0.026 0.026 0.041 0.031 0.038
5 -0.022 0.014 0.003 -0.027 0.009 -0.002 0.025 0.028 0.023 0.035 0.030 0.028
10 -0.018 0.014 0.004 -0.022 0.010 -0.000 0.025 0.023 0.022 0.036 0.027 0.029
15 -0.022 0.013 0.004 -0.026 0.008 -0.001 0.032 0.027 0.021 0.041 0.028 0.025
20 -0.021 0.016 0.007 -0.026 0.011 0.002 0.026 0.029 0.033 0.041 0.033 0.044
25 -0.016 0.011 0.003 -0.019 0.009 0.001 0.016 0.024 0.017 0.020 0.025 0.018
30 -0.022 0.010 0.003 -0.027 0.005 -0.002 0.030 0.029 0.028 0.043 0.030 0.036
35 -0.022 0.012 -0.001 -0.028 0.005 -0.008 0.027 0.023 0.035 0.043 0.029 0.048
40 -0.022 0.011 0.005 -0.027 0.006 0.000 0.024 0.028 0.030 0.033 0.032 0.035
45 -0.023 0.014 0.002 -0.029 0.008 -0.004 0.025 0.026 0.023 0.039 0.029 0.032
49* -0.051 0.005 -0.017 -0.066 -0.009 -0.032 0.016 0.041 0.037 0.022 0.043 0.039
50* -0.141 -0.001 -0.137 -0.238 -0.098 -0.235 0.070 0.083 0.101 0.124 0.122 0.154

Footnotes

Supplemental materials for the article are available online.

C++ Code The supplemental files for this article include C++ programs which can be used to replicate the simulation study and real data analysis included in the article. Please read file README contained in the zip file for more details. (ZhuJoeTang.zip, zip archive)

Appendix The supplemental files include the Appendix which all variables in the AIDS dataset and the proofs of Theorems 1 and 2. (cho2-jcgsApp.pdf)

Contributor Information

Hongtu Zhu, Department of Biostatistics, University of North Carolina at Chapel Hill.

Joseph G. Ibrahim, Department of Biostatistics, University of North Carolina at Chapel Hill

Hyunsoon Cho, Department of Biostatistics, University of North Carolina at Chapel Hill.

Niansheng Tang, Department of Statistics, Yunnan University.

References

  1. Abraham B, Box GEP. Bayesian analysis of some outlier problems in time series. Biometrika. 1979;66:229–236. [Google Scholar]
  2. Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association. 1993;88:669–679. [Google Scholar]
  3. Blyth S. Local divergence and association. Biometrika. 1994;81:579–584. [Google Scholar]
  4. Box GEP, Tiao GC. A Bayesian approach to some outlier problems. Biometrika. 1968;55:119–129. [PubMed] [Google Scholar]
  5. Bradlow ET, Zaslavsky AM. Case influence analysis in Bayesian inference. Journal of Computational and Graphical Statistics. 1997;6:314–331. [Google Scholar]
  6. Carlin BP, Polson NG. An expected utility approach to influence diagnostics. Journal of the American Statistical Association. 1991;86:1013–1021. [Google Scholar]
  7. Chaloner K. Bayesian residual analysis in the presence of censoring. Biometrika. 1991;78:637–644. [Google Scholar]
  8. Christensen R. Log-linear Models and Logistic Regression. Springer-Verlag Inc.; 1997. [Google Scholar]
  9. Christensen R, Pearson LM, Johnson W. Case-deletion diagnostics for mixed models. Technometrics. 1992;34:38–45. [Google Scholar]
  10. Cook RD. Detection of influential observation in linear regression. Technometrics. 1977;19:15–18. [Google Scholar]
  11. Cook RD, Weisberg S. Residuals and Influence in Regression. Chapman & Hall; 1982. [Google Scholar]
  12. Csiszár I. Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica. 1967;2:299–318. [Google Scholar]
  13. Daniels MJ, Hogan JW. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. London: Chapman and Hall; 2008. [Google Scholar]
  14. Epifani I, MacEachern SN, Peruggia M. Case-deletion importance sampling estimators: central limit theorems and related results. Electronic Journal of Statistics. 2008;2:774–806. [Google Scholar]
  15. Geisser S. The predictive sample reuse method with applications. Journal of the American Statistical Association. 1975;70:320–328. [Google Scholar]
  16. Geisser S. Predictive Inference: An Introduction. London: Chapman & Hall; 1993. [Google Scholar]
  17. Gelfand AE, Dey DK. Bayesian model choice: asymptotics and exact calculations. Journal of the Royal Statistical Soceity Series B. 1994;56:501–514. [Google Scholar]
  18. Gelfand AE, Dey DK, Chang H. Model determination using predictive distributions, with implementation via sampling-based methods (Disc: P160-167) In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, editors. Bayesian Statistics 4. Oxford: Oxford University Press; 1992. pp. 147–159. [Google Scholar]
  19. Ibrahim JG, Chen MH, Lipsitz SR, Herring A. Missing-data methods for generalized linear models: a comparative review. Journal of the American Statistical Association. 2005;100:332–346. [Google Scholar]
  20. Ibrahim JG, Molenberghs G. Missing data methods in longitudinal studies: a review. Test. 2009;18:1–43. doi: 10.1007/s11749-009-0138-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Ibrahim JG, Zhu HT, Tang NS. Bayesian local influence for survival models. Lifetime Data Analysis. 2010 doi: 10.1007/s10985-010-9170-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Johnson W. Influence measures for logistic regression: another point of view. Biometrika. 1985;72:59–65. [Google Scholar]
  23. Johnson W, Geisser S. A predictive view of the detection and characterization of influential observations in regression analysis. Journal of the American Statistical Association. 1983;78:137–144. [Google Scholar]
  24. Kass RE, Kadane JB, Tierney L. Asymptotic evaluation of integrals arising in Bayesian inference. In: Page C, LePage R, editors. Computing Science and Statistics: Proceedings of the Symposium on the Interface. Springer-Verlag Inc.; 1990. pp. 38–42. [Google Scholar]
  25. Kass RE, Tierney L, Kadane JB. Approximate methods for assessing influence and sensitivity in Bayesian analysis. Biometrika. 1989;76:663–674. [Google Scholar]
  26. Lee SY. Structural Equation Modelling: A Bayesian Approach. Wiley; 2007. [Google Scholar]
  27. Lee SY, Tang NS. Analysis of nonlinear structural equation models with nonignorable missing covariates and ordered categorical data. Statistica Sinica. 2006;16:1117–1141. [Google Scholar]
  28. Little RJA. Regression with missing x's: A review. Journal of the American Statistical Association. 1992;87:1227–1237. [Google Scholar]
  29. Little RJA, Rubin DB. Statistical Analysis With Missing Data. New York: Wiley; 2002. [Google Scholar]
  30. Louis TA. Finding the observed information using the em algorithm. Journal of the Royal Statistical Soceity Series B. 1982;44:226–233. [Google Scholar]
  31. McCulloch RE. Local model influence. Journal of the American Statistical Association. 1989;84:473–478. [Google Scholar]
  32. Molenberghs G, Kenward G. Missing Data in Clinical Studies. Wiley; New York: 2007. [Google Scholar]
  33. Molenberghs G, Verbeke G. Models for Discrete Longitudinal Data. Springer; New York: 2005. [Google Scholar]
  34. Morisky DE, Tiglao TV, Sneed CD, Tempongko SB, Baltazar JC, Detels R, Stein JA. The effects of establishment practices, knowledge and attitudes on condom use among filipina sex workers. AIDS Care. 1998;10:213–320. doi: 10.1080/09540129850124460. [DOI] [PubMed] [Google Scholar]
  35. Peng F, Dey DK. Bayesian analysis of outlier problems using divergence measures. The Canadian Journal of Statistics / La Revue Canadienne de Statistique. 1995;23:199–213. [Google Scholar]
  36. Peruggia M. On the variability of case-deletion importance sampling weights in the bayesian linear model. Journal of the American Statistical Assocation. 1997;92:199–207. [Google Scholar]
  37. Pettit LI. Diagnostics in Bayesian model choice. The Statistician: Journal of the Institute of Statisticians. 1986;35:183–190. [Google Scholar]
  38. Skrondal A, Rabe-Hesketh S. Generalized Latent Variable Modeling. London: Chapman and Hall; 2004. [Google Scholar]
  39. Tierney L, Kass RE, Kadane JB. Fully exponential Laplace approximations to expectations and variances of nonpositive functions. Journal of the American Statistical Association. 1989;84:710–716. [Google Scholar]
  40. Wang HM, Jones MP, Burns TL. Regression diagnostics for the class a regressive model with quantitative phenotypes. Genetic Epidemiology. 1999;17:174–187. doi: 10.1002/(SICI)1098-2272(1999)17:3<174::AID-GEPI3>3.0.CO;2-G. [DOI] [PubMed] [Google Scholar]
  41. Wei B. Exponential Family Nonlinear Models. Singapore: Springer-Verlag; 1998. [Google Scholar]
  42. Weiss R. An approach to Bayesian sensitivity analysis. Journal of the Royal Statistical Soceity Series B. 1996;58:739–750. [Google Scholar]
  43. Weiss RE, Cook RD. A graphical case statistic for assessing posterior influence. Biometrika. 1992;79:51–55. [Google Scholar]
  44. Zhu H, Lee SY, Wei BC, Zhou J. Case deletion measures for models with incomplete data. Biometrika. 2001;88:727–737. [Google Scholar]
  45. Zhu HT, Ibrahim JG, Cho N, Tang NS. A general review of Bayesian influence analysis. In: Chen MH, Dey DK, Muller P, Sun D, Ye K, editors. Frontier of statistical decision making and Bayesian analysis. New York: Springer-Verlag; 2010. pp. 219–237. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

appendix

RESOURCES