Bayesian Sensitivity Analysis of Statistical Models with Missing Data

HONGTU ZHU; JOSEPH G IBRAHIM; NIANSHENG TANG

doi:10.5705/ss.2012.126

. Author manuscript; available in PMC: 2014 Oct 1.

Published in final edited form as: Stat Sin. 2014 Apr;24(2):871–896. doi: 10.5705/ss.2012.126

Bayesian Sensitivity Analysis of Statistical Models with Missing Data

HONGTU ZHU ¹, JOSEPH G IBRAHIM ², NIANSHENG TANG ³

PMCID: PMC3991016 NIHMSID: NIHMS519520 PMID: 24753718

Abstract

Methods for handling missing data depend strongly on the mechanism that generated the missing values, such as missing completely at random (MCAR) or missing at random (MAR), as well as other distributional and modeling assumptions at various stages. It is well known that the resulting estimates and tests may be sensitive to these assumptions as well as to outlying observations. In this paper, we introduce various perturbations to modeling assumptions and individual observations, and then develop a formal sensitivity analysis to assess these perturbations in the Bayesian analysis of statistical models with missing data. We develop a geometric framework, called the Bayesian perturbation manifold, to characterize the intrinsic structure of these perturbations. We propose several intrinsic influence measures to perform sensitivity analysis and quantify the effect of various perturbations to statistical models. We use the proposed sensitivity analysis procedure to systematically investigate the tenability of the non-ignorable missing at random (NMAR) assumption. Simulation studies are conducted to evaluate our methods, and a dataset is analyzed to illustrate the use of our diagnostic measures.

Keywords: Influence measure, Missing data mechanism, Perturbation manifold, Sensitivity analysis

1. Introduction

It is common to have missing data in surveys, clinical trials, and longitudinal studies. Various statistical methods have been developed to handle missing data. These methods depend on the missing data mechanism that generates the missing values and other modeling assumptions at various stages, and the resulting estimates and tests can be sensitive to these assumptions. Sensitivity analyses are commonly performed to perturb the model assumptions and/or individual observations to check the sensitivity of a specific influence measure (e.g., a parameter of interest). There is an extensive literature on sensitivity analysis for missing data problems in frequentist analysis (Copas and Eguchi (2005), Little and Rubin (2002), Zhu and Lee (2001), Copas and Li (1997), van Steen et al. (2001), Troxel (1998), Jansen et al. (2006), Jansen et al. (2003), Verbeke et al. (2001), Troxel et al. (2004), Shi, Zhu and Ibrahim (2009), Hens et al. (2006), Daniels and Hogan (2008)).

The literature on influence measures include Copas and Eguchi (2005), Zhu and Lee (2001), Troxel et al. (2004), Copas and Li (1997), van Steen et al. (2001), Troxel (1998), Jansen et al. (2006), Jansen et al. (2003), Hens et al. (2006), Verbeke et al. (2001), Shi, Zhu and Ibrahim (2009), and Daniels and Hogan (2008). For instance, in frequentist analysis, Copas and Eguchi (2005) developed a general formulation for assessing the bias of maximum likelihood estimates in the presence of small model perturbations for missing data problems. The local influence method in Cook (1986) was successfully applied to carry out sensitivity analyses for various statistical models with missing data (van Steen et al. (2001), Troxel (1998), Jansen et al. (2006), Hens et al. (2006), Jansen et al. (2003), Verbeke et al. (2001)). Shi, Zhu and Ibrahim (2009) further systematically investigated the local influence methods proposed in Zhu et al. (2007) for GLMs with missing at random (MAR) covariates as well as not missing at random (NMAR) covariates, often referred to as nonignorable missing covariates.

In contrast, in the Bayesian literature, several analogues of Cook (1986) were developed to carry out model assessment by using either the curvature of some influence measures (Millar and Stewart (2007), Linde (2007), Lavine (1991)) or the Fréchet derivative of the posterior with respect to the prior (Dey, Ghosh and Lou (1996), Gustafson (1996a), Gustafson (1996b), Berger (1994)). Daniels and Hogan (2008) examined several global and local sensitivity methods in the Bayesian analysis of pattern mixture models (Little (1994), Andridge and Little (2011)). Recently, Zhu, Ibrahim and Tang (2011) developed a general framework of Bayesian influence analysis for assessing various perturbation schemes to the data, the prior and the sampling distribution for a class of statistical models without missing data.

The aim of this paper is to develop a formal Bayesian sensitivity analysis in statistical models with missing data. We introduce various perturbations to the modeling of the missing data mechanism, individual observations, and the prior. We develop a geometric framework, called the Bayesian perturbation manifold, to characterize the intrinsic structure of these perturbations. We examine several influence measures for sensitivity analysis and for quantifying the effect of various perturbations to statistical models with missing data.

In the paper, we develop a Bayesian perturbation manifold for a large class of statistical models with missing data; examine three Bayesian influence measures including the ϕ-divergence, the posterior mean distance, and the Bayes factor; focus on assessing missing data mechanism, while simultaneously perturbing other distributional assumptions, the prior, and individual observations.

To motivate our methodology, we consider data on 1116 female sex workers in Philippine cities from a study of the relationship between Acquired Immune Deficiency Syndrome (AIDS) and the use of condoms (Morisky et al. (1998)), which is discussed in more detail in Section 3. The data contains items about knowledge of AIDS, attitude toward AIDS, belief, and self efficiency of condom use. Nine variables in the original data set (items 33, 32, 31, 43, 72, 74, 27h, 27e, and 27i in the questionnaire) were taken as responses. The primary interest here was to find how the threat of AIDS is associated with aggressiveness of the sex worker and the fear of contracting AIDS. The responses and covariates are missing at least once for 361 workers (32.35%). In Section 3, we carry out a Bayesian analysis of a structural equations model with both missing covariates and responses to analyze this data set, and present a formal Bayesian sensitivity analysis.

The rest of this paper is organized as follows. In Section 2, we construct a Bayesian perturbation manifold to characterize various perturbations to statistical models with missing data and derive its associated geometric quantities. We propose global and local influence measures to quantify the effects of perturbing missing data mechanism, while simultaneously perturbing the data, the prior, and other model assumptions on the posterior quantities. In Section 3, we present simulation studies and a data analysis to illustrate the importance of the proposed method in assessing the missing data mechanism and other potential misspecifications.

2. Bayesian sensitivity analysis

2.1. Statistical models with missing data

Let z_obs = (z_1,o, . . . , z_n,o) and z_mis = (z_1,m, . . . , z_n,m) be the observed and missing data, respectively, and z_com = (z_1,c, . . . , z_n,c) = (z_mis, z_obs) be the complete data, where z_i,c = (z_i,o, z_i,m) for i = 1, . . . , n. In applications, the dimensions of z_i,c, z_i,o and z_i,m may be different across i. For instance, the number of observations may vary across clusters for clustered data.

For missing data problems, we consider a statistical model p(z_com | θ) for the complete data such that p(z_com | θ) is the product of a model for the observed data p(z_obs | θ) and a model for the missing data given the observed data p(z_mis | z_obs, θ). This class of statistical models for missing data includes generalized linear models with missing covariates and/or responses, generalized linear mixed models, nonlinear models, parametric survival models, and many others. To carry out Bayesian inference, we usually use Markov chain Monte Carlo (MCMC) methods to simulate samples from the posterior distribution of the observed data

p (θ ∣ z_{o b s}) \propto p (z_{o b s} ∣ θ) p (θ) \propto \int p (z_{c o m} ∣ θ) p (θ) d z_{m i s} .

(2.1)

Example 1 (Missing Covariates Data). Consider n independent observations z_com = {z_i,c = (x_i, c_i, r_i, y_i), i = 1, . . . , n}, where y_i is the response variable, x_i is a p₁ × 1 vector of completely observed covariates, and c_i = (c_i,m, c_i,o) is a p₂ × 1 vector of partially observed covariates, where c_i,m and c_i,o denote the missing and observed components of c_i, respectively. Let r_i be a p₂ × 1 vector whose j^th component, r_ij, equals 1 if the j^th component of c_i, denoted by c_ij, is observed, and 0 if c_ij is missing. We assume that p(x_i, c_i, r_i, y_i|θ) = p(y_i|x_i, c_i, θ)p(x_i, c_i|θ) p(r_i|y_i, x_i, c_i, θ), where θ denotes the vector of unknown parameters. In this case, z_i,m = c_i,m and z_i,o = (x_i, c_i,o, r_i, y_i) for all i.

We assume the generalized linear model (GLM)

p (y_{i} ∣ x_{i}, c_{i}, β, τ) = exp [a_{i}^{- 1} (τ) {y_{i} η_{i} (β) - b_{1} (η_{i} (β))} + b_{2} (y_{i}, τ)]

(2.2)

for i = 1, . . . , n, where a_i(·), b₁(·), and b₂(·,·) are known functions, η_i = η(μ_i) and $μ_{i} = g ((x_{i}^{'}, c_{i}^{'}) β)$ , in which g(·) is a known link function, β = (β₁, . . . , β_p)′ and p = p₁ + p₂. We assume that

p (x_{i}, c_{i} ∣ α) = p (c_{i p_{2}} ∣ c_{i, p_{2} - 1}, \dots, c_{i 1}, x_{i}, α_{2 p_{2}}) \times \dots \times p (c_{i 1} ∣ x_{i}, α_{21}) p (x_{i} ∣ α_{1}) .

(2.3)

Similarly, we model the missing-data mechanism as

p (r_{i} ∣ y_{i}, x_{i}, c_{i}, ξ) = p (r_{i p_{2}} ∣ r_{i, p_{2} - 1}, \dots, r_{i 1}, y_{i}, x_{i}, c_{i}, ξ_{p_{2}}) \times \dots \times p (r_{i 1} ∣ y_{i}, x_{i}, c_{i}, ξ_{1}) .

(2.4)

To carry out a full Bayesian analysis, we need to specify a prior for θ. We can take an independent prior for θ such that p(θ) = p(τ)p(β)p(ξ)p(α). For τ and β, we can take τ ~ gamma(α₀/2, λ₀/2) and β ~ N(μ₀, Σ₀), where α₀, λ₀, μ₀(p×1), and Σ₀(p × p positive definite matrix) are pre-specified hyperparameters. If λ_min(Σ₀) converges to ∞, then N(μ₀, Σ₀) tends to an improper prior. In contrast, if λ_max(Σ₀) is very small, then N(μ₀, Σ₀) tends to a strongly informative prior. For α, we can take an independent prior p(α) = p(α₁)p(α₂₁) · · · p(α_2p₂). To make valid Bayesian inferences about β, requires an appropriate prior p(θ) and the correct specification of the sampling distributions (2.2)-(2.4), so it is crucial to assess the robustness of both the prior and the sampling distribution with respect to posterior estimate of β. Particularly, there is a growing awareness of the need for a formal method for investigating the sensitivity of inference to the missing-data mechanism (Copas and Eguchi (2005), Little and Rubin (2002), Zhu and Lee (2001), Troxel et al. (2004), Copas and Li (1997), van Steen et al. (2001), Troxel (1998), Jansen et al. (2006), Jansen et al. (2003), Verbeke et al. (2001), Shi, Zhu and Ibrahim (2009), Daniels and Hogan (2008), Ibrahim, Chen and Lipsitz (2005)).

Example 2 (Missing Response Data). We consider n independent observations z_com = {z_i,c = (x_i, r_i, y_i), i = 1, . . . , n}, where y_i = (y_i,m, y_i,o) is a p_y × 1 response vector, in which y_i,m and y_i,o denote the missing and observed components of y_i, respectively, and x_i is a p_x × 1 vector of completely observed covariates. Moreover, r_i is a p_y × 1 vector, whose j^th component, r_ij, equals 1 if the j^th component of y_i, denoted by y_ij, is observed, and 0 if y_ij is missing. It is common to model the joint distribution of (y_i, r_i) given x_i such that

p (y_{i}, r_{i} ∣ x_{i}, θ) = p (y_{i, m}, y_{i, o} ∣ x_{i}, θ_{I}) p (r_{i} ∣ y_{i, m}, y_{i, o}, x_{i}, θ_{N}),

(2.5)

where θ_I is the vector of parameters of interest and θ_N includes all parameters in the missing data mechanism p(r_i|y_i,m, y_i,o, x_i, θ_N). In this case, z_i,m = y_i,m and z_i,o = (x_i, y_i,o, r_i) for all i.

To carry out a full Bayesian analysis, we need to specify a prior for θ and the missing data mechanism. For instance, a well-known ignorability condition (Rubin, 1976) is commonly used to carry out posterior inference on θ_I without specifying the missing data mechanism. Specifically, a missing data mechanism is said to be ignorable if it is MAR, (2.5) is true and p(θ) = p(θ_I)p(θ_N). Although it is computationally easier to assume the ignorability condition, most missing data mechanisms are nonignorable (Daniels and Hogan (2008)). An alternative method for nonignorable missing data is to use the extrapolation factorization

p (y_{i}, r_{i} ∣ x_{i}, θ) = p (y_{i, m} ∣ y_{i, o}, r_{i}, x_{i}, θ_{N}) p (y_{i, o}, r_{i} ∣ x_{i}, θ_{I}) .

(2.6)

In this case, p(y_i,m|y_i,o, r_i, x_i, θ_N) is an extrapolation model and cannot be identifiable by the observed data, while p(y_i,o, r_i|x_i, θ_I) is an observed data model. Here, the components in θ_N are called sensitivity parameters (Daniels and Hogan (2008)).

2.2. Bayesian Perturbation Manifold

We introduce a perturbation vector ω = ω(z_com, θ) in a set Ω to perturb the complete-data model p(z_com, θ) = p(θ)p(z_com | θ). To ensure that the perturbation ω is meaningful and sensible, we require the following. (1) p(z_com, θ | ω) is the probability density of (z_com, θ) for the perturbed model as ω varies in a set Ω; (2) There is an ω⁰ ∈ Ω such that p(z_com, θ | ω⁰) = p(z_com, θ) and p(z_obs, θ | ω⁰) = ∫ p(z_com, θ | ω⁰)dz_mis = p(z_obs, θ) for all (z, θ). The ω⁰ can be regarded as the ‘central point’ of Ω representing no perturbation. See Gustafson (2006) and Daniels and Hogan (2008) for general discussions of model expansion from a Bayesian viewpoint.

Example 1 (Continued) We are interested in perturbing the missing-data mechanism p(r_i|y_i, x_i, c_i, ξ) in (2.4). For instance, when (2.4) is assumed to be MAR, we can consider a general perturbation scheme

p (r_{i} ∣ y_{i}, x_{i}, c_{i}, ξ, ω) = p (r_{i p_{2}} ∣ r_{i, p_{2} - 1}, \dots, r_{i 1}, y_{i}, x_{i}, c_{i}, ξ_{p_{2}}, ω) \dots p (r_{i 1} ∣ y_{i}, x_{i}, c_{i}, ξ_{1}, ω),

(2.7)

where ω = (ω₁, . . . , ω_m)^T is an m×1 vector. The perturbation (2.7) is commonly used to perturb the given GLM with MAR covariates in the direction of NMAR (Shi, Zhu and Ibrahim (2009), Verbeke et al. (2001)). We can also consider the individual-specific infinitesimal perturbation (Verbeke et al. (2001), Hens et al. (2006), Jansen et al. (2006), Jansen et al. (2003))

p (r_{i} ∣ y_{i}, x_{i}, c_{i}, ξ, ω_{i}) = p (r_{i p_{2}} ∣ r_{i, p_{2} - 1}, \dots, r_{i 1}, y_{i}, x_{i}, c_{i}, ξ_{p_{2}}, ω_{i}) \dots p (r_{i 1} ∣ y_{i}, x_{i}, c_{i}, ξ_{1}, ω_{i}) .

(2.8)

Large effect of ω_i in (2.8) can provide insight into which cases have large influence. Influence measures developed for the perturbation (2.8) are closely related to Bayesian case influence measures, such as the conditional predictive ordinate (Geisser (1993), Gelfand et al. (1992)).

We develop a geometric framework, called a Bayesian perturbation manifold, to delineate the effect of introducing each perturbation ω in Ω. Under some conditions, $M = {p (z_{c o m}, θ ∣ ω) : ω \in Ω}$ is a Riemannian Hilbert manifold (Lang (1995)). On $M$ , we consider a smooth curve C(t) given by

C (t) = {p (z_{c o m}, θ ∣ ω (t)) : [- ∊, ∊] \to M, C (0) = p (z_{c o m}, θ ∣ ω), and \int \dot{ℓ} {(z_{c o m}, θ ∣ ω (t))}^{2} p (z_{c o m}, θ ∣ ω (t)) d z_{c o m} d θ < \infty},

(2.9)

in which $\dot{ℓ} (z_{c o m}, θ ∣ ω (t)) = d \log p (z_{c o m}, θ ∣ ω (t)) ∕ d t$ is called the tangent (or derivative) vector. The tangent vectors for all possible curves of the form C(t) form the tangent space of $M$ at ω, denoted by $T_{ω} M$ . The inner product of any two tangent vectors v₁(ω) and v₂(ω) in $T_{ω} M$ is given by

< v_{1}, v_{2} > (ω) = \int {v_{1} (ω) v_{2} (ω)} p (z_{c o m}, θ ∣ ω) d z_{c o m} d θ .

(2.10)

It can be shown that the length of the curve C(t) from t₁ to t₂ is

S_{C} (ω (t_{1}), ω (t_{2})) = \int_{t_{1}}^{t_{2}} \sqrt{< \dot{ℓ} (z_{c o m}, θ ∣ ω (t)), \dot{ℓ} (z_{c o m}, θ ∣ ω (t)) >} d t .

(2.11)

We consider the concept of a geodesic as a direct extension of the straight line in Euclidean space on $M$ . For a real function f(ω) defined on $M$ , we take df[v](ω) = lim_t→0t^–1(f[p(z_com, θ | ω(t))] – f[p(z_com, θ | ω(0))]) as the directional derivative of f at the perturbation distribution p(z_com, θ | ω) in the direction of $v (ω) \in T_{ω} M$ . For any two smooth vector fields u(ω) and v(ω) in $T_{ω} M$ , we define the directional derivative du[v](ω) = lim_t→0t^–1{u(ω(t)) – u(ω(0))} of a vector field u(ω), called the connection, at the perturbation distribution in the direction of v(ω). The popular Levi-Civita connection, denoted by ∇_vu(ω), is

d u [v] (ω) - 0.5 {u (ω) v (ω) p (z_{c o m}, θ ∣ ω) - \int u (ω) v (ω) p (z_{c o m}, θ ∣ ω) d z_{c o m} d θ} .

(2.12)

A geodesic on the manifold $M$ is a smooth curve τ(t) = p(z_com, θ | ω(t)) on $M$ with $\dot{ℓ} (z_{c o m}, θ ∣ ω (t)) = v (ω (t))$ such that ∇_vv(ω(t)) = 0. The geodesic is (locally) the shortest path between points on $M$ . Finally, based on these geometric quantities of $M$ , we define $(M, 〈 u, v 〉, \nabla_{v} u)$ as the Bayesian perturbation manifold (BPM) with an inner product < u, v > and the Levi-Civita connection ∇_vu.

Compared to the existing sensitivity analysis methods, a key advantage of using the BPM is that it provides a framework for quantifying simultaneous perturbations to the prior, the missing data mechanism and other distributional assumptions, and individual observations. Such simultaneous perturbations can be important, since it can allow one to disentangle the uncertainty about unverifiable missing data mechanism assumptions from the misspecification of the prior and other distributional assumptions, as well as the presence of outliers. According to the best of our knowledge, no methods currently exist for handling the simultaneous perturbations.

Example 1 (Continued) Consider the simultaneous perturbation model

p (θ ∣ ω_{θ}) \prod_{i = 1}^{n} {p (y_{i} ∣ x_{i}, c_{i}, β, τ, ω_{i y}) p (x_{i}, c_{i} ∣ α, ω_{i c}) p (r_{i} ∣ x_{i}, c_{i}, y_{i}, ξ, ω_{i r})},

(2.13)

where ω includes ω_θ and $ω_{i} = (ω_{i y}^{T}, ω_{i c}^{T}, ω_{i r}^{T})$ for all i and all components of ω are assumed to be independent of z_com and θ. The three terms on the right hand side of (2.13) are assumed to be probability densities and ω_θ, ω_iy, ω_ic, and ω_ir for all i have no components in common. In this case, the BPM is given by

M = {p (θ ∣ ω_{θ}) \prod_{i = 1}^{n} p (x_{i}, c_{i}, r_{i}, y_{i} ∣ θ, ω_{i}) : (ω_{θ}, ω_{1}, \dots, ω_{n}) \in Ω},

(2.14)

where p(x_i, c_i, r_i, y_i|θ, ω_i) denotes the product of the three terms on the right hand side of (2.13). Consider ω(t) as a vector of smooth functions of t and v_h = dω(0)/dt. It follows from the arguments in Zhu, Ibrahim and Tang (2011) that $T_{ω} M$ is spanned by the functions ∂_{ω_θ} logp(θ| ω_θ), ∂_{ω_iy} logp(y_i|x_i, c_i, β, τ, ω_iy), ∂_{ω_ic} logp(x_i, c_i|α, ω_ic), and ∂_{ω_ir} logp(r_i|x_i, c_i, y_i, ξ, ω_ir), where ∂_ω = ∂/∂_ω. By using the chain rule, we have

v (ω (0)) = v_{h} \partial_{ω} ℓ (z_{c o m}, θ ∣ ω (0)) and < v, v > (ω (0)) = v_{h}^{T} G (ω (0)) v_{h},

(2.15)

where

G (ω (0)) = \int {[\partial_{ω} ℓ (z_{c o m}, θ ∣ ω (0))]}^{\otimes 2} p (z_{c o m}, θ ∣ ω) d z_{c o m} d θ

(2.16)

is the Bayesian Fisher information matrix with respect to ω (Daniels and Hogan (2008)). Geometrically, ω_θ, ω_iy, ω_ic, and ω_ir are orthogonal to each other with respect to the inner product defined in (2.10) (Cox and Reid (1987)). Similar to Zhu et al. (2007), one can easily separate out the influence of the missing data mechanism from that of the data, the prior, and other distributional assumptions. Example 2 (Continued). The sensitivity parameters in (2.6) can be either fixed at a range of values, or assigned an appropriate distribution (Daniels and Hogan (2008)). Here we take the first approach and treat θ_N or its parametrization as a perturbation vector. Generally, we consider a simultaneous perturbation model

p (θ ∣ ω_{θ}) \prod_{i = 1}^{n} {p (y_{i, m} ∣ y_{i, o}, r_{i}, x_{i}, ω_{N}) p (y_{i, o}, r_{i} ∣ x_{i}, θ_{I}, ω_{I})},

(2.17)

where ω includes ω_θ, ω_N, and ω_I, which represent the perturbation vectors to the prior, the extrapolation model, and the observed data model, respectively. For simplicity, we assume that ω_θ, ω_N, and ω_I do not share any common components and are independent of z_com and finite dimensional parameters. Moreover, it is assumed that p(θ|ω_θ), p(y_i,m|y_i,o, r_i, x_i, ω_N), and p(y_i,o, r_i|x_i, θ_I, ω_I) for all i are probability densities. Generally, it is possible that ω_N and ω_I may depend on z_com and vary across i.

Consider ω(t) as a vector of smooth functions of t and v_h = dω(0)/dt. In this case, $T_{ω} M$ is spanned by ∂_{ω_θ} logp(θ |ω), $\sum_{i = 1}^{n} \partial_{ω_{N}} \log p (y_{i, m} ∣ y_{i, o}, r_{i}, x_{i}, ω_{N})$ , and $\sum_{i = 1}^{n} \partial_{ω_{I}} \log p (y_{i, o}, r_{i} ∣ x_{i}, θ_{I}, ω_{I})$ . Subsequently, we can calculate the Bayesian Fisher information matrix G(ω(0)) according to (2.16). Geometrically, ω_θ, ω_N, and ω_I are also orthogonal to each other with respect to the inner product defined in (2.10) (Cox and Reid (1987)).

2.3 Intrinsic influence measures

As the purpose of a sensitivity analysis is to assess the uncertainty of the parameter of interest as ω varies in Ω given the data at hand, we take an IFM to be a functional of p(θ|z_obs, ω) as ω varies in Ω, where p(θ | z_obs, ω) is the perturbed posterior distribution of θ given z_obs and ω. Generally, let IF(ω) = IF(p(θ | z_obs, ω)) be the intrinsic influence measure. Three common intrinsic influence measures are the ϕ-divergence function, the posterior mean, and the Bayes factor (Kass et al (1989), Kass and Raftery (1995)).

For the missing data mechanism, one can fix an ω₀ ∈ Ω corresponding to MAR and then develop a relative intrinsic influence measure (RIFM) as a functional of p(θ | z_obs, ω) and p(θ | z_obs, ω₀),

RI (ω, ω_{0}) = RI (p (θ ∣ z_{o b s}, ω), p (θ ∣ z_{o b s}, ω_{0})) .

(2.18)

For instance, RI(ω, ω⁰) can be the total variation distance of p(θ | z_obs, ω⁰) and p(θ | z_obs, ω) (Dey, Ghosh and Lou (1996)). One can take RI(ω, ω₀) = IF(ω) – IF(ω₀) as the difference between IFMs at ω and ω₀. See more examples in Section 2.4.

We also suggest rescaling RI(ω, ω₀) by using the minimal geodesic distance between p(z_com, θ | ω) and p(z_com, θ | ω₀), g(ω, ω₀), on the BPM $M$ . Thus, we define the intrinsic influence measure for comparing p(θ | z_obs, ω) to p(θ | z_obs, ω₀) as

{IGI}_{R I} (ω, ω_{0}) = \frac{RI {(ω, ω_{0})}^{2}}{g {(ω, ω_{0})}^{2}} .

(2.19)

The proposed IGI_RI(ω, ω₀) can be interpreted as the ratio of the change of the objective function relative to the minimal distance p(z_com, θ | ω) and p(z_com, θ | ω₀) on $M$ . In practice, one can identify the most influential ω in Ω, denoted by ${\hat{ω}}_{I}$ , which maximizes IGI_RI(ω, ω₀) for all ω ∈ Ω.

We consider the local behavior of RI(ω(t), ω₀) as t approaches zero along all possible smooth curves p(z_com, θ | ω(t)} passing through ω(0) = ω₀. Since RI(ω(t), ω₀) is a function from R to R, it follows from a Taylor's series expansion that

RI (ω (t), ω_{0}) = RI (ω (0), ω_{0}) + \partial RI (ω (0)) t + 0.5 \partial^{2} RI (ω (0)) t^{2} + o (t^{2}),

where ∂RI(ω(0)) and ∂²RI(ω(0)) denote the first- and second order derivatives of RI(ω(t), ω⁰) with respect to t evaluated at t = 0. We need to distinguish ∂RI(ω(0)) ≠ 0 for some smooth curves ω(t) and ∂RI(ω(0)) = 0 for all smooth curves ω(t). For the case ∂RI(ω(0)) ≠ 0, ∂RI(ω(0)) = d(RI)[v](ω(0)) is the directional derivative of RI in the direction of $v \in T_{ω (0)} M$ (Lang (1995)). The first-order local influence measure is defined as

{FI}_{R I} [v] (ω (0)) = lim_{t \to 0} {IGI}_{R I} (ω (0), ω (t)) = \frac{{d (RI) [v] (ω (0))}^{2}}{< v, v > (ω (0))} .

(2.20)

We use the tangent vector v_FImax in $T_{ω (0)} M$ that maximizes FI_RI[v](ω(0)), to carry out a sensitivity analysis.

For the case ∂RI(ω(0)) = 0, we use ∂²RI(ω(0)) to assess the second-order local influence of ω to a statistical model (Zhu et al. (2007)). The second-order influence measure in the direction $v \in T_{ω (0)} M$ is defined as

{SI}_{R I} [v] (ω (0)) = \frac{\partial^{2} RI (ω (0))}{< v, v > (ω (0))} .

(2.21)

Geometrically, SI_RI[v](ω(0)) is invariant to scalar transformations and smooth transformations. To carry out a sensitivity analysis, we use the tangent vector v_S,max in $T_{ω (0)} M$ that maximizes SI_RI[v](ω(0)) for all $v \in T_{ω (0)} M$ .

2.4. Bayesian Sensitivity Analysis

Our sensitivity analysis consists of four steps.

Introduce a Bayesian perturbation manifold based on p(z_com, θ | ω).
Calculate the geometric metric < v, v > (ω₀) of the perturbation manifold.
Choose an intrinsic influence measure IF(ω). If ∂RI(ω(0)) ≠ 0, then we calculate v_FI,max to assess local influence of minor perturbations to the model. If ∂RI(ω(0)) = 0, then we compute v_S,max. We inspect v_FI,max (or v_S,max) in order to detect the most influential components of ω.
For the most influential subcomponents of ω, we calculate IGI_RI(ω, ω₀) and ${\hat{ω}}_{I} = {argmax}_{ω \in Ω} {IGI}_{R I} (ω, ω_{0})$ .

In practice, we iteratively perform the four-step influence analysis as described above. We start with a simultaneous perturbation to z_com, p(θ) and p(z_com|θ). We decide a set of parametric perturbation characterized by a finite dimensional ω such that the perturbed model is large enough to cover a large class of candidate models for the data set. With parametric perturbations, it is computationally simple to carry out the Bayesian sensitivity analysis, and a perturbation model with a large number of perturbations can approximate most interesting perturbation models. we start with a local influence analysis to examine the sensitivity of all components and then focus on a few influential components using an intrinsic influence analysis. For instance, if a few influential hyper-parameters to the prior are identified, one further perturbs their associated prior distribution using the additive ε-contamination class and then carries out intrinsic influence analysis. After combining the information learned from our influence analysis, we might choose a new sampling distribution and/or a new prior. This procedure can be run iteratively until a certain satisfaction is reached.

2.5. Examples of Bayesian influence measures

We focus on assessing the influence of a perturbation scheme ω to the posterior distribution based on ϕ–divergence, the posterior mean distance, and the Bayes factor. The Bayes factor, the ϕ-diverfence, and the posterior mean quantify the effects of introducing ω on the overall assumed model, on the overall posterior distribution, and on the posterior mean of θ, respectively. Since the Bayes factor measures the overall difference between p(z_obs|ω) and p(z_obs|ω₀), it can be more sensitive to some discrepancies between the assumed model and the observed data. As the ϕ-diverfence measures the overall difference between p(z_mis, θ|z_obs, ω) and p(z_mis, θ|z_obs, ω₀), and such a difference may include mean, median, etc., it can be more sensitive to some changes of the posterior distributions, but the posterior mean distance is more sensitive to a subtle change in the posterior mean.

Example 3 (Bayes factor). The logarithm of the Bayes factor for comparing ω with ω₀ is

\begin{matrix} BF (ω, ω_{0}) & = log (p (z_{o b s} ∣ ω)) - log (p (z_{o b s} ∣ ω_{0})) \\ = log (\int p (z_{c o m} ∣ θ, ω) p (θ ∣ ω) d z_{m i s} d θ) - log (\int p (z_{c o m} ∣ θ) p (θ) d z_{m i s} d θ) . \end{matrix}

The value of BF(ω, ω₀) can be regarded as a statistic for testing hypotheses of ω against ω₀ (Kass and Raftery (1995)). Under some smoothness conditions, BF(ω, ω₀) is a continuous map from $M$ to R.

We set RI(ω, ω₀) = BF(ω, ω₀), where ω(t) is a smooth curve on $M$ with ω(0) = ω₀ and $d_{t} \log p (z_{c o m}, θ ∣ ω (t)) ∣_{t = 0} = v (ω_{0}) \in T_{ω (0)} M$ , where d_t = d/dt. It can be shown that

\partial RI (ω (0)) = E {d_{t} log p (z_{c o m}, θ ∣ ω (t)) ∣ z_{o b s}, ω (t)} ∣_{t = 0},

where the conditional expectation is taken with respect to p(z_mis, θ | z_obs, ω(t)). We can use MCMC methods to draw samples ${(θ^{(s)}, z_{m i s}^{(s)}) : s = 1, \dots, S_{0}}$ from p(z_mis, θ | z_obs) and then approximate ∂RI(ω(0)) by using $S_{0}^{- 1} \sum_{s = 1}^{S_{0}} d_{t} \log p (z_{o b s}, z_{m i s}^{(s)}, θ^{(s)} ∣ ω_{0})$ .

We consider a simultaneous perturbation to both the prior and the sampling distribution. We have

{FI}_{B F} [v] (ω (0)) = \frac{E {d_{t} log p (z_{c o m}, θ ∣ ω (0)) ∣ z_{o b s}, ω_{0}}^{2}}{< v, v > (ω_{0})} .

For instance, for the perturbation to the prior given by p(θ; t) = p(θ) + t{g(θ) – p(θ)}, it can be shown that

{FI}_{B F} [v] (ω (0)) = \frac{E {g (θ) ∕ p (θ) ∣ z_{o b s}}^{2}}{{var}_{P} {g (θ) ∕ p (θ)}} = \frac{{p_{g} (z_{o b s}) ∕ p (z_{o b s})}^{2}}{{var}_{P} {g (θ) ∕ p (θ)}},

where p(z_obs) = ∫ p(z_com; θ)p(θ)dz_misdθ and p_g(z_obs) = ∫ p(z_com; θ)g(θ)dz_misdθ. Since the ratio of p_g(z_obs) to p(z_obs) is the Bayes factor in favor of g(θ) against p(θ), the first-order local influence measure is the square of the normalized Bayes factor of g(θ) against p(θ).

Example 4 (ϕ–divergence). The ϕ–divergence between two posterior distributions for ω₀ and ω is

Φ_{R I} (ω, ω_{0}) = \int ϕ (R (z_{m i s}, θ ∣ ω, ω_{0})) p (z_{m i s}, θ ∣ z_{o b s}, ω_{0}) d z_{m i s} d θ,

where R(z_mis,θ | ω, ω₀) = p(z_mis, θ | z_obs, ω)/p(z_mis, θ | z_obs, ω₀) and ϕ(·) is a convex function with ϕ(1) = 0, such as the Kullback-Leibler divergence or the χ²-divergence (Kass et al. (1989)).

We set RI(ω, ω₀) = Φ_RI(ω, ω₀), where ω(t) is a smooth curve on $M$ with ω(0) = ω₀ and $d_{t} \log p (z_{c o m}, θ ∣ ω (t)) ∣_{t = 0} = v (ω_{0}) \in T_{ω (0)} M$ . It can be shown that ∂RI(ω(0)) = 0 and

\partial^{2} RI (ω (0)) = \ddot{ϕ} (1) \int {[d_{t} log p (z_{m i s}, θ ∣ z_{o b s}, ω (t))]}^{2} p (z_{m i s}, θ ∣ z_{o b s}, ω_{0}) d z_{m i s} d θ ∣_{t = 0},

where $\ddot{ϕ} (t) = d^{2} ϕ (t) ∕ d t^{2}$ . We need a computational formula. Note that

d_{t} log p (z_{c o m}, θ ∣ ω (0)) - \int [d_{t} log p (z_{c o m}, θ ∣ ω (0))] p (z_{m i s}, θ ∣ z_{o b s}, ω_{0}) d z_{m i s} d θ .

In practice, we use MCMC methods to draw samples ${(θ^{(s)}, z_{m i s}^{(s)}) : s = 1, \dots, S_{0}}$ from p(θ, z_mis | z_obs, ω₀) and then approximate ∂²RI(ω(0)) using

\ddot{ϕ} (1) S_{0}^{- 1} \sum_{s = 1}^{S_{0}} {[d_{t} log p (z_{m i s}^{(s)}, z_{o b s}, θ^{(s)} ∣ ω (0)) - S_{0}^{- 1} \sum_{s^{'} = 1}^{S_{0}} d_{t} log p (z_{m i s}^{(s^{'})}, z_{o b s}, θ^{(s^{'})} ∣ ω (0))]}^{2} .

For perturbation schemes to the prior distribution, it can be shown that

< v, v > (ω (0)) = \int {[d_{t} log p (θ ∣ ω (0))]}^{2} p (θ ∣ ω (0)) d θ

and $\partial^{2} RI (ω (0)) = \ddot{ϕ} (t) var [d_{t} \log p (θ ∣ ω (0)) ∣ z_{o b s}, ω_{0}]$ , which are, respectively, the Fisher information matrices of ω(t) based on the prior and posterior distributions, where var(· | z_obs, ω₀) denotes the posterior variance. For instance, for p(θ | ω(θ)) = p(θ) + t{g(θ) – p(θ)}, we can show that

{SI}_{Φ_{R I}} [v] {ω (0)} = \frac{\ddot{ϕ} (1) var {g (θ) ∕ p (θ) ∣ z_{o b s}}}{{var}_{P} {g (θ) ∕ p (θ)}},

where var_P(·) denotes the prior variance.

Example 5 (Posterior mean distance). We measure the distance between the posterior means of h(θ) for ω₀ and ω (Kass et al. (1989), Gustafson (1996b)). The posterior mean of h(θ) after introducing ω is

M_{h} (ω) = \int h (θ) p (z_{m i s}, θ ∣ z_{o b s}, ω) d z_{m i s} d θ .

Cook's posterior mean distance for characterizing the influence of ω is then

{CM}_{h} (ω, ω_{0}) = {M_{h} (ω) - M_{h} (ω_{0})}^{T} G_{h} {M_{h} (ω) - M_{h} (ω_{0})},

(2.22)

where G_h is a positive definite matrix. Henceforth, G_h is the inverse of the posterior covariance matrix of h(θ) for p(θ | z_obs, ω₀).

We set RI(ω, ω₀) = CM_h(ω, ω₀), where ω(t) is a smooth curve on $M$ with ω(0) = ω₀ and $d_{t} \log p (z_{c o m}, θ ∣ ω (t)) ∣_{t = 0} = v (ω_{0}) \in T_{ω (0)} M$ . It can be shown that ∂RI(ω(0)) = 0 and ∂²RI(ω(0)) = M̧_h(v)^TG_hM̧_h(v), where

{\dot{M}}_{h} (v) = d_{t} M_{h} (ω (0)) = Cov {h (θ), d_{t} log p (z_{c o m}, θ ∣ ω (t)) ∣ z_{o b s}, ω^{0}} ∣_{t = 0} .

We can use MCMC methods to approximate M̧_h(v) and G_h.

2.6. A Simple Theoretical Example

We consider a simple example involving missing responses (Daniels and Hogan (2008)). Consider a data set z_com = ((y₁, r₁), · · · , (y_n, r_n))^T, where r_i = 1 if y_i is observed and 0 if y_i is missing. We focus on perturbing missing-data mechanism.

First, we fit a pattern mixture model for (y_i, r_i) such that

y_{i} ∣ r_{i} = 1 \sim N (μ_{1}, σ^{2}), y_{i} ∣ r_{i} = 0 \sim N (μ_{0}, σ^{2}), r_{i} \sim B e r (ϕ) .

(2.23)

Model (2.23) assumes that the observed and missing responses differ in their mean but share the same variance. Since the observed data do not contain any information on μ₀, we assume μ₀ = μ₁ + ω_μ.

Here ω_μ can be regarded as a perturbation and θ = (μ₁, σ², ϕ). The complete-data likelihood function is

p (z_{c o m}, θ ∣ ω_{μ}) = ϕ^{\sum_{i} r_{i}} {(1 - ϕ)}^{n - \sum_{i} r_{i}} {\prod_{i, r_{i} = 0} p (y_{i}; μ_{1} + ω_{μ}, σ^{2})} {\prod_{i, r_{i} = 1} p (y_{i}; μ_{1}, σ^{2})},

where p(y|μ, σ²) denotes the normal density function. Regardless of the prior for θ, it can be shown that $G (ω_{μ}) = \sum_{i = 1}^{n} (1 - r_{i}) ∕ σ^{2}$ , which is independent of ω_μ, and thus $M$ is flat and g(ω_μ,1, ω_μ,2) = c|ω_μ,1 – ω_μ,2|, where c is a scalar (Zhu et al. (2007)). Moreover, since the observed-data likelihood function ∫ p(z_com, θ|ω_μ)dz_mis does not depend on ω_μ, all IFs and IFMs based on p(θ|z_obs, ω_μ) are zero. This indicates that varying ω_μ does not influence the posterior inferences on θ given z_obs. Instead, if we consider the posterior mean μ₁ + (1 – ϕ)ω_μ, the mean of y_i, as the influence measure, then we have

\begin{matrix} IF (ω_{μ}) = E [μ_{1} + (1 - ϕ) ω_{μ} ∣ z_{o b s}] = E [μ_{1} ∣ z_{o b s}] + {1 - E [ϕ ∣ z_{o b s}]} ω_{μ}, \\ {IGI}_{R I} (ω_{μ, 1}, ω_{μ, 2}) = \frac{{[IF (ω_{μ, 1}) - IF (ω_{μ, 2})]}^{2}}{g {(ω_{μ, 1}, ω_{μ, 2})}^{2}} = \frac{{1 - E [ϕ ∣ z_{o b s}]}^{2} σ^{2}}{\sum_{i = 1}^{n} (1 - r_{i})}, \end{matrix}

where E[·|z_obs] denote the expectations taken with respect to p(θ|z_obs). In this case, IF(ω_μ) does not belong to any of the three Bayesian influence measures considered in Section 2.4, but our invariant influence measure is applicable. Moreover, the constant IGI_RI(ω_μ,1, ω_μ,2) indicates that any inferences about the measure of y_i is completely driven by the assumptions regarding the size of ω_μ.

Second, we fit a selection model for (y_i, r_i) such that

y_{i} \sim N (μ_{1}, σ^{2}), r_{i} \sim B e r (ϕ_{i}) with logit (ϕ_{i}) = ξ_{1} + ω_{ξ} y_{i},

(2.24)

where logit(·) denotes the logit function. In (2.24), ω_ξ = 0 corresponds to MAR, whereas ω_ξ ≠ 0 corresponds to NMAR. In this case, ω_ξ can be regarded as a perturbation and θ = (μ₁, σ², ξ₁). The complete-data likelihood function is

p (z_{c o m}, θ ∣ ω_{ξ}) = ϕ_{i}^{\sum_{i} r_{i}} {(1 - ϕ_{i})}^{n - \sum_{i} r_{i}} \prod_{i = 1}^{n} p (y_{i}; μ_{1}, σ^{2}) .

If p(θ) is the prior for θ, it can be shown that

G (ω_{ξ}) = n \int y^{2} \frac{exp (ξ_{1} + ω_{ξ} y)}{{[1 + exp (ξ_{1} + ω_{ξ} y)]}^{2}} p (y; μ_{1}, σ^{2}) p (θ) d y d θ,

which does not have a simple form. Moreover, since the observed-data likelihood function ∫ p(z_com, θ|ω_ξ)dz_mis does depend on ω_ξ, all IFs and IFMs based on p(θ|z_obs, ω_ξ) can be numerically calculated according to the formula given in Sections 2.3-2.4. Generally, in the selection model, varying ω_ξ does not influence the posterior inferences about θ given z_obs.

3. Simulation Study

We consider a two-level model. We assume that data are obtained from N individuals nested within J groups, with group j containing n_j individuals, where $N = \sum_{j = 1}^{J} n_{j}$ . The level-1 units are the individuals and the level-2 units are the groups. At level-1, for each group j (j = 1, . . . , J), the within-group model is given by

y_{i j} = x_{i j}^{T} β_{j} + ε_{i j}, i = 1, \dots, n_{j},

(3.1)

where y_ij is the outcome variable, x_ij is a q-vector with explanatory variables (including a constant), β_j is a q-vector of regression coefficients, and ε_ij is the residual. At level-2, we further assume β_j to be a vector of random regression coefficients,

β_{j} = Z_{j} γ + u_{j},

(3.2)

where Z_j is a q × r matrix with explanatory variables (including a constant) obtained at the group level, γ is a r-vector containing fixed coefficients, and u_j is a q-vector of residuals. Assume that u_j is independent of ε_ij, u_j ~ N_q(0, Σ), and $ε_{i j} \sim N (0, σ_{ε}^{2})$ . We assume that the covariates x_ij and Z_j are completely observed for i = 1, . . . , n_j and j = 1, . . . , J, but the responses y_ij may be missing.

We simulated a data set according to (3.1)-(3.2). We set J = 100, q = 2, and r = 3, and then we chose varying values of n_j in order to create a scenario with different cluster sizes. Specifically, we set n₁ = . . . = n₁₀ = 3, n₉₁ = . . . = n₁₀₀ = 20, and n_i ∈ {5, 7, 8, 10, 12, 13, 15, 17} for i = 11, . . . , 90. We independently generated all components (except the intercept) of x_ij and Z_j as U(0, 1). We assumed that the y_ij's were missing at random (MAR) with missing data mechanism

\Pr (r_{i j} = 1 ∣ x_{i j}, φ) = \frac{exp (φ_{0} + φ_{x}^{T} x_{i j})}{1 + exp (φ_{0} + φ_{x}^{T} x_{i j})},

(3.3)

where φ = (φ₀, φ_x), r_ij = 1 if y_ij is missing and r_ij = 0 if y_ij is observed. We set φ₀ = –2.0, φ_x = (0.5, 0.5)^T , γ = (0.8, 0.8, 0.8)^T , $Σ = {0.51}_{2} 1_{2}^{T} + 0.5 I_{2}$ , and $σ_{ε}^{2} = 1.0$ . The missing fraction of the responses is about 18.4%. To add some outliers, we modified the simulated data set by generating new {y_ij : j = 1, 99, 100; i = 1, . . . , n_j} from a $N (x_{i j}^{T} Z_{j} γ + x_{i j}^{T} u_{j}, σ_{ε}^{2})$ distribution with u_j ~ N(5.61₂, 1.96I₂ + 0.3Σ) (j = 1, 99, 100).

We fit (3.1)-(3.3) to the simulated data set and used MCMC sampling to carry out the Bayesian influence analysis (Chen, Shao and Ibrahim (2000)). We took

p (γ) \overset{D}{=} N (γ^{0}, H_{0 ε}), p (σ_{ε}^{- 2}) \overset{D}{=} Γ (α_{0 ε}, β_{0 ε}), p (Σ) \overset{D}{=} I W_{q} (ρ_{0}, R^{0}),

where γ⁰, H_0ε, α_0ε, β_0ε, R⁰, and ρ₀ are hyperparameters whose values are prespecified. We assumed that $p (φ) \overset{D}{=} N (φ^{0}, H_{0 φ})$ , where φ⁰ and H_0φ are the given hyperparameters. Furthermore, we set γ⁰ = (0.8, 0.8, 0.8)^T, $R^{0} = 2 I_{2} + 21_{2} 1_{2}^{T}$ , φ⁰ = (–2.0, 0.5, 0.5)^T, H_0φ = I₃, α_ε0 = 10.0, β_ε0 = 8.0, ρ₀ = 10, and H_0ε = diag(0.2, 0.2, 0.2).

We simultaneously perturbed the distributions of u_j and the prior distributions of γ, Σ, and $σ_{ε}^{2}$ , whose perturbed complete-data joint (unnormalized) log-posterior density is given by

ℓ (z_{c o m}, θ ∣ ω) = \sum_{j = 1}^{J} {- q log (2 π ∕ ω_{j}) - log ∣ Σ ∣ - ω_{j} u_{j}^{T} Σ^{- 1} u_{j}} ∕ 2 + {- r log (2 π ∕ ω γ) - log ∣ H_{0 ε} ∣ - ω γ {(γ - γ^{0})}^{T} H_{0 ε}^{- 1} (γ - γ^{0})} ∕ 2 - (ρ_{0} - q - 1) log ∣ Σ ∣ ∕ 2 - ω_{Σ} tr (R^{0} Σ^{- 1}) ∕ 2 - q ρ_{0} log (2) ∕ 2 + ρ_{0} log ∣ ω_{Σ} R_{0} ∣ ∕ 2 - log Γ_{q} (ρ_{0} ∕ 2) + log [p (σ_{ε}^{- 2}) + ω_{σ} {g (σ_{ε}^{- 2}) - p (σ_{ε}^{- 2})}],

where $g (σ_{ε}^{- 2})$ is the density of a Gamma (α_0ε + 3, β_0ε + 1) distribution and ω = (ω₁, . . . , ω_J, ω_γ, ω_Σ, ω_σ )^T. In this case, ω⁰ = (1, 1, . . . , 1, 0)′ represents no perturbation. By differentiating $ℓ (z_{c o m}, θ ∣ ω)$ with respect to ω, after some calculations, we have

G (ω^{0}) = diag [q I_{J} ∕ 2, r ∕ 2, {var}_{Σ} {tr (R^{0} Σ^{- 1})} ∕ 4, {var}_{σ_{ε}^{2}} {g (σ_{ε}^{- 2}) ∕ p (σ_{ε}^{- 2})}],

where var_Σ and ${var}_{σ_{ε}^{2}}$ denote the variance with respect to the priors of Σ and $σ_{ε}^{2}$ , respectively. Then, we chose a new perturbation scheme $\tilde{ω} = ω^{0} + G {(ω^{0})}^{1 ∕ 2} (ω - ω^{0})$ and calculated the associated local influence measures $v_{F, \max} = argmax {FI}_{B F} [v] {\tilde{ω} (0)}$ , SI_{Φ_IR}[e_j], and SI_{CM_h}[e_j], in which ϕ(·) was chosen to be the Kullback-Leibler divergence divergence and h(θ) = θ. Note that the numbers of observations in groups 1, 99, and 100 were, respectively, 3, 20, and 20. Groups 1, 99 and 100 were detected to be influential by all our local influence measures. Selected results for SI_{Φ_IR}[e_j] are presented in Fig. 1(a).

Simulation Study: group index plots of local influence measures for simultaneous perturbation: (a) SI_{Φ_IR}[e_j] can detect the three influential groups (1, 99, and 100); (b) SI_{Φ_IR}[e_j] can simultaneously detect the three influential groups (1, 99, and 100) and the perturbed prior distribution p(γ).

We used the same setup, except that we employed a perturbed prior distribution for $γ : p (γ) \overset{D}{=} N (4 γ^{0}, H_{0 ε})$ , and then applied the same MCMC method, perturbation scheme, and local influence measures. Groups 1, 99, and 100 and the perturbed prior distribution of γ were identified to be influential by all our local influence measures. Selected results for SI_{Φ_D}[e_j] are presented in Fig. 1 (b).

Next, we explored the potential deviations of the MAR mechanism in the direction of NMAR. We simulated a data set using the same setup except that the missing data mechanism for y_ij was

\Pr (r_{i j} = 1 ∣ x_{i j}, y_{i j}, φ, φ_{y}) = \frac{exp (φ_{0} + φ_{x}^{T} x_{i j} + φ_{y} y_{i j})}{1 + exp (φ_{0} + φ_{x}^{T} s x_{i j} + φ_{y} y_{i j})},

(3.4)

with φ_y = 0.5 to make the missing data fraction approximately equal to 25%.

Similar to sensitivity analysis methods in missing data problems (Molenberghs and Kenward (2007), Little and Rubin (2002)), we fit model (3.1)-(3.2) and (3.4), with φ_y fixed at a value ω_y, to the simulated data set. When ω_y = 0, the missing data is MAR and hence the missing data mechanism in (3.4) is ignorable. Thus, by varying ω_y in an interval Ω₁, we can treat ω_y as a perturbation scheme to the sampling distribution and then calculate the associated local influence measures. Specifically, we chose ω = (ω_y) and obtained a curve C(t) on $M$ at t = ω.

We used the same prior distributions for γ, φ, $σ_{ε}^{2}$ , and Σ as before and used MCMC sampling to carry out the Bayesian influence analysis. We calculated the intrinsic influence measures IGI_f(ω⁰, Ω₁) for Φ_D(ω) and M_h(θ), in which we chose ϕ(·) as the Kullback-Leibler divergence divergence, set h(θ) = γ and treated ω⁰ = 0 as no perturbation. We set Ω₁ = [–2.0, 2.0] and approximated Ω₁ via K₀ = 41 grid points ω_g,(k) = –2.0 + 0.1k for k = 0, . . . , 40. For a given ω ∈ Ω₁, d(ω⁰, ω) was calculated via a composite trapezoidal rule.

Figures 2 (a) and 2 (b) present plots of IGI_IR(ω⁰, ω) against ω ∈ Ω₁ for Φ_IR(ω) and M_h(ω), respectively. The intrinsic influence measures reach maxima near the true value of φ_y = 0.5. This indicates that the nonignorable missing data mechanism is tenable for the simulated data. We also followed a standard sensitivity analysis to compute the posterior means and standard deviations of γ for different φ_y in Table 1. Although we observed that the posterior distribution of γ varies with φ_y, it is hard to tell why φ_y = 0.5 is more meaningful. We also carried out a local influence analysis under this NMAR setting (not presented here) and observed that the proposed local influence method can pick up anomalous features of the data that are not necessarily associated with the missing data mechanism (Jansen et al. (2006)).

Simulation Study: plots of IGI_IR(ω⁰, ω) against ω ∈ Ω₁ for (a) *Φ_IR*(ω) and (b) *M_h*(ω), in which h(θ) = γ.

Table 1.

Posterior means (PMs) and standard errors (SDs) of γ at different values of φ_y

	True γ⁰ = (0.8, 0.8, 0.8)^T
	γ ₁		γ ₂		γ ₃

	PM	SD	PM	SD	PM	SD
φ_y = 0.5	0.831	0.174	0.721	0.251	0.809	0.255
φ_y = 0.3	0.777	0.170	0.697	0.249	0.786	0.247
φ_y = 0.15	0.738	0.167	0.661	0.243	0.776	0.249
φ_y = 0.0	0.697	0.177	0.622	0.247	0.749	0.250

Open in a new tab

4. Real data example

We consider a small portion of a data set from a study of the relationship between acquired immune deficiency syndrome (AIDS) and the use of condoms (Morisky et al. (1998)). This subset contains 11 items on such topics as knowledge about AIDS and beliefs, behaviours and attitudes towards condoms use collected from 1116 female sex workers. Nine items, denoted by y = (y₁, . . . , y₉)^T, were taken as responses. Items (y₁, y₂, y₃) are related to a latent variable, η, which can be roughly interpreted as threat of AIDS, while items (y₄, y₅, y₆) and (y₇, y₈, y₉) are, respectively, related to latent variables ξ₁ and ξ₂, that can be interpreted as aggressiveness of the sex worker and worry of contracting AIDS (Lee and Tang (2006)). All response variables were treated as continuous. A continuous item x₁ on the duration as a sex worker and an ordered categorical item x₂ on the knowledge about AIDS were taken as covariates. The response variables and covariates are missing at least once for 361 of them (32.35%) (see Table 4 of Lee and Tang (2006)). The covariate x₂ is completely observed.

Let y_i = (y_i1, . . . , y_i9)^T and $ϖ_{i} = {(η_{i}, ξ_{i 1}, ξ_{i 2})}^{T}$ . We considered the measurement and structural equations given as

\begin{matrix} y_{i} & = μ + Λ ϖ_{i} + ε_{i}, \\ η_{i} & = b_{1} x_{i 1} + b_{2} x_{i 2} + γ_{1} ξ_{i 1} + γ_{2} ξ_{i 2} + δ_{i}, for i = 1, \dots, 1116, \end{matrix}

where μ = (μ₁, . . . , μ₉)^T and

Λ^{T} = (\begin{matrix} {1.0}^{*} & λ_{21} & λ_{31} & {0.0}^{*} & {0.0}^{*} & {0.0}^{*} & {0.0}^{*} & {0.0}^{*} & {0.0}^{*} \\ {0.0}^{*} & {0.0}^{*} & {0.0}^{*} & {1.0}^{*} & λ_{52} & λ_{62} & {0.0}^{*} & {0.0}^{*} & {0.0}^{*} \\ {0.0}^{*} & {0.0}^{*} & {0.0}^{*} & {0.0}^{*} & {0.0}^{*} & {0.0}^{*} & {1.0}^{*} & λ_{83} & λ_{93} \end{matrix}),

in which 0.0* and 1.0* are regarded as fixed values to identify the scale of the latent factor. We took ε_i distributed as N(0, Ψ), where Ψ = diag(ψ₁, . . . , ψ₉), and $ϖ_{i}$ and ε_i are independent. In the structural equation, Γ = (b₁, b₂, γ₁, γ₂) is a vector of unknown parameters, ξ_i = (ξ_i1, ξ_i2)^T is distributed as N(0, Φ), δ_i is distributed as N(0, ψ_δ), and ξ_i and δ_i are independent.

We took the missing data as NMAR, and hence the missingness mechanism of the response variables is non-ignorable (Ibrahim and Molenberghs (2009)). Let r_yij = 1 if y_ij is missing and r_yij = 0 if y_ij is observed. For the missing data mechanism of the response variables, we took logit{pr(r_yij = 1 | y_i)} = φ₀ + φ₁y_i1 + . . . + φ₉y_i9, where φ = (φ₀, φ₁, . . . , φ₉)^T. We also assumed that the covariate x_i1 is NMAR. Let r_xi1 = 1 if x_i1 is missing and r_xi1 = 0 if x_i1 is observed. It was assumed that x_i1 was $N (0, τ_{x}^{2})$ distribution and logit{pr(r_xi1 = 1 | φ_x)} = φ_x0 + ωx_i1. When ω = 0, the missingness mechanism reduces to MAR.

We fitted the proposed structural equation models to the AIDS data set and used MCMC sampling to carry out the Bayesian influence analysis. We specified the prior distributions for μ, Λ, Ψ, Γ, ω, Φ, ψ_δ, φ, φ_x0, and τ_x as those in Lee and Tang (2006). A total of 40, 000 MCMC samples was used to compute the intrinsic and local influence measures.

By varying ω in an interval [–2, 2], we can treat ω as a perturbation parameter to the sampling distribution. In this case, ω⁰ = 0 represents no perturbation. We calculated two intrinsic influence measures for the Kullback-Leibler divergence and the posterior mean distance, denoted by CM_h(ω). Specifically, CM_h(ω, ω⁰) = {M_h(ω) – M_h(ω⁰)}^T C_h{M_h(ω) – M_h(ω⁰)}, where M_h(ω) = ∫ h(θ)p(θ | z, ω)dθ, in which h(θ) = Γ, and C_h is the posterior covariance matrix of Γ based on p(Γ |z, ω⁰). We calculated IGI_RI(ω⁰, ω) at 41 evenly spaced grid points in [–2, 2] (Fig. 3). An inspection of Figure 3 shows that the largest IGI_RI(ω⁰, ω) values are close to 0.1 for both the Kullback-Leibler divergence and M_h(ω). This indicates that the nonignorable missing data mechanism may be tenable for the AIDS data. We also carried out a standard sensitivity analysis and computed posterior means and standard deviations of at different values of ω, as shown in Figure 4. Although we observe that the posterior means and standard deviations of Γ vary with ω, it is difficult to make any meaningful inference here.

AIDS data analysis results: plots of IGI_RI(ω⁰*, ω*) against ω ∈[–2, 2] for (a) *Φ_RI*(ω) and (b) *M_h*(ω), in which h(θ) = Γ.

AIDS data analysis results: plots of (posterior means-posterior mean at ω = 0)/(posterior standard deviation at ω = 0) ((a),(c),(e),(g)) and the ratio of posterior standard deviations over posterior standard deviation at ω = 0 ((b),(d),(f),(h)) of b₁, b₂, γ₁, γ₂ as a function of ω ∈ [–2, 2].

We also calculated the local influence measures of the Kullback-Leibler divergence under a simultaneous perturbation scheme. The simultaneous perturbation scheme ω includes variance perturbations ω_c for individual observations, perturbations ω_s to coefficients in the structural equations model, perturbations ω_ξ to the sampling distribution of ξ_i, perturbations ω_μ to the prior distribution of μ, perturbations ω_Γ to the prior distribution of Γ, perturbations ω_φ to the prior distribution of φ, and perturbations ω_x to the missing data mechanism. The corresponding kernel of the joint log-posterior density of (z, θ) based on the complete data is given by

log p (z, θ ∣ ω) \equiv ℓ (z, θ ∣ ω) = l_{c} (ω_{c}) + l_{s} (ω_{s}) + l_{ξ} (ω_{ξ}) + l_{μ} (ω_{μ}) + l_{Γ} (ω_{Γ}) + l_{φ} (ω_{φ}) + l_{x} (ω_{x}),

(4.1)

where

\begin{matrix} l_{c} (ω_{c}) & = {- p \sum_{i = 1}^{n} log (2 π ∕ ω_{i}) - \sum_{i = 1}^{n} ω_{i} {(y_{i} - μ - Λ ϖ_{i})}^{T} Ψ^{- 1} (y_{i} - μ - Λ ϖ_{i})} ∕ 2, \\ l_{s} (ω_{s}) & = {- \sum_{i = 1}^{n} {(η_{i} - b_{1} x_{i 1} - b_{2} x_{i 2} - γ_{1} ξ_{i 1} - γ_{2} ξ_{i 2} - ω_{γ 1} ξ_{i 1}^{2} - ω_{γ 2} ξ_{i 2}^{2} - ω_{γ 3} ξ_{i 1} ξ_{i 2})}^{2} ∕ ψ_{δ}} ∕ 2, \\ l_{ξ} (ω_{ξ}) & = {- n q_{2} log (2 π ∕ ω_{ξ}) - n log ∣ Φ ∣ - ω_{ξ} \sum_{i = 1}^{n} ξ_{i}^{T} Φ^{- 1} ξ_{i}} ∕ 2, \\ l_{μ} (ω_{μ}) & = {- p log (2 π ∕ ω_{μ}) - log ∣ Σ_{0} ∣ - ω_{μ} {(μ - μ_{0})}^{T} Σ_{0}^{- 1} (μ - μ_{0})} ∕ 2, \\ l_{Γ} (ω_{Γ}) & = {- (s + t) log (2 π ∕ ω_{Γ}) - log ∣ H_{Γ} ∣ - ω_{Γ} {(Γ - Γ^{0})}^{T} H_{Γ}^{- 1} (Γ - Γ^{0})} ∕ 2, \\ l_{φ} (ω_{φ}) & = {- (p + 1) log (2 π ∕ ω_{φ}) - log ∣ H_{φ} ∣ - ω_{φ} {(φ - φ^{0})}^{T} H_{φ}^{- 1} (φ - φ^{0})} ∕ 2, \\ l_{x} (ω_{x}) & = \sum_{i = 1}^{n} [r_{x i 1} (φ_{x 0} + ω_{x} x_{i 1}) - log {1.0 + exp (φ_{x 0} + ω_{x} x_{i 1})}] . \end{matrix}

In this case, $ω^{0} = {(ω_{c}^{0 T}, ω_{s}^{0 T}, ω_{ξ}^{0}, ω_{μ}^{0}, ω_{Γ}^{0}, ω_{φ}^{0}, ω_{x}^{0})}^{T}$ represents no perturbation, in which $ω_{c}^{0} = {(1, \dots, 1)}^{T}$ , $ω_{s}^{0} = {(0, 0, 0)}^{T}$ , $ω_{ξ}^{0} = ω_{μ}^{0} = ω_{Γ}^{0} = ω_{φ}^{0} = 1$ and $ω_{x}^{0} = 0.1$ .

We calculated $\partial_{ω} ℓ (z, θ ∣ ω)$ and then obtained its metric tensor as

G (ω^{0}) = diag {G_{c} (ω_{c}^{0}), G_{s} (ω_{s}^{0}), G_{ξ} (ω_{ξ}^{0}), G_{μ} (ω_{μ}^{0}), G_{Γ} (ω_{Γ}^{0}), G_{φ} (ω_{φ}^{0}), G_{x} (ω_{x}^{0})},

where $G_{c} (ω_{c}^{0}) = diag (p ∕ 2, \dots, p ∕ 2)$ , $G_{ξ} (ω_{ξ}^{0}) = n q_{2} ∕ 2$ , $G_{μ} (ω_{μ}^{0}) = p ∕ 2$ , $G_{Γ} (ω_{Γ}^{0}) = (s + t) ∕ 2$ , $G_{φ} (ω_{φ}^{0}) = (p + 1) ∕ 2$ , $G_{s} (ω_{s}^{0}) = diag [3 n E_{ϕ, ψ_{δ}} (ϕ_{11}^{2} ∕ ψ_{δ}), 3 n E_{ϕ, ψ_{δ}} (ϕ_{22}^{2} ∕ ψ_{δ}), n E_{ϕ, ψ_{δ}} {(ϕ_{11} ϕ_{22} + 2 ϕ_{12}^{2}) ∕ ψ_{δ}}]$ , and $G_{x} (ω_{x}^{0}) = E_{φ_{x 0}, r_{x}, x_{1}} {[\sum_{i = 1}^{n} {r_{x i 1} - \exp (φ_{x 0} + ω_{x}^{0} x_{i 1}) ∕ (1.0 + \exp (φ_{x 0} + ω_{x}^{0} x_{i 1}))}]}^{2}$ . The diagonal elements of the metric tensor G(ω⁰) reveal that ω_γ1, ω_γ2, ω_γ3, ω_ξ, and ω_x have larger effects compared to other perturbations (see Fig. 5(a)). Then, we chose a new perturbation scheme $\tilde{ω} = ω^{0} + G {(ω^{0})}^{1 ∕ 2} (ω - ω^{0})$ and calculated the associated local influence measures SI_{Φ_IR}[e_j] for the Kullback-Leibler divergence divergence. The local influence measures based on the ϕ-divergence are able to detect cases {14, 25, 28, 137, 175, 408, 985} as influential observations (see Fig. 5(b)), while ω_γ1 and ω_γ3 indicate that it may be important to include $ξ_{i 1}^{2}$ and ξ_i1ξ_i2 in the structural model (see Fig. 5(b)).

AIDS data analysis results: index plots of (a) metric tensor *g_jj*(ω⁰) and (b) local influence measures SI_{Φ_IR}[*e_j*] for simultaneous perturbation.

5. Discussion

We have developed a Bayesian sensitivity analysis methods for assessing various perturbations to statistical methods with missing data. We have developed a Bayesian perturbation manifold to characterize the intrinsic structure of the perturbation model and quantifying the degree of each perturbation in the perturbation model. We have developed global and local influence measures for selecting the most influential perturbation based on various objective functions and their statistical properties. Finally, we have also examined a number of examples to highlight the broad spectrum of applications of this method for Bayesian influence analysis in missing data problems.

Many issues merit further research. Our Bayesian sensitivity analysis method can be extended to more complex data structures (e.g., survival data) and other parametric and semiparametric models with nonparametric priors. In further research, we will generalize our methodology to the setting of estimating equations and empirical likelihood of generalized estimating equations for missing data problems. We will develop Bayesian sensitivity analysis methods to deal with the well-known masking and swamping effects in the diagnostic literature.

Contributor Information

HONGTU ZHU, Department of Biostatistics, University of North Carolina at Chapel Hill, 3109 McGavran-Greenberg Hall, Campus Box 7420, Chapel Hill, North Carolina 27516, U.S.A. hzhu@bios.unc.edu.

JOSEPH G. IBRAHIM, Department of Biostatistics, University of North Carolina at Chapel Hill, 3109 McGavran-Greenberg Hall, Campus Box 7420, Chapel Hill, North Carolina 27516, U.S.A. ibrahim@bios.unc.edu

NIANSHENG TANG, Department of Statistics, Yunnan University, Kunming 650091, P. R. China nstang@ynu.edu.cn.

References

Andridge RR, Little RJA. Proxy pattern-mixture analysis for survey nonresponse. Journal of Official Statistics. 2011;27:153–180. [Google Scholar]
Berger JO. An overview of robust bayesian analysis. Test. 1994;3:5–58. [Google Scholar]
Chen MH, Shao QM, Ibrahim JG. Monte Carlo Methods in Bayesian Computation. Springer-Verlag; New York: 2000. [Google Scholar]
Cook RD. Assessment of local influence (with Discussion). J. Roy. Statist. Soc. Ser. B. 1986;48:133–169. [Google Scholar]
Copas J, Eguchi S. Local model uncertainty and incomplete data bias (with discussion). J. Roy. Statist. Soc. Ser. B. 2005;67:459–512. [Google Scholar]
Copas JB, Li HG. Inference for non-random samples (with discussion). J. Roy. Statist. Soc. Ser. B. 1997;59:55–96. [Google Scholar]
Cox DR, Reid N. Parameter orthogonality and approximate conditional inference (with discussion). J. Roy. Statist. Soc. Ser. B. 1987;49:1–39. [Google Scholar]
Daniels MJ, Hogan JW. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Chapman and Hall; London: 2008. [Google Scholar]
Dey DK, Ghosh SK, Lou KR. Berger JO, Betró B, Moreno e., Pericchi l. R., ruggeri F, Salinetti G, Wasserman L, editors. On local sensitivity measures in bayesian (with discussion). Bayesian Robustness. 1996. pp. 21–39. IMS Lecture Notes-Monograph Series.
Geisser S. Predictive Inference: An Introduction. Chapman and Hall; London: 1993. [Google Scholar]
Gelfand AE, Dey DK, Chang H. Model determination using predictive distributions, with implementation via sampling-based methods (disc: P160-167). In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, editors. Bayesian Statistics. Vol. 4. Oxford University Press; Oxford: 1992. pp. 147–159. [Google Scholar]
Gustafson P. Local sensitivity of inferences to prior marginals. Journal of the American Statistical Association. 1996a;91:774–781. [Google Scholar]
Gustafson P. Local sensitivity of posterior expectations. Annals of Statistics. 1996b;24:174–195. [Google Scholar]
Gustafson P. On model expansion, model contraction, identifability, and prior information: two illustrative scenarios involving mismeasured variables (with discussion). Statistical Science. 2006;20:111–140. [Google Scholar]
Hens N, Aerts M, Molenberghs G, Thijs H, Verbeke G. Kernel weighted in influence measures. Computational Statistics and Data Analysis. 2006;48:467–487. [Google Scholar]
Ibrahim JG, Chen MH, Lipsitz SR, Herring A. Missing-data methods for generalized linear models: a comparative review. Journal of the American Statistical Association. 2005;100:332–346. [Google Scholar]
Ibrahim JG, Molenberghs G. Missing data methods in longitudinal studies: a review. Test. 2009;18:1–43. doi: 10.1007/s11749-009-0138-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jansen I, Hens N, Molenberghs G, Aerts M, Verbeke G, Kenward MG. The nature of sensitivity in monotone missing not at random models. Computational Statistics and Data Analysis. 2006;50:830–858. [Google Scholar]
Jansen I, Molenberghs G, Aerts M, Thijs H, Van Steen K. A local influence approach to binary data from a psychiatric study. Biometrics. 2003;59:410–419. doi: 10.1111/1541-0420.00048. [DOI] [PubMed] [Google Scholar]
Kass RE, Raftery AE. Bayes factors. Journal of the American Statistical Association. 1995;90:773–795. [Google Scholar]
Kass Robert E., Tierney Luke, Kadane Joseph B. Approximate methods for assessing influence and sensitivity in Bayesian analysis. Biometrika. 1989;76:663–674. [Google Scholar]
Lang S. Differential and Riemannian manifolds. 3ed. Springer-Verlag; New York: 1995. [Google Scholar]
Lavine M. Sensitivity in bayesian statistics: the prior and the likelihood. Journal of the American Statistical Association. 1991;86:396–399. [Google Scholar]
Lee SY, Tang NS. Analysis of nonlinear structural equation models with nonignorable missing covariates and ordered categorical data. Statistica Sinica. 2006;16:1117–1141. [Google Scholar]
Little RJA, Rubin DB. Statistical Analysis With Missing Data. Wiley; New York: 2002. [Google Scholar]
Little RJA. A class of patternmixture models for normal incomplete data. Biometrika. 1994;81:471–483. [Google Scholar]
Millar RB, Stewart WS. Assessment of locally influential observations in bayesian models. Bayesian Analysis. 2007;2:365–384. [Google Scholar]
Molenberghs G, Kenward G. Missing Data in Clinical Studies. Wiley; New York: 2007. [Google Scholar]
Morisky DE, Tiglao TV, Sneed CD, Tempongko SB, Baltazar JC, Detels R, Stein JA. The effects of establishment practices, knowledge and attitudes on condom use among filipina sex workers. AIDS Care. 1998;10:213–320. doi: 10.1080/09540129850124460. [DOI] [PubMed] [Google Scholar]
Shi XY, Zhu HT, Ibrahim JG. Local influence for generalized linear models with missing covariates. Biometrics. 2009;65:1164–1174. doi: 10.1111/j.1541-0420.2008.01179.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Troxel AB. A comparative analysis of quality of life data from a southwest oncology group randomized trial of advanced colorectal cancer. Statistics in Medicine. 1998;17:767–779. doi: 10.1002/(sici)1097-0258(19980315/15)17:5/7<767::aid-sim820>3.0.co;2-b. [DOI] [PubMed] [Google Scholar]
Troxel AB, Ma G, Heitjan DF. An index of local sensitivity to nonignorability. Statistica Sinica. 2004;14:1221–1237. [Google Scholar]
van der Lindem A. Local influence on posterior distributions under multiplicative modes of perturbation. Bayesian Analysis. 2007;2:319–332. [Google Scholar]
van Steen K, Molenberghs G, Thijs H. A local influence approach to sensitivity analysis of incomplete longitudinal ordinal data. Statistical Modelling: An International Journal. 2001;1:125–142. [Google Scholar]
Verbeke G, Molenberghs G, Thijs H, Lasaffre E, Kenward MG. Sensitivity analysis for non-random dropout: a local influence approach. Biometrics. 2001;57:43–50. doi: 10.1111/j.0006-341x.2001.00007.x. [DOI] [PubMed] [Google Scholar]
Zhu HT, Ibrahim JG, Lee SY, Zhang HP. Perturbation selection and influence measures in local influence analysis. Annals of Statistics. 2007;35:2565–2588. [Google Scholar]
Zhu HT, Ibrahim JG, Tang NS. Bayesian local influence analysis: a geometric approach. Biometrika. 2011;98:307–323. doi: 10.1093/biomet/asr009. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhu HT, Lee SY. Local influence for incomplete-data models. J. Roy. Statist. Soc. Ser. B. 2001;63(1):111–126. [Google Scholar]

[R1] Andridge RR, Little RJA. Proxy pattern-mixture analysis for survey nonresponse. Journal of Official Statistics. 2011;27:153–180. [Google Scholar]

[R2] Berger JO. An overview of robust bayesian analysis. Test. 1994;3:5–58. [Google Scholar]

[R3] Chen MH, Shao QM, Ibrahim JG. Monte Carlo Methods in Bayesian Computation. Springer-Verlag; New York: 2000. [Google Scholar]

[R4] Cook RD. Assessment of local influence (with Discussion). J. Roy. Statist. Soc. Ser. B. 1986;48:133–169. [Google Scholar]

[R5] Copas J, Eguchi S. Local model uncertainty and incomplete data bias (with discussion). J. Roy. Statist. Soc. Ser. B. 2005;67:459–512. [Google Scholar]

[R6] Copas JB, Li HG. Inference for non-random samples (with discussion). J. Roy. Statist. Soc. Ser. B. 1997;59:55–96. [Google Scholar]

[R7] Cox DR, Reid N. Parameter orthogonality and approximate conditional inference (with discussion). J. Roy. Statist. Soc. Ser. B. 1987;49:1–39. [Google Scholar]

[R8] Daniels MJ, Hogan JW. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Chapman and Hall; London: 2008. [Google Scholar]

[R9] Dey DK, Ghosh SK, Lou KR. Berger JO, Betró B, Moreno e., Pericchi l. R., ruggeri F, Salinetti G, Wasserman L, editors. On local sensitivity measures in bayesian (with discussion). Bayesian Robustness. 1996. pp. 21–39. IMS Lecture Notes-Monograph Series.

[R10] Geisser S. Predictive Inference: An Introduction. Chapman and Hall; London: 1993. [Google Scholar]

[R11] Gelfand AE, Dey DK, Chang H. Model determination using predictive distributions, with implementation via sampling-based methods (disc: P160-167). In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, editors. Bayesian Statistics. Vol. 4. Oxford University Press; Oxford: 1992. pp. 147–159. [Google Scholar]

[R12] Gustafson P. Local sensitivity of inferences to prior marginals. Journal of the American Statistical Association. 1996a;91:774–781. [Google Scholar]

[R13] Gustafson P. Local sensitivity of posterior expectations. Annals of Statistics. 1996b;24:174–195. [Google Scholar]

[R14] Gustafson P. On model expansion, model contraction, identifability, and prior information: two illustrative scenarios involving mismeasured variables (with discussion). Statistical Science. 2006;20:111–140. [Google Scholar]

[R15] Hens N, Aerts M, Molenberghs G, Thijs H, Verbeke G. Kernel weighted in influence measures. Computational Statistics and Data Analysis. 2006;48:467–487. [Google Scholar]

[R16] Ibrahim JG, Chen MH, Lipsitz SR, Herring A. Missing-data methods for generalized linear models: a comparative review. Journal of the American Statistical Association. 2005;100:332–346. [Google Scholar]

[R17] Ibrahim JG, Molenberghs G. Missing data methods in longitudinal studies: a review. Test. 2009;18:1–43. doi: 10.1007/s11749-009-0138-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Jansen I, Hens N, Molenberghs G, Aerts M, Verbeke G, Kenward MG. The nature of sensitivity in monotone missing not at random models. Computational Statistics and Data Analysis. 2006;50:830–858. [Google Scholar]

[R19] Jansen I, Molenberghs G, Aerts M, Thijs H, Van Steen K. A local influence approach to binary data from a psychiatric study. Biometrics. 2003;59:410–419. doi: 10.1111/1541-0420.00048. [DOI] [PubMed] [Google Scholar]

[R20] Kass RE, Raftery AE. Bayes factors. Journal of the American Statistical Association. 1995;90:773–795. [Google Scholar]

[R21] Kass Robert E., Tierney Luke, Kadane Joseph B. Approximate methods for assessing influence and sensitivity in Bayesian analysis. Biometrika. 1989;76:663–674. [Google Scholar]

[R22] Lang S. Differential and Riemannian manifolds. 3ed. Springer-Verlag; New York: 1995. [Google Scholar]

[R23] Lavine M. Sensitivity in bayesian statistics: the prior and the likelihood. Journal of the American Statistical Association. 1991;86:396–399. [Google Scholar]

[R24] Lee SY, Tang NS. Analysis of nonlinear structural equation models with nonignorable missing covariates and ordered categorical data. Statistica Sinica. 2006;16:1117–1141. [Google Scholar]

[R25] Little RJA, Rubin DB. Statistical Analysis With Missing Data. Wiley; New York: 2002. [Google Scholar]

[R26] Little RJA. A class of patternmixture models for normal incomplete data. Biometrika. 1994;81:471–483. [Google Scholar]

[R27] Millar RB, Stewart WS. Assessment of locally influential observations in bayesian models. Bayesian Analysis. 2007;2:365–384. [Google Scholar]

[R28] Molenberghs G, Kenward G. Missing Data in Clinical Studies. Wiley; New York: 2007. [Google Scholar]

[R29] Morisky DE, Tiglao TV, Sneed CD, Tempongko SB, Baltazar JC, Detels R, Stein JA. The effects of establishment practices, knowledge and attitudes on condom use among filipina sex workers. AIDS Care. 1998;10:213–320. doi: 10.1080/09540129850124460. [DOI] [PubMed] [Google Scholar]

[R30] Shi XY, Zhu HT, Ibrahim JG. Local influence for generalized linear models with missing covariates. Biometrics. 2009;65:1164–1174. doi: 10.1111/j.1541-0420.2008.01179.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Troxel AB. A comparative analysis of quality of life data from a southwest oncology group randomized trial of advanced colorectal cancer. Statistics in Medicine. 1998;17:767–779. doi: 10.1002/(sici)1097-0258(19980315/15)17:5/7<767::aid-sim820>3.0.co;2-b. [DOI] [PubMed] [Google Scholar]

[R32] Troxel AB, Ma G, Heitjan DF. An index of local sensitivity to nonignorability. Statistica Sinica. 2004;14:1221–1237. [Google Scholar]

[R33] van der Lindem A. Local influence on posterior distributions under multiplicative modes of perturbation. Bayesian Analysis. 2007;2:319–332. [Google Scholar]

[R34] van Steen K, Molenberghs G, Thijs H. A local influence approach to sensitivity analysis of incomplete longitudinal ordinal data. Statistical Modelling: An International Journal. 2001;1:125–142. [Google Scholar]

[R35] Verbeke G, Molenberghs G, Thijs H, Lasaffre E, Kenward MG. Sensitivity analysis for non-random dropout: a local influence approach. Biometrics. 2001;57:43–50. doi: 10.1111/j.0006-341x.2001.00007.x. [DOI] [PubMed] [Google Scholar]

[R36] Zhu HT, Ibrahim JG, Lee SY, Zhang HP. Perturbation selection and influence measures in local influence analysis. Annals of Statistics. 2007;35:2565–2588. [Google Scholar]

[R37] Zhu HT, Ibrahim JG, Tang NS. Bayesian local influence analysis: a geometric approach. Biometrika. 2011;98:307–323. doi: 10.1093/biomet/asr009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Zhu HT, Lee SY. Local influence for incomplete-data models. J. Roy. Statist. Soc. Ser. B. 2001;63(1):111–126. [Google Scholar]

PERMALINK

Bayesian Sensitivity Analysis of Statistical Models with Missing Data

HONGTU ZHU

JOSEPH G IBRAHIM

NIANSHENG TANG

Abstract

1. Introduction

2. Bayesian sensitivity analysis

2.1. Statistical models with missing data

2.2. Bayesian Perturbation Manifold

2.3 Intrinsic influence measures

2.4. Bayesian Sensitivity Analysis

2.5. Examples of Bayesian influence measures

2.6. A Simple Theoretical Example

3. Simulation Study

Figure 1.

Figure 2.

Table 1.

4. Real data example

Figure 3.

Figure 4.

Figure 5.

5. Discussion

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Bayesian Sensitivity Analysis of Statistical Models with Missing Data

HONGTU ZHU

JOSEPH G IBRAHIM

NIANSHENG TANG

Abstract

1. Introduction

2. Bayesian sensitivity analysis

2.1. Statistical models with missing data

2.2. Bayesian Perturbation Manifold

2.3 Intrinsic influence measures

2.4. Bayesian Sensitivity Analysis

2.5. Examples of Bayesian influence measures

2.6. A Simple Theoretical Example

3. Simulation Study

Figure 1.

Figure 2.

Table 1.

4. Real data example

Figure 3.

Figure 4.

Figure 5.

5. Discussion

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases