Bayesian influence analysis: a geometric approach

HONGTU ZHU; JOSEPH G IBRAHIM; NIANSHENG TANG

doi:10.1093/biomet/asr009

. 2011 Jun;98(2):307–323. doi: 10.1093/biomet/asr009

Bayesian influence analysis: a geometric approach

HONGTU ZHU ¹, JOSEPH G IBRAHIM ¹, NIANSHENG TANG ²

PMCID: PMC3897258 NIHMSID: NIHMS265394 PMID: 24453379

Summary

In this paper we develop a general framework of Bayesian influence analysis for assessing various perturbation schemes to the data, the prior and the sampling distribution for a class of statistical models. We introduce a perturbation model to characterize these various perturbation schemes. We develop a geometric framework, called the Bayesian perturbation manifold, and use its associated geometric quantities including the metric tensor and geodesic to characterize the intrinsic structure of the perturbation model. We develop intrinsic influence measures and local influence measures based on the Bayesian perturbation manifold to quantify the effect of various perturbations to statistical models. Theoretical and numerical examples are examined to highlight the broad spectrum of applications of this local influence method in a formal Bayesian analysis.

Keywords: Influence measure, Perturbation manifold, Perturbation model, Prior distribution

1. Introduction

A formal Bayesian analysis of data z = (z₁, . . . , z_n) involves the specification of a sampling distribution p(z | θ) and a prior distribution p(θ), where θ = (θ₁, . . . ,θ_k)^T represents the parameters of inferential interest and varies in an open set Θ of R^k. To carry out Bayesian inference, we usually use Markov chain Monte Carlo methods to simulate samples from the posterior distribution p(θ | z), which is proportional to p(z | θ) p(θ). Subsequently, we can calculate posterior quantities of θ in R^k, such as the posterior mean M(h) = ∫ h(θ) p(θ | z)dθ of a function h(θ). For notational simplicity, we do not emphasize the dominating measure explicitly throughout the paper. There is a great deal of interest in the degree to which posterior inferences are sensitive to p(θ), p(z | θ) and (z₁, . . . , z_n) (Kass et al., 1989; McCulloch, 1989; Berger, 1990, 1994; Dey et al., 1996; Gustafson, 2000; Sivaganesan, 2000; Oakley & O’Hagan, 2004).

There are three major formal influence techniques, including case influence measures and global and local robustness approaches, for quantifying the degree of dependence of the posterior distribution on these three key elements of Bayesian analysis including the prior, the sampling distribution and the data (Berger, 1990, 1994). In Bayesian analysis, case influence measures primarily calculate the influence of a set of observations in order to identify outliers and influential observations. Most case influence measures are based on the posterior and/or predictive distribution through either case deletion or perturbation (Guttman & Peña, 1993; Peña & Guttman, 1993; Carlin & Polson, 1991; Peng & Dey, 1995). For instance, several case influence diagnostics have been developed to quantify the possible outlyingness of a set of observations based on mean-shift or variance-shift models (Guttman & Peña, 1993; Peña & Guttman, 1993).

The key idea of the global robustness approach is to compute a range of posterior quantities as the perturbation to each of the three key elements changes in a certain set of distributions, and then determine the extremal ones. There are drawbacks with this approach, including the scale chosen for the posterior quantities, the size of the perturbation and its limitation to linear functionals as well as simplicity of models. To address the scale issue, several scaled versions of the range have been proposed for the prior perturbation class (Ruggeri & Sivaganesan, 2000).

The local robustness approach primarily computes the derivatives of posterior quantities with respect to a minor perturbation to p(θ) and p(z | θ). In the frequentist literature, Cook’s (1986) influence approach is particularly useful for perturbing p(z | θ) in order to detect influential observations and assess model misspecification in parametric and semiparametric models (Zhu & Lee, 2001; Zhu et al., 2007). McCulloch (1989) further extends the local influence approach of Cook (1986) to assess the effects of perturbing the prior in a Bayesian analysis. In the Bayesian literature, several analogues of local influence have been developed using either the curvature of influence measures (Lavine, 1992; Dey & Birmiwal, 1994; Millar & Stewart, 2007; Van der Linde, 2007) or the Fréchet derivative of the posterior with respect to the prior (Berger, 1994; Gustafson & Wasserman, 1995; Dey et al., 1996; Gustafson, 1996; Berger et al., 2000). Very little has been done on developing general Bayesian influence analysis methods for simultaneously perturbing z, p(θ) and p(z | θ), assessing their effects and examining their applications in statistical models (Berger et al., 2000). To our knowledge, Clarke & Gustafson (1998) is one of the few papers on simultaneously perturbing {z, p(θ), p(z | θ)} in the context of independent and identically distributed data.

A key motivation for the proposed methodology is to unify influence concepts for many complex Bayesian models, for which very few or no methods exist, so that the effects of different perturbations can be identified. These models include many Bayesian parametric and semiparametric models, perhaps with missing data; see the Supplementary Material. Our development includes formal assessment of outliers and influential points as well as sensitivity analyses regarding the three major components of the Bayesian model: the prior, sampling distribution, and the data. For instance, sensitivity to the data can be evaluated by perturbing all the data points by random noise, redoing the analysis, and getting a spectrum of different inferences defined by noise (Wang et al., 2009; Clarke, 2010).

2. The Bayesian perturbation model and manifold

2.1. The Bayesian perturbation model

We develop a Bayesian model to characterize various perturbation schemes to z, p(z | θ) and p(θ). We introduce perturbations into the model p(z, θ) = p(z | θ) p(θ) through a vector ω = ω(z, θ), which varies in a set Ω. That is, ω is a mapping from the product space of the sample space 𝒵 and the parameter space Θ to Ω. Generally, ω includes many perturbation schemes including the additive ∊-contamination class to the prior as detailed below. Moreover, ω must be chosen carefully so that the perturbation is meaningful and sensible.

Let p(z, θ | ω) be the probability density of (z, θ) for the perturbed model. We assume that the probability measures of p(z, θ | ω) for all ω ∈ Ω have a common dominating measure and that there is an ω⁰ ∈ Ω such that p(z, θ | ω⁰) = p(z, θ) for all (z, θ). We refer to p(z, θ | ω⁰) = p(z, θ) as the baseline joint distribution, where ω⁰ can be regarded as the central point of Ω representing no perturbation. We define the Bayesian perturbation model ℳ as a family of probability densities p(z, θ | ω) as ω varies in Ω. The Bayesian perturbation model includes individual perturbation schemes to z, p(θ) and p(z | θ), and their combinations. We focus on each individual scheme as follows.

Example 1. The Bayesian perturbation model for the prior includes many existing schemes, such as the additive ∊-contamination class and the linear and nonlinear perturbation classes. For instance, the additive ∊-contamination scheme is given by p(θ | ω) = p(θ) + λ{g(θ) − p(θ)}, where λ ∈ [0, 1] and g(θ) belongs to a class of contaminating distributions, denoted by 𝒢 (Berger, 1994; Dey & Birmiwal, 1994). In this case, Ω = {ω = λ{g(θ) − p(θ)} : (λ, g(·)) ∈ [0, 1] × 𝒢} and ω (z, θ ) are independent of the data. Thus, ω⁰ = 0 and p(z, θ | ω) = p(z | θ) p(θ | ω).

Example 2. The Bayesian perturbation model for the data includes many perturbation schemes to individual data observations of z (Cook, 1986; Guttman & Peña, 1993; Peña & Guttman, 1993; Zhu et al., 2007). The perturbation scheme to data points is proposed for identifying outliers and influential observations. As an illustration, we consider the standard linear regression model $y_{i} = x_{i}^{T} β + ∊_{i}$ , where x_i is a p × 1 covariate vector, β is a p × 1 vector of regression coefficients and the ∊_i are independently and identically distributed N(0, σ²) random variables. Let c_l be an l × 1 vector with all elements equal to c for a fixed scalar c and an integer l, written as 1_n, 1_p and 0_m. A perturbation scheme to perturb the covariate x_i is given by x_i(ω_i) = x_i + ω_i1_p. In this case, $z_{i} = {(y_{i}, x_{i}^{T})}^{T}$ , θ = (β^T, σ²)^T, ω = (ω₁, …, ω_n)^T, ω⁰ = 0_n and Ω is a subset of Rⁿ. An alternative perturbation scheme to the linear regression model is the well-known mean shift model (Guttman & Peña, 1993; Peña & Guttman, 1993). It is assumed that $y_{i} = x_{i}^{T} β + ω_{i} + ∊_{i}$ for i in a set of k distinct integers chosen from the set {1, . . . , n}, denoted by I = {i₁, . . . , i_k}, and $y_{i} = x_{i}^{T} β + ∊_{i}$ for all other i s. In this case, the perturbation scheme is ω = (ω_i₁, . . . , ω_{i_k})^T and ω⁰ = 0_k. Another important scheme is a geometric mixture model for case deletion or case weights (Millar & Stewart, 2007; Van der Linde, 2007). Specifically, let q(z_i) be an arbitrary density of z_i independent of θ, then the geometric mixture model for perturbing the ith observation is given by p(z | θ, ω) = {Π_{j ≠ i} p(z_j | θ)} p(z_i | θ)^λ q(z_i)¹⁻^λ/{∫ p(z_i | θ)^λ q(z_i)¹⁻^λ dz_i}, where ω = λ varies in [0, 1] and p(z_i | θ) is the density of z_i under the linear model assumption. In this case, ω⁰ = 1 represents no perturbation. When λ = 0, p(z_i | θ) disappears in p(z | θ, 0), which is equivalent to deleting z_i.

Example 3. The Bayesian perturbation model for the sampling distribution includes many perturbation schemes to p(z | θ) such as the additive ∊-contamination class. We may also consider a class of perturbed sampling distributions p(z | θ, ω) defined by

p (z | θ, ω) = p (z | θ) exp {\sum_{j = 1}^{m} ω_{j} u_{j} (z; θ) - 0.5 \sum_{j = 1}^{m} ω_{j}^{2} u_{j} {(z; θ)}^{2} - C (θ, ω)},

(1)

where C(θ, ω) is the normalizing constant, ω = (ω₁, . . . , ω_m)^T is an m × 1 vector and u_j (z; θ) is a fixed scalar function having zero mean under p(z | θ). In this case, ω⁰ = 0_m represents no perturbation. The number m in the perturbation (1) can either be as small as 1 or can increase with n (Copas & Eguchi, 2005; Zhu et al., 2007).

2.2. The Bayesian perturbation manifold

We develop a new geometric framework, called a Bayesian perturbation manifold, to measure each perturbation ω in the Bayesian perturbation model. Based on this manifold, we are able to measure the amount of perturbation, the extent to which each component of a perturbation model contributes to p(z, θ) and the degree of orthogonality for the components of the perturbation model. Such a quantification is useful for rigorously assessing the relative influence of each component in the Bayesian analysis, and can reveal any discrepancies among the data, the prior or the sampling model.

For an infinite dimensional set Ω, we assume throughout the paper that ℳ forms a Riemannian Hilbert manifold (Friedrich, 1991; Lang, 1995) under some regularity conditions. For a given p(z, θ | ω) ∈ ℳ, we consider a smooth curve C(t) = p{z, θ | ω(t)} through the space of perturbation models ℳ with open interval domains containing 0 and p{z, θ | ω (0)} = p(z, θ | ω). Note that ω may be different from ω⁰. We require C(t) to be smooth enough such that ℓ̇{z, θ | ω (t)} = d log p{z, θ | ω(t)}/dt, called the tangent or derivative vector, exists with ∫ ℓ̇{z, θ | ω (t)}² p{z, θ | ω(t)}dzdθ < ∞ for all t in the open interval domain. Since p{z, θ | ω (t)} is the joint density of (z, θ) given ω(t), that is ∫ p{z, θ | ω(t)}dzdθ = 1, the tangent space of ℳ at ω, denoted by T_ωℳ, is formed by the tangent vectors ℓ̇{z, θ | ω(0)} for all possible smooth curves C(t) such that ∫ ℓ̇{z, θ | ω(0)} p{z, θ | ω(0)}dzdθ = 0. We can introduce the inner product of any two tangent vectors ν₁(ω) and ν₂(ω) in T_ωℳ as

< v_{1}, v_{2} > (ω) = \int {v_{1} (ω) v_{2} (ω)} p (z, θ | ω) d z d θ .

(2)

When ω varies in a Euclidean space and is independent of z and θ, the inner product < ν₁, ν₂ > (ω) in (2) is closely associated with the Fisher information. See Example 6 for details. Thus, the squared length ||ν(ω)||² of a tangent vector ν(ω) ∈ T_ωℳ is < ν, ν > (ω) = ∫ ν(ω)² p(z, θ | ω)dzdθ. The length of the curve C(t) from t₁ to t₂ is

S_{C} {ω (t_{1}), ω (t_{2})} = \int_{t_{1}}^{t_{2}} {[< \dot{ℓ} {z, θ | ω (t)}, \dot{ℓ} {z, θ | ω (t)} > {ω (t)}]}^{1 / 2} d t .

(3)

Next, we need to introduce the concept of a geodesic, which is a direct extension of the straight line in Euclidean space, on ℳ. Consider a real function f (ω) defined on ℳ and a smooth curve p{z, θ | ω(t)} in ℳ with p{z, θ | ω(0)} = p(z, θ | ω) and ℓ̇{z, θ | ω(0)} = ν(ω). We define df[ν](ω) = lim_t→0 t⁻¹(f[p{z, θ | ω(t)}] − f[p{z, θ | ω(0)}]) as the directional derivative of f at the perturbation distribution p(z, θ | ω) in the direction of ν(ω) ∈ T_ωℳ. We consider two smooth vector fields u(ω) and v(ω), which are not only the tangent vectors in T_ωℳ, but also smooth functions of ω in Ω. We define the directional derivative of a vector field u(ω) in the direction of v(ω), called the connection, which is given by du[v](ω) = lim_t→0 t⁻¹[u{ω(t)} − u{ω(0)}]. Intuitively, if ω varies in a Euclidean space, then du[v](ω) is closely associated with the second derivative of ℓ(z, θ | ω) with respect to ω. We consider the Levi–Civita connection, which has several nice geometric properties (Amari, 1990; Lang, 1995) and is given by

\nabla_{v} u (ω) = d u [v] (ω) - 0.5 {u (ω) v (ω) p (z, θ | ω) - \int u (ω) v (ω) p (z, θ | ω) d z d θ} .

A geodesic with respect to the Levi–Civita connection on ℳ is a smooth curve γ(t) = p{z, θ | ω(t)} on ℳ with open interval domain (a, b) and ℓ̇{z, θ | ω(t)} = v{ω(t)} such that the Levi–Civita connection ▿_vv{ω(t)} = 0. Intuitively speaking, as one moves tangent vectors of a geodesic along the same geodesic, one can keep them pointing in the same direction. Moreover, geodesics can be interpreted as the shortest local path between points on ℳ. For a fixed perturbation distribution p(z, θ | ω) and a given direction of v(ω) ∈ T_ωℳ, there is a unique geodesic γ(t) = p{z, θ | ω(t)} with open interval domains covering 0 such that γ(0) = p(z, θ | ω) and γ̇(0) = v(ω). Finally, based on these geometric quantities of ℳ, we introduce the definition of a Bayesian perturbation manifold.

Definition 1. A Bayesian perturbation manifold (ℳ, < u, v >, ▿_v u) is the manifold ℳ with an inner product < u, v > and the Levi–Civita connection ▿_v u.

When Ω is an open set of R^m, under some regularity conditions, the Bayesian perturbation manifold is an m-dimensional manifold (Amari, 1990, p. 16; Kass & Vos, 1997; Zhu et al., 2007). Now, we examine some examples of Bayesian perturbation manifolds based on several perturbations to the data, the prior, and the sampling distribution.

Example 1, continued. We consider the Bayesian perturbation model for the ∊-contamination class to the prior given by ℳ= {{(1 − λ) p(θ) + λg(θ)} p(z | θ) : λ ∈ [0, 1], g(·) ∈ 𝒢}. In this case, ω(t) = t{g(θ) − p(θ)} for a given g(·) ∈ 𝒢, and therefore we consider the smooth curve C_g(t) = p{z, θ | ω(t)} = [p(θ) + t{g(θ) − p(θ)}] p(z | θ). It can be shown that v_g{ω(t)} = ℓ̇{z, θ | ω(t)} = {g(θ) − p(θ)}/[p(θ) + t{g(θ) − p(θ)}]. For any two densities g₁(·) and g₂(·) in 𝒢, we can calculate the tangent vectors v_{g_i} {ω(0)} = {g_i(θ) − p(θ)}{p(θ)}⁻¹ for i = 1, 2 and their inner product as

< v_{g_{1}}, v_{g_{2}} > (ω^{0}) = \int [g_{1} (θ) {p (θ)}^{- 1} - 1] [g_{2} (θ) {p (θ)}^{- 1} - 1] p (θ) d θ,

which is also independent of p(z | θ). In particular, < v_g, v_g > (ω⁰) = ∫{g(θ)/p(θ) − 1}² p(θ)dθ reduces to the L² norm considered in Gustafson (1996).

We further consider a Bayesian perturbation model for the sole perturbation scheme to hyper-parameters of the prior given by ℳ= {p(z, θ | ω) = p(θ | ω) p(z | θ) : ω = (ω₁, . . . ,ω_m)^T}, in which ω is independent of both z and θ. Let ω(t) = (ω₁, . . . , ω_j−1, ω_j + t, ω_j₊₁, . . . , ω_m)^T, ℓ(θ | ω) = log p(θ | ω) and ω_k(t) be the kth component of ω(t). Since ℓ(z, θ | ω) = log p(θ | ω) + log p(z | θ), we have

\dot{ℓ} {z, θ | ω (0)} = d ℓ {z, θ | ω (t)} / d t |_{t = 0} = \sum_{k = 1}^{m} [{\dot{ω}}_{k} (t) \partial_{ω_{k}} ℓ {θ | ω (t)}] |_{t = 0} = \partial_{ω_{j}} ℓ (θ | ω),

where ω̇_k(t) = dω_k(t)/dt and ∂_{ω_j} = ∂ / ∂ω_j. Therefore, T_ωℳ is spanned by the m functions ∂_{ω_j}ℓ(θ | ω) pointwise in ω. Since ∫ p(z | θ)dz = 1, the inner product between ∂_{ω_j}ℓ(θ | ω) and ∂_{ω_k}ℓ(θ | ω), denoted by G_jk(ω), is given by

\begin{array}{l} G_{j k} (ω) & = \int \partial_{ω_{j}} ℓ (θ | ω) \partial_{ω_{k}} ℓ (θ | ω) p (θ | ω) p (z | θ) d z d θ \\ = \int \partial_{ω_{j}} ℓ (θ | ω) \partial_{ω_{k}} ℓ (θ | ω) p (θ | ω) d θ, \end{array}

(4)

which is independent of p(z | θ).

Furthermore, suppose that p(θ) = p(θ₁) p(θ₂ | θ_[1]) . . . p(θ_m | θ_[m−1]) has a hierarchical structure, where θ_[j] = (θ₁, . . . ,θ_j₎ and p(θ_j | θ_[_j−1]) denote the density of the conditional distribution of θ_j given θ_[j−1]. Then, we perturb each level of p(θ) such that p(θ | ω) = p(θ₁ | ω₁) p(θ₂ | θ_[1], ω₂) . . . p(θ_m | θ_[m−1], ω_m), ∫ p(θ₁ | ω₁)dθ₁ = 1 and ∫ p(θ_j | θ_[j−1], ω_j)dθ_j = 1 for j = 2, . . . , m. In this case, T_ωℳ is spanned by the m functions ∂_ω₁ log p(θ₁ | ω₁) and ∂_{ω_j} log p(θ_j | θ_[j−1],ω_j) for j = 2, . . . , m. Moreover, G_jk(ω) = 0 for all j ≠ k. For instance, it can be shown that G₁₂(ω) = ∫ ∂_ω₁ log p(θ₁ | ω₁) ∂_ω₂ log p(θ₂ | θ_[1],ω₂) p(θ | ω)dθ = ∂_ω₁ ∂_ω₂ ∫ p(θ₁ | ω₁) p(θ₂ | θ₁, ω₂)dθ₂dθ₁ = ∂_ω₁ ∂_ω₂1 = 0. Thus, different components of ω are orthogonal to each other (Zhu et al., 2007). Furthermore, it follows from (4) that G₁₁(ω) = ∫ {∂_ω₁ log p(θ₁ | ω₁)}² p(θ₁ | ω)dθ₁ and G_jj(ω) = ∫ {∂_{ω_j} log p(θ_j | θ_[j−1], ω_j)}² p(θ_j | θ_[j−1] ,ω)dθ_j for j ⩾ 2.

Combining the above results, we are led to the following proposition, whose proof can be found in the Supplementary Material.

Proposition 1. Consider any Bayesian perturbation model to the prior given by ℳ= {p(θ | ω) p(z | θ) : ω ∈ Ω}. If ω is independent of z, then the metric tensor of its Bayesian perturbation manifold ℳ is independent of the specification of the sampling distribution p(z | θ).

Proposition 1 has important implications. The independence property ensures that existing results on local robustness to the prior can be considered as a special case of the new method developed here (McCulloch, 1989; Gustafson, 1996).

Example 4. Consider a Bayesian perturbation model given by

\begin{array}{l} 𝒨 = {p (z, θ | ω) = p (θ | ω_{p}) p (z | θ, ω_{s}) : ω = {(ω_{p}^{T}, ω_{s}^{T})}^{T}, \int p (θ | ω_{p}) d θ \\ = \int p (z; θ, ω_{s}) d z = 1}, \end{array}

in which ω_p = (ω₁, . . . ,ω_m)^T and ω_s = (ω_m₊₁, . . . ,ω_m₊_n)^T are assumed to be independent of both z and θ. We consider ω(t) = (ω₁, . . . , ω_j₋₁, ω_j + t, ω_j₊₁, . . . ,ω_m₊_n₎^T with ω(0) = ω for each j ∈ {1, . . . , m + n}. Thus, ω̇_k(0) = dω_k(0)/dt = 1 for k = j and 0 otherwise. Letting ℓ(θ | ω_p) = log p(θ | ω_p) and ℓ(z | θ, ω_s) = log p(z | θ, ω_s), we have

\dot{ℓ} {z, θ | ω (0)} = \sum_{k = 1}^{m + n} {\dot{ω}}_{k} (0) \partial_{ω_{k}} log p (z, θ | ω) = \partial_{ω_{j}} ℓ (θ | ω_{p}) + \partial_{ω_{j}} ℓ (z | θ, ω_{s}) .

(5)

Since ω_s and ω_p have no components in common, T_ωℳ is spanned by m + n functions including ∂_{ω_j}ℓ(θ | ω_p) for j = 1, . . . , m and ∂_{ω_j}ℓ(z | θ, ω_s) for j = m + 1, . . . , m + n. Note that ∫ ∂_{ω_k}ℓ(θ | ω_p) ∂_{ω_j}ℓ(z | θ, ω_s) p(z, θ | ω)dzdθ = ∫ ∂_{ω_k}p(θ | ω_p) ∂_{ω_j}p(z | θ, ω_s) dzdθ = ∂_{ω_k}1∂_{ω_j}1= 0 holds for any j, k. Therefore, it follows from (5) that the inner product of ∂_{ω_j}ℓ(z, θ | ω) and ∂_{ω_k}ℓ(z, θ | ω), denoted by G_jk(ω), is

\int \partial_{ω_{j}} ℓ (θ | ω_{p}) \partial_{ω_{k}} ℓ (θ | ω_{p}) p (z, θ | ω} d z d θ + \int \partial_{ω_{j}} ℓ (z | θ, ω_{s}) \partial_{ω_{k}} ℓ (z | θ, ω_{s}) p (z, θ | ω} d z d θ .

(6)

Moreover, the first term of (6) can be simplified to ∫ ∂_{ω_j}ℓ(θ | ω_p) ∂_{ω_k}ℓ(θ | ω_p)p(θ | ω_p)dθ since ∫ p(z | θ, ω_s)dz = 1. For j = 1, . . . , m and k = m + 1, . . . , m + n, it follows from (6) that <∂_{ω_j}ℓ(z, θ | ω) ∂_{ω_k}ℓ(z, θ | ω) >= 0 since ∂_{ω_k}ℓ(θ | ω_p) = 0 and ∂_{ω_j}ℓ(z | θ, ω_s) = 0. Thus, ω_s and ω_p are orthogonal to each other with respect to < ∂_{ω_j}ℓ(z, θ | ω), ∂_{ω_k}ℓ(z, θ | ω) >.

Combining the above results, we obtain the following proposition.

Proposition 2. Consider ℳ= {p(z, θ | ω) = p(θ | ω_p)p(z | θ, ω_s): $ω = {(ω_{p}^{T}, ω_{s}^{T})}^{T}$ }. Assume that ω_p is independent of z and ∫ p(θ | ω_p)dθ = ∫ p(z | θ, ω_s)dz = 1. Consider two smooth curves p{z, θ | ω₍_k₎(t)} with ω₍_k₎(t) = {ω₍_k_),_p(t), ω₍_k_),_s (t)}^T such that ω₍₁₎ (0) = ω₍₂₎ (0) = ω and ω_(1),_p(t) and ω_(2),_s(t) are independent of t. For any two tangent vectors v_k(ω) = ℓ̇{z, θ | ω₍_k₎(0)} ∈ T_ωℳ for k = 1, 2, we have < v₁, v₂ > (ω) = 0.

Proposition 2 has important implications. For simultaneous perturbations to the prior and the sampling distribution, it ensures that ω_p and ω_s are geometrically orthogonal to each other. Thus, we can separate out the influence of the prior from that of the data and the sampling distribution.

Finally, we consider a simultaneous perturbation model, denoted by p(z, θ | ω_p, ω_d, ω_s), in which ω_p, ω_d and ω_s represent individual perturbations to the prior, the data and the sampling distribution, respectively. In addition to Propositions 1 and 2, we can obtain the following theorem.

Theorem 1. Let ℳ= {p(z, θ | ω) = p(θ | ω_p) p(z | θ, ω_d, ω_s) : ω = (ω_p, ω_d, ω_s)} with ∫ p(θ | ω_p)dθ = ∫ p(z | θ, ω_d, ω_s)dz = 1 and that ω_p is independent of z. Consider two smooth curves p{z, θ | ω₍_k₎(t)} with ω₍_k₎(t) = {ω₍_k_),_p(t), ω₍_k_),_d (t),ω₍_k_),_s (t)}^T passing through ω₍₁₎(0) = ω₍₂₎(0) = ω and having two tangent vectors v_k(ω) = ℓ̇{z, θ | ω₍_k₎(0)} ∈ T_ωℳ, k = 1, 2. Then:

if ω_(1),_p(t) and {ω_(2),_d(t), ω_(2),_s(t)} are independent of t, then < v₁, v₂ > (ω) = 0;
if {ω_(1),_p(t), ω_(1),_d(t)} and {ω_(2),_p(t), ω_(2),_s(t)} are independent of t and p(z | θ, ω_d, ω_s) = p₁(z | θ, ω_d) p₂(z | θ, ω_s) for any (ω_d, ω_s), then < v₁, v₂ > (ω) = 0.

For simultaneous perturbations to the prior, the data, and the sampling distribution, Theorem 1 (i) ensures that ω_p and (ω_d, ω_s) are geometrically orthogonal to each other. If p(z | θ, ω_d, ω_s) = p₁(z | θ, ω_d) p₂(z | θ, ω_s), then ω_p, ω_d, and ω_s are geometrically orthogonal to each other.

3. Influence measures and their properties

3.1. Intrinsic influence measures

We consider some objective functions, such as the ϕ-divergence function, the posterior mean, and the Bayes factor, and develop associated intrinsic influence measures for quantifying the effects of perturbing the three key elements of a Bayesian analysis. An objective function of interest for sensitivity analysis is often chosen to be a functional of the perturbed posterior distribution of θ given z, given by p(θ | z, ω) = p(z, θ | ω)/ ∫ p(z, θ | ω)dθ and p(θ | z, ω⁰), which is the unperturbed posterior distribution of θ given z. Such an objective function, denoted by f (ω, ω⁰) = f {p(θ | z, ω), p(θ | z, ω⁰)}, can be also regarded as a mapping from ℳ × ℳ to R. Throughout the paper, we assume that f (ω, ω⁰) is a smooth function of ω and is a path-independent function of p(θ | z, ω) and p(θ | z, ω⁰) such that f (ω, ω) = 0 for any ω ∈ Ω. For instance, f (ω, ω⁰) can be set as the total variation distance of p(θ | z, ω⁰) and p(θ | z, ω) (Dey et al., 1996). Most standard influence measures such as the range (Berger, 1990, 1994) can be regarded as special cases of f (ω, ω⁰).

A large value of these influence measures can be caused by both the perturbation ω to the baseline distribution regardless of the observed data and the discrepancies between the observed data and the fitted model p(z, θ ). Since the purpose of any influence analysis is to detect the discrepancies between the observed data and p(z, θ), we suggest rescaling f (ω, ω⁰) by using the shortest distance between p(z, θ | ω) and p(z, θ | ω⁰). We explicitly quantify the distance between p(z, θ | ω) and p(z, θ | ω⁰) by using their minimal geodesic distance, denoted by d(ω, ω⁰). If ℳ is a complete and finite-dimensional Riemannian manifold, then the Hopf–Rinow theorem states that any two points on ℳ can be joined by a minimal geodesic (Ekeland, 1978). Furthermore, if ℳ is a complete infinite-dimensional Riemannian manifold, any two points on ℳ can be joined by a path which is almost a minimal geodesic (Ekeland, 1978). We introduce an intrinsic influence measure for comparing ω and ω⁰ ∈ Ω as follows. Geometrically, an intrinsic measure is invariant to certain reparameterizations.

Definition 2. The intrinsic influence measure for comparing p(θ | z, ω) to p(θ | z, ω⁰) is defined as IGI_f (ω, ω⁰) = f(ω, ω⁰)²/d(ω, ω⁰)².

The proposed IGI_f (ω, ω⁰) can be interpreted as the ratio of the change of the objective function relative to the minimal distance between p(z, θ | ω) and p(z, θ | ω⁰) on ℳ. Since f(ω, ω⁰) is path-independent and d(ω, ω⁰) is invariant to smooth reparametrization of ω, IGI_f (ω, ω⁰) is also invariant. Moreover, we suggest identifying the most influential ω in Ω, denoted by ω̂_I, which maximizes IGI_f (ω, ω⁰) for all ω ∈ Ω.

Example 5. We consider the logarithm BF(ω, ω⁰) = log ∫ p(z, θ | ω)dθ − log ∫ p(z, θ | ω⁰)dθ of the Bayes factor for comparing p(z | θ, ω) and p(z | θ, ω⁰), which can be regarded as a statistic for testing hypotheses of ω against ω⁰ (Kass & Raftery, 1995). Under mild conditions, BF(ω, ω⁰) is a smooth mapping from ℳ to R. We can set f (ω, ω⁰) = BF(ω, ω⁰) and obtain the intrinsic influence measure

{IGI}_{B F} (ω, ω^{0}) = \frac{BF {(ω, ω^{0})}^{2}}{d {(ω, ω^{0})}^{2}} .

3.2. First-order local influence measures

We consider the local behaviour of f {ω(t), ω⁰} as t approaches 0 along all possible smooth curves p{z, θ | ω(t)} passing through ω⁰, that is ω(0) = ω⁰. Since f {ω(t), ω⁰} is a function from R to R, it follows by Taylor’s series expansion that f {ω(t), ω⁰} = f {ω(0), ω⁰} + ḟ{ω(0)}t + 0.5 f̈{ω(0)}t² + o(t²), where ḟ{ω(0)} and f̈{ω(0)} denote the first- and second order derivatives of f{ω(t), ω⁰} with respect to t evaluated at t = 0. We need to distinguish between ḟ{ω(0)} ≠ 0 for some smooth curves ω(t) and ḟ{ω(0)} = 0 for all smooth curves ω(t). We first consider the case ḟ{ω(0)} ≠ 0 for some smooth curves ω(t). Let ℓ̇{z, θ | ω(0)} = v ∈ T_ω₍₀₎ℳ. Then, ḟ{ω(0)} = df [v]{ω(0)} is the directional derivative of f in the direction of v ∈ T_ω₍₀₎ℳ (Lang, 1995). We are led to the following definition.

Definition 3. The first-order local influence measure is defined as FI_f[v]{ω(0)} = lim_t→0 IGI_f {ω(0), ω(t)} = [df [v]{ω(0)}]²/[< v, v > {ω(0)}].

To carry out a sensitivity analysis, we use the tangent vector v_F_,max in T_ω₍₀₎ ℳ, which maximizes FI_f [v]{ω(0)} and is invariant to reparameterization of ω(t). We now have the following result.

Theorem 2. The quantity FI_f [v]{ω(0)} is invariant to smooth reparameterization of ω(t).

In addition to the invariance property in Theorem 2, FI_f [v]{ω(0)} is a direct generalization of the first-order measure for a finite-dimensional perturbation manifold (Zhu et al., 2007; Wu & Luo, 1993).

Example 5 (continued). We set f {ω(t), ω⁰} = BF{ω(t), ω⁰}. Since d[BF{ω(t), ω⁰}]/dt = ∫ ℓ̇{z, θ | ω(0)}[p{z, θ | ω(0)}/ ∫ p{z, θ | ω(0)}dθ]dθ = ∫ ℓ̇{z, θ | ω(0)} p{θ | z, ω(0)}dθ, we have

{FI}_{f} [v] {ω (0)} = \frac{{[\int \dot{ℓ} {z, θ | ω (0)} p {θ | z, ω (0)} d θ]}^{2}}{\int \dot{ℓ} {z, θ | ω (0)}^{2} p {z, θ | ω (0)} d z d θ} .

It is relatively easy to compute FI_f [v]{ω(0)} for a specific perturbation. For instance, for the contamination to the prior given by p{θ | ω(t)} = p(θ) + t{g(θ) − p(θ)}, it can be shown that

{FI}_{f} [v] {ω (0)} = \frac{{(\int [g (θ) {p (θ)}^{- 1} - 1] p {θ | z, ω (0)} d θ)}^{2}}{\int {[g (θ) {p (θ)}^{- 1} - 1]}^{2} p (θ) d θ} = \frac{{[p_{g} (z) {p (z)}^{- 1} - 1]}^{2}}{\int {[g (θ) {p (θ)}^{- 1} - 1]}^{2} p (θ) d θ},

where p(z) = ∫ p(z | θ) p(θ)dθ and p_g(z) = ∫ g(θ) p{z | θ, ω(0)}dθ. Since the ratio of p_g(z) to p(z) is the Bayes factor in favour of g(θ) versus p(θ), FI_f[v]{ω(0)} is the square of the normalized Bayes factor of g(θ) versus p(θ).

Example 6. Consider the Bayesian perturbation manifold ℳ = {p(z, θ | ω) : ω ∈ Ω ⊂ R^m} and p{z, θ | ω(t)} as a smooth curve on ℳ, in which ω is not a function of z and θ, such as the perturbation scheme in the mean-shift model, and ω(t) = {ω₁(t), . . . , ω_m(t)}^T is a smooth vector of t. Let v_h = (v_h_,1, . . . , v_h,m)^T = dω(0)/dt. By using the chain rule, we have

\begin{array}{l} v {ω (0)} = d ℓ {z, θ | ω (t)} / d t |_{t = 0} = \sum_{k = 1}^{m} {\dot{ω}}_{k} (t) \partial_{ω_{k}} ℓ {z, θ | ω (0)} = \sum_{k = 1}^{m} v_{h, k} \partial_{ω_{k}} ℓ {z, θ | ω (0)}, \\ d f [v] {ω (0)} = d f {ω (t), ω^{0}} / d t |_{t = 0} = \sum_{k = 1}^{m} v_{h, k} \partial_{ω_{k}} f {ω (0)} = v_{h}^{T} \partial_{ω} f {ω (0)}, \\ < v, v > {ω (0)} = \sum_{j, k = 1}^{m} v_{h, j} v_{h, k} < \partial_{ω_{j}} ℓ {z, θ | ω (0)} \partial_{ω_{k}} ℓ {z, θ | ω (0)} > {ω (0)} \\ = v_{h}^{T} G {ω (0)} v_{h}, \end{array}

(7)

where ∂_{ω_k}f(ω) denotes the first-order partial derivative of f (ω, ω⁰) with respect to ω_k and G{ω(0)} = ∫[∂_ωℓ{z, θ | ω(0)}]^⊗2 p(z, θ | ω)dzdθ is an m × m Fisher information matrix with respect to ω. Thus, it follows from (7) and the definition of FI_f [v]{ω(0)} that ${FI}_{f} [v] {ω (0)} = {[d f [v] {ω (0)}]}^{2} / [< v, v > {ω (0)}] = {[v_{h}^{T} \partial_{ω} f {ω (0)}]}^{2} / v_{h}^{T} G {ω (0)} v_{h}$ . Finally, we obtain v_F_,max{ω(0)} = argmax_vFI_f [v]{ω(0)} = [G{ω(0)}]⁻¹^/² ∂_ωf{ω(0)}.

3.3. Second-order local influence measures

We use f̈{ω(0)} to assess the second-order local influence of ω to a statistical model (Zhu et al., 2007). However, for a general smooth curve ω(t) on ℳ, f̈{ω(0)} is not geometrically well behaved (Lang, 1995; Zhu et al., 2007). We consider only the geodesic p{z, θ | ω(t)}, denoted by Exp_ω₍₀₎ (tv), passing through Exp_ω₍₀₎(tv) |_t₌₀= ω(0) with initial direction ℓ̇{z, θ | ω(0)} = v{ω(0)} ∈ T_ω₍₀₎ℳ. It follows from a Taylor’s series expansion (Lang, 1995; Zhu et al., 2007) that

f {{Exp}_{ω (0)} (t v), ω^{0}} = f {ω (0), ω^{0}} + t d f [v] {ω (0)} + 0.5 t^{2} \ddot{f} {{Exp}_{ω (0)} (t v)} |_{t = 0} + o {(t)}^{2},

(8)

where f̈{Exp_ω₍₀₎(tv)} = d²f{Exp_ω₍₀₎(tv), ω⁰}/dt². Geometrically, f̈{Exp_ω₍₀₎(tv)} |_t₌₀ in (8) is called the Riemannian Hessian and is denoted by Hess(f)(v, v){ω(0)} (Lang, 1995). The Riemannian Hessian is symmetric. We now introduce a second-order influence measure.

Definition 4. The second-order influence measure in the direction v ∈ T_ω₍₀₎ℳ is defined as SI_f [v]{ω(0)} = Hess(f)(v, v){ω(0)}/[<v, v> {ω(0)}].

Geometrically, SI_f [v]{ω(0)} is invariant to scalar transformations and smooth transformations. To carry out a sensitivity analysis, we use the tangent vector v_S_,max ∈ T_ω₍₀₎ℳ, which maximizes SI_f [v]{ω(0)} for all v ∈ T_ω₍₀₎ℳ. There is a direct connection between the second-order measures in finite- and infinite-dimensional spaces. Therefore, the diagnostic method proposed here can be regarded as an extension of existing local influence approaches (Cook, 1986; Zhu et al., 2007) to an infinite dimensional setting.

Example 6, continued. We consider the Bayesian perturbation model in Example 6. If df[v]{ω(0)} = 0 for all v ∈ T_ω₍₀₎ℳ, then Hess(f)(v, v){ω(0)} reduces to $v_{h}^{T} H_{f} {ω (0)} v_{h}$ , where $H_{f} {ω (0)} = \partial_{ω}^{2} f {ω (0)}$ , in which $\partial_{ω}^{2} f {ω (0)}$ denotes the second-order partial derivative of f (ω, ω⁰) with respect to ω (Zhu et al., 2007). In this case, ${SI}_{f} [v] {ω (0)} = v_{h}^{T} H_{f} (ω, ω^{0}) v_{h} / v_{h}^{T} G {ω (0)} v_{h}$ and v_S_,max equals the eigenvector of G(ω)⁻¹^/² H_f {ω(0)}G(ω)⁻¹^/² corresponding to its largest eigenvalue. Let e_j be an m × 1 vector with jth element 1 and 0 otherwise. We also suggest an index plot of SI_f [e_j] to examine influential cases (Zhu et al., 2007, p. 2572).

3.4. Bayesian influence analysis

We now summarize the four key steps in carrying out our proposed influence analysis.

Step 1. Construct a Bayesian perturbation model p(z, θ | ω).
Step 2. Given the Bayesian perturbation model, we calculate the geometric quantities, such as <v, v> {ω(0)}, of the perturbation manifold.
Step 3. Choose an objective function f (ω, ω⁰) and calculate IGI_f (ω, ω⁰) and ω̂_I = argmax_ω∈Ω IGI_f (ω, ω⁰)

In Step 3, we need to compute f (ω, ω⁰) and d(ω, ω⁰). Since f (ω, ω⁰) is a function of p(θ | z, ω) and p(θ | z, ω⁰), we use Markov chain Monte Carlo methods to draw random samples from p(θ | z, ω) and p(θ | z, ω⁰) and then evaluate f (ω, ω⁰) (Chen et al., 2000). We use the Dijkstra algorithm (Dijkstra, 1959) to approximate the geodesic distance between p(z, θ | ω) and p(z, θ | ω⁰). The main idea of this method is to discretize the model {p(z, θ | ω) : ω ∈ Ω} into a simpler space {p(z, θ | ω) : ω ∈ Ω_D}, where Ω_D contains a set of the refined grid points of Ω and then we approximate d(ω, ω⁰) (Dijkstra, 1959). Based on the set of the refined grid points Ω_D, we then calculate {IGI_f (ω, ω⁰) : ω ∈ Ω_D} and approximate ω̂_I by using argmax_{ω ∈ Ω_D} IGI_f (ω, ω₀).

Step 4. If df [v]{ω(0)} ≠ 0, then we calculate v_F_,max to assess local influence of minor perturbations to the model. However, if df [v]{ω(0)} is 0 for all v, then we compute SI_f [v]{ω(0)} and find v_S_,max = argmax[SI_f [v]{ω(0)}].

In Step 4, we need to compute FI_f [v]{ω(0)} and SI_f [v]{ω(0)}. For many infinite-dimensional manifolds, such as the additive ∊-contamination class, v varies in a set 𝒱, which may be well approximated by a finite number of grid points {v_l: l = 1, . . . , K₀}. We can approximate argmax_v[FI_f [v]{ω(0)}] and argmax_v [SI_f [v]{ω(0)}] by argmax_{v_l}[FI_f [v_l]{ω(0)}] and argmax_{v_l}[SI_f [v_l]{ω(0)}, respectively.

4. A theoretical example

We consider a dataset z = (z₁, . . . , z_n)^T to illustrate the potential applications of our proposed diagnostics. Assume that z₁, . . . , z_n are independent and identically distributed from a N(θ, 1) distribution and the baseline prior distribution of θ is the density corresponding to a $N (μ_{0}, σ_{0}^{2})$ distribution. Letting $\bar{z} = \sum_{i = 1}^{n} z_{i} / n$ , we have $p (θ | z) \propto exp [- 0.5 (n + 1 / σ_{0}^{2}) {θ - (n \bar{z} + μ_{0} / σ_{0}^{2}) / (n + 1 / σ_{0}^{2})}^{2}]$ .

We first consider a simple perturbation to the location of the baseline prior, whose perturbed model is given by

p (z, θ | ω) = p (z | θ) p (θ | ω) = p (z | θ) exp {- 0.5 {(θ - ω - μ_{0})}^{2} / σ_{0}^{2}} / {(2 π σ_{0}^{2})}^{0.5}

for ω ∈ [ω_L, ω_U], where ω_L and ω_U are known scalars. We set $E (θ | z, ω) = \int θ p (θ | z, ω) d θ = {n \bar{z} + (ω + μ_{0}) / σ_{0}^{2}} / (n + 1 / σ_{0}^{2})$ and f (ω, ω⁰) = E(θ | z, ω) − E(θ | z, ω⁰). Thus, following Berger (1990), we have that the range of f (ω, ω⁰) equals $f (ω_{U}, ω^{0}) - f (ω_{L}, ω^{0}) = (ω_{U} - ω_{L}) / (n σ_{0}^{2} + 1)$ . A large range can be caused by a large ω_U − ω_L, which is associated with the size of the perturbation to the prior, as shown later.

We compute the intrinsic structure of p(z, θ | ω) and the intrinsic influence measure. We can calculate the geodesic distance between p(z, θ | ω_L) and p(z, θ | ω_U). Since ω(t) = t and $\dot{ℓ} {z, θ | ω (t)} = (θ - μ_{0} - t) / σ_{0}^{2}$ , we have $< \dot{ℓ} {z, θ | ω (t)} = \dot{ℓ} {z, θ | ω (t)} > {ω (t)} = 1 / σ_{0}^{2}$ and $d (ω_{L}, ω_{U}) = \int_{ω_{L}}^{ω_{U}} d t / σ_{0} = (ω_{U} - ω_{L}) / σ_{0}$ , which is the size of the sole perturbation to the prior regardless of the data. Both small σ₀ and large ω_U − ω_L can introduce large perturbations. When f (ω, ω⁰) = E(θ | z, ω) − E(θ | z, ω⁰), we have ${IGI}_{f} (ω, ω^{0}) = σ_{0}^{2} / {(n σ_{0}^{2} + 1)}^{2}$ , which is independent of ω. This indicates that relative to the perturbation of the prior, f (ω, ω⁰) does not change too much. A large range gives a false indication of the extent of nonrobustness, which is actually caused by large perturbations to the prior (Sivaganesan, 2000).

Secondly, we consider a simultaneous perturbation to the prior and the model, given by

p (z, θ | ω) \propto exp {- 0.5 \sum_{i = 1}^{n} {(z_{i} - ω_{i} - θ)}^{2} - 0.5 {(θ - μ_{0} - ω_{n + 1})}^{2} / σ_{0}^{2}},

(9)

where ω = (ω₁, . . . ,ω_n₊₁₎^T ∈ Rⁿ⁺¹. In this case, ω⁰ = 0_n₊₁ represents no perturbation. Let δ_ij equal 1 for i = j and 0 otherwise. Following Example 6, we can show that for i, j = 1, . . . , n,

\begin{array}{l} \partial_{ω_{i}} ℓ (z, θ | ω) = (z_{i} - ω_{i} - θ), \partial_{ω_{n + 1}} ℓ (z, θ | ω) = (θ - μ_{0} - ω_{n + 1}) / σ_{0}^{2}, \\ < \partial_{ω_{i}} ℓ (z, θ | ω), \partial_{ω_{j}} ℓ (z, θ | ω) > (ω) = δ_{i j}, < \partial_{ω_{i}} ℓ (z, θ | ω), \partial_{ω_{n + 1}} ℓ (z, θ | ω) > (ω) = 0, \\ < \partial_{ω_{n + 1}} ℓ (z, θ | ω), \partial_{ω_{n + 1}} ℓ (z, θ | ω) > (ω) = 1 / σ_{0}^{2} . \end{array}

(10)

Thus, when σ₀ ≠ 1, ω_i for i = 1, . . . , n and ω_n₊₁ introduce different levels of perturbation to the fitted model p(z, θ | ω). Furthermore, since < ∂_{ω_i}ℓ(z, θ | ω), ∂_{ω_j}ℓ(z, θ | ω) > (ω) for all i, j are independent of ω, the manifold ℳ determined by (9) is a flat manifold (Lang, 1995). For any ω in Rⁿ⁺¹, the geodesic connecting p(z, θ | ω) and p(z, θ | ω⁰) is given by p(z, θ; tω) for t ∈ [0, 1]. By using (3), we can show that $d {(ω, ω^{0})}^{2} = \sum_{i = 1}^{n} ω_{i}^{2} + ω_{n + 1}^{2} / σ_{0}^{2}$ , which quantifies the size of the perturbation scheme (9) to the prior and the fitted model.

We calculate the logarithm of the Bayes factor BF(ω, ω⁰) as discussed in Example 5. Since the terms in the exponential function of (9) form a quadratic function of θ, we can explicitly calculate BF(ω, ω⁰) = P(ω) − P(ω⁰), where P(ω) = log ∫ p(z, θ | ω)dθ equals

C - 0.5 [{(ω_{n + 1} + μ_{0})}^{2} / σ_{0}^{2} + \sum_{i = 1}^{n} {(z_{i} - ω_{i})}^{2} - {(ω_{n + 1} + μ_{0}) / σ_{0}^{2} + \sum_{i = 1}^{n} (z_{i} - ω_{i})}^{2} / (n + 1 / σ_{0}^{2})],

and C is a scalar independent of ω. Now recall the results of Example 5. For a smooth curve ω(t) ∈ Rⁿ⁺¹ with ω(0) = ω⁰, FI_f [v]{ω(0)} is determined by ∂_ωBF(ω, ω⁰) and v_F_,max(ω) = {G(ω⁰)}^−1/2∂_ωBF(ω, ω⁰), in which $G (ω^{0}) = diag (1, \dots, 1, σ_{0}^{- 2})$ as calculated in (10). Taking derivatives of BF(ω, ω⁰) with respect to ω, we get

\begin{array}{l} \partial_{ω_{n + 1}} BF (ω, ω^{0}) = - (ω_{n + 1} + μ_{0}) / σ_{0}^{2} + {(ω_{n + 1} + μ_{0}) / σ_{0}^{2} + \sum_{i = 1}^{n} (z_{i} - ω_{i})} / (n σ_{0}^{2} + 1), \\ \partial_{ω_{i}} BF (ω, ω^{0}) = z_{i} - ω_{i} - {(ω_{n + 1} + μ_{0}) / σ_{0}^{2} + \sum_{i = 1}^{n} (z_{i} - ω_{i})} / (n + 1 / σ_{0}^{2}) \end{array}

for i = 1, . . . , n, which yields

v_{F, max} (ω^{0}) = {z_{1} - \frac{n \bar{z} + μ_{0} / σ_{0}^{2}}{n + 1 / σ_{0}^{2}}, \dots, z_{n} - \frac{n \bar{z} + μ_{0} / σ_{0}^{2}}{n + 1 / σ_{0}^{2}}, \frac{n (\bar{z} - μ_{0}) σ_{0}}{n σ_{0}^{2} + 1}}^{T} .

(11)

By inspecting the first n components of v_F_,max(ω⁰), we can identify outlying points z_i which are far from the posterior mean of θ, while the last component of v_F_,max(ω⁰) can pick up an influential hyperparameter μ₀.

Thirdly, we consider a simultaneous perturbation to the prior and the sampling distribution,

p (z, θ | ω) \propto exp {- 0.5 \sum_{i = 1}^{n} ω_{i} {(z_{i} - θ)}^{2} - 0.5 ω_{n + 1} {(θ - μ_{0})}^{2} / σ_{0}^{2} + 0.5 \sum_{i = 1}^{n + 1} log (ω_{i}),}

where ω = (ω₁, . . . ,ω_n₊₁)^T ∈ Rⁿ⁺¹. In this case, ω⁰ = 1_n₊₁ represents no perturbation. Following Example 6, we can show that for i, j = 1, . . . , n,

\begin{array}{l} \partial_{ω_{i}} ℓ (z, θ | ω) = - 0.5 {(z_{i} - θ)}^{2} + 0.5 ω_{i}^{- 1}, \\ \partial_{ω_{n + 1}} ℓ (z, θ | ω) = - 0.5 {(θ - μ_{0})}^{2} / σ_{0}^{2} + 0.5 ω_{n + 1}^{- 1}, \\ < \partial_{ω_{i}} ℓ (z, θ | ω) = \partial_{ω_{j}} ℓ (z, θ | ω) > (ω) = 0.5 ω_{i}^{- 2} δ_{i j}, \\ < \partial_{ω_{i}} ℓ (z, θ | ω), = \partial_{ω_{n + 1}} ℓ (z, θ | ω) > (ω) = 0, \\ < \partial_{ω_{n + 1}} ℓ (z, θ | ω), \partial_{ω_{n + 1}} ℓ (z, θ | ω) > (ω) = 0.5 ω_{n + 1}^{- 2} . \end{array}

(12)

Thus, G(ω⁰) is an (n + 1) × (n + 1) identity matrix.

We consider a sensitivity analysis for predictive distributions (Lavine, 1992; Millar & Stewart, 2007). Let z_n₊₁ denote a future observation from N(θ, 1), the predictive density of z_n₊₁ given z, denoted by p(z_n₊₁ | z, ω), is shown to be $N {(\sum_{i = 1}^{n} ω_{i} z_{i} + ω_{n + 1} μ_{0} / σ_{0}^{2}) / (\sum_{i = 1}^{n} ω_{i} + ω_{n + 1} / σ_{0}^{2}), 1 / (\sum_{i = 1}^{n} ω_{i} + ω_{n + 1} / σ_{0}^{2})}$ . We set f(ω, ω⁰) = ∫ z_n₊₁ p(z_n₊₁ | z, ω)dz_n₊₁ − ∫ z_n₊₁ p(z_n₊₁ | z, ω⁰)dz_n₊₁. Now recall the results of Example 6 and the metric tensor in (12). For a smooth curve ω(t) ∈ Rⁿ⁺¹ with ω(0) = ω⁰, FI_f[v]{ω(0)} is determined by ∂_ωf(ω) and v_F_,max (ω) = ∂_ωf(ω, ω⁰), which are given by

\begin{array}{l} \partial_{ω_{n + 1}} f (ω, ω^{0}) = σ_{0}^{- 2} μ_{0} / (\sum_{i = 1}^{n} ω_{i} + ω_{n + 1} / σ_{0}^{2}) \\ {- σ_{0}^{- 2} (ω_{n + 1} μ_{0} / σ_{0}^{2} + \sum_{i = 1}^{n} z_{i} ω_{i}) / (\sum_{i = 1}^{n} ω_{i} + ω_{n + 1} / σ_{0}^{2})}^{2}, \\ \partial_{ω_{i}} f (ω, ω^{0}) = z_{i} / (\sum_{i = 1}^{n} ω_{i} + ω_{n + 1} / σ_{0}^{2}) \\ {- (ω_{n + 1} μ_{0} / σ_{0}^{2} + \sum_{i = 1}^{n} z_{i} ω_{i}) / (\sum_{i = 1}^{n} ω_{i} + ω_{n + 1} / σ_{0}^{2})}^{2} \end{array}

for i = 1, . . . , n. This yields that v_F_,max_,(ω⁰) is proportional to

\frac{1}{n + 1 / σ_{0}^{2}} {(z_{1} - \frac{n \bar{z} + μ_{0} / σ_{0}^{2}}{n + 1 / σ_{0}^{2}}, \dots, z_{n} - \frac{n \bar{z} + μ_{0} / σ_{0}^{2}}{n + 1 / σ_{0}^{2}} + \frac{n (μ_{0} - \bar{z}) σ_{0}^{2}}{n σ_{0}^{2} + 1})}^{T} .

(13)

We observe that v_F_,max_,(ω⁰) in (13) is closely associated with v_F_,max(ω⁰) in (11), and thus v_F_,max (ω⁰) is able to pick up outlying points z_i and an influential hyperparameter μ₀.

Finally, we examine a more general setting in which z_i (i = 1, . . . , 50) are independent N(θ_i, 1) variables, with the θ_i independently generated from a Dirichlet process prior DP(c₀F₁), where the base measure F₁ is that of a N(5, 1) distribution and the confidence parameter c₀ is set equal to 2 (Escobar, 1994). Furthermore, the z_i were changed to z_i + 5 for i = 49 and 50, which can be regarded as two outliers. We fit a model with z_i ∼ N(θ_i, 1) and θ_i ∼ DP(2F₀), where F₀ is the probability measure of a N(0, 1) distribution. The base measure F₀ is misspecified due to the difference between the means of a N(0, 1) and the true base measure N(5, 1). We consider a simultaneous perturbation to the prior and the data. We have

\begin{array}{l} p (z, θ | ω) \propto exp (- 0.5 \sum_{i = 1}^{n} {(z_{i} - ω_{i} - θ_{i})}^{2} \\ + \sum_{i = 1}^{n} log [c_{0} F_{0} (θ_{i}) + c_{0} ω_{n + 1} {F_{1} (θ_{i}) - F_{0} (θ_{i})} + \sum_{j = 1}^{i - 1} δ_{θ_{j}} (θ_{i})]) . \end{array}

(14)

In this case, ω⁰ = 0_n₊₁ represents no perturbation. By differentiating ℓ(z, θ | ω) = log p(z, θ | ω) in (14) with respect to each component of ω, we have that for i = 1, . . . , n,

\begin{array}{l} \partial_{ω_{i}} ℓ (z, θ; ω) = z_{i} - ω_{i} - θ_{i}, \\ \partial_{ω_{n + 1}} ℓ (z, θ | ω) = \sum_{i = 1}^{n} \frac{c_{0} {F_{1} (θ_{i}) - F_{0} (θ_{i})}}{c_{0} F_{0} (θ_{i}) + c_{0} ω_{n + 1} {F_{1} (θ_{i}) - F_{0} (θ_{i})} + \sum_{j = 1}^{i - 1} δ_{θ_{j}} (θ_{i})} . \end{array}

Since ∫ (z_i − ω_i − θ_i) p(z, θ | ω)dz = 0 and ∫ (z_i − ω_i − θ_i)(z_j − ω_j − θ_j) p(z, θ | ω)dz = δ_ij, we have

\begin{array}{l} < \partial_{ω_{i}} ℓ (z, θ | ω), \partial_{ω_{j}} ℓ (z, θ | ω) > (ω) = δ_{i j}, < \partial_{ω_{i}} ℓ (z, θ | ω), \partial_{ω_{n + 1}} ℓ (z, θ | ω) > (ω) = 0, \\ < \partial_{ω_{n + 1}} ℓ (z, θ | ω), \partial_{ω_{n + 1}} ℓ (z, θ | ω) > (ω) = E [{\partial_{ω_{n + 1}} ℓ (z, θ | ω)}^{2}] . \end{array}

Similar to (11), we set f (ω, ω⁰) = BF(ω, ω⁰) and substitute the results from (7) to calculate v_F_,max(ω⁰) using 50 000 Markov chain Monte Carlo samples generated from the posterior distribution p(θ₁, . . . ,θ_n | z₁, . . . , z₅₀) after a 5000 sample burn-in. Inspecting the components of v_F_,max(ω⁰) reveals the outlying cases 49 and 50 and shows the sensitivity to the misspecified base measure F₀ of the Dirichlet process prior for θ_i in Fig. 1.

Fig. 1 — Simultaneous perturbation model using a Dirichlet process prior and perturbing individual observations: (a) local influence measures *v_B*_,max(ω⁰) for the logarithm of the Bayes factor f (ω, ω⁰) = BF(ω, ω⁰), from which the outlying cases 49 and 50 and the perturbation to the Dirichlet process prior were detected; (b) index plot of metric tensor *g_ii* (ω⁰) for the perturbation (15).

In addition to this theoretical example, an extensive simulation and a real data analysis involving missing data are given in the Supplementary Material. In practice, we suggest an iterative process to carry out the four-step influence analysis in § 3.4. If one is concerned about sensitivity to the prior, then one may introduce some finite-dimensional perturbation as in Example 1 to all hyperparameters of the prior and identify influential hyperparameters according to their local influence measures. Then, for a few influential hyperparameters, one further perturbs their associated prior distribution using the additive ∊-contamination class and then carries out intrinsic influence analysis. If one is concerned about the sampling distribution, then one may introduce various perturbations including the additive ∊-contamination class and the perturbation model (1) to p(z | θ) and use the local influence measures to detect which part of p(z | θ) is sensitive to minor perturbations. Then, one may focus on these influential parts and carry out an intrinsic influence analysis. After refining the prior and the sampling distribution, one may then perturb individual observations and detect a set of influential observations. After examining the information from each influence analysis, we carry out a simultaneous perturbation to z, p(θ) and p(z | θ). We start with a local influence analysis to examine the sensitivity of all components and then focus on a few influential components using an intrinsic influence analysis.

Acknowledgments

We thank the editor, an associate editor and two referees for many valuable suggestions which have greatly improved this paper.

Appendix

Proof of Proposition 1. Consider any two smooth curves p{z, θ | ω₍_k₎(t)} = p{θ | ω_(k)(t)} p(z | θ) with p{z, θ | ω₍_k₎(0)} = p(θ | ω) p(z | θ) for k = 1, 2. For each k, by differentiating ℓ{z, θ | ω₍_k₎(t)} with respect to t, we obtain a tangent vector v_k (ω) = ℓ{z, θ | ω₍_k₎(0)} = d log p{θ | ω₍_k₎(t)}/dt |_t₌₀ ∈ T_ωℳ, which is independent of p(z | θ). Furthermore, letting d_t = d/dt, the inner product of v₁(ω) and v₂(ω) is given by ∫ [d_t log p{θ | ω₍₁₎(t)}][d_t log p{θ | ω₍₂₎(t)}] p{z, θ | ω}dzdθ = ∫ [d_t log p{θ | ω₍₁₎(t)}][d_t log p{θ | ω₍₂₎(t)}] p{θ | ω}dθ, which is also independent of p(z | θ).

Proof of Proposition 2. Consider two smooth curves p{z, θ | ω₍_k₎(t)} with ω₍_k₎(t) = {ω₍_k_),_p(t)^T, {ω₍_k_),_s(t)^T}^T such that ω₍₁₎(0) = ω₂(0) = ω and (ω_(1),_p(t) and ω_(1),_s(t) are independent of t. Let ℓ(z | θ, ω_(1),_s) = log p(z | θ, ω_(1),_s). Since ω_(1),_p(t) is independent of t,

v_{1} (ω) = \dot{ℓ} {z, θ | ω_{(1)} (0)} = \frac{d}{d t} log p {θ | ω_{(1), p} (t)} |_{t = 0} + \frac{d}{d t} log p {z | θ, ω_{(1), s} (t)} |_{t = 0} \dot{ℓ} {z | θ, ω_{(1), s} (0)} .

Let ℓ(θ | ω_(2),_p) = log p(θ | ω_(2),_p). Similarly, we have

v_{2} (ω) = \dot{ℓ} {z, θ | ω_{(2)} (0)} = \frac{d}{d t} log p {θ | ω_{(2), p} (t)} |_{t = 0} \dot{ℓ} {θ | ω_{(2), p} (0)} .

Thus, the inner product of v₁(ω) and v₂(ω), denoted by < v₁, v₂ > (ω), is given by

\begin{array}{l} \int \dot{ℓ} {θ | ω_{(2), p} (0)} \dot{ℓ} {z | θ, ω_{(1), s} (0)} p (z, θ | ω) d z d θ & = \int \frac{d p {θ | ω_{(2), p} (0)}}{d t} \frac{d p {z | θ, ω_{(1), s} (0)}}{d t} d z d θ \\ = \int (\frac{d p {θ | ω_{(2), p} (0)}}{d t} [\int \frac{d p {z | θ, ω_{(1), s} (0)}}{d t} d z]) d θ \\ = \int (\frac{d p {θ | ω_{(2), p} (0)}}{d t} \frac{d [\int d p {z | θ, ω_{(1), s} (0)} d z]}{d t}) d θ \\ = \int [\frac{d p {θ | ω_{(2), p} (0)}}{d t} \frac{d 1}{d t}] d θ = 0 . \end{array}

Proof of Theorem 1. Since Theorem 1 (i) follows from Proposition 2, we focus on Theorem 1 (ii). Since {ω_(1),_p(t), ω_(1),_d(t)} and {ω_(2),_p(t), ω_(2),_s(t)} are independent of t and p(z | θ, ω_d, ω_s) = p₁(z | θ, ω_d) p₂(z | θ, ω_s), we have

\begin{array}{l} v_{1} (ω) = \dot{ℓ} {z, θ | ω_{(1)} (0)} = \frac{d}{d t} log p_{1} {z | θ, ω_{(1), s} (t)} |_{t = 0}, \\ v_{2} (ω) = \dot{ℓ} {z, θ | ω_{(2)} (0)} = \frac{d}{d t^{'}} log p_{2} {z | θ, ω_{(2), d} (t^{'})} |_{t^{'} = 0} . \end{array}

Thus, < v₁, v₂ > (ω) is given by

\begin{array}{l} {\int \frac{d log p_{1} {z | θ, ω_{(1), s} (t)}}{d t} |}_{t = 0} {\frac{d log p_{2} {z | θ, ω_{(2), d} (t^{'})}}{d t^{'}} |}_{t^{'} = 0} p (z, θ | ω) d z d θ \\ = \int \frac{d p_{1} {z | θ, ω_{(1), s} (0)}}{d t} \frac{d p_{2} {z | θ, ω_{(2), d} (0)}}{d t^{'}} p (θ | ω_{p}) d z d θ = \frac{d^{2} 1}{d t d t^{'}} = 0 . \end{array}

Proof of Theorem 2. Consider a smooth curve p{z, θ | ω(t)}. Let R(s) : [c₁, c₂] → [−∊, ∊] be the first-order differential map such that R(c₃) = 0 and Ṙ (c₃) = dR(s)/ds |_s=c₃ ≠ 0 for a c₃ ∈ (c₁, c₂). Then, p[z, θ | ω{R(s)}] is a differential map from [c₁, c₂] to ℳ. It follows from the chain rule that ḟ[ω{R(s)}] = d_s f[ω{R(s)}, ω⁰] = d_r f {ω(r), ω⁰} Ṙ(s) and d_sℓ [z, θ | ω {R(s)}] = d_r ℓ{z, θ | ω(r)} Ṙ(s), where Ṙ (s) = d_s R(s), d_c = d/dc, d_r = d/dr, and d_s = d/ds. Thus, as ω(0) = ω⁰, we have

d f [\dot{R} (c_{3}) v] [ω {R (c_{3})}] = \dot{R} (c_{3}) d f [v] (ω), and < \dot{R} (c_{3}) v, \dot{R} (c_{3}) v > (ω) = \dot{R} {(c_{3})}^{2} < v, v > (ω) .

Supplementary material

Supplementary Material available at Biometrika online includes the proof of Proposition 1, a real data analysis on missing data problems and an extensive simulation.

asr009supp.pdf^{(1MB, pdf)}

References

Amari S. Differential-Geometrical Methods in Statistics. 2nd edn. Vol. 28. Berlin: Springer; 1990. Lecture Notes in Statistics. [Google Scholar]
Berger JO. Robust Bayesian analysis: sensitivity to the prior. J Statist Plan Infer. 1990;25:303–28. [Google Scholar]
Berger JO. An overview of robust Bayesian analysis. TEST. 1994;3:5–58. [Google Scholar]
Berger JO, Rios Insua D, Ruggeri F. In: Bayesian robustness Robust Bayesian Analysis. Lecture Notes in Statistics. Rios Insua D, Ruggeri F, editors. Vol. 152. New York: Springer; 2000. pp. 1–32. [Google Scholar]
Carlin BP, Polson NG. An expected utility approach to influence diagnostics. J Am Statist Assoc. 1991;86:1013–21. [Google Scholar]
Chen MH, Shao QM, Ibrahim JG. Monte Carlo Methods in Bayesian Computation. New York: Springer; 2000. [Google Scholar]
Clarke B. Desiderata for a predictive theory of statistics. Bayesian Anal. 2010;5:283–318. [Google Scholar]
Clarke B, Gustafson P. On the overall sensitivity of the posterior distribution to its inputs. J Statist Plan Infer. 1998;71:137–50. [Google Scholar]
Cook RD. Assessment of local influence (with Discussion) J. R. Statist. Soc. B. 1986;48:133–69. [Google Scholar]
Copas JB, Eguchi S. Local model uncertainty and incomplete-data bias (with discussion) J. R. Statist. Soc. B. 2005;67:459–513. [Google Scholar]
Dey DK, Birmiwal LR. Robust Bayesian analysis using divergence measures. Statist Prob Lett. 1994;20:287–94. [Google Scholar]
Dey DK, Ghosh SK, Lou KR. On local sensitivity measures in Bayesian analysis (with discussion) In: Berger JO, Betro B, Moreno E, Pericchi LR, Ruggeri F, Salinetti G, Wasserman L, editors. Bayesian Robustness. Vol. 29. Hayward, CA; 1996. pp. 21–40. IMS Lecture Notes-Monograph Series. [Google Scholar]
Dijkstra EW. A note on two problems in connexion with graphs. Numer Math. 1959;1:269–71. [Google Scholar]
Ekeland I. The Hopf–Rinow theorem in infinite dimension. J Diff Geom. 1978;13:287–301. [Google Scholar]
Escobar MD. Estimating normal means with a Dirichlet process prior. J Am Statist Assoc. 1994;89:268–77. [Google Scholar]
Friedrich T. Die Fisher-Information und symplektische strukturen. Math Nachr. 1991;153:273–96. [Google Scholar]
Gustafson P. Local sensitivity of inferences to prior marginals. J Am Statist Assoc. 1996;91:774–81. [Google Scholar]
Gustafson P. Local robustness in Bayesian analysis. In: Rios Insua D, Ruggeri F, editors. Robust Bayesian Analysis. New York: Springer; 2000. pp. 71–88. [Google Scholar]
Gustafson P, Wasserman L. Local sensitivity diagnostics for Bayesian inference. Ann Statist. 1995;23:2153–67. [Google Scholar]
Guttman I, Peña D. A Bayesian look at diagnostics in the univariate linear model. Statist. Sinica. 1993;3:367–90. [Google Scholar]
Kass RE, Raftery AE. Bayes factors. J Am Statist Assoc. 1995;90:773–95. [Google Scholar]
Kass RE, Tierney L, Kadane JB. Approximate methods for assessing influence and sensitivity in Bayesian analysis. Biometrika. 1989;76:663–74. [Google Scholar]
Kass RE, Vos PW. Geometrical Foundations of Asymptotic Inference. New York: Wiley; 1997. [Google Scholar]
Lang S. Differential and Riemannian Manifolds. 3rd ed. New York: Springer; 1995. [Google Scholar]
Lavine M. Local predictive influence in Bayesian linear models with conjugate priors. Commun. Statist. B. 1992;2:269– 83. [Google Scholar]
McCulloch RE. Local model influence. J Am Statist Assoc. 1989;84:473–78. [Google Scholar]
Millar RB, Stewart WS. Assessment of locally influential observations in Bayesian models. Bayesian Anal. 2007;2:365–84. [Google Scholar]
Oakley JE, O’Hagan A. Probabilistic sensitivity analysis of complex models: a Bayesian approach. J. R. Statist. Soc. B. 2004;66:751–69. [Google Scholar]
Peña D, Guttman I. Comparing probabilistic methods for outlier detection in linear models. Biometrika. 1993;80:603–10. [Google Scholar]
Peng F, Dey DK. Bayesian analysis of outlier problems using divergence measures. Can J Statist. 1995;23:199–213. [Google Scholar]
Ruggeri F, Sivaganesan S. On a global sensitivity measure for Bayesian inference. Sankhya. 2000;62:110–27. [Google Scholar]
Sivaganesan S. Global and local robustness approaches: uses and limitations. In: Rios Insua D, Ruggeri F, editors. Robust Bayesian Analysis. Vol. 152. New York: Springer; 2000. pp. 89–108. Lecture Notes in Statistics. [Google Scholar]
Van der Linde A. Local influence on posterior distributions under multiplicative modes of perturbation. Bayesian Anal. 2007;2:319–32. [Google Scholar]
Wang Q, Stefanski LA, Genton MG, Boos DD. Robust time series analysis via measurement error modeling. Statist Sinica. 2009;19:1263–80. [Google Scholar]
Wu X, Luo Z. Second-order approach to local influence. J. R. Statist. Soc. B. 1993;55:929–36. [Google Scholar]
Zhu HT, Ibrahim JG, Lee SY, Zhang HP. Perturbation selection and influence measures in local influence analysis. Ann Statist. 2007;35:2565–88. [Google Scholar]
Zhu HT, Lee SY. Local influence for incomplete data models. J. R. Statist. Soc. B. 2001;63:111–26. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material available at Biometrika online includes the proof of Proposition 1, a real data analysis on missing data problems and an extensive simulation.

asr009supp.pdf^{(1MB, pdf)}

[b1-biomet-98-2-307] Amari S. Differential-Geometrical Methods in Statistics. 2nd edn. Vol. 28. Berlin: Springer; 1990. Lecture Notes in Statistics. [Google Scholar]

[b2-biomet-98-2-307] Berger JO. Robust Bayesian analysis: sensitivity to the prior. J Statist Plan Infer. 1990;25:303–28. [Google Scholar]

[b3-biomet-98-2-307] Berger JO. An overview of robust Bayesian analysis. TEST. 1994;3:5–58. [Google Scholar]

[b4-biomet-98-2-307] Berger JO, Rios Insua D, Ruggeri F. In: Bayesian robustness Robust Bayesian Analysis. Lecture Notes in Statistics. Rios Insua D, Ruggeri F, editors. Vol. 152. New York: Springer; 2000. pp. 1–32. [Google Scholar]

[b5-biomet-98-2-307] Carlin BP, Polson NG. An expected utility approach to influence diagnostics. J Am Statist Assoc. 1991;86:1013–21. [Google Scholar]

[b6-biomet-98-2-307] Chen MH, Shao QM, Ibrahim JG. Monte Carlo Methods in Bayesian Computation. New York: Springer; 2000. [Google Scholar]

[b7-biomet-98-2-307] Clarke B. Desiderata for a predictive theory of statistics. Bayesian Anal. 2010;5:283–318. [Google Scholar]

[b8-biomet-98-2-307] Clarke B, Gustafson P. On the overall sensitivity of the posterior distribution to its inputs. J Statist Plan Infer. 1998;71:137–50. [Google Scholar]

[b9-biomet-98-2-307] Cook RD. Assessment of local influence (with Discussion) J. R. Statist. Soc. B. 1986;48:133–69. [Google Scholar]

[b10-biomet-98-2-307] Copas JB, Eguchi S. Local model uncertainty and incomplete-data bias (with discussion) J. R. Statist. Soc. B. 2005;67:459–513. [Google Scholar]

[b11-biomet-98-2-307] Dey DK, Birmiwal LR. Robust Bayesian analysis using divergence measures. Statist Prob Lett. 1994;20:287–94. [Google Scholar]

[b12-biomet-98-2-307] Dey DK, Ghosh SK, Lou KR. On local sensitivity measures in Bayesian analysis (with discussion) In: Berger JO, Betro B, Moreno E, Pericchi LR, Ruggeri F, Salinetti G, Wasserman L, editors. Bayesian Robustness. Vol. 29. Hayward, CA; 1996. pp. 21–40. IMS Lecture Notes-Monograph Series. [Google Scholar]

[b13-biomet-98-2-307] Dijkstra EW. A note on two problems in connexion with graphs. Numer Math. 1959;1:269–71. [Google Scholar]

[b14-biomet-98-2-307] Ekeland I. The Hopf–Rinow theorem in infinite dimension. J Diff Geom. 1978;13:287–301. [Google Scholar]

[b15-biomet-98-2-307] Escobar MD. Estimating normal means with a Dirichlet process prior. J Am Statist Assoc. 1994;89:268–77. [Google Scholar]

[b16-biomet-98-2-307] Friedrich T. Die Fisher-Information und symplektische strukturen. Math Nachr. 1991;153:273–96. [Google Scholar]

[b17-biomet-98-2-307] Gustafson P. Local sensitivity of inferences to prior marginals. J Am Statist Assoc. 1996;91:774–81. [Google Scholar]

[b18-biomet-98-2-307] Gustafson P. Local robustness in Bayesian analysis. In: Rios Insua D, Ruggeri F, editors. Robust Bayesian Analysis. New York: Springer; 2000. pp. 71–88. [Google Scholar]

[b19-biomet-98-2-307] Gustafson P, Wasserman L. Local sensitivity diagnostics for Bayesian inference. Ann Statist. 1995;23:2153–67. [Google Scholar]

[b20-biomet-98-2-307] Guttman I, Peña D. A Bayesian look at diagnostics in the univariate linear model. Statist. Sinica. 1993;3:367–90. [Google Scholar]

[b21-biomet-98-2-307] Kass RE, Raftery AE. Bayes factors. J Am Statist Assoc. 1995;90:773–95. [Google Scholar]

[b22-biomet-98-2-307] Kass RE, Tierney L, Kadane JB. Approximate methods for assessing influence and sensitivity in Bayesian analysis. Biometrika. 1989;76:663–74. [Google Scholar]

[b23-biomet-98-2-307] Kass RE, Vos PW. Geometrical Foundations of Asymptotic Inference. New York: Wiley; 1997. [Google Scholar]

[b24-biomet-98-2-307] Lang S. Differential and Riemannian Manifolds. 3rd ed. New York: Springer; 1995. [Google Scholar]

[b25-biomet-98-2-307] Lavine M. Local predictive influence in Bayesian linear models with conjugate priors. Commun. Statist. B. 1992;2:269– 83. [Google Scholar]

[b26-biomet-98-2-307] McCulloch RE. Local model influence. J Am Statist Assoc. 1989;84:473–78. [Google Scholar]

[b27-biomet-98-2-307] Millar RB, Stewart WS. Assessment of locally influential observations in Bayesian models. Bayesian Anal. 2007;2:365–84. [Google Scholar]

[b28-biomet-98-2-307] Oakley JE, O’Hagan A. Probabilistic sensitivity analysis of complex models: a Bayesian approach. J. R. Statist. Soc. B. 2004;66:751–69. [Google Scholar]

[b29-biomet-98-2-307] Peña D, Guttman I. Comparing probabilistic methods for outlier detection in linear models. Biometrika. 1993;80:603–10. [Google Scholar]

[b30-biomet-98-2-307] Peng F, Dey DK. Bayesian analysis of outlier problems using divergence measures. Can J Statist. 1995;23:199–213. [Google Scholar]

[b31-biomet-98-2-307] Ruggeri F, Sivaganesan S. On a global sensitivity measure for Bayesian inference. Sankhya. 2000;62:110–27. [Google Scholar]

[b32-biomet-98-2-307] Sivaganesan S. Global and local robustness approaches: uses and limitations. In: Rios Insua D, Ruggeri F, editors. Robust Bayesian Analysis. Vol. 152. New York: Springer; 2000. pp. 89–108. Lecture Notes in Statistics. [Google Scholar]

[b33-biomet-98-2-307] Van der Linde A. Local influence on posterior distributions under multiplicative modes of perturbation. Bayesian Anal. 2007;2:319–32. [Google Scholar]

[b34-biomet-98-2-307] Wang Q, Stefanski LA, Genton MG, Boos DD. Robust time series analysis via measurement error modeling. Statist Sinica. 2009;19:1263–80. [Google Scholar]

[b35-biomet-98-2-307] Wu X, Luo Z. Second-order approach to local influence. J. R. Statist. Soc. B. 1993;55:929–36. [Google Scholar]

[b36-biomet-98-2-307] Zhu HT, Ibrahim JG, Lee SY, Zhang HP. Perturbation selection and influence measures in local influence analysis. Ann Statist. 2007;35:2565–88. [Google Scholar]

[b37-biomet-98-2-307] Zhu HT, Lee SY. Local influence for incomplete data models. J. R. Statist. Soc. B. 2001;63:111–26. [Google Scholar]

PERMALINK

Bayesian influence analysis: a geometric approach

HONGTU ZHU

JOSEPH G IBRAHIM

NIANSHENG TANG

Summary

1. Introduction

2. The Bayesian perturbation model and manifold

2.1. The Bayesian perturbation model

2.2. The Bayesian perturbation manifold

3. Influence measures and their properties

3.1. Intrinsic influence measures

3.2. First-order local influence measures

3.3. Second-order local influence measures

3.4. Bayesian influence analysis

4. A theoretical example

Fig. 1.

Acknowledgments

Appendix

Supplementary material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Bayesian influence analysis: a geometric approach

HONGTU ZHU

JOSEPH G IBRAHIM

NIANSHENG TANG

Summary

1. Introduction

2. The Bayesian perturbation model and manifold

2.1. The Bayesian perturbation model

2.2. The Bayesian perturbation manifold

3. Influence measures and their properties

3.1. Intrinsic influence measures

3.2. First-order local influence measures

3.3. Second-order local influence measures

3.4. Bayesian influence analysis

4. A theoretical example

Fig. 1.

Acknowledgments

Appendix

Supplementary material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases