Skip to main content
Biometrika logoLink to Biometrika
. 2011 Jun;98(2):307–323. doi: 10.1093/biomet/asr009

Bayesian influence analysis: a geometric approach

HONGTU ZHU 1, JOSEPH G IBRAHIM 1, NIANSHENG TANG 2
PMCID: PMC3897258  NIHMSID: NIHMS265394  PMID: 24453379

Summary

In this paper we develop a general framework of Bayesian influence analysis for assessing various perturbation schemes to the data, the prior and the sampling distribution for a class of statistical models. We introduce a perturbation model to characterize these various perturbation schemes. We develop a geometric framework, called the Bayesian perturbation manifold, and use its associated geometric quantities including the metric tensor and geodesic to characterize the intrinsic structure of the perturbation model. We develop intrinsic influence measures and local influence measures based on the Bayesian perturbation manifold to quantify the effect of various perturbations to statistical models. Theoretical and numerical examples are examined to highlight the broad spectrum of applications of this local influence method in a formal Bayesian analysis.

Keywords: Influence measure, Perturbation manifold, Perturbation model, Prior distribution

1. Introduction

A formal Bayesian analysis of data z = (z1, . . . , zn) involves the specification of a sampling distribution p(z | θ) and a prior distribution p(θ), where θ = (θ1, . . . ,θk)T represents the parameters of inferential interest and varies in an open set Θ of Rk. To carry out Bayesian inference, we usually use Markov chain Monte Carlo methods to simulate samples from the posterior distribution p(θ | z), which is proportional to p(z | θ) p(θ). Subsequently, we can calculate posterior quantities of θ in Rk, such as the posterior mean M(h) = ∫ h(θ) p(θ | z) of a function h(θ). For notational simplicity, we do not emphasize the dominating measure explicitly throughout the paper. There is a great deal of interest in the degree to which posterior inferences are sensitive to p(θ), p(z | θ) and (z1, . . . , zn) (Kass et al., 1989; McCulloch, 1989; Berger, 1990, 1994; Dey et al., 1996; Gustafson, 2000; Sivaganesan, 2000; Oakley & O’Hagan, 2004).

There are three major formal influence techniques, including case influence measures and global and local robustness approaches, for quantifying the degree of dependence of the posterior distribution on these three key elements of Bayesian analysis including the prior, the sampling distribution and the data (Berger, 1990, 1994). In Bayesian analysis, case influence measures primarily calculate the influence of a set of observations in order to identify outliers and influential observations. Most case influence measures are based on the posterior and/or predictive distribution through either case deletion or perturbation (Guttman & Peña, 1993; Peña & Guttman, 1993; Carlin & Polson, 1991; Peng & Dey, 1995). For instance, several case influence diagnostics have been developed to quantify the possible outlyingness of a set of observations based on mean-shift or variance-shift models (Guttman & Peña, 1993; Peña & Guttman, 1993).

The key idea of the global robustness approach is to compute a range of posterior quantities as the perturbation to each of the three key elements changes in a certain set of distributions, and then determine the extremal ones. There are drawbacks with this approach, including the scale chosen for the posterior quantities, the size of the perturbation and its limitation to linear functionals as well as simplicity of models. To address the scale issue, several scaled versions of the range have been proposed for the prior perturbation class (Ruggeri & Sivaganesan, 2000).

The local robustness approach primarily computes the derivatives of posterior quantities with respect to a minor perturbation to p(θ) and p(z | θ). In the frequentist literature, Cook’s (1986) influence approach is particularly useful for perturbing p(z | θ) in order to detect influential observations and assess model misspecification in parametric and semiparametric models (Zhu & Lee, 2001; Zhu et al., 2007). McCulloch (1989) further extends the local influence approach of Cook (1986) to assess the effects of perturbing the prior in a Bayesian analysis. In the Bayesian literature, several analogues of local influence have been developed using either the curvature of influence measures (Lavine, 1992; Dey & Birmiwal, 1994; Millar & Stewart, 2007; Van der Linde, 2007) or the Fréchet derivative of the posterior with respect to the prior (Berger, 1994; Gustafson & Wasserman, 1995; Dey et al., 1996; Gustafson, 1996; Berger et al., 2000). Very little has been done on developing general Bayesian influence analysis methods for simultaneously perturbing z, p(θ) and p(z | θ), assessing their effects and examining their applications in statistical models (Berger et al., 2000). To our knowledge, Clarke & Gustafson (1998) is one of the few papers on simultaneously perturbing {z, p(θ), p(z | θ)} in the context of independent and identically distributed data.

A key motivation for the proposed methodology is to unify influence concepts for many complex Bayesian models, for which very few or no methods exist, so that the effects of different perturbations can be identified. These models include many Bayesian parametric and semiparametric models, perhaps with missing data; see the Supplementary Material. Our development includes formal assessment of outliers and influential points as well as sensitivity analyses regarding the three major components of the Bayesian model: the prior, sampling distribution, and the data. For instance, sensitivity to the data can be evaluated by perturbing all the data points by random noise, redoing the analysis, and getting a spectrum of different inferences defined by noise (Wang et al., 2009; Clarke, 2010).

2. The Bayesian perturbation model and manifold

2.1. The Bayesian perturbation model

We develop a Bayesian model to characterize various perturbation schemes to z, p(z | θ) and p(θ). We introduce perturbations into the model p(z, θ) = p(z | θ) p(θ) through a vector ω = ω(z, θ), which varies in a set Ω. That is, ω is a mapping from the product space of the sample space 𝒵 and the parameter space Θ to Ω. Generally, ω includes many perturbation schemes including the additive -contamination class to the prior as detailed below. Moreover, ω must be chosen carefully so that the perturbation is meaningful and sensible.

Let p(z, θ | ω) be the probability density of (z, θ) for the perturbed model. We assume that the probability measures of p(z, θ | ω) for all ω ∈ Ω have a common dominating measure and that there is an ω0 ∈ Ω such that p(z, θ | ω0) = p(z, θ) for all (z, θ). We refer to p(z, θ | ω0) = p(z, θ) as the baseline joint distribution, where ω0 can be regarded as the central point of Ω representing no perturbation. We define the Bayesian perturbation model as a family of probability densities p(z, θ | ω) as ω varies in Ω. The Bayesian perturbation model includes individual perturbation schemes to z, p(θ) and p(z | θ), and their combinations. We focus on each individual scheme as follows.

Example 1. The Bayesian perturbation model for the prior includes many existing schemes, such as the additive -contamination class and the linear and nonlinear perturbation classes. For instance, the additive -contamination scheme is given by p(θ | ω) = p(θ) + λ{g(θ) − p(θ)}, where λ ∈ [0, 1] and g(θ) belongs to a class of contaminating distributions, denoted by 𝒢 (Berger, 1994; Dey & Birmiwal, 1994). In this case, Ω = {ω = λ{g(θ) − p(θ)} : (λ, g(·)) ∈ [0, 1] × 𝒢} and ω (z, θ ) are independent of the data. Thus, ω0 = 0 and p(z, θ | ω) = p(z | θ) p(θ | ω).

Example 2. The Bayesian perturbation model for the data includes many perturbation schemes to individual data observations of z (Cook, 1986; Guttman & Peña, 1993; Peña & Guttman, 1993; Zhu et al., 2007). The perturbation scheme to data points is proposed for identifying outliers and influential observations. As an illustration, we consider the standard linear regression model yi=xiTβ+i, where xi is a p × 1 covariate vector, β is a p × 1 vector of regression coefficients and the i are independently and identically distributed N(0, σ2) random variables. Let cl be an l × 1 vector with all elements equal to c for a fixed scalar c and an integer l, written as 1n, 1p and 0m. A perturbation scheme to perturb the covariate xi is given by xi(ωi) = xi + ωi1p. In this case, zi=(yi,xiT)T, θ = (βT, σ2)T, ω = (ω1, …, ωn)T, ω0 = 0n and Ω is a subset of Rn. An alternative perturbation scheme to the linear regression model is the well-known mean shift model (Guttman & Peña, 1993; Peña & Guttman, 1993). It is assumed that yi=xiTβ+ωi+i for i in a set of k distinct integers chosen from the set {1, . . . , n}, denoted by I = {i1, . . . , ik}, and yi=xiTβ+i for all other i s. In this case, the perturbation scheme is ω = (ωi1, . . . , ωik)T and ω0 = 0k. Another important scheme is a geometric mixture model for case deletion or case weights (Millar & Stewart, 2007; Van der Linde, 2007). Specifically, let q(zi) be an arbitrary density of zi independent of θ, then the geometric mixture model for perturbing the ith observation is given by p(z | θ, ω) = {Πj ≠ i p(zj | θ)} p(zi | θ)λ q(zi)1−λ/{∫ p(zi | θ)λ q(zi)1−λ dzi}, where ω = λ varies in [0, 1] and p(zi | θ) is the density of zi under the linear model assumption. In this case, ω0 = 1 represents no perturbation. When λ = 0, p(zi | θ) disappears in p(z | θ, 0), which is equivalent to deleting zi.

Example 3. The Bayesian perturbation model for the sampling distribution includes many perturbation schemes to p(z | θ) such as the additive -contamination class. We may also consider a class of perturbed sampling distributions p(z | θ, ω) defined by

p(z|θ,ω)=p(z|θ)exp{j=1mωjuj(z;θ)0.5j=1mωj2uj(z;θ)2C(θ,ω)}, (1)

where C(θ, ω) is the normalizing constant, ω = (ω1, . . . , ωm)T is an m × 1 vector and uj (z; θ) is a fixed scalar function having zero mean under p(z | θ). In this case, ω0 = 0m represents no perturbation. The number m in the perturbation (1) can either be as small as 1 or can increase with n (Copas & Eguchi, 2005; Zhu et al., 2007).

2.2. The Bayesian perturbation manifold

We develop a new geometric framework, called a Bayesian perturbation manifold, to measure each perturbation ω in the Bayesian perturbation model. Based on this manifold, we are able to measure the amount of perturbation, the extent to which each component of a perturbation model contributes to p(z, θ) and the degree of orthogonality for the components of the perturbation model. Such a quantification is useful for rigorously assessing the relative influence of each component in the Bayesian analysis, and can reveal any discrepancies among the data, the prior or the sampling model.

For an infinite dimensional set Ω, we assume throughout the paper that forms a Riemannian Hilbert manifold (Friedrich, 1991; Lang, 1995) under some regularity conditions. For a given p(z, θ | ω) ∈ , we consider a smooth curve C(t) = p{z, θ | ω(t)} through the space of perturbation models with open interval domains containing 0 and p{z, θ | ω (0)} = p(z, θ | ω). Note that ω may be different from ω0. We require C(t) to be smooth enough such that ℓ̇{z, θ | ω (t)} = d log p{z, θ | ω(t)}/dt, called the tangent or derivative vector, exists with ∫ ℓ̇{z, θ | ω (t)}2 p{z, θ | ω(t)}dzdθ < ∞ for all t in the open interval domain. Since p{z, θ | ω (t)} is the joint density of (z, θ) given ω(t), that is ∫ p{z, θ | ω(t)}dzdθ = 1, the tangent space of at ω, denoted by Tω, is formed by the tangent vectors ℓ̇{z, θ | ω(0)} for all possible smooth curves C(t) such that ∫ ℓ̇{z, θ | ω(0)} p{z, θ | ω(0)}dzdθ = 0. We can introduce the inner product of any two tangent vectors ν1(ω) and ν2(ω) in Tω as

<v1,v2>(ω)={v1(ω)v2(ω)}p(z,θ|ω)dzdθ. (2)

When ω varies in a Euclidean space and is independent of z and θ, the inner product < ν1, ν2 > (ω) in (2) is closely associated with the Fisher information. See Example 6 for details. Thus, the squared length ||ν(ω)||2 of a tangent vector ν(ω) ∈ Tω is < ν, ν > (ω) = ∫ ν(ω)2 p(z, θ | ω)dzdθ. The length of the curve C(t) from t1 to t2 is

SC{ω(t1),ω(t2)}=t1t2[<˙{z,θ|ω(t)},˙{z,θ|ω(t)}>{ω(t)}]1/2dt. (3)

Next, we need to introduce the concept of a geodesic, which is a direct extension of the straight line in Euclidean space, on . Consider a real function f (ω) defined on and a smooth curve p{z, θ | ω(t)} in with p{z, θ | ω(0)} = p(z, θ | ω) and ℓ̇{z, θ | ω(0)} = ν(ω). We define df[ν](ω) = limt→0 t−1(f[p{z, θ | ω(t)}] − f[p{z, θ | ω(0)}]) as the directional derivative of f at the perturbation distribution p(z, θ | ω) in the direction of ν(ω) ∈ Tω. We consider two smooth vector fields u(ω) and v(ω), which are not only the tangent vectors in Tω, but also smooth functions of ω in Ω. We define the directional derivative of a vector field u(ω) in the direction of v(ω), called the connection, which is given by du[v](ω) = limt→0 t−1[u{ω(t)} − u{ω(0)}]. Intuitively, if ω varies in a Euclidean space, then du[v](ω) is closely associated with the second derivative of (z, θ | ω) with respect to ω. We consider the Levi–Civita connection, which has several nice geometric properties (Amari, 1990; Lang, 1995) and is given by

vu(ω)=du[v](ω)0.5{u(ω)v(ω)p(z,θ|ω)u(ω)v(ω)p(z,θ|ω)dzdθ}.

A geodesic with respect to the Levi–Civita connection on is a smooth curve γ(t) = p{z, θ | ω(t)} on with open interval domain (a, b) and ℓ̇{z, θ | ω(t)} = v{ω(t)} such that the Levi–Civita connection ▿vv{ω(t)} = 0. Intuitively speaking, as one moves tangent vectors of a geodesic along the same geodesic, one can keep them pointing in the same direction. Moreover, geodesics can be interpreted as the shortest local path between points on . For a fixed perturbation distribution p(z, θ | ω) and a given direction of v(ω) ∈ Tω, there is a unique geodesic γ(t) = p{z, θ | ω(t)} with open interval domains covering 0 such that γ(0) = p(z, θ | ω) and γ̇(0) = v(ω). Finally, based on these geometric quantities of , we introduce the definition of a Bayesian perturbation manifold.

Definition 1. A Bayesian perturbation manifold (, < u, v >, ▿v u) is the manifold ℳ with an inner product < u, v > and the Levi–Civita connectionv u.

When Ω is an open set of Rm, under some regularity conditions, the Bayesian perturbation manifold is an m-dimensional manifold (Amari, 1990, p. 16; Kass & Vos, 1997; Zhu et al., 2007). Now, we examine some examples of Bayesian perturbation manifolds based on several perturbations to the data, the prior, and the sampling distribution.

Example 1, continued. We consider the Bayesian perturbation model for the -contamination class to the prior given by = {{(1 − λ) p(θ) + λg(θ)} p(z | θ) : λ ∈ [0, 1], g(·) ∈ 𝒢}. In this case, ω(t) = t{g(θ) − p(θ)} for a given g(·) ∈ 𝒢, and therefore we consider the smooth curve Cg(t) = p{z, θ | ω(t)} = [p(θ) + t{g(θ) − p(θ)}] p(z | θ). It can be shown that vg{ω(t)} = ℓ̇{z, θ | ω(t)} = {g(θ) − p(θ)}/[p(θ) + t{g(θ) − p(θ)}]. For any two densities g1(·) and g2(·) in 𝒢, we can calculate the tangent vectors vgi {ω(0)} = {gi(θ) − p(θ)}{p(θ)}−1 for i = 1, 2 and their inner product as

<vg1,vg2>(ω0)=[g1(θ){p(θ)}11][g2(θ){p(θ)}11]p(θ)dθ,

which is also independent of p(z | θ). In particular, < vg, vg > (ω0) = ∫{g(θ)/p(θ) − 1}2 p(θ) reduces to the L2 norm considered in Gustafson (1996).

We further consider a Bayesian perturbation model for the sole perturbation scheme to hyper-parameters of the prior given by = {p(z, θ | ω) = p(θ | ω) p(z | θ) : ω = (ω1, . . . ,ωm)T}, in which ω is independent of both z and θ. Let ω(t) = (ω1, . . . , ωj−1, ωj + t, ωj+1, . . . , ωm)T, (θ | ω) = log p(θ | ω) and ωk(t) be the kth component of ω(t). Since (z, θ | ω) = log p(θ | ω) + log p(z | θ), we have

˙{z,θ|ω(0)}=d{z,θ|ω(t)}/dt|t=0=k=1m[ω˙k(t)ωk{θ|ω(t)}]|t=0=ωj(θ|ω),

where ω̇k(t) = k(t)/dt and ωj = ∂ / ∂ωj. Therefore, Tω is spanned by the m functions ωj(θ | ω) pointwise in ω. Since ∫ p(z | θ)dz = 1, the inner product between ωj(θ | ω) and ωk(θ | ω), denoted by Gjk(ω), is given by

Gjk(ω)=ωj(θ|ω)ωk(θ|ω)p(θ|ω)p(z|θ)dzdθ=ωj(θ|ω)ωk(θ|ω)p(θ|ω)dθ, (4)

which is independent of p(z | θ).

Furthermore, suppose that p(θ) = p(θ1) p(θ2 | θ[1]) . . . p(θm | θ[m−1]) has a hierarchical structure, where θ[j] = (θ1, . . . ,θj) and p(θj | θ[j−1]) denote the density of the conditional distribution of θj given θ[j−1]. Then, we perturb each level of p(θ) such that p(θ | ω) = p(θ1 | ω1) p(θ2 | θ[1], ω2) . . . p(θm | θ[m−1], ωm), ∫ p(θ1 | ω1)1 = 1 and ∫ p(θj | θ[j−1], ωj)j = 1 for j = 2, . . . , m. In this case, Tω is spanned by the m functions ω1 log p(θ1 | ω1) and ωj log p(θj | θ[j−1],ωj) for j = 2, . . . , m. Moreover, Gjk(ω) = 0 for all jk. For instance, it can be shown that G12(ω) = ∫ ω1 log p(θ1 | ω1) ω2 log p(θ2 | θ[1],ω2) p(θ | ω) = ω1 ω2p(θ1 | ω1) p(θ2 | θ1, ω2)21 = ω1 ω21 = 0. Thus, different components of ω are orthogonal to each other (Zhu et al., 2007). Furthermore, it follows from (4) that G11(ω) = ∫ {ω1 log p(θ1 | ω1)}2 p(θ1 | ω)1 and Gjj(ω) = ∫ {ωj log p(θj | θ[j−1], ωj)}2 p(θj | θ[j−1] ,ω)j for j ⩾ 2.

Combining the above results, we are led to the following proposition, whose proof can be found in the Supplementary Material.

Proposition 1. Consider any Bayesian perturbation model to the prior given by ℳ= {p(θ | ω) p(z | θ) : ω ∈ Ω}. If ω is independent of z, then the metric tensor of its Bayesian perturbation manifold ℳ is independent of the specification of the sampling distribution p(z | θ).

Proposition 1 has important implications. The independence property ensures that existing results on local robustness to the prior can be considered as a special case of the new method developed here (McCulloch, 1989; Gustafson, 1996).

Example 4. Consider a Bayesian perturbation model given by

𝒨={p(z,θ|ω)=p(θ|ωp)p(z|θ,ωs):ω=(ωpT,ωsT)T,p(θ|ωp)dθ=p(z;θ,ωs)dz=1},

in which ωp = (ω1, . . . ,ωm)T and ωs = (ωm+1, . . . ,ωm+n)T are assumed to be independent of both z and θ. We consider ω(t) = (ω1, . . . , ωj−1, ωj + t, ωj+1, . . . ,ωm+n)T with ω(0) = ω for each j ∈ {1, . . . , m + n}. Thus, ω̇k(0) = k(0)/dt = 1 for k = j and 0 otherwise. Letting (θ | ωp) = log p(θ | ωp) and (z | θ, ωs) = log p(z | θ, ωs), we have

˙{z,θ|ω(0)}=k=1m+nω˙k(0)ωklogp(z,θ|ω)=ωj(θ|ωp)+ωj(z|θ,ωs). (5)

Since ωs and ωp have no components in common, Tω is spanned by m + n functions including ωj(θ | ωp) for j = 1, . . . , m and ωj(z | θ, ωs) for j = m + 1, . . . , m + n. Note that ∫ ωk(θ | ωp) ωj(z | θ, ωs) p(z, θ | ω)dzdθ = ∫ ωkp(θ | ωp) ωjp(z | θ, ωs) dzdθ = ωk1ωj1= 0 holds for any j, k. Therefore, it follows from (5) that the inner product of ωj(z, θ | ω) and ωk(z, θ | ω), denoted by Gjk(ω), is

ωj(θ|ωp)ωk(θ|ωp)p(z,θ|ω}dzdθ+ωj(z|θ,ωs)ωk(z|θ,ωs)p(z,θ|ω}dzdθ. (6)

Moreover, the first term of (6) can be simplified to ∫ ωj(θ | ωp) ωk(θ | ωp)p(θ | ωp) since ∫ p(z | θ, ωs)dz = 1. For j = 1, . . . , m and k = m + 1, . . . , m + n, it follows from (6) that <ωj(z, θ | ω) ωk(z, θ | ω) >= 0 since ωk(θ | ωp) = 0 and ωj(z | θ, ωs) = 0. Thus, ωs and ωp are orthogonal to each other with respect to < ∂ωj(z, θ | ω), ωk(z, θ | ω) >.

Combining the above results, we obtain the following proposition.

Proposition 2. Consider ℳ= {p(z, θ | ω) = p(θ | ωp)p(z | θ, ωs): ω=(ωpT,ωsT)T}. Assume that ωp is independent of z and ∫ p(θ | ωp) = ∫ p(z | θ, ωs)dz = 1. Consider two smooth curves p{z, θ | ω(k)(t)} with ω(k)(t) = {ω(k),p(t), ω(k),s (t)}T such that ω(1) (0) = ω(2) (0) = ω and ω(1),p(t) and ω(2),s(t) are independent of t. For any two tangent vectors vk(ω) = ℓ̇{z, θ | ω(k)(0)} ∈ Tωℳ for k = 1, 2, we have < v1, v2 > (ω) = 0.

Proposition 2 has important implications. For simultaneous perturbations to the prior and the sampling distribution, it ensures that ωp and ωs are geometrically orthogonal to each other. Thus, we can separate out the influence of the prior from that of the data and the sampling distribution.

Finally, we consider a simultaneous perturbation model, denoted by p(z, θ | ωp, ωd, ωs), in which ωp, ωd and ωs represent individual perturbations to the prior, the data and the sampling distribution, respectively. In addition to Propositions 1 and 2, we can obtain the following theorem.

Theorem 1. Let ℳ= {p(z, θ | ω) = p(θ | ωp) p(z | θ, ωd, ωs) : ω = (ωp, ωd, ωs)} withp(θ | ωp) = ∫ p(z | θ, ωd, ωs)dz = 1 and that ωp is independent of z. Consider two smooth curves p{z, θ | ω(k)(t)} with ω(k)(t) = {ω(k),p(t), ω(k),d (t),ω(k),s (t)}T passing through ω(1)(0) = ω(2)(0) = ω and having two tangent vectors vk(ω) = ℓ̇{z, θ | ω(k)(0)} ∈ Tωℳ, k = 1, 2. Then:

  1. if ω(1),p(t) and {ω(2),d(t), ω(2),s(t)} are independent of t, then < v1, v2 > (ω) = 0;

  2. if {ω(1),p(t), ω(1),d(t)} and {ω(2),p(t), ω(2),s(t)} are independent of t and p(z | θ, ωd, ωs) = p1(z | θ, ωd) p2(z | θ, ωs) for any (ωd, ωs), then < v1, v2 > (ω) = 0.

For simultaneous perturbations to the prior, the data, and the sampling distribution, Theorem 1 (i) ensures that ωp and (ωd, ωs) are geometrically orthogonal to each other. If p(z | θ, ωd, ωs) = p1(z | θ, ωd) p2(z | θ, ωs), then ωp, ωd, and ωs are geometrically orthogonal to each other.

3. Influence measures and their properties

3.1. Intrinsic influence measures

We consider some objective functions, such as the ϕ-divergence function, the posterior mean, and the Bayes factor, and develop associated intrinsic influence measures for quantifying the effects of perturbing the three key elements of a Bayesian analysis. An objective function of interest for sensitivity analysis is often chosen to be a functional of the perturbed posterior distribution of θ given z, given by p(θ | z, ω) = p(z, θ | ω)/ ∫ p(z, θ | ω) and p(θ | z, ω0), which is the unperturbed posterior distribution of θ given z. Such an objective function, denoted by f (ω, ω0) = f {p(θ | z, ω), p(θ | z, ω0)}, can be also regarded as a mapping from × to R. Throughout the paper, we assume that f (ω, ω0) is a smooth function of ω and is a path-independent function of p(θ | z, ω) and p(θ | z, ω0) such that f (ω, ω) = 0 for any ω ∈ Ω. For instance, f (ω, ω0) can be set as the total variation distance of p(θ | z, ω0) and p(θ | z, ω) (Dey et al., 1996). Most standard influence measures such as the range (Berger, 1990, 1994) can be regarded as special cases of f (ω, ω0).

A large value of these influence measures can be caused by both the perturbation ω to the baseline distribution regardless of the observed data and the discrepancies between the observed data and the fitted model p(z, θ ). Since the purpose of any influence analysis is to detect the discrepancies between the observed data and p(z, θ), we suggest rescaling f (ω, ω0) by using the shortest distance between p(z, θ | ω) and p(z, θ | ω0). We explicitly quantify the distance between p(z, θ | ω) and p(z, θ | ω0) by using their minimal geodesic distance, denoted by d(ω, ω0). If is a complete and finite-dimensional Riemannian manifold, then the Hopf–Rinow theorem states that any two points on can be joined by a minimal geodesic (Ekeland, 1978). Furthermore, if is a complete infinite-dimensional Riemannian manifold, any two points on can be joined by a path which is almost a minimal geodesic (Ekeland, 1978). We introduce an intrinsic influence measure for comparing ω and ω0 ∈ Ω as follows. Geometrically, an intrinsic measure is invariant to certain reparameterizations.

Definition 2. The intrinsic influence measure for comparing p(θ | z, ω) to p(θ | z, ω0) is defined as IGIf (ω, ω0) = f(ω, ω0)2/d(ω, ω0)2.

The proposed IGIf (ω, ω0) can be interpreted as the ratio of the change of the objective function relative to the minimal distance between p(z, θ | ω) and p(z, θ | ω0) on . Since f(ω, ω0) is path-independent and d(ω, ω0) is invariant to smooth reparametrization of ω, IGIf (ω, ω0) is also invariant. Moreover, we suggest identifying the most influential ω in Ω, denoted by ω̂I, which maximizes IGIf (ω, ω0) for all ω ∈ Ω.

Example 5. We consider the logarithm BF(ω, ω0) = log ∫ p(z, θ | ω) − log ∫ p(z, θ | ω0) of the Bayes factor for comparing p(z | θ, ω) and p(z | θ, ω0), which can be regarded as a statistic for testing hypotheses of ω against ω0 (Kass & Raftery, 1995). Under mild conditions, BF(ω, ω0) is a smooth mapping from to R. We can set f (ω, ω0) = BF(ω, ω0) and obtain the intrinsic influence measure

IGIBF(ω,ω0)=BF(ω,ω0)2d(ω,ω0)2.

3.2. First-order local influence measures

We consider the local behaviour of f {ω(t), ω0} as t approaches 0 along all possible smooth curves p{z, θ | ω(t)} passing through ω0, that is ω(0) = ω0. Since f {ω(t), ω0} is a function from R to R, it follows by Taylor’s series expansion that f {ω(t), ω0} = f {ω(0), ω0} + {ω(0)}t + 0.5 {ω(0)}t2 + o(t2), where {ω(0)} and {ω(0)} denote the first- and second order derivatives of f{ω(t), ω0} with respect to t evaluated at t = 0. We need to distinguish between {ω(0)} ≠ 0 for some smooth curves ω(t) and {ω(0)} = 0 for all smooth curves ω(t). We first consider the case {ω(0)} ≠ 0 for some smooth curves ω(t). Let ℓ̇{z, θ | ω(0)} = vTω(0). Then, {ω(0)} = df [v]{ω(0)} is the directional derivative of f in the direction of vTω(0) (Lang, 1995). We are led to the following definition.

Definition 3. The first-order local influence measure is defined as FIf[v]{ω(0)} = limt→0 IGIf {ω(0), ω(t)} = [df [v]{ω(0)}]2/[< v, v > {ω(0)}].

To carry out a sensitivity analysis, we use the tangent vector vF,max in Tω(0) , which maximizes FIf [v]{ω(0)} and is invariant to reparameterization of ω(t). We now have the following result.

Theorem 2. The quantity FIf [v]{ω(0)} is invariant to smooth reparameterization of ω(t).

In addition to the invariance property in Theorem 2, FIf [v]{ω(0)} is a direct generalization of the first-order measure for a finite-dimensional perturbation manifold (Zhu et al., 2007; Wu & Luo, 1993).

Example 5 (continued). We set f {ω(t), ω0} = BF{ω(t), ω0}. Since d[BF{ω(t), ω0}]/dt = ∫ ℓ̇{z, θ | ω(0)}[p{z, θ | ω(0)}/ ∫ p{z, θ | ω(0)}] = ∫ ℓ̇{z, θ | ω(0)} p{θ | z, ω(0)}, we have

FIf[v]{ω(0)}=[˙{z,θ|ω(0)}p{θ|z,ω(0)}dθ]2˙{z,θ|ω(0)}2p{z,θ|ω(0)}dzdθ.

It is relatively easy to compute FIf [v]{ω(0)} for a specific perturbation. For instance, for the contamination to the prior given by p{θ | ω(t)} = p(θ) + t{g(θ) − p(θ)}, it can be shown that

FIf[v]{ω(0)}=([g(θ){p(θ)}11]p{θ|z,ω(0)}dθ)2[g(θ){p(θ)}11]2p(θ)dθ=[pg(z){p(z)}11]2[g(θ){p(θ)}11]2p(θ)dθ,

where p(z) = ∫ p(z | θ) p(θ) and pg(z) = ∫ g(θ) p{z | θ, ω(0)}. Since the ratio of pg(z) to p(z) is the Bayes factor in favour of g(θ) versus p(θ), FIf[v]{ω(0)} is the square of the normalized Bayes factor of g(θ) versus p(θ).

Example 6. Consider the Bayesian perturbation manifold = {p(z, θ | ω) : ω ∈ Ω ⊂ Rm} and p{z, θ | ω(t)} as a smooth curve on , in which ω is not a function of z and θ, such as the perturbation scheme in the mean-shift model, and ω(t) = {ω1(t), . . . , ωm(t)}T is a smooth vector of t. Let vh = (vh,1, . . . , vh,m)T = (0)/dt. By using the chain rule, we have

v{ω(0)}=d{z,θ|ω(t)}/dt|t=0=k=1mω˙k(t)ωk{z,θ|ω(0)}=k=1mvh,kωk{z,θ|ω(0)},df[v]{ω(0)}=df{ω(t),ω0}/dt|t=0=k=1mvh,kωkf{ω(0)}=vhTωf{ω(0)},<v,v>{ω(0)}=j,k=1mvh,jvh,k<ωj{z,θ|ω(0)}ωk{z,θ|ω(0)}>{ω(0)}=vhTG{ω(0)}vh, (7)

where ωkf(ω) denotes the first-order partial derivative of f (ω, ω0) with respect to ωk and G{ω(0)} = ∫[ω{z, θ | ω(0)}]⊗2 p(z, θ | ω)dzdθ is an m × m Fisher information matrix with respect to ω. Thus, it follows from (7) and the definition of FIf [v]{ω(0)} that FIf[v]{ω(0)}=[df[v]{ω(0)}]2/[<v,v>{ω(0)}]=[vhTωf{ω(0)}]2/vhTG{ω(0)}vh. Finally, we obtain vF,max{ω(0)} = argmaxvFIf [v]{ω(0)} = [G{ω(0)}]−1/2 ωf{ω(0)}.

3.3. Second-order local influence measures

We use {ω(0)} to assess the second-order local influence of ω to a statistical model (Zhu et al., 2007). However, for a general smooth curve ω(t) on , {ω(0)} is not geometrically well behaved (Lang, 1995; Zhu et al., 2007). We consider only the geodesic p{z, θ | ω(t)}, denoted by Expω(0) (tv), passing through Expω(0)(tv) |t=0= ω(0) with initial direction ℓ̇{z, θ | ω(0)} = v{ω(0)} ∈ Tω(0). It follows from a Taylor’s series expansion (Lang, 1995; Zhu et al., 2007) that

f{Expω(0)(tv),ω0}=f{ω(0),ω0}+tdf[v]{ω(0)}+0.5t2f¨{Expω(0)(tv)}|t=0+o(t)2, (8)

where {Expω(0)(tv)} = d2f{Expω(0)(tv), ω0}/dt2. Geometrically, {Expω(0)(tv)} |t=0 in (8) is called the Riemannian Hessian and is denoted by Hess(f)(v, v){ω(0)} (Lang, 1995). The Riemannian Hessian is symmetric. We now introduce a second-order influence measure.

Definition 4. The second-order influence measure in the direction vTω(0)ℳ is defined as SIf [v]{ω(0)} = Hess(f)(v, v){ω(0)}/[<v, v> {ω(0)}].

Geometrically, SIf [v]{ω(0)} is invariant to scalar transformations and smooth transformations. To carry out a sensitivity analysis, we use the tangent vector vS,maxTω(0), which maximizes SIf [v]{ω(0)} for all vTω(0). There is a direct connection between the second-order measures in finite- and infinite-dimensional spaces. Therefore, the diagnostic method proposed here can be regarded as an extension of existing local influence approaches (Cook, 1986; Zhu et al., 2007) to an infinite dimensional setting.

Example 6, continued. We consider the Bayesian perturbation model in Example 6. If df[v]{ω(0)} = 0 for all vTω(0), then Hess(f)(v, v){ω(0)} reduces to vhTHf{ω(0)}vh, where Hf{ω(0)}=ω2f{ω(0)}, in which ω2f{ω(0)} denotes the second-order partial derivative of f (ω, ω0) with respect to ω (Zhu et al., 2007). In this case, SIf[v]{ω(0)}=vhTHf(ω,ω0)vh/vhTG{ω(0)}vh and vS,max equals the eigenvector of G(ω)−1/2 Hf {ω(0)}G(ω)−1/2 corresponding to its largest eigenvalue. Let ej be an m × 1 vector with jth element 1 and 0 otherwise. We also suggest an index plot of SIf [ej] to examine influential cases (Zhu et al., 2007, p. 2572).

3.4. Bayesian influence analysis

We now summarize the four key steps in carrying out our proposed influence analysis.

  • Step 1. Construct a Bayesian perturbation model p(z, θ | ω).

  • Step 2. Given the Bayesian perturbation model, we calculate the geometric quantities, such as <v, v> {ω(0)}, of the perturbation manifold.

  • Step 3. Choose an objective function f (ω, ω0) and calculate IGIf (ω, ω0) and ω̂I = argmaxω∈Ω IGIf (ω, ω0)

In Step 3, we need to compute f (ω, ω0) and d(ω, ω0). Since f (ω, ω0) is a function of p(θ | z, ω) and p(θ | z, ω0), we use Markov chain Monte Carlo methods to draw random samples from p(θ | z, ω) and p(θ | z, ω0) and then evaluate f (ω, ω0) (Chen et al., 2000). We use the Dijkstra algorithm (Dijkstra, 1959) to approximate the geodesic distance between p(z, θ | ω) and p(z, θ | ω0). The main idea of this method is to discretize the model {p(z, θ | ω) : ω ∈ Ω} into a simpler space {p(z, θ | ω) : ω ∈ ΩD}, where ΩD contains a set of the refined grid points of Ω and then we approximate d(ω, ω0) (Dijkstra, 1959). Based on the set of the refined grid points ΩD, we then calculate {IGIf (ω, ω0) : ω ∈ ΩD} and approximate ω̂I by using argmaxω ∈ ΩD IGIf (ω, ω0).

Step 4. If df [v]{ω(0)} ≠ 0, then we calculate vF,max to assess local influence of minor perturbations to the model. However, if df [v]{ω(0)} is 0 for all v, then we compute SIf [v]{ω(0)} and find vS,max = argmax[SIf [v]{ω(0)}].

In Step 4, we need to compute FIf [v]{ω(0)} and SIf [v]{ω(0)}. For many infinite-dimensional manifolds, such as the additive -contamination class, v varies in a set 𝒱, which may be well approximated by a finite number of grid points {vl: l = 1, . . . , K0}. We can approximate argmaxv[FIf [v]{ω(0)}] and argmaxv [SIf [v]{ω(0)}] by argmaxvl[FIf [vl]{ω(0)}] and argmaxvl[SIf [vl]{ω(0)}, respectively.

4. A theoretical example

We consider a dataset z = (z1, . . . , zn)T to illustrate the potential applications of our proposed diagnostics. Assume that z1, . . . , zn are independent and identically distributed from a N(θ, 1) distribution and the baseline prior distribution of θ is the density corresponding to a N(μ0,σ02) distribution. Letting z¯=i=1nzi/n, we have p(θ|z)exp[0.5(n+1/σ02){θ(nz¯+μ0/σ02)/(n+1/σ02)}2].

We first consider a simple perturbation to the location of the baseline prior, whose perturbed model is given by

p(z,θ|ω)=p(z|θ)p(θ|ω)=p(z|θ)exp{0.5(θωμ0)2/σ02}/(2πσ02)0.5

for ω ∈ [ωL, ωU], where ωL and ωU are known scalars. We set E(θ|z,ω)=θp(θ|z,ω)dθ={nz¯+(ω+μ0)/σ02}/(n+1/σ02) and f (ω, ω0) = E(θ | z, ω) − E(θ | z, ω0). Thus, following Berger (1990), we have that the range of f (ω, ω0) equals f(ωU,ω0)f(ωL,ω0)=(ωUωL)/(nσ02+1). A large range can be caused by a large ωUωL, which is associated with the size of the perturbation to the prior, as shown later.

We compute the intrinsic structure of p(z, θ | ω) and the intrinsic influence measure. We can calculate the geodesic distance between p(z, θ | ωL) and p(z, θ | ωU). Since ω(t) = t and ˙{z,θ|ω(t)}=(θμ0t)/σ02, we have <˙{z,θ|ω(t)}=˙{z,θ|ω(t)}>{ω(t)}=1/σ02 and d(ωL,ωU)=ωLωUdt/σ0=(ωUωL)/σ0, which is the size of the sole perturbation to the prior regardless of the data. Both small σ0 and large ωUωL can introduce large perturbations. When f (ω, ω0) = E(θ | z, ω) − E(θ | z, ω0), we have IGIf(ω,ω0)=σ02/(nσ02+1)2, which is independent of ω. This indicates that relative to the perturbation of the prior, f (ω, ω0) does not change too much. A large range gives a false indication of the extent of nonrobustness, which is actually caused by large perturbations to the prior (Sivaganesan, 2000).

Secondly, we consider a simultaneous perturbation to the prior and the model, given by

p(z,θ|ω)exp{0.5i=1n(ziωiθ)20.5(θμ0ωn+1)2/σ02}, (9)

where ω = (ω1, . . . ,ωn+1)TRn+1. In this case, ω0 = 0n+1 represents no perturbation. Let δij equal 1 for i = j and 0 otherwise. Following Example 6, we can show that for i, j = 1, . . . , n,

ωi(z,θ|ω)=(ziωiθ),ωn+1(z,θ|ω)=(θμ0ωn+1)/σ02,<ωi(z,θ|ω),ωj(z,θ|ω)>(ω)=δij,<ωi(z,θ|ω),ωn+1(z,θ|ω)>(ω)=0,<ωn+1(z,θ|ω),ωn+1(z,θ|ω)>(ω)=1/σ02. (10)

Thus, when σ0 ≠ 1, ωi for i = 1, . . . , n and ωn+1 introduce different levels of perturbation to the fitted model p(z, θ | ω). Furthermore, since < ∂ωi(z, θ | ω), ωj(z, θ | ω) > (ω) for all i, j are independent of ω, the manifold determined by (9) is a flat manifold (Lang, 1995). For any ω in Rn+1, the geodesic connecting p(z, θ | ω) and p(z, θ | ω0) is given by p(z, θ; ) for t ∈ [0, 1]. By using (3), we can show that d(ω,ω0)2=i=1nωi2+ωn+12/σ02, which quantifies the size of the perturbation scheme (9) to the prior and the fitted model.

We calculate the logarithm of the Bayes factor BF(ω, ω0) as discussed in Example 5. Since the terms in the exponential function of (9) form a quadratic function of θ, we can explicitly calculate BF(ω, ω0) = P(ω) − P(ω0), where P(ω) = log ∫ p(z, θ | ω) equals

C0.5[(ωn+1+μ0)2/σ02+i=1n(ziωi)2{(ωn+1+μ0)/σ02+i=1n(ziωi)}2/(n+1/σ02)],

and C is a scalar independent of ω. Now recall the results of Example 5. For a smooth curve ω(t) ∈ Rn+1 with ω(0) = ω0, FIf [v]{ω(0)} is determined by ωBF(ω, ω0) and vF,max(ω) = {G(ω0)}−1/2ωBF(ω, ω0), in which G(ω0)=diag(1,,1,σ02) as calculated in (10). Taking derivatives of BF(ω, ω0) with respect to ω, we get

ωn+1BF(ω,ω0)=(ωn+1+μ0)/σ02+{(ωn+1+μ0)/σ02+i=1n(ziωi)}/(nσ02+1),ωiBF(ω,ω0)=ziωi{(ωn+1+μ0)/σ02+i=1n(ziωi)}/(n+1/σ02)

for i = 1, . . . , n, which yields

vF,max(ω0)={z1nz¯+μ0/σ02n+1/σ02,,znnz¯+μ0/σ02n+1/σ02,n(z¯μ0)σ0nσ02+1}T. (11)

By inspecting the first n components of vF,max(ω0), we can identify outlying points zi which are far from the posterior mean of θ, while the last component of vF,max(ω0) can pick up an influential hyperparameter μ0.

Thirdly, we consider a simultaneous perturbation to the prior and the sampling distribution,

p(z,θ|ω)exp{0.5i=1nωi(ziθ)20.5ωn+1(θμ0)2/σ02+0.5i=1n+1log(ωi),}

where ω = (ω1, . . . ,ωn+1)TRn+1. In this case, ω0 = 1n+1 represents no perturbation. Following Example 6, we can show that for i, j = 1, . . . , n,

ωi(z,θ|ω)=0.5(ziθ)2+0.5ωi1,ωn+1(z,θ|ω)=0.5(θμ0)2/σ02+0.5ωn+11,<ωi(z,θ|ω)=ωj(z,θ|ω)>(ω)=0.5ωi2δij,<ωi(z,θ|ω),=ωn+1(z,θ|ω)>(ω)=0,<ωn+1(z,θ|ω),ωn+1(z,θ|ω)>(ω)=0.5ωn+12. (12)

Thus, G(ω0) is an (n + 1) × (n + 1) identity matrix.

We consider a sensitivity analysis for predictive distributions (Lavine, 1992; Millar & Stewart, 2007). Let zn+1 denote a future observation from N(θ, 1), the predictive density of zn+1 given z, denoted by p(zn+1 | z, ω), is shown to be N{(i=1nωizi+ωn+1μ0/σ02)/(i=1nωi+ωn+1/σ02),1/(i=1nωi+ωn+1/σ02)}. We set f(ω, ω0) = ∫ zn+1 p(zn+1 | z, ω)dzn+1 − ∫ zn+1 p(zn+1 | z, ω0)dzn+1. Now recall the results of Example 6 and the metric tensor in (12). For a smooth curve ω(t) ∈ Rn+1 with ω(0) = ω0, FIf[v]{ω(0)} is determined by ∂ωf(ω) and vF,max (ω) = ωf(ω, ω0), which are given by

ωn+1f(ω,ω0)=σ02μ0/(i=1nωi+ωn+1/σ02)σ02(ωn+1μ0/σ02+i=1nziωi)/(i=1nωi+ωn+1/σ02)2,ωif(ω,ω0)=zi/(i=1nωi+ωn+1/σ02)(ωn+1μ0/σ02+i=1nziωi)/(i=1nωi+ωn+1/σ02)2

for i = 1, . . . , n. This yields that vF,max,(ω0) is proportional to

1n+1/σ02(z1nz¯+μ0/σ02n+1/σ02,,znnz¯+μ0/σ02n+1/σ02+n(μ0z¯)σ02nσ02+1)T. (13)

We observe that vF,max,(ω0) in (13) is closely associated with vF,max(ω0) in (11), and thus vF,max (ω0) is able to pick up outlying points zi and an influential hyperparameter μ0.

Finally, we examine a more general setting in which zi (i = 1, . . . , 50) are independent N(θi, 1) variables, with the θi independently generated from a Dirichlet process prior DP(c0F1), where the base measure F1 is that of a N(5, 1) distribution and the confidence parameter c0 is set equal to 2 (Escobar, 1994). Furthermore, the zi were changed to zi + 5 for i = 49 and 50, which can be regarded as two outliers. We fit a model with ziN(θi, 1) and θiDP(2F0), where F0 is the probability measure of a N(0, 1) distribution. The base measure F0 is misspecified due to the difference between the means of a N(0, 1) and the true base measure N(5, 1). We consider a simultaneous perturbation to the prior and the data. We have

p(z,θ|ω)exp(0.5i=1n(ziωiθi)2+i=1nlog[c0F0(θi)+c0ωn+1{F1(θi)F0(θi)}+j=1i1δθj(θi)]). (14)

In this case, ω0 = 0n+1 represents no perturbation. By differentiating (z, θ | ω) = log p(z, θ | ω) in (14) with respect to each component of ω, we have that for i = 1, . . . , n,

ωi(z,θ;ω)=ziωiθi,ωn+1(z,θ|ω)=i=1nc0{F1(θi)F0(θi)}c0F0(θi)+c0ωn+1{F1(θi)F0(θi)}+j=1i1δθj(θi).

Since ∫ (ziωiθi) p(z, θ | ω)dz = 0 and ∫ (ziωiθi)(zjωjθj) p(z, θ | ω)dz = δij, we have

<ωi(z,θ|ω),ωj(z,θ|ω)>(ω)=δij,<ωi(z,θ|ω),ωn+1(z,θ|ω)>(ω)=0,<ωn+1(z,θ|ω),ωn+1(z,θ|ω)>(ω)=E[{ωn+1(z,θ|ω)}2].

Similar to (11), we set f (ω, ω0) = BF(ω, ω0) and substitute the results from (7) to calculate vF,max(ω0) using 50 000 Markov chain Monte Carlo samples generated from the posterior distribution p(θ1, . . . ,θn | z1, . . . , z50) after a 5000 sample burn-in. Inspecting the components of vF,max(ω0) reveals the outlying cases 49 and 50 and shows the sensitivity to the misspecified base measure F0 of the Dirichlet process prior for θi in Fig. 1.

Fig. 1.

Fig. 1

Simultaneous perturbation model using a Dirichlet process prior and perturbing individual observations: (a) local influence measures vB,max(ω0) for the logarithm of the Bayes factor f (ω, ω0) = BF(ω, ω0), from which the outlying cases 49 and 50 and the perturbation to the Dirichlet process prior were detected; (b) index plot of metric tensor gii (ω0) for the perturbation (15).

In addition to this theoretical example, an extensive simulation and a real data analysis involving missing data are given in the Supplementary Material. In practice, we suggest an iterative process to carry out the four-step influence analysis in § 3.4. If one is concerned about sensitivity to the prior, then one may introduce some finite-dimensional perturbation as in Example 1 to all hyperparameters of the prior and identify influential hyperparameters according to their local influence measures. Then, for a few influential hyperparameters, one further perturbs their associated prior distribution using the additive -contamination class and then carries out intrinsic influence analysis. If one is concerned about the sampling distribution, then one may introduce various perturbations including the additive -contamination class and the perturbation model (1) to p(z | θ) and use the local influence measures to detect which part of p(z | θ) is sensitive to minor perturbations. Then, one may focus on these influential parts and carry out an intrinsic influence analysis. After refining the prior and the sampling distribution, one may then perturb individual observations and detect a set of influential observations. After examining the information from each influence analysis, we carry out a simultaneous perturbation to z, p(θ) and p(z | θ). We start with a local influence analysis to examine the sensitivity of all components and then focus on a few influential components using an intrinsic influence analysis.

Acknowledgments

We thank the editor, an associate editor and two referees for many valuable suggestions which have greatly improved this paper.

Appendix

Proof of Proposition 1. Consider any two smooth curves p{z, θ | ω(k)(t)} = p{θ | ω(k)(t)} p(z | θ) with p{z, θ | ω(k)(0)} = p(θ | ω) p(z | θ) for k = 1, 2. For each k, by differentiating {z, θ | ω(k)(t)} with respect to t, we obtain a tangent vector vk (ω) = {z, θ | ω(k)(0)} = d log p{θ | ω(k)(t)}/dt |t=0Tω, which is independent of p(z | θ). Furthermore, letting dt = d/dt, the inner product of v1(ω) and v2(ω) is given by ∫ [dt log p{θ | ω(1)(t)}][dt log p{θ | ω(2)(t)}] p{z, θ | ω}dzdθ = ∫ [dt log p{θ | ω(1)(t)}][dt log p{θ | ω(2)(t)}] p{θ | ω}, which is also independent of p(z | θ).

Proof of Proposition 2. Consider two smooth curves p{z, θ | ω(k)(t)} with ω(k)(t) = {ω(k),p(t)T, {ω(k),s(t)T}T such that ω(1)(0) = ω2(0) = ω and (ω(1),p(t) and ω(1),s(t) are independent of t. Let (z | θ, ω(1),s) = log p(z | θ, ω(1),s). Since ω(1),p(t) is independent of t,

v1(ω)=˙{z,θ|ω(1)(0)}=ddtlogp{θ|ω(1),p(t)}|t=0+ddtlogp{z|θ,ω(1),s(t)}|t=0˙{z|θ,ω(1),s(0)}.

Let (θ | ω(2),p) = log p(θ | ω(2),p). Similarly, we have

v2(ω)=˙{z,θ|ω(2)(0)}=ddtlogp{θ|ω(2),p(t)}|t=0˙{θ|ω(2),p(0)}.

Thus, the inner product of v1(ω) and v2(ω), denoted by < v1, v2 > (ω), is given by

˙{θ|ω(2),p(0)}˙{z|θ,ω(1),s(0)}p(z,θ|ω)dzdθ=dp{θ|ω(2),p(0)}dtdp{z|θ,ω(1),s(0)}dtdzdθ=(dp{θ|ω(2),p(0)}dt[dp{z|θ,ω(1),s(0)}dtdz])dθ=(dp{θ|ω(2),p(0)}dtd[dp{z|θ,ω(1),s(0)}dz]dt)dθ=[dp{θ|ω(2),p(0)}dtd1dt]dθ=0.

Proof of Theorem 1. Since Theorem 1 (i) follows from Proposition 2, we focus on Theorem 1 (ii). Since {ω(1),p(t), ω(1),d(t)} and {ω(2),p(t), ω(2),s(t)} are independent of t and p(z | θ, ωd, ωs) = p1(z | θ, ωd) p2(z | θ, ωs), we have

v1(ω)=˙{z,θ|ω(1)(0)}=ddtlogp1{z|θ,ω(1),s(t)}|t=0,v2(ω)=˙{z,θ|ω(2)(0)}=ddtlogp2{z|θ,ω(2),d(t)}|t=0.

Thus, < v1, v2 > (ω) is given by

dlogp1{z|θ,ω(1),s(t)}dt|t=0dlogp2{z|θ,ω(2),d(t)}dt|t=0p(z,θ|ω)dzdθ=dp1{z|θ,ω(1),s(0)}dtdp2{z|θ,ω(2),d(0)}dtp(θ|ωp)dzdθ=d21dtdt=0.

Proof of Theorem 2. Consider a smooth curve p{z, θ | ω(t)}. Let R(s) : [c1, c2] → [−∊, ∊] be the first-order differential map such that R(c3) = 0 and (c3) = dR(s)/ds |s=c3 ≠ 0 for a c3 ∈ (c1, c2). Then, p[z, θ | ω{R(s)}] is a differential map from [c1, c2] to . It follows from the chain rule that [ω{R(s)}] = ds f[ω{R(s)}, ω0] = dr f {ω(r), ω0} (s) and ds [z, θ | ω {R(s)}] = dr {z, θ | ω(r)} (s), where (s) = ds R(s), dc = d/dc, dr = d/dr, and ds = d/ds. Thus, as ω(0) = ω0, we have

df[R˙(c3)v][ω{R(c3)}]=R˙(c3)df[v](ω),and<R˙(c3)v,R˙(c3)v>(ω)=R˙(c3)2<v,v>(ω).

Supplementary material

Supplementary Material available at Biometrika online includes the proof of Proposition 1, a real data analysis on missing data problems and an extensive simulation.

asr009supp.pdf (1MB, pdf)

References

  1. Amari S. Differential-Geometrical Methods in Statistics. 2nd edn. Vol. 28. Berlin: Springer; 1990. Lecture Notes in Statistics. [Google Scholar]
  2. Berger JO. Robust Bayesian analysis: sensitivity to the prior. J Statist Plan Infer. 1990;25:303–28. [Google Scholar]
  3. Berger JO. An overview of robust Bayesian analysis. TEST. 1994;3:5–58. [Google Scholar]
  4. Berger JO, Rios Insua D, Ruggeri F. In: Bayesian robustness Robust Bayesian Analysis. Lecture Notes in Statistics. Rios Insua D, Ruggeri F, editors. Vol. 152. New York: Springer; 2000. pp. 1–32. [Google Scholar]
  5. Carlin BP, Polson NG. An expected utility approach to influence diagnostics. J Am Statist Assoc. 1991;86:1013–21. [Google Scholar]
  6. Chen MH, Shao QM, Ibrahim JG. Monte Carlo Methods in Bayesian Computation. New York: Springer; 2000. [Google Scholar]
  7. Clarke B. Desiderata for a predictive theory of statistics. Bayesian Anal. 2010;5:283–318. [Google Scholar]
  8. Clarke B, Gustafson P. On the overall sensitivity of the posterior distribution to its inputs. J Statist Plan Infer. 1998;71:137–50. [Google Scholar]
  9. Cook RD. Assessment of local influence (with Discussion) J. R. Statist. Soc. B. 1986;48:133–69. [Google Scholar]
  10. Copas JB, Eguchi S. Local model uncertainty and incomplete-data bias (with discussion) J. R. Statist. Soc. B. 2005;67:459–513. [Google Scholar]
  11. Dey DK, Birmiwal LR. Robust Bayesian analysis using divergence measures. Statist Prob Lett. 1994;20:287–94. [Google Scholar]
  12. Dey DK, Ghosh SK, Lou KR. On local sensitivity measures in Bayesian analysis (with discussion) In: Berger JO, Betro B, Moreno E, Pericchi LR, Ruggeri F, Salinetti G, Wasserman L, editors. Bayesian Robustness. Vol. 29. Hayward, CA; 1996. pp. 21–40. IMS Lecture Notes-Monograph Series. [Google Scholar]
  13. Dijkstra EW. A note on two problems in connexion with graphs. Numer Math. 1959;1:269–71. [Google Scholar]
  14. Ekeland I. The Hopf–Rinow theorem in infinite dimension. J Diff Geom. 1978;13:287–301. [Google Scholar]
  15. Escobar MD. Estimating normal means with a Dirichlet process prior. J Am Statist Assoc. 1994;89:268–77. [Google Scholar]
  16. Friedrich T. Die Fisher-Information und symplektische strukturen. Math Nachr. 1991;153:273–96. [Google Scholar]
  17. Gustafson P. Local sensitivity of inferences to prior marginals. J Am Statist Assoc. 1996;91:774–81. [Google Scholar]
  18. Gustafson P. Local robustness in Bayesian analysis. In: Rios Insua D, Ruggeri F, editors. Robust Bayesian Analysis. New York: Springer; 2000. pp. 71–88. [Google Scholar]
  19. Gustafson P, Wasserman L. Local sensitivity diagnostics for Bayesian inference. Ann Statist. 1995;23:2153–67. [Google Scholar]
  20. Guttman I, Peña D. A Bayesian look at diagnostics in the univariate linear model. Statist. Sinica. 1993;3:367–90. [Google Scholar]
  21. Kass RE, Raftery AE. Bayes factors. J Am Statist Assoc. 1995;90:773–95. [Google Scholar]
  22. Kass RE, Tierney L, Kadane JB. Approximate methods for assessing influence and sensitivity in Bayesian analysis. Biometrika. 1989;76:663–74. [Google Scholar]
  23. Kass RE, Vos PW. Geometrical Foundations of Asymptotic Inference. New York: Wiley; 1997. [Google Scholar]
  24. Lang S. Differential and Riemannian Manifolds. 3rd ed. New York: Springer; 1995. [Google Scholar]
  25. Lavine M. Local predictive influence in Bayesian linear models with conjugate priors. Commun. Statist. B. 1992;2:269– 83. [Google Scholar]
  26. McCulloch RE. Local model influence. J Am Statist Assoc. 1989;84:473–78. [Google Scholar]
  27. Millar RB, Stewart WS. Assessment of locally influential observations in Bayesian models. Bayesian Anal. 2007;2:365–84. [Google Scholar]
  28. Oakley JE, O’Hagan A. Probabilistic sensitivity analysis of complex models: a Bayesian approach. J. R. Statist. Soc. B. 2004;66:751–69. [Google Scholar]
  29. Peña D, Guttman I. Comparing probabilistic methods for outlier detection in linear models. Biometrika. 1993;80:603–10. [Google Scholar]
  30. Peng F, Dey DK. Bayesian analysis of outlier problems using divergence measures. Can J Statist. 1995;23:199–213. [Google Scholar]
  31. Ruggeri F, Sivaganesan S. On a global sensitivity measure for Bayesian inference. Sankhya. 2000;62:110–27. [Google Scholar]
  32. Sivaganesan S. Global and local robustness approaches: uses and limitations. In: Rios Insua D, Ruggeri F, editors. Robust Bayesian Analysis. Vol. 152. New York: Springer; 2000. pp. 89–108. Lecture Notes in Statistics. [Google Scholar]
  33. Van der Linde A. Local influence on posterior distributions under multiplicative modes of perturbation. Bayesian Anal. 2007;2:319–32. [Google Scholar]
  34. Wang Q, Stefanski LA, Genton MG, Boos DD. Robust time series analysis via measurement error modeling. Statist Sinica. 2009;19:1263–80. [Google Scholar]
  35. Wu X, Luo Z. Second-order approach to local influence. J. R. Statist. Soc. B. 1993;55:929–36. [Google Scholar]
  36. Zhu HT, Ibrahim JG, Lee SY, Zhang HP. Perturbation selection and influence measures in local influence analysis. Ann Statist. 2007;35:2565–88. [Google Scholar]
  37. Zhu HT, Lee SY. Local influence for incomplete data models. J. R. Statist. Soc. B. 2001;63:111–26. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material available at Biometrika online includes the proof of Proposition 1, a real data analysis on missing data problems and an extensive simulation.

asr009supp.pdf (1MB, pdf)

Articles from Biometrika are provided here courtesy of Oxford University Press

RESOURCES