Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Mar 1.
Published in final edited form as: J Korean Stat Soc. 2020 Jan 1;49(1):244–263. doi: 10.1007/s42952-019-00013-z

Inflated Density Ratio and Its Variation and Generalization for Computing Marginal Likelihoods

Yu-Bo Wang 1, Ming-Hui Chen 2,*, Wei Shi 3, Paul Lewis 4, Lynn Kuo 5
PMCID: PMC7560979  NIHMSID: NIHMS1547654  PMID: 33071541

Abstract

In the Bayesian framework, the marginal likelihood plays an important role in variable selection and model comparison. The marginal likelihood is the marginal density of the data after integrating out the parameters over the parameter space. However, this quantity is often analytically intractable due to the complexity of the model. In this paper, we first examine the properties of the inflated density ratio (IDR) method, which is a Monte Carlo method for computing the marginal likelihood using a single MC or Markov chain Monte Carlo (MCMC) sample. We then develop a variation of the IDR estimator, called the dimension reduced inflated density ratio (Dr.IDR) estimator. We further propose a more general identity and then obtain a general dimension reduced (GDr) estimator. Simulation studies are conducted to examine empirical performance of the IDR estimator as well as the Dr.IDR and GDr estimators. We further demonstrate the usefulness of the GDr estimator for computing the normalizing constants in a case study on the inequality-constrained analysis of variance.

Keywords: CMDE, Conditional posterior density, Constrained parameter space, IWMDE, Marginal posterior density, 62F15, 62-M05

1. Introduction

In Bayesian analysis, different models can be compared by using their marginal likelihoods (also known as normalizing constants). Defined as the integral of the likelihood function times the prior distribution over the parameter space, the marginal likelihood measures the average fit of the model to the data, and is useful in the Bayesian hypothesis testing problem. The statistical theory of hypothesis testing is associated with methods for checking whether an improvement in fit is statistically significant. The Bayes factor (BF) is one of the methods to use in this scenario. BF is the ratio of the marginal likelihoods under the two competing models. Due to the complicated form of the product of the likelihood function and the prior, which is called the posterior kernel later, the marginal likelihood is often analytically intractable. Hence, computing the marginal likelihood or the normalizing constant is of great current interest and remains an active research area.

There is a vast literature on computing marginal likelihoods or normalizing constants, including but not limited to the importance sampling (IS) [1], the harmonic means (HM) [2], the posterior density based approaches of Chib [3] and Chib and Jeliazkov [4], the path sampling (PS) [5], the inflated density ratio (IDR) [6], the latent variable approach of Chen [7], the thermodynamic integration (TI) [8], the stepping stone method (SSM) [9, 10], and the partition weighted kernel (PWK) estimator [11]. These methods can be divided into two categories. The methods such as IS, HM, IDR, Chen [7], and PWK use only a single MCMC sample while the other methods such as Chib [3], TI, and SSM use multiple MCMC samples. These methods can be shown to asymptotically converge to the quantity of interest under some mild conditions. Varying in their use of information such as Markov chain Monte Carlo (MCMC) samples or the posterior kernel, they all have pros and cons in different applications. Among them, the IDR method is developed by introducing a perturbed density such that this density has an additionally calculable mass in the posterior mode and lighter tails than the posterior kernel, and can estimate the marginal likelihood precisely based on an MCMC sample from the posterior distribution. With such a control of the posterior tails, Petris and Tardella [12] also show that IDR has a finite variance in one-dimensional problems. However, the properties of the IDR estimator still have not been fully examined and its usefulness in statistical applications has not been fully explored.

In this paper, we first reveal an interesting property of the IDR estimator, that is, its superior efficiency in two-dimensional problems. This property is further confirmed in our simulation study. To broaden the applicability of IDR, we develop a variation of IDR, namely, the dimension reduced inflated density ratio (Dr.IDR) estimator. This new estimator not only enjoys the property of dimension reduction and but also has a nice connection to the marginal posterior density estimation. We further extend the idea of Dr.IDR to develop the generalized dimension reduced (GDr) estimator, which is constructed via a new identity based on the marginal posterior density. An interesting case study is carried out to demonstrate how to compute the normalizing constant for the inequality-constrained analysis of variance model. The new estimators developed in this paper have great potential for computing marginal likelihoods or normalizing constants for high-dimensional problems and complex models such as those with an inequality-constrained parameter space.

The rest of the article is organized as follows. Section 2 presents the formulation of the IDR estimator and examines its theoretical properties. In Section 3, we develop a variation of IDR, called the dimension reduced IDR (Dr.IDR) and discuss various properties of Dr.IDR. A new identity based on the marginal posterior density is introduced and a so-called generalized dimension reduced (GDr) estimator is then constructed in Section 4. Two simulation studies to examine empirical performance of the IDR, Dr.IDR, and GDr estimators are presented in Section 5. Section 6 is a case study, which demonstrates how to compute the normalizing constant for the inequality-constrained analysis of variance model. We conclude this paper with a brief discussion in Section 7.

2. The Inflated Density Ratio (IDR) Estimator

Suppose ζ is a p-dimensional vector of parameters with the unbounded support Ω. Given that L(ζ|D) is the likelihood function of parameters ζ given the data D and π(ζ) is a proper prior distribution in the sense that Ωπ(ζ)dζ=1, the marginal likelihood is given by

c=ΩL(ζ|D)π(ζ)dζ=Ωq(ζ)dζ, (1)

where q(ζ) = L(ζ|D)π(ζ) is the posterior kernel function. The corresponding posterior distribution is written as

π(ζ|D)=L(ζ|D)π(ζ)c=q(ζ)c. (2)

We note that when Ωπ(ζ)dζ1 or Ωπ(ζ)dζ= but c < ∞, c is called the normalizing constant. When the model structure is complex, the constant c is often analytically intractable. However, a Markov chain Monte Carlo (MCMC) sample can still be generated from the posterior distribution given in (2) without knowing c.

To estimate (1), Petris and Tardella [6, 12] proposed the inflated density ratio (IDR) method, which requires a single Monte Carlo (MC) or MCMC sample from (2). This method first constructs a perturbed density qr(ζ) based on the posterior kernel q(ζ). Let ζ0 be the “center” of the posterior distribution, which is typically the posterior mode of ζ. Define qr(.) as

qr(ζ)={q(ζ0)ifζζ0r,q(ζ0+h(ζζ0))ifζζ0>r, (3)

where r is a prespecified radius and h(ζζ0) = (1 − rp/‖ζζ0p)1/p(ζζ0). It can be shown that the integral of qr(.) over the whole support is the sum of the target c and a tractable volume k = q(ζ0)br, where br = volume of the ball {ζ : ‖ζζ0‖ ≤ r} = πp/2rp/Γ(p/2+1). Thus, we have the IDR identity given by

c=q(ζ0)brE[qr(ζ)q(ζ)|D]1, (4)

where E[qr(ζ)q(ζ)D] is the expectation of qr(ζ)q(ζ) with respect to the posterior distribution in (2). Based on the IDR identity in (4), the IDR estimator for c is given by

c^IDR=k1ni=1nqr(ζ(i))q(ζ(i))1, (5)

where {ζ(1), …, ζ(n)} is an MC/MCMC sample from the posterior distribution in (2). Figure 1 provides the visualization of qr(.) in a univariate normal example. The area under the curve qr(.) is k + c = 2rq(0) + 1 since ζ0 = 0 in this case.

Figure 1:

Figure 1:

An illustration of perturbed density in the IDR method.

Petris and Tardella [12] showed that under certain conditions, such as the log-Lipschitz condition, the variance of c^IDR is finite in one-dimensional problems. If {ζ(1), …, ζ(n)} is an MCMC sample from (2), under certain mild regularity conditions, such as time reversible, invariant, and irreducible conditions, it is easy to show that c^IDR is a consistent estimator of c since the denominator of (5) converges to

limn1nt=1nqr(ζ(i))q(ζ(i))1=Ωqr(ζ)q(ζ)q(ζ)cdζ1=Ωqr(ζ)cdζ1=c+kc1=kc,

as n → ∞. As pointed by Petris and Tardella [12], Arima and Tardella [13], and Wang et al. [11], the IDR method requires the full space of q(.) and a careful selection of the radius. Wang et al. [11] further pointed out that mode finding is essential and standardization of an MCMC sample with respect to the mode and the sample covariance matrix is required for IDR. Thus, the best way to use IDR is to obtain the perturbed density after the standardization as demonstrated in Section 5.2. If any parameter in the model has a bounded support, an additional transformation is needed to define the perturbed density qr(.). This extra effort may limit its applications especially for a problem with complicated constraints (see Section 6 for an example). During our empirical investigation, we found that the IDR has an interesting property in the bivariate standard normal case.

Consider the posterior kernel function as

q(ζ)=12πexp(μμ2), (6)

where ζ = μ = (μ1, μ2)′ is a vector of unknown parameters with center at ζ0 = (0, 0). Under such a setting, the perturbed density is given by

qr(ζ)={12πifμr,12πexp(μμr22)ifμ>r,

so that

qr(ζ)q(ζ)={exp(μμ2)ifμr,exp(r22)ifμ>r. (7)

We see from (7) that the ratio term within the radius is always bounded by exp(r22) while the ratio term outside the radius is always constant, which implies that the variance or any higher-order moments of the IDR estimator are finite. We formally state this result in the following theorem.

Theorem 2.1. Given that {ζ(1), …, ζ(n)} is a random sample from the posterior density, any vth moment (v > 0) of the IDR estimator is finite for the bivariate normal distribution given in (6).

Proof. Letting g(ζ(i))=qr(ζ(i))q(ζ(i)), we have

g(ζ(i))exp(r22)<.

Thus,

E|1ni=1ng(ζ(i))|vexp(υr22)<.

By the δ-method, the vth moment of the IDR estimator is finite. □

Following Theorem 2.1, we can see that the variance of IDR is finite. This result also implies that the IDR method is potentially efficient in two-dimensional problems if the posterior distribution approximates a bivariate normal distribution. To examine the empirical property of IDR, we carry out a simulation study in Section 5.1.

3. Dimension Reduced Inflated Density Ratio (Dr.IDR) Estimator

In this section, we develop a variation of IDR called the dimension reduced inflated density ratio estimator (Dr.IDR). Suppose ζ′ = (θ′, ξ′), where θ is a p1-dimensional vector of the parameters and ξ is a vector of the remaining pp1p2 parameters. We assume that the support of the marginal posterior density π(θ|D) = ∫ π(θ, ξ|D)dξ is Rp1 and θ0 is a ‘center’, such as the mode, of the marginal posterior distribution of θ. Then, we define the perturbed density as

qr(θ,ξ)={q(θ0,ξ)ifθθ0r,q(θ0+h(θθ0),ξ)ifθθ0>r, (8)

where h(θθ0)=(1rp1θθ0p1)1p1(θθ0) and r > 0 is a prespecified radius. Similar to the IDR identity given in (4), we obtain a new identity given by

c=brq(θ0,ξ)dξE[qr(θ,ξ)q(θ,ξ)|D]1, (9)

where br=πp12Γ(p12+1)rp1 and E[qr(θ,ξ)q(θ,ξ)D] is the expectation of qr(θ,ξ)q(θ,ξ) with respect to the posterior distribution given by (2).

Let {(θ(i), ξ(i)), i = 1, …, n} denote an MC/MCMC sample from the posterior distribution π(ζ|D). If ∫ q(θ0, ξ)dξ is analytically available, using (9), a Dr.IDR estimate of c is given by

c^Dr.IDR=brq(θ0,ξ)dξ1ni=1nqr(θ(i),ξ(i))q(θ(i),ξ(i))1. (10)

Under certain Ergodic conditions, we have limnc^Dr.IDR=a.s.c.

Remark 3.1: In (9) and (10), the perturbed density qr(θ, ξ) in (8) is constructed on θ and the dimension of θ is p1, which is smaller than p for the original vector of the parameters, ζ. When p1 = p, (9) and (10) reduce to (4) and (5). For these reasons, c^Dr.IDR is called the dimension reduced IDR (Dr.IDR) estimator.

Remark 3.2: The perturbed density in (8) can be extended as

qr(θ,ξ)={q(θ0(ξ),ξ)ifθθ0(ξ)r(ξ),q(θ0(ξ)+h(θθ0(ξ)),ξ)ifθθ0(ξ)>r(ξ), (11)

where θ0(ξ) and r(ξ) are two functions of ξ. However, the Dr.IDR estimator in (10) needs to be modified as

c^Dr.IDR=q(θ0(ξ),ξ)dξ1ni=1n1br(ξ)[qr(θ(i),ξ(i))q(θ(i),ξ(i))1], (12)

where br(ξ)=πp12Γ(p12+1)r(ξ)p1. In (12), both the center and the radius of each perturbed circle can be determined by the conditional distribution π(θ|ξ, D) instead of the marginal posterior distribution of π(θ|D), which may lead to a more efficient estimator.

Remark 3.3: In (10), we assume that ∫ q(θ0, ξ)dξ is analytically available. When c(θ0) = ∫ q(θ0, ξ)dξ is not available in the closed form, c(θ0) is the normalizing constant of the conditional posterior distribution of π(ξ|θ0, D). Since θ0 is a fixed point, we can generate a MC/MCMC sample {ξ(i), i = 1, …, n} from π(ξ|θ0, D). Then, we can use (5) or (10) to construct an estimator for c(θ0) and the resulting estimator of c may be termed as a Dr.IDR2 estimator. But, we notice that the version of c^Dr.IDR given in (12) is difficult to extend if ∫ q(θ0(ξ), ξ)dξ is analytically intractable.

Remark 3.4: From the Dr.IDR identity in (9), we obtain

[q(θ0,ξ)/c]dξ=E[qr(θ,ξ)q(θ,ξ)|D]1br. (13)

Since the term on the left hand side is precisely the marginal posterior density of θ evaluated at θ0, we obtain a new marginal posterior density estimator given by

π^Dr.IDR(θ0|D)=1br[1ni=1nqr(θ(i),ξ(i))q(θ(i),ξ(i))1]. (14)

Write

w(θ|ξ)=qr(θ,ξ)q(θ,ξ)brq(θ0,ξ). (15)

It is easy to show that ∫ w(θ|ξ)dθ = 1 and also (14) can be rewritten as

π^Dr.IDR(θ0|D)=1ni=1nw(θ(i)|ξ(i))q(θ0,ξ(i))q(θ(i),ξ(i)). (16)

Therefore, π^Dr.IDR(θ0D) is an importance weighted marginal density estimator (IWMDE) of [14]. However, the weight function w(θ|ξ) is not the conditional posterior density π(θ|ξ, D). Thus, the IDR marginal posterior density estimator π^Dr.IDR(θ0D) may not be most efficient as the “best” estimator is the conditional marginal density estimator (CMDE) with w(θ|ξ) = π(θ|ξ, D) as discussed in [14] and [15].

4. Generalization of the IDR Estimator

Observing that the marginal posterior density of θ is defined as

π(θ|D)=q(θ,ξ)cdξ,

we have the following identity

c=q(θ,ξ)dξπ(θ|D)=c(θ)π(θ|D). (17)

The above marginal posterior density based identity is a natural extension of the identity given in Chib (1995), in which the marginal likelihood is expressed as a ratio of the likelihood function times the prior and the posterior distribution of the model parameters.

We let c(θ) denote the term in the numerator of (17), i.e.,

c(θ)=q(θ,ξ)dξ. (18)

Then, c(θ) is precisely the normalizing constant of the conditional posterior density of ξ given θ. We note that the identity (17) holds for all θ. Similar to [3], we take θ = θ0 as a high marginal density point such as the posterior mean or the posterior mode of θ. Then, (17) can be rewritten as

c=c(θ0)π(θ0|D). (19)

In (19), the original p-dimensional estimation problem of c becomes the computation of c(θ0) and π(θ0|D) each with smaller dimension (p2 and p1). If c(θ0) is analytically tractable, then we only need to compute π(θ0|D). If c(θ0) is intractable, as discussed in Remark 3.3, we first generate an MC/MCMC sample ξ(i), i = 1, …, n} from the conditional posterior distribution π(ξ|θ0, D) and then we can obtain an IDR estimator or a Dr.IDR estimator of c(θ0). For instance, an IDR estimator of c(θ0) is given by

c^(θ0)=brq(θ0,ξ0)1ni=1nqr(θ0,ξ(i))q(θ0,ξ(i))1, (20)

where ξ0 is the posterior mode of ξ, br=πp22Γ(p22+1)rp2,

qr(θ0,ξ)={q(θ0,ξ0)ifξξ0r,q(θ0,ξ0+h(ξξ0))ifξξ0>r,

and h(ξξ0)=(1rp2ξξ0p2)1p2(ξξ0) and r > 0 is a prespecified radius, which may be different than the one in (3) or (8).

For the marginal posterior density π(θ0|D), which is the term in the denominator of (19), a Dr.IDR estimator π^Dr.IDR(θ0D) given in (16) is readily available using an MC/MCMC sample {(θ(i), ξ(i)), i = 1, …, n} from the full posterior distribution π(ζ|D). As discussed in Remark 3.4, π^Dr.IDR(θ0D) may not be most efficient. A more efficient estimator can be obtained using the CMDE of [16] or the partition weighted marginal density estimator (PWMDE) of [15]. We note that the PWMDE is a special case of the IWMDE of [14], which has the potential to achieve the “optimal” CMDE in estimating π(θ0|D). As empirically shown in [15], the PWMDE of π(θ0|D) is more efficient when θ0 is close to the “center” of the marginal posterior distribution π(θ|D). This is intuitively appealing since (i) the PWMDE uses the MC/MCMC sample {(θ(i), ξ(i)), i = 1, …, n} from the full posterior distribution π(ζ|D); (ii) fewer points of the θ(i)’s are around θ0 when θ0 is away from the “center”; and (iii) in this case, π(θ0|D) is more difficult to estimate and, consequently, the corresponding the PWMDE becomes less efficient. Therefore, the identity (19) works the best when θ0 is chosen as a high marginal density point.

The new identity (19) provides us a great flexibility in computing the marginal likelihood c. This identity also sheds light on obtaining an efficient Monte Carlo estimator of c for a posterior distribution with parameter space of high-dimension. An estimator based on (19) is termed as a generalized dimension reduced (GDr) estimator. In Section 5.2, we carry out a simulation study to examine the empirical performance of the IDR, Dr.IDR, and GDr estimators.

5. Simulation Studies

5.1. Simulation Study I

In this simulation study, we empirically examine whether the result established in Theorem 2.1 implies a potential gain in estimation efficiency of an IDR estimator for computing the normalizing constant of the bivariate normal distribution compared to the univariate normal distribution or the multivariate normal distribution with more than two dimensions. Assume that the posterior kernel is given by

q(ζ)=(12π)p/2exp(μμ2), (21)

where μ is a vector of p unknown parameters. We apply the IDR estimator to compute the normalizing constant of (21) with different p’s, and evaluate its performance by comparing the means, standard errors, and root mean square errors (RMSE) of estimates in log scale. Let c^t denote the IDR estimate for the tth replicate of data and t = 1, 2, …, 1000. We have the mean and the standard error (SE) equal to c~=t=11000logc^t1000 and t=11000(logc^tc~)2999, respectively. The RMSE is defined as

RMSE=11000t=11000[logc^tlogc]2,

where log c is 0 in this case. The results for p = 1, 2, 3, and 5 are summarized in Table 1. We see that when r ≤ 1, the IDR estimator for p = 2 has the superior performance compared to those for p ≠ 2. Interestingly, the IDR estimator for p = 3 has a smaller SE than the one for p = 1 for every value of r considered in this simulation. However, the IDR estimator for p = 5 has the worst performance among those for p = 1, 2, and 3. These simulation results empirically confirm that the result established in Theorem 2.1 does imply that > the IDR estimator has an “optimal” performance when p = 2 under the normal distributions.

Table 1:

Performance of the IDR method in p—dimensional normal distribution with 1000 replicates: mean, standard error (SE), and root mean squared error (RMSE) of the normalizing constant estimate in log scale

Mean SE RMSE Mean SE RMSE
r=1.5 p=1 −0.00019 0.03222 0.03221 r=0.5 p=1 −0.00021 0.01260 0.01260
p=2 0.00050 0.00740 0.00742 p=2 0.00003 0.00216 0.00216
p=3 −0.00005 0.00537 0.00537 p=3 −0.00009 0.00574 0.00574
p=5 0.00017 0.01163 0.01163 p=5 0.00014 0.03159 0.03157

r=1 p=1 −0.00031 0.01938 0.01937 r=0.01 p=1 −0.00012 0.00800 0.00800
p=2 −0.00023 0.00450 0.00451 p=2 0.00000 0.00004 0.00004
p=3 0.00008 0.00458 0.00457 p=3 −0.00016 0.00754 0.00754
p=5 0.00077 0.01798 0.01799 p=5 0.00420 0.06300 0.06311

5.2. Simulation Study II

In the second simulation study, we revisit the bivariate normal example in [11]. Assume that yj|μ, ~i.i.d.N(μ,) for j = 1, …, m where μ = (μ1, μ2)′, and =(σ12ρσ1σ2ρσ1σ2σ22) are the unknown means and the variance-covariance matrix. We specify the conjugate priors

IWν0(Λ01),

and

μ|N(μ0,/κ0),

respectively. Therefore, the joint posterior kernel is given by

q(μ,)=f(y1,y2,,ym|μ,)π(μ|)π()=(2π)m||(m+ν0+2)/211Zexp(12Σj=1m(yjμ)1(yjμ))×exp(κ02(μμ0)1(μμ0))exp(12trace(Λ01)),

with Z = 2π2ν0Γ2(ν0/2)|Λ0|ν0/2/κ0. Also, the marginal likelihood has the closed form

c=1πmd/2Γd(νm/2)Γd(ν0/2)|Λ0|ν0/2|Λm|νm/2(κ0κm)d/2,

where Λm=Λ0+S2+κ0mκ0+m(μ0y)(μ0y), S2=j=1m(yjy)(yjy), κm = κ0 + m, and νm = ν0 + m. A random sample y with m = 200 is generated from a bivariate normal distribution with μ = (0, 0)′ and =(10.70.71). The corresponding sample mean y is (−0.029, 0.040)′, and the sample variance-covariance matrix S is (201.987143.330143.330192.365). Under this setting and the pre-specified hyper-parameters μ0 = (0, 0)′, k0 = 0.01, ν0 = 3, and Λ0=(10.70.71), the marginal likelihood in log scale is −507.278.

To implement the IDR estimator, we consider two “center” values of (−0.0281, 0.0401, 0.0097, 1.8471, −0.0389)’ and (0.0437, 0.1101, 0.1093, 1.9891, 0.0615)’ for the transformed parameter vector of (μ1,μ2,logσ12,log{(ρ+1)(1ρ)},logσ22), where (−0.0281, 0.0401, 0.0097, 1.8471, −0.0389)′ is the posterior mean.

To implement the Dr.IDR estimator, we consider two blocks as θ = (μ1, μ2)′ and ξ = (σ1, σ2, ρ)′. First, since θ|ξ, D~N(μc=(my+ν0μ0)(m+ν0),c=(m+ν0)), we consider the standardization θ=c12(μμc). With such a standardization, we then have q~(0,ξ)dξ=q(μc,)c12d=c(2π) and q~r(θ,ξ)q~(θ,ξ)

={exp(12θθ)ifθr,exp(12[(1r2||θ||2)1/2θ][(1r2||θ||2)1/2θ])exp(12θθ)=exp(12r2)ifθ>r,

where q~(.) is the posterior kernel of (θ*, ξ) and q~r(.) is the perturbed density of q~(.). For the Dr.IDR estimator, we also consider θ=c12(μμc(0.0709,0.0709)), where 0.0709 is the average of the posterior standard deviations of μ1 and μ2, for a sensitivity analysis.

For the GDr estimator, we consider two blocks as θ = (σ1, σ2, ρ)′ and ξ = (μ1, μ2)′ instead. Note that the posterior mean for θ is (1.015, 0.967, 0.727)′ and the posterior mode for θ is (0.985, 0.939, 0.727)′. We first choose θ0 = (1, 1, 0.7)′ (also θ0 = (0.5, 0.5, 0)′ for a sensitivity analysis) and estimated c(θ0) by the IDR approach. By using the same standardization ξ=c12(μμc), (20) becomes

c^(θ0)=πr2q¯(θ0,0)1ni=1nqr¯(θ0,ξ(i))q¯(θ0,ξ(i))1,

where q(θ0,0)=q(1,1,0.7,μc)c12 with the corresponding perturbed density qr¯(.), and

qr¯(θ0,ξ)q¯(θ0,ξ)={exp(12ξξ)ifξr,exp(12r2)ifξ>r.

Secondly, we use the CMDE to estimate π(θ0|D) since π(θ|ξ, D) is an inverse Wishart distribution with vm + 1 degrees of freedom and scale matrix Λ0+j=1m(yjξ)(yjξ)+κ0(ξμ0)(ξμ0). Specifically π^(θ0D)=1ni=1nπ(θ0ξ(i),D). We note that for the GDr estimator, {θ(i)=(σ1(i),σ2(i),ρ(i)), ξ(i) = μ(i), i = 1, …, n} is a MC sample from the posterior distribution with the kernel q(μ, ∑), (i)=((σ1(i))2ρ(i)σ1(i)σ2(i)ρ(i)σ1(i)σ2(i)(σ2(i))2), c(i)=(i)(m+ν0), and ξ(i)=(c(i))12(μ(i)μc) for i = 1, …, n.

Considering that the GDr estimator requires an extra MCMC sample, we set the size (n = 5, 000) of each MCMC sample for the GDr estimator as half of the MCMC sample size (n = 10, 000) for the IDR and Dr.IDR estimators in each replicate. Table 2 summarizes the results of IDR, Dr.IDR, and GDr estimators based on 1,000 replicates when r = 0.001, 0.5, 1, 1.5. It is not surprising that the IDR estimator is sensitive to the specification of the “center” value as the values of SE and RMSE in columns 6 and 7 are much larger than those corresponding values in columns 3 and 4. As expected, both the Dr.IDR and GDr estimators outperform the IDR estimator. The Dr.IDR estimator has the best performance when the posterior mean is used in the standardization since in this case it enjoys the dimensional reduction and the closed form of the conditional marginal likelihood, which is estimated in the GDr estimator. The GDr estimator yields the results comparable to the Dr.IDR estimator. It is interesting to see that unlike the IDR estimator, both the Dr.IDR and GDr estimators are quite robust in the choices of the “center” values of μc for Dr.IDR and θ0 for GDr. Certainly, better choices of μc and θ0 do yield slightly smaller values of SE and RMSE as expected.

Table 2:

Simulation Results of the IDR, Dr.IDR, and GDr estimators for a bivariate normal example with 1000 replicates: mean, standard error (SE), and root mean squared error (RMSE) of the marginal likelihood estimate in the log scale

Mean SE RMSE Mean SE RMSE
IDR with a “center” value of (μ1,μ2,logσ12,log{(ρ+1)(1ρ)},logσ22)

(−0.0281, 0.0401, 0.0097, 1.8471, −0.0389)’ (0.0437, 0.1101, 0.1093, 1.9891, 0.0615)’

r=1.5 −509.124 0.141 1.852 −511.609 0.310 4.342
r=1.0 −508.910 0.043 1.633 −511.118 0.166 3.844
r=0.5 −508.774 0.052 1.497 −510.597 0.250 3.329
r=0.001 −508.711 0.137 1.440 −511.592 0.373 4.331

Dr.IDR with

c0.5(μμc) c0.5(μμc(0.0709,0.0709))

r=1.5 −507.278 0.007 0.007 −507.279 0.019 0.019
r=1.0 −507.278 0.004 0.004 −507.278 0.016 0.016
r=0.5 −507.278 0.002 0.002 −507.278 0.015 0.015
r=0.001 −507.278 0.000 0.000 −507.276 0.030 0.030

GDr with (σ10, σ20, ρ0)′

(1, 1, 0.7)’ (0.5,0.5,0.0)’

r=1.5 −507.278 0.010 0.010 −507.279 0.019 0.019
r=1.0 −507.277 0.006 0.006 −507.279 0.017 0.017
r=0.5 −507.277 0.003 0.003 −507.279 0.016 0.016
r=0.001 −507.278 0.001 0.001 −507.278 0.016 0.016

Remark 5.1: From Table 1 in Simulation Study I, we see that the IDR estimator has the smallest SE and RMSE when r = 0.01 for p =1, 2, r = 1 for p = 3, and r = 1.5 for p = 5. The results shown in Table 2 in Simulation Study II indicate that a smaller but not too small value of r generally yields a smaller SE or RMSE for IDR, Dr.IDR, as well as GDr. In practice, as suggested in [13], we first calculate the IDR, Dr.IDR, and GDr estimates for a grid of different values of r, then use the overlapping batch statistics approach of [17] to compute the Monte Carlo errors of these estimates, and finally choose an “optimal” r* as the r for which the Monte Carlo errors are minimum.

Remark 5.2: In this simulation study, the two blocks θ = (μ1, μ2) and ξ = (σ1, σ2, ρ)′ yielded an efficient Dr.IDR estimator as shown in Table 2. Another natural combination of these parameters would be θ = (σ1,σ2,ρ)′ and ξ = (μ1, μ2)′. Under this combination, the dimension of the reduced parameter space corresponding to θ is 3 while an additional transformation (log σ1, log σ2, log{(ρ + 1)/(1 − ρ)})′ is also needed. After this transformation, the full conditional distribution given ξ is not available analytically. Furthermore, the closed-form expression of θ0(ξ) in (11) for the perturbed density is also no longer available. Thus, this combination of the parameters leads to a less efficient and more computationally extensive implementation of Dr.IDR. Thus, an appropriate combination of the parameters should be carefully selected in order to implement Dr.IDR efficiently.

6. A Case Study: Inequality-Constrained Analysis of Variance

In this section, we consider the inequality-constrained analysis of variance model to demonstrate how to compute the normalizing constant using the GDr estimator. We use the data in Hoijtink et al. [18] for studying amnesia in patients with dissociative identity disorder (DID). We call this data set as the DID data thereafter. We model memory performance score ykj for the kth subject in group j as an independent observation from a normal distribution with mean μj and variance σ2 for k = 1, 2, …, mj and j = 1, …, J. In the DID data, J = 4, m1 = 19, m2 = 25, m3 = 25, and m4 = 25. The sample means of the memory performance scores are 3.105, 13.28, 1.88, and 4.56, respectively, for these four groups. Let D={m,y}={m=j=1Jmj,y11,,ym11,,y1J,,ymJJ} denote the observed data. Then, the likelihood function is given by

L(μ,σ2D)=(2πσ2)m2exp{12σ2j=1Jk=1mj(ykjμj)2},

where μ = (μ1, …, μJ)′. Let Ω denote the parameter space. Following Chen and Kim [19] and Wang et al. [15], we specify the joint prior of (μ, σ2) as

π(μ,σ2|a0)π(μ,σ2|a0)=[(2πσ2)J2j=1J(a0mj)1/2exp{a0mj2σ2μj2}]×b02b01Γ(b01)(σ2)b011exp(b02σ2), (22)

for (μ, σ2) ∈ Ω where a0 > 0 is a scalar parameter and b01 > 0 and b02 > 0 are prespecified hyperparameters.

Following Chen and Kim [19] and Wang et al. [15], we specify b01 = b02 = 0.0001 and a0 = 0.01. We also consider the constrained parameter space: Ω = {μ2 > (μ1, μ4) > μ3, σ2 > 0}. Under this setting, the posterior kernel is given by

q(μ,σ2)=L(μ,σ2|D)π(μ,σ2|a0), (23)

where π*(μ, σ2|a0) is defined in (22). We are interested in computing

c=Ωq(μ,σ2)dμdσ2. (24)

Under the constrained parameter space, c in (24) is a normalizing constant but it is not the marginal likelihood since Ωπ(μ,σ2a0)dμdσ21. Also, the closed form expression of c in (24) is not available. However, for the unconstrained parameter space ΩU = {(μ, σ2) : −∞ < μj < ∞, j = 1, …, 4, σ2 > 0}, we have

cU=ΩUq(μ,σ2)dμdσ2=(2π)m2b02b01Γ(b01+m2)Γ(b01)(a01+a0)J2×[b02+12Σj=1J{Σk=1mj(ykjy¯j)2+a0mj1+a0(y¯j)2}](α0+m2), (25)

where yj=1mjk=1mjykj for j = 1, …, J. Since ΩUπ(μ,σ2a0)dμdσ2=1, cU is the marginal likelihood. Let 1{A} denote the indicator function such that 1{A} = 1 if A is true and 0 if A is false. Observe that

c=ΩUq(μ,σ2)1{(μ,σ2)Ω}dμdσ2=cUΩU1{(μ,σ2)Ω}q(μ,σ2)cUdμdσ2=cUEU[1{(μ,σ2)Ω}|D], (26)

where the posterior expectation is taken with respect to the posterior distribution q(μ,σ2)cU. From (26), we see that the posterior probability PU((μ,σ2)ΩD)=EU[1{(μ,σ2)Ω}D]=ccU.

Next, we present the detailed formulation of the GDr estimator of c and then use the identity given in (26) to empirically validate the accuracy of the GDr estimator since in this case, the “true” value of c is unknown. We set θ = (μ1, μ4, σ2)′ and ξ = (μ2, μ3). Let ζ = (θ′, ξ′)′. Let θ0=(μ10,μ40,σ02) denote the posterior mean of θ under the posterior distribution π(ζ|D) with the constrained parameter space. Using (19), we have

c=c(μ10,μ40,σ02)π(μ10,μ40,σ02|D), (27)

where c(μ10,μ40,σ02)=Ωq(μ10,μ2,μ3,μ40,σ02)dμ2dμ3. After some lengthy algebra, we obtain an analytic expression of c(μ10,μ40,σ02) as follows

c(μ10,μ40,σ02)=(2π)m+J22[Πj=1J(a0mj)1/2]b02b01Γ(b01)(σ02)(m+J22+b01+1)×[(a0+1)2m2m3]1×exp(1σ02[b02+12Σj=1J{Σk=1mj(ykjy¯j)2+a01+a0mjy¯j2}])×exp[a0+12σ02{m1(μ10y¯1a0+1)2+m4(μ40y¯4a0+1)2}]×(1Φ[{(a0+1)m2σ02}12(μ10μ40y¯2a0+1)])×Φ[{(a0+1)m3σ02}12(μ10μ40y¯3a0+1)],

where μ10μ40 = max{μ10, μ40}, μ10μ40 = min{μ10, μ40}, and Φ(·) is the standard normal N(0, 1) cumulative distribution function. For the term in the denominator of (27), we write

π(μ10,μ40,σ02|D)=π(μ10,μ40|D)π(σ02|μ10,μ40,D).

We generate one MCMC sample {(μ2,1(i),μ3,1(i),σ12(i)),i=1,,n1} from the posterior π(ζ|D) with the constrained parameters and another MCMC sample {(μ2,2(i),μ3,2(i)),i=1,,n2} from the conditional posterior distribution π(μ2, μ3|μ10, μ40, D). Then, the CMDE of π(μ10, μ40|D) is given by

π^(μ10,μ40|D)=1n1i=1n1π(μ10|μ2,1(i),μ3,1(i),σ12(i),D)π(μ40|μ2,1(i),μ3,1(i),σ12(i),D),

where

π(μ10|μ2,μ3,σ2,D)={(a0+1)m12πσ2}12exp{(a0+1)m12σ2(μ10y¯1a0+1)2}Φ[{(a0+1)m1σ2}12(μ2y¯1a0+1)]Φ[{(a0+1)m1σ2}12(μ3y¯1a0+1)],π(μ40|μ2,μ3,σ2,D)={(a0+1)m42πσ2}12exp{(a0+1)m42σ2(μ40y¯4a0+1)2}Φ[{(a0+1)m4σ2}12(μ2y¯4a0+1)]Φ[{(a0+1)m4σ2}12(μ3y¯4a0+1)].

The CMDE of π(σ02μ10,μ40,D) is also available, which is given by

π^(σ02|μ10,μ40,D)=1n2i=1n2π(σ02|μ10,μ2,2(i),μ3,2(i),μ40,D),

where

π(σ02|μ1,μ2,μ3,μ4,D)=[b02+12j=1J{Sj2+a0a0+1mjy¯j2+(a0+1)mj(μjy¯ja0+1)2}]m+J2+b01(σ02)m+J2+b01+1Γ(m+J2+b01)×exp(1σ02[b02+12Σj=1J{Sj2+a0a0+1my¯jj2+(a0+1)mj(μjy¯ja0+1)2}]),andSj2=k=1mj(ykjyj)2.

For the DID data, we first generate an MCMC sample of size 50,000 from each of the posterior distributions with or without constraints to compute the posterior means, the posterior standard deviations (SDs), and the 95% highest posterior density (HPD) intervals of μ and σ2. The results are shown in Table 3. We see from this table that both sets of the posterior estimates are very close to each other, implying that the hypothesis of μ2 > (μ1, μ4) > μ3 is likely true. Notice that these four means are from the four groups consisting of DID, mimic normal, symptom simulated, and true amnesic subjects. Next, we compute log cU as the closed form expression given in (25) and use the dropping-needle method discussed in Hoijtink et al. [18] to calculate EU[1{(μ,σ2)Ω}D] using an MCMC sample {(μ(i), σ2(i)), i = 1, …, n} from the posterior distribution without constraints on μ. We further generate two MCMC samples to estimate log c. Using (27), we have logc^=logc(μ10,μ40,σ02)logπ^(μ10,μ40D)logπ^(σ02μ10,μ40,D). The results of these Monte Carlo estimates are reported in Table 3. We see from Table 3 that the estimate E^U[1{(μ,σ2)Ω}D] based on the dropping-needle method is very close to c^CU based on the GDr estimate for various MC sample sizes and E^U[1{(μ,σ2)Ω}D]0.988. The high posterior probability of {ζ ∈ Ω} implies that the constraints on μ are most likely to hold and also provides a further explanation of why the two sets of the posterior estimates of μ and σ2 in Table 3 are very similar. We note that since the GDr estimate requires two MCMC samples, we choose n1 = n2 = n/2 to make fair comparison between two different methods. These empirical results indicate that the “true” value of log(c) is very likely to be −205.645.

Table 3:

Posterior Estimates of μ and σ2 and MC Estimates of the Posterior Probability and the Normalizing Constant for the DID Data

Parameter Unconstrained Model Constrained Model

Posterior Mean SD 95% HPD Interval Posterior Mean SD 95% HPD Interval
μ1 3.073 0.405 (2.271, 3.856) 3.085 0.393 (2.326, 3.865)
μ2 13.148 0.351 (12.465, 13.847) 13.149 0.354 (12.467, 13.863)
μ3 1.862 0.351 (1.170, 2.545) 1.856 0.349 (1.182, 2.550)
μ4 4.516 0.354 (3.836, 5.229) 4.516 0.351 (3.825, 5.197)
σ2 3.147 0.469 (2.282, 4.083) 3.145 0.471 (2.283, 4.085)
Monte Carlo Estimates of EU [1{(μ, σ2) ∈ Ω}|D] and c

log cU n E^U[1{(μ,σ2)Ω}D] n1 = n2 logc^ c^cU

−205.63261 50,000 0.98810 25,000 −205.64522 0.98747
200,000 0.98781 100,000 −205.64520 0.98748
500,000 0.98776 250,000 −205.64470 0.98799

Finally, we note that instead of setting θ = (μ1, μ4, σ2)′ and ξ = (μ2, μ3), we can also take θ = (μ2, μ3, σ2)′ and ξ = (μ1, μ4). Let θ0=(μ20,μ30,σ02) denote the posterior mean of θ under the posterior distribution π(ξ|D) with the constrained parameter space. Then we have

c=c(μ20,μ30,σ02)π(μ20,μ30,σ02|D), (28)

where c(μ20,μ30,σ02)=Ωq(μ1,μ20,μ30,μ4,σ02)dμ1dμ4. Similar to (27), a closed form expression of c(μ20,μ30,σ02) is available. For the term in the denominator of (28), we write

π(μ20,μ30,σ02|D)=π(μ20,μ30|D)π(σ02|μ20,μ30,D).

Again, the CMDEs are available for estimating π(μ20, μ30|D) and π(σ02μ20,μ30,D). This formulation is as efficient as the one given in (27).

7. Discussion

In this paper, we first examine properties of the IDR estimator and we find that the IDR estimator is most efficient in two dimensions when the distribution is approximately normal. We then develop an extension of the IDR estimator, called the Dr.IDR estimator. The Dr.IDR estimator is more attractive than the IDR estimator since it allows for dimension reduction and it also has a nice connection to marginal posterior density estimation. The GDr estimator is constructed using the identity based on the marginal posterior density. This new identity, which is given in (19), is a natural extension of the identity of Chib [3] based on the full posterior density. Both the Dr.IDR and GDr estimators are potentially useful in computing marginal likelihoods or normalizing constants for the models with high-dimensional parameters or the complex structure such as constraints on the model parameters and the autocorrelation structure in a time-series model or the spatial structure in a spatial-temporal model. Computing the normalizing constant in (24) for the DID data is quite interesting. We use two different approaches for computing this constant and the results are quite comparable. There are other methods, such as the stepping stone approaches of Xie et al. [9] and Fan et al. [10] and the PWK method of Wang et al. [11], that can be used to compute this normalizing constant. These additional comparisons and further investigation of applicability of the Dr.IDR and GDr estimators in high-dimensional problems are interesting future research projects, which are currently under investigation.

Acknowledgments

The authors gratefully thank the Editor in Chief, the Editor, the Associate Editor, and the three anonymous reviewers for their constructive comments and suggestions that help improve the article. This material is based upon work partially supported by the National Science Foundation under Grant No. DEB-1354146. Dr. M.-H. Chen’s research was also partially supported by NIH grants #GM70335 and #P01CA142538.

Footnotes

Publisher's Disclaimer: This Author Accepted Manuscript is a PDF file of an unedited peer-reviewed manuscript that has been accepted for publication but has not been copyedited or corrected. The official version of record that is published in the journal is kept up to date and so may therefore differ from this version.

Contributor Information

Yu-Bo Wang, School of Mathematical and Statistical Sciences, Clemson University.

Ming-Hui Chen, Department of Statistics, University of Connecticut.

Wei Shi, Department of Statistics, University of Connecticut.

Paul Lewis, Department of Ecology and Evolutionary Biology, University of Connecticut.

Lynn Kuo, Department of Statistics, University of Connecticut.

References

  • [1].Kahn H, Random sampling Monte Carlo techniques in neutron attenuation problems, Nucleonics 6 (1950) 27–37. [PubMed] [Google Scholar]
  • [2].Newton MA, Raftery AE, Approximate Bayesian inference by the weighted likelihood bootstrap, Journal of the Royal Statistical Society, Series B 56 (1994) 3–48. [Google Scholar]
  • [3].Chib S, Marginal likelihood from the Gibbs output, Journal of the American Statistical Association 90 (1995) 1313–1321. [Google Scholar]
  • [4].Chib S, Jeliazkov I, Marginal likelihood from the Metropolis-Hastings output, Journal of the American Statistical Association 96 (2001) 270–281. [Google Scholar]
  • [5].Gelman A, Meng X-L, Simulating normalizing constants: From importance sampling to bridge sampling to path sampling, Statistical Science 13 (1998) 163–185. [Google Scholar]
  • [6].Petris G, Tardella L, A geometric approach to transdimensional Markov chain Monte Carlo, The Canadian Journal of Statistics 31(4) (2003) 469–482. [Google Scholar]
  • [7].Chen M-H, Computing marginal likelihoods from a single MCMC output, Statistica Neerlandica 59 (2005) 16–29. [Google Scholar]
  • [8].Lartillot N, Philippe H, Computing Bayes factors using thermodynamic integration, Systematic Biology 55 (2006) 195–207. [DOI] [PubMed] [Google Scholar]
  • [9].Xie W, Lewis PO, Fan Y, Kuo L, Chen M-H, Improving marginal likelihood estimation for Bayesian phylogenetic model selection., Systematic Biology 60 (2011) 150–160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Fan Y, Wu R, Chen M-H, Kuo L, Lewis PO, Choosing among partition models in Bayesian phylogenetics., Molecular Biology and Evolution 28 (2011) 523–532. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Wang Y-B, Chen M-H, Kuo L, Lewis PO, A new Monte Carlo method for estimating marginal likelihoods, Bayesian Analysis 13 (2018) 311–333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Petris G, Tardella L, New perspectives for estimating normalizing constants via posterior simulation, Technical Report, Universita I di Roma La Sapienza; (2007). [Google Scholar]
  • [13].Arima S, Tardella L, Idr for marginal likelihood in Bayesian phylogenetics, in: Chen M-H, Kuo L, Lewis P (Eds.), Bayesian Phylogenetics — Methods, Algorithm, and Applications, CRC Press, 2014, pp. 25–57. [Google Scholar]
  • [14].Chen M-H, Importance-weighted marginal Bayesian posterior density estimation, Journal of the American Statistical Association 89 (1994) 818–824. [Google Scholar]
  • [15].Wang Y-B, Chen M-H, Kuo L, Lewis PO, Partition weighted approach for estimating the marginal posterior density with applications, Journal of Computational and Graphical Statistics, doi: 10.1080/10618600.2018.1529600 (2019) In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Gelfand AE, Smith AF, Lee T-M, Bayesian analysis of constrained parameter and truncated data problems using Gibbs sampling, Journal of the American Statistical Association 87 (1992) 523–532. [Google Scholar]
  • [17].Schmeiser BW, Avramidis TN, Hashem S, Overlapping batch statistics, in: Proceedings of the 22nd Conference on Winter Simulation, IEEE Press, pp. 395–398. [Google Scholar]
  • [18].Hoijtink H, Klugkist I, Boelen P, Bayesian Evaluation of Informative Hypotheses, Springer Science & Business Media, 2008. [Google Scholar]
  • [19].Chen M-H, Kim S, The Bayes factor versus other model selection criteria for the selection of constrained models, in: Hoijtink H, Klugkist I, Boelen P (Eds.), Bayesian Evaluation of Informative Hypotheses, Springer, 2008, pp. 155–180. [Google Scholar]

RESOURCES