Inflated Density Ratio and Its Variation and Generalization for Computing Marginal Likelihoods

Yu-Bo Wang; Ming-Hui Chen; Wei Shi; Paul Lewis; Lynn Kuo

doi:10.1007/s42952-019-00013-z

. Author manuscript; available in PMC: 2021 Mar 1.

Published in final edited form as: J Korean Stat Soc. 2020 Jan 1;49(1):244–263. doi: 10.1007/s42952-019-00013-z

Inflated Density Ratio and Its Variation and Generalization for Computing Marginal Likelihoods

Yu-Bo Wang ¹, Ming-Hui Chen ^2,^*, Wei Shi ³, Paul Lewis ⁴, Lynn Kuo ⁵

PMCID: PMC7560979 NIHMSID: NIHMS1547654 PMID: 33071541

Abstract

In the Bayesian framework, the marginal likelihood plays an important role in variable selection and model comparison. The marginal likelihood is the marginal density of the data after integrating out the parameters over the parameter space. However, this quantity is often analytically intractable due to the complexity of the model. In this paper, we first examine the properties of the inflated density ratio (IDR) method, which is a Monte Carlo method for computing the marginal likelihood using a single MC or Markov chain Monte Carlo (MCMC) sample. We then develop a variation of the IDR estimator, called the dimension reduced inflated density ratio (Dr.IDR) estimator. We further propose a more general identity and then obtain a general dimension reduced (GDr) estimator. Simulation studies are conducted to examine empirical performance of the IDR estimator as well as the Dr.IDR and GDr estimators. We further demonstrate the usefulness of the GDr estimator for computing the normalizing constants in a case study on the inequality-constrained analysis of variance.

Keywords: CMDE, Conditional posterior density, Constrained parameter space, IWMDE, Marginal posterior density, 62F15, 62-M05

1. Introduction

In Bayesian analysis, different models can be compared by using their marginal likelihoods (also known as normalizing constants). Defined as the integral of the likelihood function times the prior distribution over the parameter space, the marginal likelihood measures the average fit of the model to the data, and is useful in the Bayesian hypothesis testing problem. The statistical theory of hypothesis testing is associated with methods for checking whether an improvement in fit is statistically significant. The Bayes factor (BF) is one of the methods to use in this scenario. BF is the ratio of the marginal likelihoods under the two competing models. Due to the complicated form of the product of the likelihood function and the prior, which is called the posterior kernel later, the marginal likelihood is often analytically intractable. Hence, computing the marginal likelihood or the normalizing constant is of great current interest and remains an active research area.

There is a vast literature on computing marginal likelihoods or normalizing constants, including but not limited to the importance sampling (IS) [1], the harmonic means (HM) [2], the posterior density based approaches of Chib [3] and Chib and Jeliazkov [4], the path sampling (PS) [5], the inflated density ratio (IDR) [6], the latent variable approach of Chen [7], the thermodynamic integration (TI) [8], the stepping stone method (SSM) [9, 10], and the partition weighted kernel (PWK) estimator [11]. These methods can be divided into two categories. The methods such as IS, HM, IDR, Chen [7], and PWK use only a single MCMC sample while the other methods such as Chib [3], TI, and SSM use multiple MCMC samples. These methods can be shown to asymptotically converge to the quantity of interest under some mild conditions. Varying in their use of information such as Markov chain Monte Carlo (MCMC) samples or the posterior kernel, they all have pros and cons in different applications. Among them, the IDR method is developed by introducing a perturbed density such that this density has an additionally calculable mass in the posterior mode and lighter tails than the posterior kernel, and can estimate the marginal likelihood precisely based on an MCMC sample from the posterior distribution. With such a control of the posterior tails, Petris and Tardella [12] also show that IDR has a finite variance in one-dimensional problems. However, the properties of the IDR estimator still have not been fully examined and its usefulness in statistical applications has not been fully explored.

In this paper, we first reveal an interesting property of the IDR estimator, that is, its superior efficiency in two-dimensional problems. This property is further confirmed in our simulation study. To broaden the applicability of IDR, we develop a variation of IDR, namely, the dimension reduced inflated density ratio (Dr.IDR) estimator. This new estimator not only enjoys the property of dimension reduction and but also has a nice connection to the marginal posterior density estimation. We further extend the idea of Dr.IDR to develop the generalized dimension reduced (GDr) estimator, which is constructed via a new identity based on the marginal posterior density. An interesting case study is carried out to demonstrate how to compute the normalizing constant for the inequality-constrained analysis of variance model. The new estimators developed in this paper have great potential for computing marginal likelihoods or normalizing constants for high-dimensional problems and complex models such as those with an inequality-constrained parameter space.

The rest of the article is organized as follows. Section 2 presents the formulation of the IDR estimator and examines its theoretical properties. In Section 3, we develop a variation of IDR, called the dimension reduced IDR (Dr.IDR) and discuss various properties of Dr.IDR. A new identity based on the marginal posterior density is introduced and a so-called generalized dimension reduced (GDr) estimator is then constructed in Section 4. Two simulation studies to examine empirical performance of the IDR, Dr.IDR, and GDr estimators are presented in Section 5. Section 6 is a case study, which demonstrates how to compute the normalizing constant for the inequality-constrained analysis of variance model. We conclude this paper with a brief discussion in Section 7.

2. The Inflated Density Ratio (IDR) Estimator

Suppose ζ is a p-dimensional vector of parameters with the unbounded support Ω. Given that L(ζ|D) is the likelihood function of parameters ζ given the data D and π(ζ) is a proper prior distribution in the sense that $\int_{Ω} π (ζ) d ζ = 1$ , the marginal likelihood is given by

c = \int_{Ω} L (ζ | D) π (ζ) d ζ = \int_{Ω} q (ζ) d ζ,

(1)

where q(ζ) = L(ζ|D)π(ζ) is the posterior kernel function. The corresponding posterior distribution is written as

π (ζ | D) = \frac{L (ζ | D) π (ζ)}{c} = \frac{q (ζ)}{c} .

(2)

We note that when $\int_{Ω} π (ζ) d ζ \neq 1$ or $\int_{Ω} π (ζ) d ζ = \infty$ but c < ∞, c is called the normalizing constant. When the model structure is complex, the constant c is often analytically intractable. However, a Markov chain Monte Carlo (MCMC) sample can still be generated from the posterior distribution given in (2) without knowing c.

To estimate (1), Petris and Tardella [6, 12] proposed the inflated density ratio (IDR) method, which requires a single Monte Carlo (MC) or MCMC sample from (2). This method first constructs a perturbed density q_r(ζ) based on the posterior kernel q(ζ). Let ζ₀ be the “center” of the posterior distribution, which is typically the posterior mode of ζ. Define q_r(.) as

q_{r} (ζ) = {\begin{matrix} q (ζ_{0}) & if ‖ ζ - ζ_{0} ‖ \leq r, \\ q (ζ_{0} + h (ζ - ζ_{0})) & if ‖ ζ - ζ_{0} ‖ > r, \end{matrix}

(3)

where r is a prespecified radius and h(ζ − ζ₀) = (1 − r^p/‖ζ − ζ₀‖^p)^1/p(ζ − ζ₀). It can be shown that the integral of q_r(.) over the whole support is the sum of the target c and a tractable volume k = q(ζ₀)b_r, where b_r = volume of the ball {ζ : ‖ζ − ζ₀‖ ≤ r} = π^p/2r^p/Γ(p/2+1). Thus, we have the IDR identity given by

c = \frac{q (ζ_{0}) b_{r}}{E [\frac{q_{r} (ζ)}{q (ζ)} | D] - 1},

(4)

where $E [\frac{q_{r} (ζ)}{q (ζ)} ∣ D]$ is the expectation of $\frac{q_{r} (ζ)}{q (ζ)}$ with respect to the posterior distribution in (2). Based on the IDR identity in (4), the IDR estimator for c is given by

{\hat{c}}_{IDR} = \frac{k}{\frac{1}{n} \sum_{i = 1}^{n} \frac{q_{r} (ζ^{(i)})}{q (ζ^{(i)})} - 1},

(5)

where {ζ⁽¹⁾, …, ζ⁽ⁿ⁾} is an MC/MCMC sample from the posterior distribution in (2). Figure 1 provides the visualization of q_r(.) in a univariate normal example. The area under the curve q_r(.) is k + c = 2rq(0) + 1 since ζ₀ = 0 in this case.

Petris and Tardella [12] showed that under certain conditions, such as the log-Lipschitz condition, the variance of ${\hat{c}}_{IDR}$ is finite in one-dimensional problems. If {ζ⁽¹⁾, …, ζ⁽ⁿ⁾} is an MCMC sample from (2), under certain mild regularity conditions, such as time reversible, invariant, and irreducible conditions, it is easy to show that ${\hat{c}}_{IDR}$ is a consistent estimator of c since the denominator of (5) converges to

\lim_{n \to \infty} \frac{1}{n} \sum_{t = 1}^{n} \frac{q_{r} (ζ^{(i)})}{q (ζ^{(i)})} - 1 = \int_{Ω} \frac{q_{r} (ζ)}{q (ζ)} \frac{q (ζ)}{c} d ζ - 1 = \int_{Ω} \frac{q_{r} (ζ)}{c} d ζ - 1 = \frac{c + k}{c} - 1 = \frac{k}{c},

as n → ∞. As pointed by Petris and Tardella [12], Arima and Tardella [13], and Wang et al. [11], the IDR method requires the full space of q(.) and a careful selection of the radius. Wang et al. [11] further pointed out that mode finding is essential and standardization of an MCMC sample with respect to the mode and the sample covariance matrix is required for IDR. Thus, the best way to use IDR is to obtain the perturbed density after the standardization as demonstrated in Section 5.2. If any parameter in the model has a bounded support, an additional transformation is needed to define the perturbed density q_r(.). This extra effort may limit its applications especially for a problem with complicated constraints (see Section 6 for an example). During our empirical investigation, we found that the IDR has an interesting property in the bivariate standard normal case.

Consider the posterior kernel function as

q (ζ) = \frac{1}{2 π} \exp (- \frac{μ^{'} μ}{2}),

(6)

where ζ = μ = (μ₁, μ₂)′ is a vector of unknown parameters with center at ζ₀ = (0, 0)^′. Under such a setting, the perturbed density is given by

q_{r} (ζ) = {\begin{matrix} \frac{1}{2 π} & if ‖ μ ‖ \leq r, \\ \frac{1}{2 π} \exp (- \frac{μ^{'} μ - r^{2}}{2}) & if ‖ μ ‖ > r, \end{matrix}

so that

\frac{q_{r} (ζ)}{q (ζ)} = {\begin{matrix} \exp (\frac{μ^{'} μ}{2}) & if ‖ μ ‖ \leq r, \\ \exp (\frac{r^{2}}{2}) & if ‖ μ ‖ > r . \end{matrix}

(7)

We see from (7) that the ratio term within the radius is always bounded by $\exp (\frac{r^{2}}{2})$ while the ratio term outside the radius is always constant, which implies that the variance or any higher-order moments of the IDR estimator are finite. We formally state this result in the following theorem.

Theorem 2.1. Given that {ζ⁽¹⁾, …, ζ⁽ⁿ⁾} is a random sample from the posterior density, any v^th moment (v > 0) of the IDR estimator is finite for the bivariate normal distribution given in (6).

Proof. Letting $g (ζ^{(i)}) = \frac{q_{r} (ζ^{(i)})}{q (ζ^{(i)})}$ , we have

g (ζ^{(i)}) \leq \exp (\frac{r^{2}}{2}) < \infty .

Thus,

E | \frac{1}{n} \sum_{i = 1}^{n} g (ζ^{(i)}) |^{v} \leq \exp (\frac{υ r^{2}}{2}) < \infty .

By the δ-method, the v^th moment of the IDR estimator is finite. □

Following Theorem 2.1, we can see that the variance of IDR is finite. This result also implies that the IDR method is potentially efficient in two-dimensional problems if the posterior distribution approximates a bivariate normal distribution. To examine the empirical property of IDR, we carry out a simulation study in Section 5.1.

3. Dimension Reduced Inflated Density Ratio (Dr.IDR) Estimator

In this section, we develop a variation of IDR called the dimension reduced inflated density ratio estimator (Dr.IDR). Suppose ζ′ = (θ′, ξ′), where θ is a p₁-dimensional vector of the parameters and ξ is a vector of the remaining p − p₁ ≡ p₂ parameters. We assume that the support of the marginal posterior density π(θ|D) = ∫ π(θ, ξ|D)dξ is R^p₁ and θ₀ is a ‘center’, such as the mode, of the marginal posterior distribution of θ. Then, we define the perturbed density as

q_{r} (θ, ξ) = {\begin{matrix} q (θ_{0}, ξ) & if ‖ θ - θ_{0} ‖ \leq r, \\ q (θ_{0} + h (θ - θ_{0}), ξ) & if ‖ θ - θ_{0} ‖ > r, \end{matrix}

(8)

where $h (θ - θ_{0}) = {(1 - \frac{r^{p_{1}}}{{∥ θ - θ_{0} ∥}^{p_{1}}})}^{1 ∕ p_{1}} (θ - θ_{0})$ and r > 0 is a prespecified radius. Similar to the IDR identity given in (4), we obtain a new identity given by

c = \frac{b_{r} \int q (θ_{0}, ξ) d ξ}{E [\frac{q_{r} (θ, ξ)}{q (θ, ξ)} | D] - 1},

(9)

where $b_{r} = \frac{π^{p_{1} ∕ 2}}{Γ (p_{1} ∕ 2 + 1)} r^{p_{1}}$ and $E [\frac{q_{r} (θ, ξ)}{q (θ, ξ)} ∣ D]$ is the expectation of $\frac{q_{r} (θ, ξ)}{q (θ, ξ)}$ with respect to the posterior distribution given by (2).

Let {(θ⁽ⁱ⁾, ξ⁽ⁱ⁾), i = 1, …, n} denote an MC/MCMC sample from the posterior distribution π(ζ|D). If ∫ q(θ₀, ξ)dξ is analytically available, using (9), a Dr.IDR estimate of c is given by

{\hat{c}}_{Dr . IDR} = \frac{b_{r} \int q (θ_{0}, ξ) d ξ}{\frac{1}{n} \sum_{i = 1}^{n} \frac{q_{r} (θ^{(i)}, ξ^{(i)})}{q (θ^{(i)}, ξ^{(i)})} - 1} .

(10)

Under certain Ergodic conditions, we have $\lim_{n \to \infty} {\hat{c}}_{Dr . IDR} \overset{a . s .}{=} c$ .

Remark 3.1: In (9) and (10), the perturbed density q_r(θ, ξ) in (8) is constructed on θ and the dimension of θ is p₁, which is smaller than p for the original vector of the parameters, ζ. When p₁ = p, (9) and (10) reduce to (4) and (5). For these reasons, ${\hat{c}}_{Dr . IDR}$ is called the dimension reduced IDR (Dr.IDR) estimator.

Remark 3.2: The perturbed density in (8) can be extended as

q_{r} (θ, ξ) = {\begin{matrix} q (θ_{0} (ξ), ξ) & if ‖ θ - θ_{0} (ξ) ‖ \leq r (ξ), \\ q (θ_{0} (ξ) + h (θ - θ_{0} (ξ)), ξ) & if ‖ θ - θ_{0} (ξ) ‖ > r (ξ), \end{matrix}

(11)

where θ₀(ξ) and r(ξ) are two functions of ξ. However, the Dr.IDR estimator in (10) needs to be modified as

{\hat{c}}_{Dr . IDR} = \frac{\int q (θ_{0} (ξ), ξ) d ξ}{\frac{1}{n} \sum_{i = 1}^{n} \frac{1}{b_{r (ξ)}} [\frac{q_{r} (θ^{(i)}, ξ^{(i)})}{q (θ^{(i)}, ξ^{(i)})} - 1]},

(12)

where $b_{r (ξ)} = \frac{π^{p_{1} ∕ 2}}{Γ (p_{1} ∕ 2 + 1)} r {(ξ)}^{p_{1}}$ . In (12), both the center and the radius of each perturbed circle can be determined by the conditional distribution π(θ|ξ, D) instead of the marginal posterior distribution of π(θ|D), which may lead to a more efficient estimator.

Remark 3.3: In (10), we assume that ∫ q(θ₀, ξ)dξ is analytically available. When c(θ₀) = ∫ q(θ₀, ξ)dξ is not available in the closed form, c(θ₀) is the normalizing constant of the conditional posterior distribution of π(ξ|θ₀, D). Since θ₀ is a fixed point, we can generate a MC/MCMC sample {ξ⁽ⁱ⁾, i = 1, …, n} from π(ξ|θ₀, D). Then, we can use (5) or (10) to construct an estimator for c(θ₀) and the resulting estimator of c may be termed as a Dr.IDR² estimator. But, we notice that the version of ${\hat{c}}_{Dr . IDR}$ given in (12) is difficult to extend if ∫ q(θ₀(ξ), ξ)dξ is analytically intractable.

Remark 3.4: From the Dr.IDR identity in (9), we obtain

\int [q (θ_{0}, ξ) / c] d ξ = \frac{E [\frac{q_{r} (θ, ξ)}{q (θ, ξ)} | D] - 1}{b_{r}} .

(13)

Since the term on the left hand side is precisely the marginal posterior density of θ evaluated at θ₀, we obtain a new marginal posterior density estimator given by

{\hat{π}}_{Dr . IDR} (θ_{0} | D) = \frac{1}{b_{r}} [\frac{1}{n} \sum_{i = 1}^{n} \frac{q_{r} (θ^{(i)}, ξ^{(i)})}{q (θ^{(i)}, ξ^{(i)})} - 1] .

(14)

Write

w (θ | ξ) = \frac{q_{r} (θ, ξ) - q (θ, ξ)}{b_{r} q (θ_{0}, ξ)} .

(15)

It is easy to show that ∫ w(θ|ξ)dθ = 1 and also (14) can be rewritten as

{\hat{π}}_{Dr . IDR} (θ_{0} | D) = \frac{1}{n} \sum_{i = 1}^{n} w (θ^{(i)} | ξ^{(i)}) \frac{q (θ_{0}, ξ^{(i)})}{q (θ^{(i)}, ξ^{(i)})} .

(16)

Therefore, ${\hat{π}}_{Dr . IDR} (θ_{0} ∣ D)$ is an importance weighted marginal density estimator (IWMDE) of [14]. However, the weight function w(θ|ξ) is not the conditional posterior density π(θ|ξ, D). Thus, the IDR marginal posterior density estimator ${\hat{π}}_{Dr . IDR} (θ_{0} ∣ D)$ may not be most efficient as the “best” estimator is the conditional marginal density estimator (CMDE) with w(θ|ξ) = π(θ|ξ, D) as discussed in [14] and [15].

4. Generalization of the IDR Estimator

Observing that the marginal posterior density of θ is defined as

π (θ | D) = \int \frac{q (θ, ξ)}{c} d ξ,

we have the following identity

c = \frac{\int q (θ, ξ) d ξ}{π (θ | D)} = \frac{c (θ)}{π (θ | D)} .

(17)

The above marginal posterior density based identity is a natural extension of the identity given in Chib (1995), in which the marginal likelihood is expressed as a ratio of the likelihood function times the prior and the posterior distribution of the model parameters.

We let c(θ) denote the term in the numerator of (17), i.e.,

c (θ) = \int q (θ, ξ) d ξ .

(18)

Then, c(θ) is precisely the normalizing constant of the conditional posterior density of ξ given θ. We note that the identity (17) holds for all θ. Similar to [3], we take θ = θ₀ as a high marginal density point such as the posterior mean or the posterior mode of θ. Then, (17) can be rewritten as

c = \frac{c (θ_{0})}{π (θ_{0} | D)} .

(19)

In (19), the original p-dimensional estimation problem of c becomes the computation of c(θ₀) and π(θ₀|D) each with smaller dimension (p₂ and p₁). If c(θ₀) is analytically tractable, then we only need to compute π(θ₀|D). If c(θ₀) is intractable, as discussed in Remark 3.3, we first generate an MC/MCMC sample ξ⁽ⁱ⁾, i = 1, …, n} from the conditional posterior distribution π(ξ|θ₀, D) and then we can obtain an IDR estimator or a Dr.IDR estimator of c(θ₀). For instance, an IDR estimator of c(θ₀) is given by

\hat{c} (θ_{0}) = \frac{b_{r} q (θ_{0}, ξ_{0})}{\frac{1}{n} \sum_{i = 1}^{n} \frac{q_{r} (θ_{0}, ξ^{(i)})}{q (θ_{0}, ξ^{(i)})} - 1},

(20)

where ξ₀ is the posterior mode of ξ, $b_{r} = \frac{π^{p_{2} ∕ 2}}{Γ (p_{2} ∕ 2 + 1)} r^{p_{2}}$ ,

q_{r} (θ_{0}, ξ) = {\begin{matrix} q (θ_{0}, ξ_{0}) & if ‖ ξ - ξ_{0} ‖ \leq r, \\ q (θ_{0}, ξ_{0} + h (ξ - ξ_{0})) & if ‖ ξ - ξ_{0} ‖ > r, \end{matrix}

and $h (ξ - ξ_{0}) = {(1 - \frac{r^{p_{2}}}{{∥ ξ - ξ_{0} ∥}^{p_{2}}})}^{1 ∕ p_{2}} (ξ - ξ_{0})$ and r > 0 is a prespecified radius, which may be different than the one in (3) or (8).

For the marginal posterior density π(θ₀|D), which is the term in the denominator of (19), a Dr.IDR estimator ${\hat{π}}_{Dr . IDR} (θ_{0} ∣ D)$ given in (16) is readily available using an MC/MCMC sample {(θ⁽ⁱ⁾, ξ⁽ⁱ⁾), i = 1, …, n} from the full posterior distribution π(ζ|D). As discussed in Remark 3.4, ${\hat{π}}_{Dr . IDR} (θ_{0} ∣ D)$ may not be most efficient. A more efficient estimator can be obtained using the CMDE of [16] or the partition weighted marginal density estimator (PWMDE) of [15]. We note that the PWMDE is a special case of the IWMDE of [14], which has the potential to achieve the “optimal” CMDE in estimating π(θ₀|D). As empirically shown in [15], the PWMDE of π(θ₀|D) is more efficient when θ₀ is close to the “center” of the marginal posterior distribution π(θ|D). This is intuitively appealing since (i) the PWMDE uses the MC/MCMC sample {(θ⁽ⁱ⁾, ξ⁽ⁱ⁾), i = 1, …, n} from the full posterior distribution π(ζ|D); (ii) fewer points of the θ⁽ⁱ⁾’s are around θ₀ when θ₀ is away from the “center”; and (iii) in this case, π(θ₀|D) is more difficult to estimate and, consequently, the corresponding the PWMDE becomes less efficient. Therefore, the identity (19) works the best when θ₀ is chosen as a high marginal density point.

The new identity (19) provides us a great flexibility in computing the marginal likelihood c. This identity also sheds light on obtaining an efficient Monte Carlo estimator of c for a posterior distribution with parameter space of high-dimension. An estimator based on (19) is termed as a generalized dimension reduced (GDr) estimator. In Section 5.2, we carry out a simulation study to examine the empirical performance of the IDR, Dr.IDR, and GDr estimators.

5. Simulation Studies

5.1. Simulation Study I

In this simulation study, we empirically examine whether the result established in Theorem 2.1 implies a potential gain in estimation efficiency of an IDR estimator for computing the normalizing constant of the bivariate normal distribution compared to the univariate normal distribution or the multivariate normal distribution with more than two dimensions. Assume that the posterior kernel is given by

q (ζ) = {(\frac{1}{2 π})}^{p / 2} \exp (- \frac{μ^{'} μ}{2}),

(21)

where μ is a vector of p unknown parameters. We apply the IDR estimator to compute the normalizing constant of (21) with different p’s, and evaluate its performance by comparing the means, standard errors, and root mean square errors (RMSE) of estimates in log scale. Let ${\hat{c}}_{t}$ denote the IDR estimate for the t^th replicate of data and t = 1, 2, …, 1000. We have the mean and the standard error (SE) equal to $\tilde{c} = \sum_{t = 1}^{1000} \log {\hat{c}}_{t} ∕ 1000$ and $\sqrt{\sum_{t = 1}^{1000} {(\log {\hat{c}}_{t} - \tilde{c})}^{2} ∕ 999}$ , respectively. The RMSE is defined as

R M S E = \sqrt{\frac{1}{1000} \sum_{t = 1}^{1000} {[\log {\hat{c}}_{t} - \log c]}^{2}},

where log c is 0 in this case. The results for p = 1, 2, 3, and 5 are summarized in Table 1. We see that when r ≤ 1, the IDR estimator for p = 2 has the superior performance compared to those for p ≠ 2. Interestingly, the IDR estimator for p = 3 has a smaller SE than the one for p = 1 for every value of r considered in this simulation. However, the IDR estimator for p = 5 has the worst performance among those for p = 1, 2, and 3. These simulation results empirically confirm that the result established in Theorem 2.1 does imply that > the IDR estimator has an “optimal” performance when p = 2 under the normal distributions.

Table 1:

Performance of the IDR method in p—dimensional normal distribution with 1000 replicates: mean, standard error (SE), and root mean squared error (RMSE) of the normalizing constant estimate in log scale

		Mean	SE	RMSE			Mean	SE	RMSE
r=1.5	p=1	−0.00019	0.03222	0.03221	r=0.5	p=1	−0.00021	0.01260	0.01260
	p=2	0.00050	0.00740	0.00742		p=2	0.00003	0.00216	0.00216
	p=3	−0.00005	0.00537	0.00537		p=3	−0.00009	0.00574	0.00574
	p=5	0.00017	0.01163	0.01163		p=5	0.00014	0.03159	0.03157

r=1	p=1	−0.00031	0.01938	0.01937	r=0.01	p=1	−0.00012	0.00800	0.00800
	p=2	−0.00023	0.00450	0.00451		p=2	0.00000	0.00004	0.00004
	p=3	0.00008	0.00458	0.00457		p=3	−0.00016	0.00754	0.00754
	p=5	0.00077	0.01798	0.01799		p=5	0.00420	0.06300	0.06311

Open in a new tab

5.2. Simulation Study II

In the second simulation study, we revisit the bivariate normal example in [11]. Assume that y_j|μ, $\sum \overset{i . i . d .}{~} N (μ, \sum)$ for j = 1, …, m where μ = (μ₁, μ₂)′, and $\sum = (\begin{matrix} σ_{1}^{2} & ρ σ_{1} σ_{2} \\ ρ σ_{1} σ_{2} & σ_{2}^{2} \end{matrix})$ are the unknown means and the variance-covariance matrix. We specify the conjugate priors

\sum \sim I W_{ν_{0}} (Λ_{0}^{- 1}),

and

μ | \sum \sim N (μ_{0}, \sum / κ_{0}),

respectively. Therefore, the joint posterior kernel is given by

q (μ, \sum) = f (y_{1}, y_{2}, …, y_{m} | μ, \sum) π (μ | \sum) π (\sum) = {(2 π)}^{- m} | \sum |^{- (m + ν_{0} + 2) / 2 - 1} \frac{1}{Z} \exp (- \frac{1}{2} Σ_{j = 1}^{m} {(y_{j} - μ)}^{'} \sum^{- 1} (y_{j} - μ)) \times \exp (- \frac{κ_{0}}{2} {(μ - μ_{0})}^{'} \sum^{- 1} (μ - μ_{0})) \exp (- \frac{1}{2} t r a c e (Λ_{0} \sum^{- 1})),

with Z = 2π2^ν₀Γ₂(ν₀/2)|Λ₀|^−ν₀/2/κ₀. Also, the marginal likelihood has the closed form

c = \frac{1}{π^{m d / 2}} \frac{Γ_{d} (ν_{m} / 2)}{Γ_{d} (ν_{0} / 2)} \frac{| Λ_{0} |^{ν_{0} / 2}}{| Λ_{m} |^{ν_{m} / 2}} {(\frac{κ_{0}}{κ_{m}})}^{d / 2},

where $Λ_{m} = Λ_{0} + S^{2} + \frac{κ_{0} m}{κ_{0} + m} (μ_{0} - \overset{‒}{y}) {(μ_{0} - \overset{‒}{y})}^{'}$ , $S^{2} = \sum_{j = 1}^{m} (y_{j} - \overset{‒}{y}) {(y_{j} - \overset{‒}{y})}^{'}$ , κ_m = κ₀ + m, and ν_m = ν₀ + m. A random sample y with m = 200 is generated from a bivariate normal distribution with μ = (0, 0)′ and $\sum = (\begin{matrix} 1 & 0.7 \\ 0.7 & 1 \end{matrix})$ . The corresponding sample mean $\overset{‒}{y}$ is (−0.029, 0.040)′, and the sample variance-covariance matrix S is $(\begin{matrix} 201.987 & 143.330 \\ 143.330 & 192.365 \end{matrix})$ . Under this setting and the pre-specified hyper-parameters μ₀ = (0, 0)′, k₀ = 0.01, ν₀ = 3, and $Λ_{0} = (\begin{matrix} 1 & 0.7 \\ 0.7 & 1 \end{matrix})$ , the marginal likelihood in log scale is −507.278.

To implement the IDR estimator, we consider two “center” values of (−0.0281, 0.0401, 0.0097, 1.8471, −0.0389)’ and (0.0437, 0.1101, 0.1093, 1.9891, 0.0615)’ for the transformed parameter vector of ${(μ_{1}, μ_{2}, \log σ_{1}^{2}, \log {(ρ + 1) ∕ (1 - ρ)}, \log σ_{2}^{2})}^{'}$ , where (−0.0281, 0.0401, 0.0097, 1.8471, −0.0389)′ is the posterior mean.

To implement the Dr.IDR estimator, we consider two blocks as θ = (μ₁, μ₂)′ and ξ = (σ₁, σ₂, ρ)′. First, since θ|ξ, $D ~ N (μ_{c} = (m \overset{‒}{y} + ν_{0} μ_{0}) ∕ (m + ν_{0}), \sum_{c} = \sum ∕ (m + ν_{0}))$ , we consider the standardization $θ^{*} = \sum_{c}^{- 1 ∕ 2} (μ - μ_{c})$ . With such a standardization, we then have $\int \tilde{q} (0, ξ) d ξ = \int q (μ_{c}, \sum) {∣ \sum_{c} ∣}^{1 ∕ 2} d \sum = c ∕ (2 π)$ and $\frac{{\tilde{q}}_{r} (θ^{*}, ξ)}{\tilde{q} (θ^{*}, ξ)}$

\begin{array}{l} = {\begin{matrix} \exp (\frac{1}{2} θ^{*^{'}} θ^{*}) & if ‖ θ^{*} ‖ \leq r, \\ \frac{\exp (- \frac{1}{2} {[{(1 - \frac{r^{2}}{| | θ^{*} | |^{2}})}^{1 / 2} θ^{*}]}^{'} [{(1 - \frac{r^{2}}{| | θ^{*} | |^{2}})}^{1 / 2} θ^{*}])}{\exp (- \frac{1}{2} θ^{*}^{'} θ^{*})} = \exp (\frac{1}{2} r^{2}) & if ‖ θ^{*} ‖ > r, \end{matrix} \end{array}

where $\tilde{q} (.)$ is the posterior kernel of (θ*, ξ) and ${\tilde{q}}_{r} (.)$ is the perturbed density of $\tilde{q} (.)$ . For the Dr.IDR estimator, we also consider $θ^{*} = \sum_{c}^{- 1 ∕ 2} (μ - μ_{c} - {(0.0709, 0.0709)}^{'})$ , where 0.0709 is the average of the posterior standard deviations of μ₁ and μ₂, for a sensitivity analysis.

For the GDr estimator, we consider two blocks as θ = (σ₁, σ₂, ρ)′ and ξ = (μ₁, μ₂)′ instead. Note that the posterior mean for θ is (1.015, 0.967, 0.727)′ and the posterior mode for θ is (0.985, 0.939, 0.727)′. We first choose θ₀ = (1, 1, 0.7)′ (also θ₀ = (0.5, 0.5, 0)′ for a sensitivity analysis) and estimated c(θ₀) by the IDR approach. By using the same standardization $ξ^{*} = \sum_{c}^{- 1 ∕ 2} (μ - μ_{c})$ , (20) becomes

\hat{c} (θ_{0}) = \frac{π r^{2} \bar{q} (θ_{0}, 0)}{\frac{1}{n} \sum_{i = 1}^{n} \frac{\bar{q_{r}} (θ_{0}, ξ^{* (i)})}{\bar{q} (θ_{0}, ξ^{* (i)})} - 1},

where $\overset{‒}{q} (θ_{0}, 0) = q (1, 1, 0.7, μ_{c}) {∣ \sum_{c} ∣}^{1 ∕ 2}$ with the corresponding perturbed density $\bar{q_{r}} (.)$ , and

\frac{\bar{q_{r}} (θ_{0}, ξ^{*})}{\bar{q} (θ_{0}, ξ^{*})} = {\begin{matrix} \exp (\frac{1}{2} ξ^{*^{'}} ξ^{*}) & if ‖ ξ^{*} ‖ \leq r, \\ \exp (\frac{1}{2} r^{2}) & if ‖ ξ^{*} ‖ > r . \end{matrix}

Secondly, we use the CMDE to estimate π(θ₀|D) since π(θ|ξ, D) is an inverse Wishart distribution with v_m + 1 degrees of freedom and scale matrix $Λ_{0} + \sum_{j = 1}^{m} (y_{j} - ξ) {(y_{j} - ξ)}^{'} + κ_{0} (ξ - μ_{0}) {(ξ - μ_{0})}^{'}$ . Specifically $\hat{π} (θ_{0} ∣ D) = \frac{1}{n} \sum_{i = 1}^{n} π (θ_{0} ∣ ξ^{(i)}, D)$ . We note that for the GDr estimator, { $θ^{(i)} = {(σ_{1}^{(i)}, σ_{2}^{(i)}, ρ^{(i)})}^{'}$ , ξ⁽ⁱ⁾ = μ⁽ⁱ⁾, i = 1, …, n} is a MC sample from the posterior distribution with the kernel q(μ, ∑), $\sum^{(i)} = (\begin{matrix} {(σ_{1}^{(i)})}^{2} & ρ^{(i)} σ_{1}^{(i)} σ_{2}^{(i)} \\ ρ^{(i)} σ_{1}^{(i)} σ_{2}^{(i)} & {(σ_{2}^{(i)})}^{2} \end{matrix})$ , $\sum_{c}^{(i)} = \sum^{(i)} ∕ (m + ν_{0})$ , and $ξ^{* (i)} = {(\sum_{c}^{(i)})}^{- 1 ∕ 2} (μ^{(i)} - μ_{c})$ for i = 1, …, n.

Considering that the GDr estimator requires an extra MCMC sample, we set the size (n = 5, 000) of each MCMC sample for the GDr estimator as half of the MCMC sample size (n = 10, 000) for the IDR and Dr.IDR estimators in each replicate. Table 2 summarizes the results of IDR, Dr.IDR, and GDr estimators based on 1,000 replicates when r = 0.001, 0.5, 1, 1.5. It is not surprising that the IDR estimator is sensitive to the specification of the “center” value as the values of SE and RMSE in columns 6 and 7 are much larger than those corresponding values in columns 3 and 4. As expected, both the Dr.IDR and GDr estimators outperform the IDR estimator. The Dr.IDR estimator has the best performance when the posterior mean is used in the standardization since in this case it enjoys the dimensional reduction and the closed form of the conditional marginal likelihood, which is estimated in the GDr estimator. The GDr estimator yields the results comparable to the Dr.IDR estimator. It is interesting to see that unlike the IDR estimator, both the Dr.IDR and GDr estimators are quite robust in the choices of the “center” values of μ_c for Dr.IDR and θ₀ for GDr. Certainly, better choices of μ_c and θ₀ do yield slightly smaller values of SE and RMSE as expected.

Table 2:

Simulation Results of the IDR, Dr.IDR, and GDr estimators for a bivariate normal example with 1000 replicates: mean, standard error (SE), and root mean squared error (RMSE) of the marginal likelihood estimate in the log scale

	Mean	SE	RMSE	Mean	SE	RMSE
	IDR with a “center” value of ${(μ_{1}, μ_{2}, \log σ_{1}^{2}, \log {(ρ + 1) ∕ (1 - ρ)}, \log σ_{2}^{2})}^{'}$

	(−0.0281, 0.0401, 0.0097, 1.8471, −0.0389)’			(0.0437, 0.1101, 0.1093, 1.9891, 0.0615)’

r=1.5	−509.124	0.141	1.852	−511.609	0.310	4.342
r=1.0	−508.910	0.043	1.633	−511.118	0.166	3.844
r=0.5	−508.774	0.052	1.497	−510.597	0.250	3.329
r=0.001	−508.711	0.137	1.440	−511.592	0.373	4.331

	Dr.IDR with

	$\sum_{c}^{- 0.5} (μ - μ_{c})$			$\sum_{c}^{- 0.5} (μ - μ_{c} - {(0.0709, 0.0709)}^{'})$

r=1.5	−507.278	0.007	0.007	−507.279	0.019	0.019
r=1.0	−507.278	0.004	0.004	−507.278	0.016	0.016
r=0.5	−507.278	0.002	0.002	−507.278	0.015	0.015
r=0.001	−507.278	0.000	0.000	−507.276	0.030	0.030

	GDr with (σ₁₀, σ₂₀, ρ₀)′

	(1, 1, 0.7)’			(0.5,0.5,0.0)’

r=1.5	−507.278	0.010	0.010	−507.279	0.019	0.019
r=1.0	−507.277	0.006	0.006	−507.279	0.017	0.017
r=0.5	−507.277	0.003	0.003	−507.279	0.016	0.016
r=0.001	−507.278	0.001	0.001	−507.278	0.016	0.016

Open in a new tab

Remark 5.1: From Table 1 in Simulation Study I, we see that the IDR estimator has the smallest SE and RMSE when r = 0.01 for p =1, 2, r = 1 for p = 3, and r = 1.5 for p = 5. The results shown in Table 2 in Simulation Study II indicate that a smaller but not too small value of r generally yields a smaller SE or RMSE for IDR, Dr.IDR, as well as GDr. In practice, as suggested in [13], we first calculate the IDR, Dr.IDR, and GDr estimates for a grid of different values of r, then use the overlapping batch statistics approach of [17] to compute the Monte Carlo errors of these estimates, and finally choose an “optimal” r* as the r for which the Monte Carlo errors are minimum.

Remark 5.2: In this simulation study, the two blocks θ = (μ₁, μ₂)^′ and ξ = (σ₁, σ₂, ρ)′ yielded an efficient Dr.IDR estimator as shown in Table 2. Another natural combination of these parameters would be θ = (σ₁,σ₂,ρ)′ and ξ = (μ₁, μ₂)′. Under this combination, the dimension of the reduced parameter space corresponding to θ is 3 while an additional transformation (log σ₁, log σ₂, log{(ρ + 1)/(1 − ρ)})′ is also needed. After this transformation, the full conditional distribution given ξ is not available analytically. Furthermore, the closed-form expression of θ₀(ξ) in (11) for the perturbed density is also no longer available. Thus, this combination of the parameters leads to a less efficient and more computationally extensive implementation of Dr.IDR. Thus, an appropriate combination of the parameters should be carefully selected in order to implement Dr.IDR efficiently.

6. A Case Study: Inequality-Constrained Analysis of Variance

In this section, we consider the inequality-constrained analysis of variance model to demonstrate how to compute the normalizing constant using the GDr estimator. We use the data in Hoijtink et al. [18] for studying amnesia in patients with dissociative identity disorder (DID). We call this data set as the DID data thereafter. We model memory performance score y_kj for the k^th subject in group j as an independent observation from a normal distribution with mean μ_j and variance σ² for k = 1, 2, …, m_j and j = 1, …, J. In the DID data, J = 4, m₁ = 19, m₂ = 25, m₃ = 25, and m₄ = 25. The sample means of the memory performance scores are 3.105, 13.28, 1.88, and 4.56, respectively, for these four groups. Let $D = {m, y} = {m = \sum_{j = 1}^{J} m_{j}, y_{11}, \dots, y_{m_{1} 1}, \dots, y_{1 J}, \dots, y_{m_{J} J}}$ denote the observed data. Then, the likelihood function is given by

L (μ, σ^{2} ∣ D) = {(2 π σ^{2})}^{- \frac{m}{2}} \exp {- \frac{1}{2 σ^{2}} \sum_{j = 1}^{J} \sum_{k = 1}^{m_{j}} {(y_{k_{j}} - μ_{j})}^{2}},

where μ = (μ₁, …, μ_J)′. Let Ω denote the parameter space. Following Chen and Kim [19] and Wang et al. [15], we specify the joint prior of (μ, σ²) as

π (μ, σ^{2} | a_{0}) \propto π^{*} (μ, σ^{2} | a_{0}) = [{(2 π σ^{2})}^{- \frac{J}{2}} \prod_{j = 1}^{J} {(a_{0} m_{j})}^{1 / 2} \exp {- \frac{a_{0} m_{j}}{2 σ^{2}} μ_{j}^{2}}] \times \frac{b_{02}^{b_{01}}}{Γ (b_{01})} {(σ^{2})}^{- b_{01} - 1} \exp (- \frac{b_{02}}{σ^{2}}),

(22)

for (μ, σ²) ∈ Ω where a₀ > 0 is a scalar parameter and b₀₁ > 0 and b₀₂ > 0 are prespecified hyperparameters.

Following Chen and Kim [19] and Wang et al. [15], we specify b₀₁ = b₀₂ = 0.0001 and a₀ = 0.01. We also consider the constrained parameter space: Ω = {μ₂ > (μ₁, μ₄) > μ₃, σ² > 0}. Under this setting, the posterior kernel is given by

q (μ, σ^{2}) = L (μ, σ^{2} | D) π^{*} (μ, σ^{2} | a_{0}),

(23)

where π*(μ, σ²|a₀) is defined in (22). We are interested in computing

c = \int_{Ω} q (μ, σ^{2}) d μ d σ^{2} .

(24)

Under the constrained parameter space, c in (24) is a normalizing constant but it is not the marginal likelihood since $\int_{Ω} π^{*} (μ, σ^{2} ∣ a_{0}) d μ d σ^{2} \neq 1$ . Also, the closed form expression of c in (24) is not available. However, for the unconstrained parameter space Ω_U = {(μ, σ²) : −∞ < μ_j < ∞, j = 1, …, 4, σ² > 0}, we have

c_{U} = {\int_{Ω_{U}} q (μ, σ^{2}) d μ d σ^{2} = (2 π)}^{- \frac{m}{2}} \frac{b_{02}^{b_{01}} Γ (b_{01} + \frac{m}{2})}{Γ (b_{01})} {(\frac{a_{0}}{1 + a_{0}})}^{\frac{J}{2}} \times {[b_{02} + \frac{1}{2} Σ_{j = 1}^{J} {Σ_{k = 1}^{m_{j}} {(y_{k j} - \bar{y} {}_{j})}^{2} + \frac{a_{0} m_{j}}{1 + a_{0}} {({\bar{y}}_{j})}^{2}}]}^{- (α_{0} + \frac{m}{2})},

(25)

where ${\overset{‒}{y}}_{j} = \frac{1}{m_{j}} \sum_{k = 1}^{m_{j}} {yk}_{j}$ for j = 1, …, J. Since $\int_{Ω_{U}} π^{*} (μ, σ^{2} ∣ a_{0}) d μ d σ^{2} = 1$ , c_U is the marginal likelihood. Let 1{A} denote the indicator function such that 1{A} = 1 if A is true and 0 if A is false. Observe that

c = \int_{Ω_{U}} q (μ, σ^{2}) 1 {(μ, σ^{2}) \in Ω} d μ d σ^{2} = c_{U} \int_{Ω_{U}} 1 {(μ, σ^{2}) \in Ω} \frac{q (μ, σ^{2})}{c_{U}} d μ d σ^{2} = c_{U} E_{U} [1 {(μ, σ^{2}) \in Ω} | D],

(26)

where the posterior expectation is taken with respect to the posterior distribution $\frac{q (μ, σ^{2})}{c_{U}}$ . From (26), we see that the posterior probability $P_{U} ((μ, σ^{2}) \in Ω ∣ D) = E_{U} [1 {(μ, σ^{2}) \in Ω} ∣ D] = \frac{c}{c_{U}}$ .

Next, we present the detailed formulation of the GDr estimator of c and then use the identity given in (26) to empirically validate the accuracy of the GDr estimator since in this case, the “true” value of c is unknown. We set θ = (μ₁, μ₄, σ²)′ and ξ = (μ₂, μ₃). Let ζ = (θ′, ξ′)′. Let $θ_{0} = {(μ_{10}, μ_{40}, σ_{0}^{2})}^{'}$ denote the posterior mean of θ under the posterior distribution π(ζ|D) with the constrained parameter space. Using (19), we have

c = \frac{c (μ_{10}, μ_{40}, σ_{0}^{2})}{π (μ_{10}, μ_{40}, σ_{0}^{2} | D)},

(27)

where $c (μ_{10}, μ_{40}, σ_{0}^{2}) = \int_{Ω} q (μ_{10}, μ_{2}, μ_{3}, μ_{40}, σ_{0}^{2}) d μ_{2} d μ_{3}$ . After some lengthy algebra, we obtain an analytic expression of $c (μ_{10}, μ_{40}, σ_{0}^{2})$ as follows

c (μ_{10}, μ_{40}, σ_{0}^{2}) = {(2 π)}^{- \frac{m + J - 2}{2}} [Π_{j = 1}^{J} {(a_{0} m_{j})}^{1 / 2}] \frac{b_{02}^{b_{01}}}{Γ (b_{01})} {(σ_{0}^{2})}^{- (\frac{m + J - 2}{2} + b_{01} + 1)} \times {[{(a_{0} + 1)}^{2} m_{2} m_{3}]}^{- 1} \times \exp (- \frac{1}{σ_{0}^{2}} [b_{02} + \frac{1}{2} Σ_{j = 1}^{J} {Σ_{k = 1}^{m_{j}} {({yk}_{j} - {\bar{y}}_{j})}^{2} + \frac{a_{0}}{1 + a_{0}} m_{j} {\bar{y}}_{j}^{2}}]) \times \exp [- \frac{a_{0} + 1}{2 σ_{0}^{2}} {m_{1} {(μ_{10} - \frac{{\bar{y}}_{1}}{a_{0} + 1})}^{2} + m_{4} {(μ_{40} - \frac{{\bar{y}}_{4}}{a_{0} + 1})}^{2}}] \times (1 - Φ [{\frac{(a_{0} + 1) m_{2}}{σ_{0}^{2}}}^{\frac{1}{2}} (μ_{10} \lor μ_{40} - \frac{{\bar{y}}_{2}}{a_{0} + 1})]) \times Φ [{\frac{(a_{0} + 1) m_{3}}{σ_{0}^{2}}}^{\frac{1}{2}} (μ_{10} \land μ_{40} - \frac{{\bar{y}}_{3}}{a_{0} + 1})],

where μ₁₀ ∨ μ₄₀ = max{μ₁₀, μ₄₀}, μ₁₀ ∧ μ₄₀ = min{μ₁₀, μ₄₀}, and Φ(·) is the standard normal N(0, 1) cumulative distribution function. For the term in the denominator of (27), we write

π (μ_{10}, μ_{40}, σ_{0}^{2} | D) = π (μ_{10}, μ_{40} | D) π (σ_{0}^{2} | μ_{10}, μ_{40}, D) .

We generate one MCMC sample ${(μ_{2, 1}^{(i)}, μ_{3, 1}^{(i)}, σ_{1}^{2 (i)}), i = 1, \dots, n_{1}}$ from the posterior π(ζ|D) with the constrained parameters and another MCMC sample ${(μ_{2, 2}^{(i)}, μ_{3, 2}^{(i)}), i = 1, \dots, n_{2}}$ from the conditional posterior distribution π(μ₂, μ₃|μ₁₀, μ₄₀, D). Then, the CMDE of π(μ₁₀, μ₄₀|D) is given by

\hat{π} (μ_{10}, μ_{40} | D) = \frac{1}{n_{1}} \sum_{i = 1}^{n_{1}} π (μ_{10} | μ_{2, 1}^{(i)}, μ_{3, 1}^{(i)}, σ_{1}^{2 (i)}, D) π (μ_{40} | μ_{2, 1}^{(i)}, μ_{3, 1}^{(i)}, σ_{1}^{2 (i)}, D),

where

π (μ_{10} | μ_{2}, μ_{3}, σ^{2}, D) = \frac{{\frac{(a_{0} + 1) m_{1}}{2 π σ^{2}}}^{\frac{1}{2}} \exp {- \frac{(a_{0} + 1) m_{1}}{2 σ^{2}} {(μ_{10} - \frac{{\bar{y}}_{1}}{a_{0} + 1})}^{2}}}{Φ [{\frac{(a_{0} + 1) m_{1}}{σ^{2}}}^{\frac{1}{2}} (μ_{2} - \frac{{\bar{y}}_{1}}{a_{0} + 1})] - Φ [{\frac{(a_{0} + 1) m_{1}}{σ^{2}}}^{\frac{1}{2}} (μ_{3} - \frac{{\bar{y}}_{1}}{a_{0} + 1})]}, π (μ_{40} | μ_{2}, μ_{3}, σ^{2}, D) = \frac{{\frac{(a_{0} + 1) m_{4}}{2 π σ^{2}}}^{\frac{1}{2}} \exp {- \frac{(a_{0} + 1) m_{4}}{2 σ^{2}} {(μ_{40} - \frac{{\bar{y}}_{4}}{a_{0} + 1})}^{2}}}{Φ [{\frac{(a_{0} + 1) m_{4}}{σ^{2}}}^{\frac{1}{2}} (μ_{2} - \frac{{\bar{y}}_{4}}{a_{0} + 1})] - Φ [{\frac{(a_{0} + 1) m_{4}}{σ^{2}}}^{\frac{1}{2}} (μ_{3} - \frac{{\bar{y}}_{4}}{a_{0} + 1})]} .

The CMDE of $π (σ_{0}^{2} ∣ μ_{10}, μ_{40}, D)$ is also available, which is given by

\hat{π} (σ_{0}^{2} | μ_{10}, μ_{40}, D) = \frac{1}{n_{2}} \sum_{i = 1}^{n_{2}} π (σ_{0}^{2} | μ_{10}, μ_{2, 2}^{(i)}, μ_{3, 2}^{(i)}, μ_{40}, D),

where

π (σ_{0}^{2} | μ_{1}, μ_{2}, μ_{3}, μ_{4}, D) = \frac{{[b_{02} + \frac{1}{2} \sum_{j = 1}^{J} {S_{j}^{2} + \frac{a_{0}}{a_{0} + 1} m_{j} {\bar{y}}_{j}^{2} + (a_{0} + 1) m_{j} {(μ_{j} - \frac{{\bar{y}}_{j}}{a_{0} + 1})}^{2}}]}^{\frac{m + J}{2} + b_{01}}}{{(σ_{0}^{2})}^{\frac{m + J}{2} + b_{01} + 1} Γ (\frac{m + J}{2} + b_{01})} \times \exp (- \frac{1}{σ_{0}^{2}} [b_{02} + \frac{1}{2} Σ_{j = 1}^{J} {S_{j}^{2} + \frac{a_{0}}{a_{0} + 1} m {}_{j}{\bar{y}}_{j}^{2} + (a_{0} + 1) m_{j} {(μ_{j} - \frac{{\bar{y}}_{j}}{a_{0} + 1})}^{2}}]), and S_{j}^{2} = \sum_{k = 1}^{m_{j}} {({yk}_{j} - {\overset{‒}{y}}_{j})}^{2} .

For the DID data, we first generate an MCMC sample of size 50,000 from each of the posterior distributions with or without constraints to compute the posterior means, the posterior standard deviations (SDs), and the 95% highest posterior density (HPD) intervals of μ and σ². The results are shown in Table 3. We see from this table that both sets of the posterior estimates are very close to each other, implying that the hypothesis of μ₂ > (μ₁, μ₄) > μ₃ is likely true. Notice that these four means are from the four groups consisting of DID, mimic normal, symptom simulated, and true amnesic subjects. Next, we compute log c_U as the closed form expression given in (25) and use the dropping-needle method discussed in Hoijtink et al. [18] to calculate $E_{U} [1 {(μ, σ^{2}) \in Ω} ∣ D]$ using an MCMC sample {(μ⁽ⁱ⁾, σ²⁽ⁱ⁾), i = 1, …, n} from the posterior distribution without constraints on μ. We further generate two MCMC samples to estimate log c. Using (27), we have $\log \hat{c} = \log c (μ_{10}, μ_{40}, σ_{0}^{2}) - \log \hat{π} (μ_{10}, μ_{40} ∣ D) - \log \hat{π} (σ_{0}^{2} ∣ μ_{10}, μ_{40}, D)$ . The results of these Monte Carlo estimates are reported in Table 3. We see from Table 3 that the estimate ${\hat{E}}_{U} [1 {(μ, σ^{2}) \in Ω} ∣ D]$ based on the dropping-needle method is very close to $\hat{c} ∕ C_{U}$ based on the GDr estimate for various MC sample sizes and ${\hat{E}}_{U} [1 {(μ, σ^{2}) \in Ω} ∣ D] \approx 0.988$ . The high posterior probability of {ζ ∈ Ω} implies that the constraints on μ are most likely to hold and also provides a further explanation of why the two sets of the posterior estimates of μ and σ² in Table 3 are very similar. We note that since the GDr estimate requires two MCMC samples, we choose n₁ = n₂ = n/2 to make fair comparison between two different methods. These empirical results indicate that the “true” value of log(c) is very likely to be −205.645.

Table 3:

Posterior Estimates of μ and σ² and MC Estimates of the Posterior Probability and the Normalizing Constant for the DID Data

Parameter	Unconstrained Model			Constrained Model

	Posterior Mean	SD	95% HPD Interval	Posterior Mean	SD	95% HPD Interval
μ₁	3.073	0.405	(2.271, 3.856)	3.085	0.393	(2.326, 3.865)
μ₂	13.148	0.351	(12.465, 13.847)	13.149	0.354	(12.467, 13.863)
μ₃	1.862	0.351	(1.170, 2.545)	1.856	0.349	(1.182, 2.550)
μ₄	4.516	0.354	(3.836, 5.229)	4.516	0.351	(3.825, 5.197)
σ²	3.147	0.469	(2.282, 4.083)	3.145	0.471	(2.283, 4.085)
Monte Carlo Estimates of E_U [1{(μ, σ²) ∈ Ω}\|D] and c

log c_U	n	${\hat{E}}_{U} [1 {(μ, σ^{2}) \in Ω} ∣ D]$		n₁ = n₂	$\log \hat{c}$	$\hat{c} ∕ c_{U}$

−205.63261	50,000	0.98810		25,000	−205.64522	0.98747
	200,000	0.98781		100,000	−205.64520	0.98748
	500,000	0.98776		250,000	−205.64470	0.98799

Open in a new tab

Finally, we note that instead of setting θ = (μ₁, μ₄, σ²)′ and ξ = (μ₂, μ₃), we can also take θ = (μ₂, μ₃, σ²)′ and ξ = (μ₁, μ₄). Let $θ_{0} = {(μ_{20}, μ_{30}, σ_{0}^{2})}^{'}$ denote the posterior mean of θ under the posterior distribution π(ξ|D) with the constrained parameter space. Then we have

c = \frac{c (μ_{20}, μ_{30}, σ_{0}^{2})}{π (μ_{20}, μ_{30}, σ_{0}^{2} | D)},

(28)

where $c (μ_{20}, μ_{30}, σ_{0}^{2}) = \int_{Ω} q (μ_{1}, μ_{20}, μ_{30}, μ_{4}, σ_{0}^{2}) d μ_{1} d μ_{4}$ . Similar to (27), a closed form expression of $c (μ_{20}, μ_{30}, σ_{0}^{2})$ is available. For the term in the denominator of (28), we write

π (μ_{20}, μ_{30}, σ_{0}^{2} | D) = π (μ_{20}, μ_{30} | D) π (σ_{0}^{2} | μ_{20}, μ_{30}, D) .

Again, the CMDEs are available for estimating π(μ₂₀, μ₃₀|D) and $π (σ_{0}^{2} ∣ μ_{20}, μ_{30}, D)$ . This formulation is as efficient as the one given in (27).

7. Discussion

In this paper, we first examine properties of the IDR estimator and we find that the IDR estimator is most efficient in two dimensions when the distribution is approximately normal. We then develop an extension of the IDR estimator, called the Dr.IDR estimator. The Dr.IDR estimator is more attractive than the IDR estimator since it allows for dimension reduction and it also has a nice connection to marginal posterior density estimation. The GDr estimator is constructed using the identity based on the marginal posterior density. This new identity, which is given in (19), is a natural extension of the identity of Chib [3] based on the full posterior density. Both the Dr.IDR and GDr estimators are potentially useful in computing marginal likelihoods or normalizing constants for the models with high-dimensional parameters or the complex structure such as constraints on the model parameters and the autocorrelation structure in a time-series model or the spatial structure in a spatial-temporal model. Computing the normalizing constant in (24) for the DID data is quite interesting. We use two different approaches for computing this constant and the results are quite comparable. There are other methods, such as the stepping stone approaches of Xie et al. [9] and Fan et al. [10] and the PWK method of Wang et al. [11], that can be used to compute this normalizing constant. These additional comparisons and further investigation of applicability of the Dr.IDR and GDr estimators in high-dimensional problems are interesting future research projects, which are currently under investigation.

Acknowledgments

The authors gratefully thank the Editor in Chief, the Editor, the Associate Editor, and the three anonymous reviewers for their constructive comments and suggestions that help improve the article. This material is based upon work partially supported by the National Science Foundation under Grant No. DEB-1354146. Dr. M.-H. Chen’s research was also partially supported by NIH grants #GM70335 and #P01CA142538.

Footnotes

Publisher's Disclaimer: This Author Accepted Manuscript is a PDF file of an unedited peer-reviewed manuscript that has been accepted for publication but has not been copyedited or corrected. The official version of record that is published in the journal is kept up to date and so may therefore differ from this version.

Contributor Information

Yu-Bo Wang, School of Mathematical and Statistical Sciences, Clemson University.

Ming-Hui Chen, Department of Statistics, University of Connecticut.

Wei Shi, Department of Statistics, University of Connecticut.

Paul Lewis, Department of Ecology and Evolutionary Biology, University of Connecticut.

Lynn Kuo, Department of Statistics, University of Connecticut.

References

[1].Kahn H, Random sampling Monte Carlo techniques in neutron attenuation problems, Nucleonics 6 (1950) 27–37. [PubMed] [Google Scholar]
[2].Newton MA, Raftery AE, Approximate Bayesian inference by the weighted likelihood bootstrap, Journal of the Royal Statistical Society, Series B 56 (1994) 3–48. [Google Scholar]
[3].Chib S, Marginal likelihood from the Gibbs output, Journal of the American Statistical Association 90 (1995) 1313–1321. [Google Scholar]
[4].Chib S, Jeliazkov I, Marginal likelihood from the Metropolis-Hastings output, Journal of the American Statistical Association 96 (2001) 270–281. [Google Scholar]
[5].Gelman A, Meng X-L, Simulating normalizing constants: From importance sampling to bridge sampling to path sampling, Statistical Science 13 (1998) 163–185. [Google Scholar]
[6].Petris G, Tardella L, A geometric approach to transdimensional Markov chain Monte Carlo, The Canadian Journal of Statistics 31(4) (2003) 469–482. [Google Scholar]
[7].Chen M-H, Computing marginal likelihoods from a single MCMC output, Statistica Neerlandica 59 (2005) 16–29. [Google Scholar]
[8].Lartillot N, Philippe H, Computing Bayes factors using thermodynamic integration, Systematic Biology 55 (2006) 195–207. [DOI] [PubMed] [Google Scholar]
[9].Xie W, Lewis PO, Fan Y, Kuo L, Chen M-H, Improving marginal likelihood estimation for Bayesian phylogenetic model selection., Systematic Biology 60 (2011) 150–160. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Fan Y, Wu R, Chen M-H, Kuo L, Lewis PO, Choosing among partition models in Bayesian phylogenetics., Molecular Biology and Evolution 28 (2011) 523–532. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Wang Y-B, Chen M-H, Kuo L, Lewis PO, A new Monte Carlo method for estimating marginal likelihoods, Bayesian Analysis 13 (2018) 311–333. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Petris G, Tardella L, New perspectives for estimating normalizing constants via posterior simulation, Technical Report, Universita I di Roma La Sapienza; (2007). [Google Scholar]
[13].Arima S, Tardella L, Idr for marginal likelihood in Bayesian phylogenetics, in: Chen M-H, Kuo L, Lewis P (Eds.), Bayesian Phylogenetics — Methods, Algorithm, and Applications, CRC Press, 2014, pp. 25–57. [Google Scholar]
[14].Chen M-H, Importance-weighted marginal Bayesian posterior density estimation, Journal of the American Statistical Association 89 (1994) 818–824. [Google Scholar]
[15].Wang Y-B, Chen M-H, Kuo L, Lewis PO, Partition weighted approach for estimating the marginal posterior density with applications, Journal of Computational and Graphical Statistics, doi: 10.1080/10618600.2018.1529600 (2019) In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
[16].Gelfand AE, Smith AF, Lee T-M, Bayesian analysis of constrained parameter and truncated data problems using Gibbs sampling, Journal of the American Statistical Association 87 (1992) 523–532. [Google Scholar]
[17].Schmeiser BW, Avramidis TN, Hashem S, Overlapping batch statistics, in: Proceedings of the 22nd Conference on Winter Simulation, IEEE Press, pp. 395–398. [Google Scholar]
[18].Hoijtink H, Klugkist I, Boelen P, Bayesian Evaluation of Informative Hypotheses, Springer Science & Business Media, 2008. [Google Scholar]
[19].Chen M-H, Kim S, The Bayes factor versus other model selection criteria for the selection of constrained models, in: Hoijtink H, Klugkist I, Boelen P (Eds.), Bayesian Evaluation of Informative Hypotheses, Springer, 2008, pp. 155–180. [Google Scholar]

[R1] [1].Kahn H, Random sampling Monte Carlo techniques in neutron attenuation problems, Nucleonics 6 (1950) 27–37. [PubMed] [Google Scholar]

[R2] [2].Newton MA, Raftery AE, Approximate Bayesian inference by the weighted likelihood bootstrap, Journal of the Royal Statistical Society, Series B 56 (1994) 3–48. [Google Scholar]

[R3] [3].Chib S, Marginal likelihood from the Gibbs output, Journal of the American Statistical Association 90 (1995) 1313–1321. [Google Scholar]

[R4] [4].Chib S, Jeliazkov I, Marginal likelihood from the Metropolis-Hastings output, Journal of the American Statistical Association 96 (2001) 270–281. [Google Scholar]

[R5] [5].Gelman A, Meng X-L, Simulating normalizing constants: From importance sampling to bridge sampling to path sampling, Statistical Science 13 (1998) 163–185. [Google Scholar]

[R6] [6].Petris G, Tardella L, A geometric approach to transdimensional Markov chain Monte Carlo, The Canadian Journal of Statistics 31(4) (2003) 469–482. [Google Scholar]

[R7] [7].Chen M-H, Computing marginal likelihoods from a single MCMC output, Statistica Neerlandica 59 (2005) 16–29. [Google Scholar]

[R8] [8].Lartillot N, Philippe H, Computing Bayes factors using thermodynamic integration, Systematic Biology 55 (2006) 195–207. [DOI] [PubMed] [Google Scholar]

[R9] [9].Xie W, Lewis PO, Fan Y, Kuo L, Chen M-H, Improving marginal likelihood estimation for Bayesian phylogenetic model selection., Systematic Biology 60 (2011) 150–160. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Fan Y, Wu R, Chen M-H, Kuo L, Lewis PO, Choosing among partition models in Bayesian phylogenetics., Molecular Biology and Evolution 28 (2011) 523–532. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].Wang Y-B, Chen M-H, Kuo L, Lewis PO, A new Monte Carlo method for estimating marginal likelihoods, Bayesian Analysis 13 (2018) 311–333. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] [12].Petris G, Tardella L, New perspectives for estimating normalizing constants via posterior simulation, Technical Report, Universita I di Roma La Sapienza; (2007). [Google Scholar]

[R13] [13].Arima S, Tardella L, Idr for marginal likelihood in Bayesian phylogenetics, in: Chen M-H, Kuo L, Lewis P (Eds.), Bayesian Phylogenetics — Methods, Algorithm, and Applications, CRC Press, 2014, pp. 25–57. [Google Scholar]

[R14] [14].Chen M-H, Importance-weighted marginal Bayesian posterior density estimation, Journal of the American Statistical Association 89 (1994) 818–824. [Google Scholar]

[R15] [15].Wang Y-B, Chen M-H, Kuo L, Lewis PO, Partition weighted approach for estimating the marginal posterior density with applications, Journal of Computational and Graphical Statistics, doi: 10.1080/10618600.2018.1529600 (2019) In press. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] [16].Gelfand AE, Smith AF, Lee T-M, Bayesian analysis of constrained parameter and truncated data problems using Gibbs sampling, Journal of the American Statistical Association 87 (1992) 523–532. [Google Scholar]

[R17] [17].Schmeiser BW, Avramidis TN, Hashem S, Overlapping batch statistics, in: Proceedings of the 22nd Conference on Winter Simulation, IEEE Press, pp. 395–398. [Google Scholar]

[R18] [18].Hoijtink H, Klugkist I, Boelen P, Bayesian Evaluation of Informative Hypotheses, Springer Science & Business Media, 2008. [Google Scholar]

[R19] [19].Chen M-H, Kim S, The Bayes factor versus other model selection criteria for the selection of constrained models, in: Hoijtink H, Klugkist I, Boelen P (Eds.), Bayesian Evaluation of Informative Hypotheses, Springer, 2008, pp. 155–180. [Google Scholar]

PERMALINK

Inflated Density Ratio and Its Variation and Generalization for Computing Marginal Likelihoods

Yu-Bo Wang

Ming-Hui Chen

Wei Shi

Paul Lewis

Lynn Kuo

Abstract

1. Introduction

2. The Inflated Density Ratio (IDR) Estimator

Figure 1:

3. Dimension Reduced Inflated Density Ratio (Dr.IDR) Estimator

4. Generalization of the IDR Estimator

5. Simulation Studies

5.1. Simulation Study I

Table 1:

5.2. Simulation Study II

Table 2:

6. A Case Study: Inequality-Constrained Analysis of Variance

Table 3:

7. Discussion

Acknowledgments

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Inflated Density Ratio and Its Variation and Generalization for Computing Marginal Likelihoods

Yu-Bo Wang

Ming-Hui Chen

Wei Shi

Paul Lewis

Lynn Kuo

Abstract

1. Introduction

2. The Inflated Density Ratio (IDR) Estimator

Figure 1:

3. Dimension Reduced Inflated Density Ratio (Dr.IDR) Estimator

4. Generalization of the IDR Estimator

5. Simulation Studies

5.1. Simulation Study I

Table 1:

5.2. Simulation Study II

Table 2:

6. A Case Study: Inequality-Constrained Analysis of Variance

Table 3:

7. Discussion

Acknowledgments

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases