Abstract
In the Bayesian framework, the marginal likelihood plays an important role in variable selection and model comparison. The marginal likelihood is the marginal density of the data after integrating out the parameters over the parameter space. However, this quantity is often analytically intractable due to the complexity of the model. In this paper, we first examine the properties of the inflated density ratio (IDR) method, which is a Monte Carlo method for computing the marginal likelihood using a single MC or Markov chain Monte Carlo (MCMC) sample. We then develop a variation of the IDR estimator, called the dimension reduced inflated density ratio (Dr.IDR) estimator. We further propose a more general identity and then obtain a general dimension reduced (GDr) estimator. Simulation studies are conducted to examine empirical performance of the IDR estimator as well as the Dr.IDR and GDr estimators. We further demonstrate the usefulness of the GDr estimator for computing the normalizing constants in a case study on the inequality-constrained analysis of variance.
Keywords: CMDE, Conditional posterior density, Constrained parameter space, IWMDE, Marginal posterior density, 62F15, 62-M05
1. Introduction
In Bayesian analysis, different models can be compared by using their marginal likelihoods (also known as normalizing constants). Defined as the integral of the likelihood function times the prior distribution over the parameter space, the marginal likelihood measures the average fit of the model to the data, and is useful in the Bayesian hypothesis testing problem. The statistical theory of hypothesis testing is associated with methods for checking whether an improvement in fit is statistically significant. The Bayes factor (BF) is one of the methods to use in this scenario. BF is the ratio of the marginal likelihoods under the two competing models. Due to the complicated form of the product of the likelihood function and the prior, which is called the posterior kernel later, the marginal likelihood is often analytically intractable. Hence, computing the marginal likelihood or the normalizing constant is of great current interest and remains an active research area.
There is a vast literature on computing marginal likelihoods or normalizing constants, including but not limited to the importance sampling (IS) [1], the harmonic means (HM) [2], the posterior density based approaches of Chib [3] and Chib and Jeliazkov [4], the path sampling (PS) [5], the inflated density ratio (IDR) [6], the latent variable approach of Chen [7], the thermodynamic integration (TI) [8], the stepping stone method (SSM) [9, 10], and the partition weighted kernel (PWK) estimator [11]. These methods can be divided into two categories. The methods such as IS, HM, IDR, Chen [7], and PWK use only a single MCMC sample while the other methods such as Chib [3], TI, and SSM use multiple MCMC samples. These methods can be shown to asymptotically converge to the quantity of interest under some mild conditions. Varying in their use of information such as Markov chain Monte Carlo (MCMC) samples or the posterior kernel, they all have pros and cons in different applications. Among them, the IDR method is developed by introducing a perturbed density such that this density has an additionally calculable mass in the posterior mode and lighter tails than the posterior kernel, and can estimate the marginal likelihood precisely based on an MCMC sample from the posterior distribution. With such a control of the posterior tails, Petris and Tardella [12] also show that IDR has a finite variance in one-dimensional problems. However, the properties of the IDR estimator still have not been fully examined and its usefulness in statistical applications has not been fully explored.
In this paper, we first reveal an interesting property of the IDR estimator, that is, its superior efficiency in two-dimensional problems. This property is further confirmed in our simulation study. To broaden the applicability of IDR, we develop a variation of IDR, namely, the dimension reduced inflated density ratio (Dr.IDR) estimator. This new estimator not only enjoys the property of dimension reduction and but also has a nice connection to the marginal posterior density estimation. We further extend the idea of Dr.IDR to develop the generalized dimension reduced (GDr) estimator, which is constructed via a new identity based on the marginal posterior density. An interesting case study is carried out to demonstrate how to compute the normalizing constant for the inequality-constrained analysis of variance model. The new estimators developed in this paper have great potential for computing marginal likelihoods or normalizing constants for high-dimensional problems and complex models such as those with an inequality-constrained parameter space.
The rest of the article is organized as follows. Section 2 presents the formulation of the IDR estimator and examines its theoretical properties. In Section 3, we develop a variation of IDR, called the dimension reduced IDR (Dr.IDR) and discuss various properties of Dr.IDR. A new identity based on the marginal posterior density is introduced and a so-called generalized dimension reduced (GDr) estimator is then constructed in Section 4. Two simulation studies to examine empirical performance of the IDR, Dr.IDR, and GDr estimators are presented in Section 5. Section 6 is a case study, which demonstrates how to compute the normalizing constant for the inequality-constrained analysis of variance model. We conclude this paper with a brief discussion in Section 7.
2. The Inflated Density Ratio (IDR) Estimator
Suppose ζ is a p-dimensional vector of parameters with the unbounded support Ω. Given that L(ζ|D) is the likelihood function of parameters ζ given the data D and π(ζ) is a proper prior distribution in the sense that , the marginal likelihood is given by
| (1) |
where q(ζ) = L(ζ|D)π(ζ) is the posterior kernel function. The corresponding posterior distribution is written as
| (2) |
We note that when or but c < ∞, c is called the normalizing constant. When the model structure is complex, the constant c is often analytically intractable. However, a Markov chain Monte Carlo (MCMC) sample can still be generated from the posterior distribution given in (2) without knowing c.
To estimate (1), Petris and Tardella [6, 12] proposed the inflated density ratio (IDR) method, which requires a single Monte Carlo (MC) or MCMC sample from (2). This method first constructs a perturbed density qr(ζ) based on the posterior kernel q(ζ). Let ζ0 be the “center” of the posterior distribution, which is typically the posterior mode of ζ. Define qr(.) as
| (3) |
where r is a prespecified radius and h(ζ − ζ0) = (1 − rp/‖ζ − ζ0‖p)1/p(ζ − ζ0). It can be shown that the integral of qr(.) over the whole support is the sum of the target c and a tractable volume k = q(ζ0)br, where br = volume of the ball {ζ : ‖ζ − ζ0‖ ≤ r} = πp/2rp/Γ(p/2+1). Thus, we have the IDR identity given by
| (4) |
where is the expectation of with respect to the posterior distribution in (2). Based on the IDR identity in (4), the IDR estimator for c is given by
| (5) |
where {ζ(1), …, ζ(n)} is an MC/MCMC sample from the posterior distribution in (2). Figure 1 provides the visualization of qr(.) in a univariate normal example. The area under the curve qr(.) is k + c = 2rq(0) + 1 since ζ0 = 0 in this case.
Figure 1:

An illustration of perturbed density in the IDR method.
Petris and Tardella [12] showed that under certain conditions, such as the log-Lipschitz condition, the variance of is finite in one-dimensional problems. If {ζ(1), …, ζ(n)} is an MCMC sample from (2), under certain mild regularity conditions, such as time reversible, invariant, and irreducible conditions, it is easy to show that is a consistent estimator of c since the denominator of (5) converges to
as n → ∞. As pointed by Petris and Tardella [12], Arima and Tardella [13], and Wang et al. [11], the IDR method requires the full space of q(.) and a careful selection of the radius. Wang et al. [11] further pointed out that mode finding is essential and standardization of an MCMC sample with respect to the mode and the sample covariance matrix is required for IDR. Thus, the best way to use IDR is to obtain the perturbed density after the standardization as demonstrated in Section 5.2. If any parameter in the model has a bounded support, an additional transformation is needed to define the perturbed density qr(.). This extra effort may limit its applications especially for a problem with complicated constraints (see Section 6 for an example). During our empirical investigation, we found that the IDR has an interesting property in the bivariate standard normal case.
Consider the posterior kernel function as
| (6) |
where ζ = μ = (μ1, μ2)′ is a vector of unknown parameters with center at ζ0 = (0, 0)′. Under such a setting, the perturbed density is given by
so that
| (7) |
We see from (7) that the ratio term within the radius is always bounded by while the ratio term outside the radius is always constant, which implies that the variance or any higher-order moments of the IDR estimator are finite. We formally state this result in the following theorem.
Theorem 2.1. Given that {ζ(1), …, ζ(n)} is a random sample from the posterior density, any vth moment (v > 0) of the IDR estimator is finite for the bivariate normal distribution given in (6).
Proof. Letting , we have
Thus,
By the δ-method, the vth moment of the IDR estimator is finite. □
Following Theorem 2.1, we can see that the variance of IDR is finite. This result also implies that the IDR method is potentially efficient in two-dimensional problems if the posterior distribution approximates a bivariate normal distribution. To examine the empirical property of IDR, we carry out a simulation study in Section 5.1.
3. Dimension Reduced Inflated Density Ratio (Dr.IDR) Estimator
In this section, we develop a variation of IDR called the dimension reduced inflated density ratio estimator (Dr.IDR). Suppose ζ′ = (θ′, ξ′), where θ is a p1-dimensional vector of the parameters and ξ is a vector of the remaining p − p1 ≡ p2 parameters. We assume that the support of the marginal posterior density π(θ|D) = ∫ π(θ, ξ|D)dξ is Rp1 and θ0 is a ‘center’, such as the mode, of the marginal posterior distribution of θ. Then, we define the perturbed density as
| (8) |
where and r > 0 is a prespecified radius. Similar to the IDR identity given in (4), we obtain a new identity given by
| (9) |
where and is the expectation of with respect to the posterior distribution given by (2).
Let {(θ(i), ξ(i)), i = 1, …, n} denote an MC/MCMC sample from the posterior distribution π(ζ|D). If ∫ q(θ0, ξ)dξ is analytically available, using (9), a Dr.IDR estimate of c is given by
| (10) |
Under certain Ergodic conditions, we have .
Remark 3.1: In (9) and (10), the perturbed density qr(θ, ξ) in (8) is constructed on θ and the dimension of θ is p1, which is smaller than p for the original vector of the parameters, ζ. When p1 = p, (9) and (10) reduce to (4) and (5). For these reasons, is called the dimension reduced IDR (Dr.IDR) estimator.
Remark 3.2: The perturbed density in (8) can be extended as
| (11) |
where θ0(ξ) and r(ξ) are two functions of ξ. However, the Dr.IDR estimator in (10) needs to be modified as
| (12) |
where . In (12), both the center and the radius of each perturbed circle can be determined by the conditional distribution π(θ|ξ, D) instead of the marginal posterior distribution of π(θ|D), which may lead to a more efficient estimator.
Remark 3.3: In (10), we assume that ∫ q(θ0, ξ)dξ is analytically available. When c(θ0) = ∫ q(θ0, ξ)dξ is not available in the closed form, c(θ0) is the normalizing constant of the conditional posterior distribution of π(ξ|θ0, D). Since θ0 is a fixed point, we can generate a MC/MCMC sample {ξ(i), i = 1, …, n} from π(ξ|θ0, D). Then, we can use (5) or (10) to construct an estimator for c(θ0) and the resulting estimator of c may be termed as a Dr.IDR2 estimator. But, we notice that the version of given in (12) is difficult to extend if ∫ q(θ0(ξ), ξ)dξ is analytically intractable.
Remark 3.4: From the Dr.IDR identity in (9), we obtain
| (13) |
Since the term on the left hand side is precisely the marginal posterior density of θ evaluated at θ0, we obtain a new marginal posterior density estimator given by
| (14) |
Write
| (15) |
It is easy to show that ∫ w(θ|ξ)dθ = 1 and also (14) can be rewritten as
| (16) |
Therefore, is an importance weighted marginal density estimator (IWMDE) of [14]. However, the weight function w(θ|ξ) is not the conditional posterior density π(θ|ξ, D). Thus, the IDR marginal posterior density estimator may not be most efficient as the “best” estimator is the conditional marginal density estimator (CMDE) with w(θ|ξ) = π(θ|ξ, D) as discussed in [14] and [15].
4. Generalization of the IDR Estimator
Observing that the marginal posterior density of θ is defined as
we have the following identity
| (17) |
The above marginal posterior density based identity is a natural extension of the identity given in Chib (1995), in which the marginal likelihood is expressed as a ratio of the likelihood function times the prior and the posterior distribution of the model parameters.
We let c(θ) denote the term in the numerator of (17), i.e.,
| (18) |
Then, c(θ) is precisely the normalizing constant of the conditional posterior density of ξ given θ. We note that the identity (17) holds for all θ. Similar to [3], we take θ = θ0 as a high marginal density point such as the posterior mean or the posterior mode of θ. Then, (17) can be rewritten as
| (19) |
In (19), the original p-dimensional estimation problem of c becomes the computation of c(θ0) and π(θ0|D) each with smaller dimension (p2 and p1). If c(θ0) is analytically tractable, then we only need to compute π(θ0|D). If c(θ0) is intractable, as discussed in Remark 3.3, we first generate an MC/MCMC sample ξ(i), i = 1, …, n} from the conditional posterior distribution π(ξ|θ0, D) and then we can obtain an IDR estimator or a Dr.IDR estimator of c(θ0). For instance, an IDR estimator of c(θ0) is given by
| (20) |
where ξ0 is the posterior mode of ξ, ,
and and r > 0 is a prespecified radius, which may be different than the one in (3) or (8).
For the marginal posterior density π(θ0|D), which is the term in the denominator of (19), a Dr.IDR estimator given in (16) is readily available using an MC/MCMC sample {(θ(i), ξ(i)), i = 1, …, n} from the full posterior distribution π(ζ|D). As discussed in Remark 3.4, may not be most efficient. A more efficient estimator can be obtained using the CMDE of [16] or the partition weighted marginal density estimator (PWMDE) of [15]. We note that the PWMDE is a special case of the IWMDE of [14], which has the potential to achieve the “optimal” CMDE in estimating π(θ0|D). As empirically shown in [15], the PWMDE of π(θ0|D) is more efficient when θ0 is close to the “center” of the marginal posterior distribution π(θ|D). This is intuitively appealing since (i) the PWMDE uses the MC/MCMC sample {(θ(i), ξ(i)), i = 1, …, n} from the full posterior distribution π(ζ|D); (ii) fewer points of the θ(i)’s are around θ0 when θ0 is away from the “center”; and (iii) in this case, π(θ0|D) is more difficult to estimate and, consequently, the corresponding the PWMDE becomes less efficient. Therefore, the identity (19) works the best when θ0 is chosen as a high marginal density point.
The new identity (19) provides us a great flexibility in computing the marginal likelihood c. This identity also sheds light on obtaining an efficient Monte Carlo estimator of c for a posterior distribution with parameter space of high-dimension. An estimator based on (19) is termed as a generalized dimension reduced (GDr) estimator. In Section 5.2, we carry out a simulation study to examine the empirical performance of the IDR, Dr.IDR, and GDr estimators.
5. Simulation Studies
5.1. Simulation Study I
In this simulation study, we empirically examine whether the result established in Theorem 2.1 implies a potential gain in estimation efficiency of an IDR estimator for computing the normalizing constant of the bivariate normal distribution compared to the univariate normal distribution or the multivariate normal distribution with more than two dimensions. Assume that the posterior kernel is given by
| (21) |
where μ is a vector of p unknown parameters. We apply the IDR estimator to compute the normalizing constant of (21) with different p’s, and evaluate its performance by comparing the means, standard errors, and root mean square errors (RMSE) of estimates in log scale. Let denote the IDR estimate for the tth replicate of data and t = 1, 2, …, 1000. We have the mean and the standard error (SE) equal to and , respectively. The RMSE is defined as
where log c is 0 in this case. The results for p = 1, 2, 3, and 5 are summarized in Table 1. We see that when r ≤ 1, the IDR estimator for p = 2 has the superior performance compared to those for p ≠ 2. Interestingly, the IDR estimator for p = 3 has a smaller SE than the one for p = 1 for every value of r considered in this simulation. However, the IDR estimator for p = 5 has the worst performance among those for p = 1, 2, and 3. These simulation results empirically confirm that the result established in Theorem 2.1 does imply that > the IDR estimator has an “optimal” performance when p = 2 under the normal distributions.
Table 1:
Performance of the IDR method in p—dimensional normal distribution with 1000 replicates: mean, standard error (SE), and root mean squared error (RMSE) of the normalizing constant estimate in log scale
| Mean | SE | RMSE | Mean | SE | RMSE | ||||
|---|---|---|---|---|---|---|---|---|---|
| r=1.5 | p=1 | −0.00019 | 0.03222 | 0.03221 | r=0.5 | p=1 | −0.00021 | 0.01260 | 0.01260 |
| p=2 | 0.00050 | 0.00740 | 0.00742 | p=2 | 0.00003 | 0.00216 | 0.00216 | ||
| p=3 | −0.00005 | 0.00537 | 0.00537 | p=3 | −0.00009 | 0.00574 | 0.00574 | ||
| p=5 | 0.00017 | 0.01163 | 0.01163 | p=5 | 0.00014 | 0.03159 | 0.03157 | ||
| r=1 | p=1 | −0.00031 | 0.01938 | 0.01937 | r=0.01 | p=1 | −0.00012 | 0.00800 | 0.00800 |
| p=2 | −0.00023 | 0.00450 | 0.00451 | p=2 | 0.00000 | 0.00004 | 0.00004 | ||
| p=3 | 0.00008 | 0.00458 | 0.00457 | p=3 | −0.00016 | 0.00754 | 0.00754 | ||
| p=5 | 0.00077 | 0.01798 | 0.01799 | p=5 | 0.00420 | 0.06300 | 0.06311 | ||
5.2. Simulation Study II
In the second simulation study, we revisit the bivariate normal example in [11]. Assume that yj|μ, for j = 1, …, m where μ = (μ1, μ2)′, and are the unknown means and the variance-covariance matrix. We specify the conjugate priors
and
respectively. Therefore, the joint posterior kernel is given by
with Z = 2π2ν0Γ2(ν0/2)|Λ0|−ν0/2/κ0. Also, the marginal likelihood has the closed form
where , , κm = κ0 + m, and νm = ν0 + m. A random sample y with m = 200 is generated from a bivariate normal distribution with μ = (0, 0)′ and . The corresponding sample mean is (−0.029, 0.040)′, and the sample variance-covariance matrix S is . Under this setting and the pre-specified hyper-parameters μ0 = (0, 0)′, k0 = 0.01, ν0 = 3, and , the marginal likelihood in log scale is −507.278.
To implement the IDR estimator, we consider two “center” values of (−0.0281, 0.0401, 0.0097, 1.8471, −0.0389)’ and (0.0437, 0.1101, 0.1093, 1.9891, 0.0615)’ for the transformed parameter vector of , where (−0.0281, 0.0401, 0.0097, 1.8471, −0.0389)′ is the posterior mean.
To implement the Dr.IDR estimator, we consider two blocks as θ = (μ1, μ2)′ and ξ = (σ1, σ2, ρ)′. First, since θ|ξ, , we consider the standardization . With such a standardization, we then have and
where is the posterior kernel of (θ*, ξ) and is the perturbed density of . For the Dr.IDR estimator, we also consider , where 0.0709 is the average of the posterior standard deviations of μ1 and μ2, for a sensitivity analysis.
For the GDr estimator, we consider two blocks as θ = (σ1, σ2, ρ)′ and ξ = (μ1, μ2)′ instead. Note that the posterior mean for θ is (1.015, 0.967, 0.727)′ and the posterior mode for θ is (0.985, 0.939, 0.727)′. We first choose θ0 = (1, 1, 0.7)′ (also θ0 = (0.5, 0.5, 0)′ for a sensitivity analysis) and estimated c(θ0) by the IDR approach. By using the same standardization , (20) becomes
where with the corresponding perturbed density , and
Secondly, we use the CMDE to estimate π(θ0|D) since π(θ|ξ, D) is an inverse Wishart distribution with vm + 1 degrees of freedom and scale matrix . Specifically . We note that for the GDr estimator, {, ξ(i) = μ(i), i = 1, …, n} is a MC sample from the posterior distribution with the kernel q(μ, ∑), , , and for i = 1, …, n.
Considering that the GDr estimator requires an extra MCMC sample, we set the size (n = 5, 000) of each MCMC sample for the GDr estimator as half of the MCMC sample size (n = 10, 000) for the IDR and Dr.IDR estimators in each replicate. Table 2 summarizes the results of IDR, Dr.IDR, and GDr estimators based on 1,000 replicates when r = 0.001, 0.5, 1, 1.5. It is not surprising that the IDR estimator is sensitive to the specification of the “center” value as the values of SE and RMSE in columns 6 and 7 are much larger than those corresponding values in columns 3 and 4. As expected, both the Dr.IDR and GDr estimators outperform the IDR estimator. The Dr.IDR estimator has the best performance when the posterior mean is used in the standardization since in this case it enjoys the dimensional reduction and the closed form of the conditional marginal likelihood, which is estimated in the GDr estimator. The GDr estimator yields the results comparable to the Dr.IDR estimator. It is interesting to see that unlike the IDR estimator, both the Dr.IDR and GDr estimators are quite robust in the choices of the “center” values of μc for Dr.IDR and θ0 for GDr. Certainly, better choices of μc and θ0 do yield slightly smaller values of SE and RMSE as expected.
Table 2:
Simulation Results of the IDR, Dr.IDR, and GDr estimators for a bivariate normal example with 1000 replicates: mean, standard error (SE), and root mean squared error (RMSE) of the marginal likelihood estimate in the log scale
| Mean | SE | RMSE | Mean | SE | RMSE | |
|---|---|---|---|---|---|---|
| IDR with a “center” value of | ||||||
| (−0.0281, 0.0401, 0.0097, 1.8471, −0.0389)’ | (0.0437, 0.1101, 0.1093, 1.9891, 0.0615)’ | |||||
| r=1.5 | −509.124 | 0.141 | 1.852 | −511.609 | 0.310 | 4.342 |
| r=1.0 | −508.910 | 0.043 | 1.633 | −511.118 | 0.166 | 3.844 |
| r=0.5 | −508.774 | 0.052 | 1.497 | −510.597 | 0.250 | 3.329 |
| r=0.001 | −508.711 | 0.137 | 1.440 | −511.592 | 0.373 | 4.331 |
| Dr.IDR with | ||||||
| r=1.5 | −507.278 | 0.007 | 0.007 | −507.279 | 0.019 | 0.019 |
| r=1.0 | −507.278 | 0.004 | 0.004 | −507.278 | 0.016 | 0.016 |
| r=0.5 | −507.278 | 0.002 | 0.002 | −507.278 | 0.015 | 0.015 |
| r=0.001 | −507.278 | 0.000 | 0.000 | −507.276 | 0.030 | 0.030 |
| GDr with (σ10, σ20, ρ0)′ | ||||||
| (1, 1, 0.7)’ | (0.5,0.5,0.0)’ | |||||
| r=1.5 | −507.278 | 0.010 | 0.010 | −507.279 | 0.019 | 0.019 |
| r=1.0 | −507.277 | 0.006 | 0.006 | −507.279 | 0.017 | 0.017 |
| r=0.5 | −507.277 | 0.003 | 0.003 | −507.279 | 0.016 | 0.016 |
| r=0.001 | −507.278 | 0.001 | 0.001 | −507.278 | 0.016 | 0.016 |
Remark 5.1: From Table 1 in Simulation Study I, we see that the IDR estimator has the smallest SE and RMSE when r = 0.01 for p =1, 2, r = 1 for p = 3, and r = 1.5 for p = 5. The results shown in Table 2 in Simulation Study II indicate that a smaller but not too small value of r generally yields a smaller SE or RMSE for IDR, Dr.IDR, as well as GDr. In practice, as suggested in [13], we first calculate the IDR, Dr.IDR, and GDr estimates for a grid of different values of r, then use the overlapping batch statistics approach of [17] to compute the Monte Carlo errors of these estimates, and finally choose an “optimal” r* as the r for which the Monte Carlo errors are minimum.
Remark 5.2: In this simulation study, the two blocks θ = (μ1, μ2)′ and ξ = (σ1, σ2, ρ)′ yielded an efficient Dr.IDR estimator as shown in Table 2. Another natural combination of these parameters would be θ = (σ1,σ2,ρ)′ and ξ = (μ1, μ2)′. Under this combination, the dimension of the reduced parameter space corresponding to θ is 3 while an additional transformation (log σ1, log σ2, log{(ρ + 1)/(1 − ρ)})′ is also needed. After this transformation, the full conditional distribution given ξ is not available analytically. Furthermore, the closed-form expression of θ0(ξ) in (11) for the perturbed density is also no longer available. Thus, this combination of the parameters leads to a less efficient and more computationally extensive implementation of Dr.IDR. Thus, an appropriate combination of the parameters should be carefully selected in order to implement Dr.IDR efficiently.
6. A Case Study: Inequality-Constrained Analysis of Variance
In this section, we consider the inequality-constrained analysis of variance model to demonstrate how to compute the normalizing constant using the GDr estimator. We use the data in Hoijtink et al. [18] for studying amnesia in patients with dissociative identity disorder (DID). We call this data set as the DID data thereafter. We model memory performance score ykj for the kth subject in group j as an independent observation from a normal distribution with mean μj and variance σ2 for k = 1, 2, …, mj and j = 1, …, J. In the DID data, J = 4, m1 = 19, m2 = 25, m3 = 25, and m4 = 25. The sample means of the memory performance scores are 3.105, 13.28, 1.88, and 4.56, respectively, for these four groups. Let denote the observed data. Then, the likelihood function is given by
where μ = (μ1, …, μJ)′. Let Ω denote the parameter space. Following Chen and Kim [19] and Wang et al. [15], we specify the joint prior of (μ, σ2) as
| (22) |
for (μ, σ2) ∈ Ω where a0 > 0 is a scalar parameter and b01 > 0 and b02 > 0 are prespecified hyperparameters.
Following Chen and Kim [19] and Wang et al. [15], we specify b01 = b02 = 0.0001 and a0 = 0.01. We also consider the constrained parameter space: Ω = {μ2 > (μ1, μ4) > μ3, σ2 > 0}. Under this setting, the posterior kernel is given by
| (23) |
where π*(μ, σ2|a0) is defined in (22). We are interested in computing
| (24) |
Under the constrained parameter space, c in (24) is a normalizing constant but it is not the marginal likelihood since . Also, the closed form expression of c in (24) is not available. However, for the unconstrained parameter space ΩU = {(μ, σ2) : −∞ < μj < ∞, j = 1, …, 4, σ2 > 0}, we have
| (25) |
where for j = 1, …, J. Since , cU is the marginal likelihood. Let 1{A} denote the indicator function such that 1{A} = 1 if A is true and 0 if A is false. Observe that
| (26) |
where the posterior expectation is taken with respect to the posterior distribution . From (26), we see that the posterior probability .
Next, we present the detailed formulation of the GDr estimator of c and then use the identity given in (26) to empirically validate the accuracy of the GDr estimator since in this case, the “true” value of c is unknown. We set θ = (μ1, μ4, σ2)′ and ξ = (μ2, μ3). Let ζ = (θ′, ξ′)′. Let denote the posterior mean of θ under the posterior distribution π(ζ|D) with the constrained parameter space. Using (19), we have
| (27) |
where . After some lengthy algebra, we obtain an analytic expression of as follows
where μ10 ∨ μ40 = max{μ10, μ40}, μ10 ∧ μ40 = min{μ10, μ40}, and Φ(·) is the standard normal N(0, 1) cumulative distribution function. For the term in the denominator of (27), we write
We generate one MCMC sample from the posterior π(ζ|D) with the constrained parameters and another MCMC sample from the conditional posterior distribution π(μ2, μ3|μ10, μ40, D). Then, the CMDE of π(μ10, μ40|D) is given by
where
The CMDE of is also available, which is given by
where
For the DID data, we first generate an MCMC sample of size 50,000 from each of the posterior distributions with or without constraints to compute the posterior means, the posterior standard deviations (SDs), and the 95% highest posterior density (HPD) intervals of μ and σ2. The results are shown in Table 3. We see from this table that both sets of the posterior estimates are very close to each other, implying that the hypothesis of μ2 > (μ1, μ4) > μ3 is likely true. Notice that these four means are from the four groups consisting of DID, mimic normal, symptom simulated, and true amnesic subjects. Next, we compute log cU as the closed form expression given in (25) and use the dropping-needle method discussed in Hoijtink et al. [18] to calculate using an MCMC sample {(μ(i), σ2(i)), i = 1, …, n} from the posterior distribution without constraints on μ. We further generate two MCMC samples to estimate log c. Using (27), we have . The results of these Monte Carlo estimates are reported in Table 3. We see from Table 3 that the estimate based on the dropping-needle method is very close to based on the GDr estimate for various MC sample sizes and . The high posterior probability of {ζ ∈ Ω} implies that the constraints on μ are most likely to hold and also provides a further explanation of why the two sets of the posterior estimates of μ and σ2 in Table 3 are very similar. We note that since the GDr estimate requires two MCMC samples, we choose n1 = n2 = n/2 to make fair comparison between two different methods. These empirical results indicate that the “true” value of log(c) is very likely to be −205.645.
Table 3:
Posterior Estimates of μ and σ2 and MC Estimates of the Posterior Probability and the Normalizing Constant for the DID Data
| Parameter | Unconstrained Model | Constrained Model | ||||
|---|---|---|---|---|---|---|
| Posterior Mean | SD | 95% HPD Interval | Posterior Mean | SD | 95% HPD Interval | |
| μ1 | 3.073 | 0.405 | (2.271, 3.856) | 3.085 | 0.393 | (2.326, 3.865) |
| μ2 | 13.148 | 0.351 | (12.465, 13.847) | 13.149 | 0.354 | (12.467, 13.863) |
| μ3 | 1.862 | 0.351 | (1.170, 2.545) | 1.856 | 0.349 | (1.182, 2.550) |
| μ4 | 4.516 | 0.354 | (3.836, 5.229) | 4.516 | 0.351 | (3.825, 5.197) |
| σ2 | 3.147 | 0.469 | (2.282, 4.083) | 3.145 | 0.471 | (2.283, 4.085) |
| Monte Carlo Estimates of EU [1{(μ, σ2) ∈ Ω}|D] and c | ||||||
| log cU | n | n1 = n2 | ||||
| −205.63261 | 50,000 | 0.98810 | 25,000 | −205.64522 | 0.98747 | |
| 200,000 | 0.98781 | 100,000 | −205.64520 | 0.98748 | ||
| 500,000 | 0.98776 | 250,000 | −205.64470 | 0.98799 | ||
Finally, we note that instead of setting θ = (μ1, μ4, σ2)′ and ξ = (μ2, μ3), we can also take θ = (μ2, μ3, σ2)′ and ξ = (μ1, μ4). Let denote the posterior mean of θ under the posterior distribution π(ξ|D) with the constrained parameter space. Then we have
| (28) |
where . Similar to (27), a closed form expression of is available. For the term in the denominator of (28), we write
Again, the CMDEs are available for estimating π(μ20, μ30|D) and . This formulation is as efficient as the one given in (27).
7. Discussion
In this paper, we first examine properties of the IDR estimator and we find that the IDR estimator is most efficient in two dimensions when the distribution is approximately normal. We then develop an extension of the IDR estimator, called the Dr.IDR estimator. The Dr.IDR estimator is more attractive than the IDR estimator since it allows for dimension reduction and it also has a nice connection to marginal posterior density estimation. The GDr estimator is constructed using the identity based on the marginal posterior density. This new identity, which is given in (19), is a natural extension of the identity of Chib [3] based on the full posterior density. Both the Dr.IDR and GDr estimators are potentially useful in computing marginal likelihoods or normalizing constants for the models with high-dimensional parameters or the complex structure such as constraints on the model parameters and the autocorrelation structure in a time-series model or the spatial structure in a spatial-temporal model. Computing the normalizing constant in (24) for the DID data is quite interesting. We use two different approaches for computing this constant and the results are quite comparable. There are other methods, such as the stepping stone approaches of Xie et al. [9] and Fan et al. [10] and the PWK method of Wang et al. [11], that can be used to compute this normalizing constant. These additional comparisons and further investigation of applicability of the Dr.IDR and GDr estimators in high-dimensional problems are interesting future research projects, which are currently under investigation.
Acknowledgments
The authors gratefully thank the Editor in Chief, the Editor, the Associate Editor, and the three anonymous reviewers for their constructive comments and suggestions that help improve the article. This material is based upon work partially supported by the National Science Foundation under Grant No. DEB-1354146. Dr. M.-H. Chen’s research was also partially supported by NIH grants #GM70335 and #P01CA142538.
Footnotes
Publisher's Disclaimer: This Author Accepted Manuscript is a PDF file of an unedited peer-reviewed manuscript that has been accepted for publication but has not been copyedited or corrected. The official version of record that is published in the journal is kept up to date and so may therefore differ from this version.
Contributor Information
Yu-Bo Wang, School of Mathematical and Statistical Sciences, Clemson University.
Ming-Hui Chen, Department of Statistics, University of Connecticut.
Wei Shi, Department of Statistics, University of Connecticut.
Paul Lewis, Department of Ecology and Evolutionary Biology, University of Connecticut.
Lynn Kuo, Department of Statistics, University of Connecticut.
References
- [1].Kahn H, Random sampling Monte Carlo techniques in neutron attenuation problems, Nucleonics 6 (1950) 27–37. [PubMed] [Google Scholar]
- [2].Newton MA, Raftery AE, Approximate Bayesian inference by the weighted likelihood bootstrap, Journal of the Royal Statistical Society, Series B 56 (1994) 3–48. [Google Scholar]
- [3].Chib S, Marginal likelihood from the Gibbs output, Journal of the American Statistical Association 90 (1995) 1313–1321. [Google Scholar]
- [4].Chib S, Jeliazkov I, Marginal likelihood from the Metropolis-Hastings output, Journal of the American Statistical Association 96 (2001) 270–281. [Google Scholar]
- [5].Gelman A, Meng X-L, Simulating normalizing constants: From importance sampling to bridge sampling to path sampling, Statistical Science 13 (1998) 163–185. [Google Scholar]
- [6].Petris G, Tardella L, A geometric approach to transdimensional Markov chain Monte Carlo, The Canadian Journal of Statistics 31(4) (2003) 469–482. [Google Scholar]
- [7].Chen M-H, Computing marginal likelihoods from a single MCMC output, Statistica Neerlandica 59 (2005) 16–29. [Google Scholar]
- [8].Lartillot N, Philippe H, Computing Bayes factors using thermodynamic integration, Systematic Biology 55 (2006) 195–207. [DOI] [PubMed] [Google Scholar]
- [9].Xie W, Lewis PO, Fan Y, Kuo L, Chen M-H, Improving marginal likelihood estimation for Bayesian phylogenetic model selection., Systematic Biology 60 (2011) 150–160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Fan Y, Wu R, Chen M-H, Kuo L, Lewis PO, Choosing among partition models in Bayesian phylogenetics., Molecular Biology and Evolution 28 (2011) 523–532. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Wang Y-B, Chen M-H, Kuo L, Lewis PO, A new Monte Carlo method for estimating marginal likelihoods, Bayesian Analysis 13 (2018) 311–333. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Petris G, Tardella L, New perspectives for estimating normalizing constants via posterior simulation, Technical Report, Universita I di Roma La Sapienza; (2007). [Google Scholar]
- [13].Arima S, Tardella L, Idr for marginal likelihood in Bayesian phylogenetics, in: Chen M-H, Kuo L, Lewis P (Eds.), Bayesian Phylogenetics — Methods, Algorithm, and Applications, CRC Press, 2014, pp. 25–57. [Google Scholar]
- [14].Chen M-H, Importance-weighted marginal Bayesian posterior density estimation, Journal of the American Statistical Association 89 (1994) 818–824. [Google Scholar]
- [15].Wang Y-B, Chen M-H, Kuo L, Lewis PO, Partition weighted approach for estimating the marginal posterior density with applications, Journal of Computational and Graphical Statistics, doi: 10.1080/10618600.2018.1529600 (2019) In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Gelfand AE, Smith AF, Lee T-M, Bayesian analysis of constrained parameter and truncated data problems using Gibbs sampling, Journal of the American Statistical Association 87 (1992) 523–532. [Google Scholar]
- [17].Schmeiser BW, Avramidis TN, Hashem S, Overlapping batch statistics, in: Proceedings of the 22nd Conference on Winter Simulation, IEEE Press, pp. 395–398. [Google Scholar]
- [18].Hoijtink H, Klugkist I, Boelen P, Bayesian Evaluation of Informative Hypotheses, Springer Science & Business Media, 2008. [Google Scholar]
- [19].Chen M-H, Kim S, The Bayes factor versus other model selection criteria for the selection of constrained models, in: Hoijtink H, Klugkist I, Boelen P (Eds.), Bayesian Evaluation of Informative Hypotheses, Springer, 2008, pp. 155–180. [Google Scholar]
