Determining the Effective Sample Size of a Parametric Prior

Satoshi Morita; Peter F Thall; Peter Müller

doi:10.1111/j.1541-0420.2007.00888.x

. Author manuscript; available in PMC: 2011 Apr 25.

Published in final edited form as: Biometrics. 2007 Aug 30;64(2):595–602. doi: 10.1111/j.1541-0420.2007.00888.x

Determining the Effective Sample Size of a Parametric Prior

Satoshi Morita ^1,^*, Peter F Thall ², Peter Müller ²

PMCID: PMC3081791 NIHMSID: NIHMS283134 PMID: 17764481

Summary

We present a definition for the effective sample size of a parametric prior distribution in a Bayesian model, and propose methods for computing the effective sample size in a variety of settings. Our approach first constructs a prior chosen to be vague in a suitable sense, and updates this prior to obtain a sequence of posteriors corresponding to each of a range of sample sizes. We then compute a distance between each posterior and the parametric prior, defined in terms of the curvature of the logarithm of each distribution, and the posterior minimizing the distance defines the effective sample size of the prior. For cases where the distance cannot be computed analytically, we provide a numerical approximation based on Monte Carlo simulation. We provide general guidelines for application, illustrate the method in several standard cases where the answer seems obvious, and then apply it to some nonstandard settings.

Keywords: Bayesian analysis, Computationally intensive methods, Effective sample size, Epsilon-information prior, Parametric prior distribution

1. Introduction

A fundamental question in any Bayesian analysis is the amount of information contained in the prior. For many commonly used models, the answer seems straightforward. For example, it can be argued that a beta(a, b) distribution has effective sample size (ESS) a + b. This is based on the fact that a binomial variable Y from a sample of size n with success probability θ following a beta(a, b) prior implies a beta(a + Y, b + n − Y) posterior. In other words, given a sample of size n, the prior sum a + b becomes the posterior sum a + b + n. Thus, saying that a given beta(a, b) prior has ESS m = a + b requires the implicit reasoning that the beta(a, b) may be identified with a beta(c + Y, d + m − Y) posterior arising from a previous beta(c, d) prior having a very small amount of information. A simple way to formalize this is to set c + d = ε for an arbitrarily small value ε > 0 and solve for m = a + b − (c + d) = a + b − ε.

More generally, one may match a given prior p(θ) with the posterior q_m(θ|Y) arising from an earlier prior q₀(θ) that is chosen to be vague in a suitable sense and that was updated by a sample of size m, and consider m to be the ESS of p(θ). In this general formulation, p(θ), q₀(θ), and q_m(θ|Y) play roles analogous to those of the beta(a, b), beta(c, d), and beta(a + Y, b + n − Y) distributions given above. In some cases one may find the hyperparameters of q_m(θ|Y) as a function of m, compare q_m(θ|Y) with p(θ), and solve for m analytically. For many parametric Bayesian models, however, this analytic approach does not work, and it is not obvious how to determine the ESS of the prior. A simple example is the usual normal linear regression model where the observed response variable Y for predictor X has mean β₀ + β₁X and variance σ², so that θ = (β₀, β₁, σ²). A traditional, technically convenient prior is that (β₀, β₁) is bivariate normal and σ² is inverse chi-squared, with hyperparameters chosen either for computational convenience or by elicitation. In either case, there is no obvious answer to the question of what the ESS of the prior may be. Moreover, for many commonly used choices of q₀(θ), the joint prior p(θ) cannot be matched with q_m(θ|Y) analytically.

Understanding the prior ESS is important when applying Bayesian methods in settings with a small to moderate sample size. For example, when fitting a Bayesian model to a data set of 10 observations, an a priori ESS of 1 is reasonable, whereas a prior ESS of 20 implies that the prior, rather than the data, dominates posterior inferences. If the prior is elicited from a domain expert, then an informative prior is desirable (Chaloner and Rhame, 2001; Garthwaite, Kadane, and O’Hagan, 2005). In contrast, if the prior is only a technically convenient ad hoc choice, as is often the case in practice, then understanding the ESS may prompt the investigator to reconsider the prior choice. Thus, it is important to have a good idea of the prior’s ESS when interpreting one’s inferences. This is especially important from the viewpoint of defending Bayesian methods against the concern that the prior may inappropriately introduce artificial information.

In this article, we present a definition for the ESS of a prior p(θ) in a Bayesian parametric model, and we provide methods for computing the ESS in a wide variety of settings. Our approach relies on the idea of constructing an “ε-information” prior q₀(θ), considering a sample Y of size m and the posterior q_m(θ|Y), and computing a distance between q_m(θ|Y) and p(θ) in terms of the curvature (second derivatives) of log {p(θ)} and log {q_m(θ|Y)}. The value of m minimizing the distance is the prior ESS. For cases where the distance cannot be computed analytically, we provide a numerical approximation based on Monte Carlo simulations from q_m(θ|Y). In cases where θ is multivariate, one may compute multiple ESSs, one associated with each of several subvectors of θ.

Section 2 presents a motivating application and defines ε-information priors and ESS. Computational methods are presented in Section 3. Section 4 gives guidelines for using ESS computations in specific settings. Applications are described in Sections 5 and 6, including discussions of connections between our proposed procedures and related methods given by Spiegelhalter, Freedman, and Parmar (1994), Ibrahim and Chen (2000), Hodges and Sargent (2001), and Spiegelhalter et al. (2002). We close with a brief discussion in Section 7.

2. Effective Sample Size

The following example illustrates why it may be useful to determine the ESS of a prior. We consider a design for a phase I trial to determine an optimal dose combination X = (X₁, X₂) of two cytotoxic agents (Thall et al., 2003). The toxicity probability at X is given by the six-parameter model

π (X, θ) = \frac{α_{1} X_{1}^{β_{1}} + α_{2} X_{2}^{β_{2}} + α_{3} {(X_{1}^{β_{1}} X_{2}^{β_{2}})}^{β_{3}}}{1 + α_{1} X_{1}^{β_{1}} + α_{2} X_{2}^{β_{2}} + α_{3} {(X_{1}^{β_{1}} X_{2}^{β_{2}})}^{β_{3}}},

(1)

where all parameters in θ = (α₁, β₁, α₂, β₂, α₃, β₃) are nonnegative. Under this model, if only agent 1 is administered at dose X₁, with X₂ = 0, as in a single-agent phase I trial, then $π (X, θ) = π_{1} (X_{1}, θ_{1}) = α_{1} X_{1}^{β_{1}} / (1 + α_{1} X_{1}^{β_{1}})$ only depends on X₁ and θ₁ = (α₁, β₁). Similarly, if X₁ = 0 then $π (X, θ) = π_{2} (X_{2}, θ_{2}) = α_{2} X_{2}^{β_{2}} / (1 + α_{2} X_{2}^{β_{2}})$ only depends on X₂ and θ₂ = (α₂, β₂). The parameters θ₃ = (α₃, β₃) characterize interactions that may occur when the two agents are used in combination. The model parameter vector thus is partitioned as θ = (θ₁, θ₂, θ₃). Because phase I trials of combinations generally require that each agent be previously tested alone, it is natural to obtain informative priors on θ₁ and θ₂, but assume a vague prior on θ₃. Denoting by Ga(a, b), the gamma distribution with mean a/b and variance a/b², the elicitation process (Thall et al., 2003, Section 3) yielded the priors α₁ ~ Ga(1.74, 4.07), β₁ ~ Ga(10.24, 1.34) for the effects of agent 1 alone, α₂ ~ Ga(2.32, 5.42), β₂ ~ Ga(15.24, 1.95) for the effects of agent 2 alone, and α₃ ~ Ga(0.33, 0.33), β₃ ~ Ga(0.0008, 0.0167) for the interaction parameters.

Because doses must be selected sequentially in phase I trials based on very small amounts of data, an important question is: Which ESS may be associated with the prior? Our proposed methods (Section 5, below) show that the overall ESS of this prior is m = 1.5. However, informative priors on θ₁ and θ₂ were obtained and a vague prior on θ₃ was desired, it also is important to determine the prior ESS of each subvector. Applying our proposed methods yielded prior ESSs, m₁ = 547.3 for θ₁, m₂ = 756.8 for θ₂, and m₃ = 0.01 for θ₃. The small value for m₃ confirms that the prior on θ₃ reflects little information about the interaction of the two agents. The large numerical discrepancy between m = 1.5 and (m₁, m₂) = (547.3, 756.8) is desirable. It reflects the fact that, for each i = 1, 2, “θ_i” has a very different meaning in the submodel π_i(X_i, θ_i) parameterized by θ_i alone versus its meaning in the full six-parameter model π(X, θ). See, for, example, Berger and Pericchi (2001). From a geometric viewpoint, if π(X, θ) is thought of as a response surface varying as a function of the two-dimensional dose (X₁, X₂), because the edges of the surface correspond to the submodels π₁(X₁, θ₁) where X₂ = 0 and π₂(X₂, θ₂) where X₁ = 0, the large values of m₁ and m₂ indicate that the locations of the edges were well known, whereas the small overall ESS m = 1.5 says that otherwise very little was known about the surface. In practice, one would report m₁, m₂, m₃, and m to the clinician from whom the priors were elicited. The clinician could then judge whether m₁ and m₂ are reasonable characterizations of his/her prior information about the single agents, and compare m to the trial’s sample size. In the motivating application, a trial of gemcitabine and cyclophosphamide for advanced cancer, the large values of m₁ and m₂ were appropriate because there was substantial clinical experience with each single agent, and the small overall ESS also was appropriate because no clinical data on the two agents used together were available and a sample size of 60 patients was planned.

This example illustrates four key features of our proposed method, namely, that (1) ESS is a readily interpretable index of a prior’s informativeness, (2) it may be useful to compute ESSs for both the entire parameter vector and for particular subvectors, (3) ESS values may be used as feedback in the elicitation process, and (4) even when standard distributions are used, it may not be obvious how to define a prior’s ESS.

The intuitive motivation for the following construction is to mimic the rationale, given in Section 1, as to why the ESS of a beta(a, b) equals a + b. As a general Bayesian framework, let f(Y|θ) denote the probability distribution function (pdf) of an s-dimensional random vector Y, and let p(θ|θ̃) be the prior on the parameter vector θ = (θ₁, …, θ_d), where θ̃ denotes the vector of hyperparameters. The likelihood of an independent and identically distributed (i.i.d.) sample Y_m = (Y₁, …, Y_m) is then given by $f_{m} (Y_{m} ∣ θ) = \prod_{i = 1}^{m} f (Y_{i} ∣ θ)$ .

We define an ε-information prior q₀(θ|θ̃₀) by requiring it to have the same mean, E_q₀ (θ) = E_p(θ), and correlations, Corr_q₀ (θ_j, θ_j_′) = Corr_p(θ_j, θ_j_′), j ≠ j′, as p(θ|θ̃), while inflating the variances of the elements of θ so that Var_q₀ (θ_j) ≫ Var_p(θ_j), in such a way that q₀(θ|θ̃₀) has small information but Var_q₀ (θ_j) must exist for j = 1, …, d. Table 1 illustrates how to specify q₀(θ|θ̃₀) for several standard parametric priors. Given the likelihood f_m(Y_m|θ) and ε-information prior q₀(θ|θ̃₀), we denote the posterior by q_m(θ|θ̃₀, Y_m) ∝ q₀(θ|θ̃₀)f_m(Y_m|θ) and the marginal distribution under p(θ|θ̃) by

Table 1.

Examples of ε-information prior distributions. The hyperparameters c, c₁, and c₂ are very large constants chosen to inflate the variances of the elements of θ under q₀.

Distribution

p(θ|θ̃)

q₀(θ|θ̃₀)

Beta

Be(α̃, β̃)

Be(α̃/c, β̃/c)

Gamma

Ga(α̃, β̃)

Ga((α̃/c, β̃/c)

Univariate normal with known variance

N(μ̃, σ̃²)

N(μ̃, cσ̃²)

Scaled inverse χ²

Inv χ²(ν̃, σ̃²)

Inv χ²(4 + c⁻¹, ν̃σ̃²/2(ν̃ − 2))

Normal inverse χ²

N(μ̃, σ̃²/φ̃)* Inv χ²(ν̃, σ̃²)

N(μ̃, cσ̃²/φ̃) * Inv χ² (4 + c⁻¹, ν̃σ̃²/2(ν̃ − 2))

Dirichlet

Dir(α̃₁, α̃₂, α̃₃)

Dir(α̃₁/c, α̃₂/c, α̃₃/c)

Multivariate normal

MVN ({\tilde{μ}}_{1}, {\tilde{μ}}_{2}, {\tilde{σ}}_{1}^{2}, {\tilde{σ}}_{2}^{2}, {\tilde{σ}}_{12})

MVN ({\tilde{μ}}_{1}, {\tilde{μ}}_{2}, c_{1}^{2} {\tilde{σ}}_{1}^{2}, c_{2}^{2} {\tilde{σ}}_{2}^{2}, c_{1} c_{2} {\tilde{σ}}_{12})

Open in a new tab

f_{m} (Y_{m} ∣ \tilde{θ}) = \int f_{m} (Y_{m} ∣ θ) p (θ ∣ \tilde{θ}) d θ .

(2)

When θ̃ is fixed we write f_m(Y_m) for brevity. To define the ESS(s), consider the following three cases based on p(θ|θ̃). For implementation, we find it useful to distinguish between these cases although, formally, cases 1 and 2 are special instances of case 3.

Case 1
d = 1, with p(θ|θ̃) being a univariate parametric model. For this case, we will define one ESS. Examples include the beta, gamma, univariate normal with known variance, and inverse χ² distributions.
Case 2
d ≥ 2 with p(θ|θ̃) being a d-variate parametric model. For this case, we will define one ESS. Examples include the Dirichlet and multivariate normal (MVN) distributions.

The following case deals with settings where it is scientifically appropriate to define two or more ESSs for p(θ|θ̃).
Case 3
d ≥ 2 with p(θ|θ̃) written as a product of K parametric distributions, $p (θ ∣ \tilde{θ}) = \prod_{k = 1}^{K} p_{k} (θ_{k}, ∣ {\tilde{θ}}_{k}, θ_{1}, \dots, θ_{k - 1})$ , where θ = (θ₁, …, θ_K) is partitioned into K subvectors, for 1 < K ≤ d. In this case, a vector of K ESSs, one for each subvector, may be meaningful. An example is a normal inverse χ² distribution where (θ₁, θ₂) = (σ², μ), the variance and mean of a normal sampling model, with p(θ₁, θ₂) = p₁(σ²) p₂(μ|σ²) and σ² ~ Inv χ²(ν̃, σ̃²) and μ|σ² ~ N(μ̃, σ²/φ̃). Here K = d = 2 and the two subvectors of θ are the single parameters σ² and μ. We will discuss other examples in Sections 4 and 5.

To define the distance between p(θ|θ̃) and q_m(θ|θ̃₀, Y_m) in cases 1 and 2, the basic idea is to find the sample size, m, that would be implied by normal approximations of the prior p(θ) and the posterior q_m(θ|θ̃₀, Y_m). This led us to use the second derivatives of the log densities to define the distance. The real validation and justification of our definition, however, comes from comparing the resulting ESS values with the commonly reported ESS in standard settings. We carry out these comparisons in Section 5.

Let θ̄ = E_p(θ) denote the prior mean under p(θ|θ̃). We define

D_{p, j} (θ) = - \frac{\partial^{2} log {p (θ ∣ \tilde{θ})}}{\partial θ_{j}^{2}},

and

D_{q, j} (m, θ, Y_{m}) = - \frac{\partial^{2} log {q_{m} (θ ∣ {\tilde{θ}}_{0}, Y_{m})}}{\partial θ_{j}^{2}}, j = 1, \dots, d .

Denote $D_{p, +} (θ) = \sum_{j = 1}^{d} D_{p, j} (θ)$ and $D_{q, +} (m, θ) = \sum_{j = 1}^{d} \int D_{q, j} (m, θ, Y_{m}) f_{m} (Y_{m}) d Y_{m}$ . We define a distance between p(θ|θ̃) and q_m(θ|θ̃₀, Y_m) for sample size m as the difference of the trace of the two information matrices,

δ (m, \bar{θ}, p, q_{0}) = ∣ D_{p, +} (\bar{θ}) - D_{q, +} (m, \bar{θ}) ∣ .

(3)

That is, we define the distance in terms of the trace of the information matrix (second derivative of the log density) of the prior p(θ|θ̃), and the expected information matrix of the posterior q_m(θ|θ̃₀, Y_m), where the expectation is with respect to the marginal f_m(Y_m). When d = 1, because the “+” subscript is superfluous, we write D_p(θ̄) and D_q(m, θ̄).

Definition 1

The ESS of p(θ|θ̃) with respect to the likelihood f_m(Y_m|θ) is the integer m that minimizes the distance δ(m, θ̄, p, q₀).

Algorithm 1, below, will generalize this to allow noninteger-valued m. An essential point is that the ESS is defined as a property of a prior and likelihood pair, so that, for example, a given prior might have two different ESS values in the context of two different likelihoods.

The definition of the distance (3) involves some arbitrary choices. We chose this definition after an extensive empirical investigation (not shown) of alternative formulations. Instead of evaluating the curvature at the prior mean, one could use the prior mode. Similarly, one could marginalize θ with respect to the prior, averaged over Y_m with respect to f_m(Y_m|θ) rather than the marginal f_m(Y_m), or use the determinant rather than the trace of the information matrix. One also could define δ(.) in terms of Kullback–Liebler divergence, or variances. We investigated all of these alternatives and evaluated the resulting ESS in each of several standard cases, and found that the proposed distance (3) was best at matching the results that are commonly used as ESS values.

For case 3, a more general definition is required. A motivating example is the logistic regression model, logit {π(X, θ)} = β₀ + β₁X₁ + β₂X₂, where d = 3, θ = (β₀, β₁, β₂) and $β_{j} \sim N ({\tilde{μ}}_{j}, {\tilde{σ}}_{j}^{2})$ independently with $\tilde{θ} = ({\tilde{μ}}_{j}, {\tilde{σ}}_{j}^{2}}$ for j = 0, 1, 2. In this case, the subvectors of interest are θ₁ = β₀ and θ₂ = (β₁, β₂), so two ESS values, m₁ and m₂, may be computed. To accommodate case 3, we generalize (3) by defining a set of K subvector-specific distances. Let γ_k be the set of indices of the elements of θ_k, and denote $D_{p, +}^{k} (θ) = \sum_{j \in γ_{k}} D_{p, j} (θ)$ and $D_{q, +}^{k} (m_{k}, θ) = \sum_{j \in γ_{k}} \int D_{q, j} (m_{k}, θ, Y_{m_{k}}) f_{m_{k}} (Y_{m_{k}}) d Y_{m_{k}}$ . For each k = 1, …, K, we define the distance between p_k(θ_k|θ̃_k, θ₁, …, θ_k₋₁) and q_{m_k}(θ_k|θ̃_0,_k, Y_{m_k}, θ₁, …, θ_k₋₁) to be

δ_{k} (m_{k}, \bar{θ}, p, q_{0}) = ∣ D_{p, +}^{k} (\bar{θ}) - D_{q, +}^{k} (m_{k}, \bar{θ}) ∣ .

(4)

Definition 2

Assume p(θ|θ̃) as in case 3. Let m_k = arg min δ_k(m, θ̄, p, q₀). We define (m₁, …, m_K) to be the ESSs for the prior p(θ|θ̃) with respect to the model f_m(Y_m|θ) and the partition θ = (θ₁, …, θ_K).

3. Computational Methods

Let θ̄ = (θ̄₁, …, θ̄_d) denote the prior mean vector. With the following algorithms, we generalize Definitions 1 and 2 to allow noninteger ESS values.

Algorithm 1, for cases 1 and 2

Let M be a positive integer chosen so that, initially, it is reasonable to assume that m ≤ M.

Step 1
Specify q₀(θ|θ̃₀).
Step 2
For each m = 0, …, M, compute δ(m, θ̄, p, q₀).
Step 3
The ESS is the interpolated value of m minimizing δ(m, θ̄, p, q₀).

In practice, step 2 is carried out either analytically or using the simulation-based numerical approximation as described in Section 3.2.

Algorithm 2, for case 3

For each k = 1, …, K, let M_k be a positive integer chosen so that, initially, it is reasonable to assume that m_k ≤ M_k.

Step 1
Specify $q_{0} (θ ∣ {\tilde{θ}}_{0}) = \prod_{k = 1}^{K} q_{0, k} (θ_{k} ∣ {\tilde{θ}}_{0, k}, θ_{1}, \dots, θ_{k - 1})$ .
Step 2
For each k = 1, …, K, and m_k = 0, …, M_k, compute δ_k(m_k, θ̄, p, q₀).
Step 3
The ESS of θ_k is the interpolated value of m_k minimizing δ_k(m_k, θ̄, p, q₀).

If the hyperparameter θ̃ of p(θ|θ̃) includes a degree of freedom (df) parameter ν̃, as with an inverse χ², inverse gamma, inverse Wishart, or t-distribution, then the corresponding hyperparameter of q₀(θ|θ̃₀) is ν̃₀ = ν̃_min + ε, where ν̃_min is the smallest integer that ensures the second moments of θ ~ q₀(θ|θ̃₀) exist and ε > 0 is arbitrarily small. In such cases, we add D_q_,+(ν̃_min, θ̄) − D_q_,+(0, θ̄) to D_q_,+(m, θ̄) and add $D_{q, +}^{k} ({\tilde{ν}}_{min}, \bar{θ}) - D_{q, +}^{k} (0, \bar{θ})$ to $D_{q, +}^{k} (m_{k}, \bar{θ})$ to ensure that ESS > ν̃_min.

For each m = 1, …, M, when ∫ D_q,j(m, θ̄, Y_m) × f_m(Y_m) dY_m cannot be computed analytically, we use the following simulation-based approximation. Given θ̄ = E_p(θ), we first simulate Monte Carlo sample θ⁽¹⁾, …, θ⁽^T⁾ from p(θ|θ̃) for large T, for example, T = 100,000. For each t = 1, …, T, simulate $Y_{1}^{(t)}, \dots, Y_{M}^{(t)}$ from f_M(Y_M|θ^(t)). Use the Monte Carlo average $T^{- 1} \sum_{t = 1}^{T} D_{q, j} (m, \bar{θ}, Y_{m}^{(t)})$ in place of ∫ D_q,j(m, θ̄, Y_m) f_m(Y_m) dY_m. For case 3, the same method is used to evaluate $D_{q, +}^{k} (m_{k}, \bar{θ})$ in (4).

For regression models of Y as a function of a u-dimensional predictor X, we extend Definition 1 by augmenting the regression model with a probability distribution g_m(X_m|ξ) for the covariates and prior r(ξ|ξ̃), usually assuming independence, $g_{m} (X_{m} ∣ ξ) = \prod_{i = 1}^{m} g (X_{i} ∣ ξ)$ . Then we define

f_{m} (Y_{m}) = \int f_{m} (Y_{m} ∣ X_{m}, θ) g_{m} (X_{m} ∣ ξ) f (θ ∣ \tilde{θ}) r (ξ ∣ \tilde{ξ}) d θ d ξ .

In this case, we simulate θ⁽¹⁾, …, θ⁽^T⁾ from p(θ|θ̃) and ξ⁽¹⁾, …, ξ⁽^T⁾ from r(ξ|ξ̃), then simulate each $X_{1}^{(t)}, \dots, X_{M}^{(t)}$ from g_M(X_M|ξ⁽^t⁾), and $Y_{i}^{(t)}$ from $f (Y_{i} ∣ θ^{(t)}, X_{i}^{(t)})$ for each i = 1, …, M, to obtain $(Y_{1}^{(t)}, X_{1}^{(t)}), \dots, (Y_{M}^{(t)}, X_{M}^{(t)})$ . Finally, we compute the Monte Carlo average $T^{- 1} \sum_{t = 1}^{T} D_{q, j} (m, \bar{θ}, Y_{m}^{(t)}, X_{m}^{(t)})$ . For case 3, the same method is used to evaluate $D_{q, +}^{k} (m_{k}, θ)$ in (4).

4. Guidelines for Application

Before illustrating how the above methods for computing ESS may be applied in particular cases, we provide general guidelines for using ESS values in some commonly encountered settings of Bayesian inference.

Prior elicitation. When eliciting a prior from an area expert, ESS values may be provided as a readily interpretable form of feedback. The area expert may use this as a basis to modify his/her judgments, if desired, and this process may be iterated. For example, in the motivating example of Section 2, we would report the ESS values m₁ = 547 and m₂ = 756 to the investigator planning the study. If his/her prior were based on earlier single-agent trials with around 100 patients each, (s)he would be prompted to revise the replies to the prior elicitation questions.
Formalizing uninformative priors. Often an investigator wishes to formalize vague prior information. The ESS can be used to confirm that the chosen prior carries little information, as desired. For example, in the motivating example in Section 2.1, the reported ESS m₃ = 0.01 for the interaction parameter confirms that this prior is vague.
Reviewing others’ analyses. When interpreting or formally reviewing a Bayesian data analysis, the ESS of the analyst’s prior provides a tool for evaluating the reasonableness of the analysis. In particular, if it is claimed that a vague or uninformative prior was used, the ESS provides an objective index to evaluate this claim. If appropriate, one may alert the analyst if a prior appears to be overly informative. Similarly, if an informative prior based on historical data is used in the analysis, reporting the ESS enables the reviewer to verify that the prior data are given appropriate weight.
Sensitivity analyses. In performing a conventional Bayesian sensitivity analysis in which prior parameters are varied and corresponding posterior values of interest are computed, the ESS of each prior may be computed to enhance interpretation of this analysis. The ESS itself may be used as an index of prior informativeness in such a sensitivity analysis.
Designing outcome-adaptive experiments. When formulating a prior as part of a Bayesian model to be used in a sequentially outcome-adaptive experiment, the ESS may be used to calibrate the prior to ensure that the data, rather than the prior, will dominate early decisions during the trial.
Reviewing Bayesian designs. When interpreting or formally reviewing a Bayesian design, such as that given in a clinical trial protocol, the ESS of the prior provides a tool for determining the extent to which the prior may influence the design’s decisions. Currently, an important reservation about using Bayesian inference in a regulatory environment, such as the planning of clinical trial protocols, is the difficulty of evaluating and judging the appropriateness of prior distributions in complex probability models. The ESS provides a useful tool to mitigate such concerns.

5. Validation with Standard Models

We validate the proposed definition of ESS by computing the implied sample sizes in standard models (Table 2) for which commonly reported prior-equivalent sample sizes exist. Following Gelman et al. (2004), we denote Be(α, β), Bin(n, θ), Ga(α, β), Exp(θ), N(μ, σ²), Inv χ²(ν, s²), Dir(α₁, …, α_J), Mn(n, θ₁, …, θ_J), and BeBin(n, α, β) for the beta, binomial, gamma, exponential, normal, scaled inverse χ², Dirichlet, multinomial, and beta-binomial distributions. The corresponding ε-information priors are given in Table 1. For each model in Table 2, the reported ESS matches the obvious choice.

Table 2.

Prior, likelihood, and corresponding posterior q_m with respect to the ε-information prior q₀, and traditionally reported prior effective sample size, ESS, for some common models. In line three, we denote $s^{2} = \sum_{i = 1}^{m} {(Y_{i} - {\tilde{ν}}_{0})}^{2}$ .

p(θ|θ̃)

f(Y_m|θ)

q_m(θ|θ̃, Y_m)

ESS

Be(α̃, β̃)

Bin(n, θ)

Be(c⁻¹α̃ + Y, c⁻¹β̃ + m − Y)

α̃ + β̃

Ga(α̃, β̃)

Exp(θ)

Ga(c⁻¹α̃ + m, c⁻¹β̃ + Σ Y_i)

α̃

Inv χ²(ν̃, σ̃²)

N(0, σ²)

Inv χ^{2} ({\tilde{ν}}_{0} + m, \frac{{\tilde{ν}}_{0} {\tilde{σ}}^{2} + s^{2}}{{\tilde{ν}}_{0} + m})

ν̃

Dir(α̃)

Mn(n, θ)

Dir(c⁻¹α̃ + S)

Σ α̃_j

Open in a new tab

Example 1. Beta/binomial model

$δ (m, \bar{θ}, p, q_{0}) = {(\tilde{α} - 1) {\bar{θ}}^{- 2} + (\tilde{β} - 1) {(1 - \bar{θ})}^{- 2}} - {(\tilde{α} / c + \sum_{Y = 0}^{m} Y f_{m} (Y_{m}) - 1) θ^{- 2} + (\tilde{β} / c + m - \sum_{Y = 0}^{m} Y f_{m} (Y_{m}) - 1) {(1 - \bar{θ})}^{- 2}}$ , where f_m(Y_m) = BeBin(n, α̃, β̃) and θ = E_p(θ) = α̃/(α̃ + β̃). Figure 1 shows a plot of δ(m, θ̄, p, q₀) against m in the case θ̃ = (α̃, β̃) = (3, 7). Using θ ~ Be(3,7), the computed ESS is 10, matching the commonly reported ESS in this case. Analogous plots (not shown) in all other cases examined below are very similar in appearance to Figure 1.

Plot of δ(*m, θ̄, p, q*₀) against m for the beta/binomial model with θ̃ = (*α̃, β̃*) = (3, 7).

Example 2. Gamma/exponential model

δ(m, θ̄, p, q₀) = (α̃ − 1)θ̄⁻² − (α̃;/c + m − 1)θ̄⁻², where θ̄ = α̃/β̃, and the ESS is found analytically to be α̃ as desired.

Example 3. Univariate normal wth known variance

For Y|θ ~ N(θ, σ²) with θ² known and prior θ|θ̃ ~ N(μ̃, σ̃²), so that θ̃ = (μ̃, σ̃²), one may compute analytically D_p(θ) = −∂²log{p(θ|θ̃)}/∂θ² = 1/σ̃², and similarly, D_q(m, θ̄) = m/σ². Thus, δ(m, θ̄, p, q₀) = |1/σ̃² − m/σ²|, so the ESS = σ²/σ̃², the ratio of the known variance in the likelihood to the prior variance of θ. In applying this model to a clinical trial setting where θ is the difference between two treatment effects, Spiegelhalter et al. (1994, Section 3.1.2) propose assuming that σ̃² = σ²/n₀ to obtain a prior that “ … is equivalent to a normalized likelihood arising from a (hypothetical) trial of n₀ patients with an observed value μ̄ of the treatment difference statistic.” Thus, in this case, the two methods for defining prior ESS agree.

Example 4. Inverse χ²/normal model

We find analytically that D_p(θ) = −(σ²)⁻²(ν̃+ 2)/2 + (σ²)⁻³ν̃σ̃², whereas ∫ D_q(m, θ̄, Y_m) f_m(Y_m) dY_m is obtained by simulation. As explained in Section 3, the adjustment factor {D_q(4, θ̄) − D_q (0, θ̄)} is added to D_q(m, θ̄). For θ̃ = (ν̃, σ̃²) = (20, 1), ESS = 20 = ν̃, as desired.

Example 5. Dirichlet/multinomial model

From Table 1, denote θ̃ = (α̃₁, …, α̃_J), θ = (θ₁, …, θ_J) and S = (S₁, …, S_J) with $S_{j} = \sum_{i = 1}^{m} Y_{j i}$ . Compute D_q,j(m, θ) analytically, as with the beta–binomial. For d = 3 and θ̃ = (10, 15, 25), ESS = 50 = Σ α̃_j, as desired.

Example 6. Power priors

Ibrahim and Chen (2000) propose a class of “power priors” based on an initial prior p₀(θ|c₀), a likelihood L(θ|D₀) of historical data D₀, and a scalar prior parameter a₀. The power prior is p(θ|D₀, a₀) ∝ L(θ|D₀)^a₀p₀(θ|c₀), so that a₀ weights the historical data relative to the data that will be obtained in the future. To see how one would compute the ESS of a power prior, consider the beta/binomial model with a beta(1,1) initial prior and D₀ consisting of three successes in 10 historical trials. The power prior is p(θ|D₀, a₀) = p(θ|(3, 10), a₀) ∝ {θ³(1 − θ⁷)}^a₀θ(1 − θ), and it follows easily (case 1) that ESS = a₀10 + 2. More generally, the ESS of p(θ|D₀, a₀) is a_oESS × {L(θ|D₀)} + ESS {p₀(θ|c₀)}, the weighting parameter times the ESS of the historical data likelihood treated as a function of θ plus the ESS of the initial prior.

Hodges and Sargent (2001) derive a formula for the effective degrees of freedom (EDF) of a richly parameterized model, and illustrate this for a balanced one-way normal linear random effects model for Nn observations {Y_ij, i = 1, …, N, j = 1, …, n}, given by the likelihood Y_i₁, …, Y_in|θ_i, σ² ~ i.i.d. N(θ_i, σ²) for each i, and prior θ₁, …, θ_N|μ̃, σ̃² ~ i.i.d. N(μ̃, σ̃²). They show that the EDF for this model is ρ = (nN + φ)/(n + φ), where φ = σ²/σ̃², the ratio of the residual variance and the prior variance. Recall from Example 3 that φ is the ESS of the simple normal model with known variance. In the limiting case with φ → ∞, that is, all θ_i are equal, θ_i = μ, we find ρ = 1. In other words, for large ESS and essentially only one group, Hodges and Sargent report ρ ≈ 1. At the other extreme, for φ → 0, that is, for small ESS and θ_i’s very different from each other, they report ρ ≈ N. However, such comparisons should not be overinterpreted. EDF and ESS are quite different summaries. Formally, the EDF is a function of the sample size n. In contrast, ESS is not a function of n. Rather, it reports an equivalent sample size for the given model.

Using an information-theoretic argument, Spiegelhalter et al. (2002) also derive a measure for the effective number of parameters in complex models, such as generalized linear (mixed effects) models, p_D, defined as the difference between the posterior mean of the deviance and the deviance evaluated at the posterior means of the parameters of interest. But, similar to the EDF ρ, the nature of p_D is different from the proposed ESS. Formally, p_D is a function of the data, whereas the ESS is not.

6. Application to Some Nonstandard Cases

The following examples show how ESS values may be computed in settings where no commonly agreed-upon ESS exists, using the numerical approximations described earlier to obtain δ(m, θ̄, p, q₀).

Example 7. Logistic regression

Thall and Lee (2003) use a logistic regression model to determine a maximum tolerable dose in a phase I clinical trial. Each patient receives one of six doses 100, 200, 300, 400, 500, 600 mg/m², denoted by x₁, …, x₆, with standardized doses $X_{(z)} = log (x_{z}) - 6^{- 1} \sum_{l = 1}^{6} log (x_{l})$ . The outcome variable is the indicator Y_i= 1 if a patient i suffers toxicity, 0 if not. A logistic model π(X_i, θ) = Pr(Y_i = 1|X_i, θ) = logit⁻¹{η(X_i, θ)} with η(X_i, θ) = μ + βX_i is assumed, where logit⁻¹(x) = e^x/(1 + e^x). Hence d = 2, θ = (θ₁, θ₂) = (μ, β), and the likelihood for m patients is

f_{m} (Y_{m} ∣ X_{m}, θ) = \prod_{i = 1}^{m} π {(X_{i}, θ)}^{Y_{i}} {1 - π (X_{i}, θ)}^{1 - Y_{i}} .

Thall and Lee (2003) obtained independent normal priors for μ and β, based on elicited mean π(X, θ) for doses x₂ = 200 and x₅ = 500, and setting σ̃_μ = σ̃_β = 2 based on preliminary sensitivity analyses, which yielded $N ({\tilde{μ}}_{μ}, {\tilde{σ}}_{μ}^{2}) = N (- 0.1313, 2^{2})$ and $N ({\tilde{μ}}_{β}, {\tilde{σ}}_{β}^{2}) = N (2.3980, 2^{2})$ . For this application, Algorithms 1 and 2 may be applied to compute one ESS of p(θ|θ̃) and two ESSs m_μ and m_β of the priors for μ and β, as follows. For step 1, specify $q_{0} (θ ∣ {\tilde{θ}}_{0}) = N ({\tilde{μ}}_{μ}, c {\tilde{σ}}_{μ}^{2}) N ({\tilde{μ}}_{β}, c {\tilde{σ}}_{β}^{2})$ , with c = 10,000. Next, compute $D_{p, 1} (θ) = {({\tilde{σ}}_{μ}^{2})}^{- 1}, D_{p, 2} (θ) = {({\tilde{σ}}_{β}^{2})}^{- 1}, D_{q, 1} (m, θ, Y_{m}, X_{m}) = \sum_{i = 1}^{m} π (X_{i}, θ) {1 - π (X_{i}, θ)}$ , and $D_{q, 2} (m, θ, Y_{m}, X_{m}) = \sum_{i = 1}^{m} X_{i}^{2} π (X_{i}, θ) {1 - π (X_{i}, θ)}$ . Because D_q,₁(m, θ, Y_m, X_m) and D_q,₂(m, θ, Y_m, X_m) depend on X_m but not on Y_m, this simplifies the simulation method given in Section 3.2. We assume a uniform distribution on the six doses for the probability model g(X_i|ξ). Draw $X_{1}^{(t)}, \dots, X_{M}^{(t)}$ independently from {X₍₁₎, …, X₍₆₎} with probability 1/6 each, for t = 1, …, 100,000. Then, using the plug-in vector θ̄ = (μ̄, β̄) = (μ̃_μ, μ̃_β), compute δ(m, θ̄, p, q₀) for each m = 0, …, M, δ₁(m_μ, θ̄, p, q₀) for each m_μ = 0, …, M₁, and δ₂(m_β, θ̄, p, q₀) for each m_β = 0, …, M₂. As shown in Table 3m = 2.3, m_μ = 1.4, and m_β = 6.3.

Because the standardized doses X_i were defined to be centered at 0, one may interpret m_μ as the ESS for the prior on the average effect, and m_β as the ESS for the dose effect. The prior indicates greater knowledge about the effects of the doses than about the average response. Because m = 2.3, after enrolling 3 patients, the information from the likelihood starts to dominate the prior, as desired.

As a sensitivity analysis, Table 3 summarizes corresponding results for ${\tilde{σ}}_{μ}^{2} = {\tilde{σ}}_{β}^{2} = {0.5}^{2}, {1.0}^{2}, {3.0}^{2}, and {5.0}^{2}$ . As a basis for comparison, we also include the ESS at each dose obtained by the crude method of equating the mean and variance of π(X₍_z₎, θ) at each dose to the corresponding values for a beta, E(θ) = α̃/(α̃ + β̃), and Var(θ) = {E(θ)(1 − E(θ))}/(α̃ + β̃ + 1), and solving for α̃ + β̃. We denote by m̄ the average of the ESSs m_X₍₁₎, …, m_X₍₆₎ at the six doses, obtained in this way. The results indicate that the crude method provides smaller estimates of the ESS for σ̃² < 5.0².

Table 3.

Comparison of ESSs computed using the proposed method and the crude method that matches first and second moments to a beta, for the logistic regression model, π(X_i, θ) = Pr(Y_i = 1|X_i, θ) = exp(μ + βX_i)/{1 + exp(μ + βX_i)}, where the priors are $μ - N ({\tilde{μ}}_{μ}, {\tilde{σ}}_{μ}^{2})$ with μ̃_μ = −0.1313 and $β \sim N ({\tilde{μ}}_{β}, {\tilde{σ}}_{β}^{2})$ with μ̃_β = 2.3980

${\tilde{σ}}_{μ}^{2} = {\tilde{σ}}_{β}^{2}$	Proposed method			Crude method
${\tilde{σ}}_{μ}^{2} = {\tilde{σ}}_{β}^{2}$	m	m_μ	m_β	m̄*	m_X₍₁₎	m_X₍₂₎	m_X₍₃₎	m_X₍₄₎	m_X₍₅₎	m_X₍₆₎
0.5²	37.1	22.7	101.3	18.2	23.5	18.2	17.0	16.6	16.8	17.3
1.0²	9.3	5.7	25.3	4.5	4.1	4.7	4.8	4.6	4.4	4.2
2.0²	2.3	1.4	6.3	1.3	1.0	1.4	1.5	1.5	1.3	1.2
3.0²	1.0	0.6	2.8	0.7	0.5	0.8	0.8	0.8	0.7	0.7
5.0²	0.4	0.2	1.0	0.4	0.3	0.4	0.4	0.4	0.4	0.3

Open in a new tab

${\bar{m}}^{*} = 6^{- 1} \sum_{z = 1}^{6} m_{X_{(z)}}$ .

It also is useful to examine how the ESS in this example would vary with a₀ if one wished to reweight the prior by replacing it with a power prior {p(θ|θ̃)}^a₀. Identifying p(θ|θ̃) with L(θ|D₀) in the set-up of Ibrahim and Chen (2000), and considering the additional ESS of an initial prior to be negligible, the ESS may be computed by applying Algorithms 1 and 2 and setting the ε-information prior to be {q₀(θ|θ̃₀)}^a₀. This yields the values summarized in Table 4. These values illustrate, as in Example 6 given earlier, that the power a₀ acts essentially as a multiplier in the ESS domain, aside from the additive ESS of an initial prior.

Table 4.

ESSs for power priors {p(θ|θ̃)}^a₀ based on the prior {p(θ|θ̃)} in the logistic regression example, using hyperparameter values μ̃_μ = −0.1313 and μ̃_β = 2.3980, as in Table 3, with ${\tilde{σ}}_{μ}^{2} = {\tilde{σ}}_{β}^{2} = 4$

a₀	m	m_μ	m_β
0.5	1.2	0.7	3.2
1	2.3	1.4	6.3
2	4.6	2.8	12.6
4	9.3	5.7	25.3

Open in a new tab

Example 8. Two-agent dose–response model

The next example is the one described earlier in Section 2—a design to find acceptable dose combinations of two cytotoxic agents used together in a phase I trial. Recall the definition of π(X, θ) given in equation (1). The likelihood for m patients with toxicity indicators Y_m = (Y₁, …, Y_m) and dose pairs X_m = (X₁, …, X_m) is

f (Y_{m} ∣ X_{m}, θ) = \prod_{i = 1}^{m} π {(X_{i}, θ)}^{Y_{i}} {1 - π (X_{i}, θ)}^{1 - Y_{i}} .

(5)

Based on (5) and the gamma priors given in Section 2, for this case, Algorithm 1 is used to compute one ESS, m, of p(θ|θ̃). The three ESSs m₁, m₂, and m₃ for θ₁, θ₂, and θ₃ can be computed using Algorithm 2. In step 1, with c = 10,000, $q_{0} (θ ∣ {\tilde{θ}}_{0}) = \prod_{k = 1}^{3} Ga ({\tilde{a}}_{k, 1} / c, {\tilde{a}}_{k, 2} / c) \times Ga ({\tilde{b}}_{k, 1} / c, {\tilde{b}}_{k, 2} / c)$ . In step 2, we computed $D_{p, 1} (θ) = ({\tilde{a}}_{1, 1} - 1) α_{1}^{- 2}, \dots, D_{p, 6} (θ) = ({\tilde{b}}_{3, 1} - 1) β_{3}^{- 2}$ analytically. The numerical methods given in Section 3 give δ_k(m_k, θ̄, p, q₀) for k = 1, 2, 3, yielding the values m = 1.5, m₁ = 547.3, m₂ = 756.8, and m₃ = 0.01, as reported earlier.

Example 9: Linear regression

The last example is a linear regression model used to analyze a small data set (Y₁, X₁), …, (Y₁₀, X₁₀), where Y_i is December rainfall and X_i is November rainfall for 10 consecutive years i = 1, …, 10 (Congdon, 2001). The sampling model is Y_i|X_i, thetas; ~ N(μ_i, 1/τ) with μ_i = α + β(X_i − X̄) and τ denoting the precision where X̄ is the sample average of the original predictor, so θ= (θ₁, θ₂, θ₃) = (α, β, τ). Let N(x; m, s) indicate that the random variable x is normally distributed with moments (m, s). In Congdon (2001), an independent prior p(θ) = p₁(θ₁, θ₂|θ̃₁, θ̃₂) · p₂(θ₃|θ̃₃) is assumed, with $p_{1} (θ_{1}, θ_{2}) = N (θ_{1}; {\tilde{μ}}_{α}, {\tilde{σ}}_{α}^{2}) \cdot N (θ_{2}; {\tilde{μ}}_{β}, {\tilde{σ}}_{β}^{2})$ and p₂ = Ga(ã, b̃). Congdon (2001) uses μ̃_α = μ̃_β = 0, ${\tilde{σ}}_{α}^{2} = {\tilde{σ}}_{β}^{2} = 1000$ , ã = b̃ = 0.001. Algorithm 2 was used to compute two ESSs: m₁ for p₁(θ₁, θ₂|θ̃₁, θ̃₂) and m₂ of p₂(θ₃|θ̃₃). The plug-in vector is θ̄ = E_p(θ) = (μ̃_α, μ̃_β, ã/b̃). In step 1, specify $q_{0} (θ ∣ {\tilde{θ}}_{0}) = q_{0, 1} (θ_{1} ∣ {\tilde{θ}}_{0, 1}) q_{0, 1} (θ_{2} ∣ θ_{0, 2}) q_{0, 2} (θ_{3} ∣ {\tilde{θ}}_{0, 3}) = N ({\tilde{μ}}_{α}, c {\tilde{σ}}_{α}^{2}) N ({\tilde{μ}}_{β}, c {\tilde{σ}}_{β}^{2}) \times G a (\tilde{a} / c, \tilde{b} / c)$ , with c = 10,000. In step 2, compute analytically $D_{p, 1} (θ) = {({\tilde{σ}}_{α}^{2})}^{- 1}, D_{p, 2} (θ) = {({\tilde{σ}}_{β}^{2})}^{- 1}$ , D_p,₃(θ) = (ã − 1)τ ⁻², $D_{q, 1} (m_{1}, θ, Y_{m_{1}}, X_{m_{1}}) = {(c {\tilde{σ}}_{α}^{2})}^{- 1} + τ m_{1}$ , and D_q,₃(m₂, θ, Y_m₂, X_m₂ ) = (ã/c − 1)τ⁻² + m₂ τ⁻²/2. For this case, only $D_{q, 2} (m_{1}, θ, Y_{m_{1}}, X_{m_{1}}) = {(c {\tilde{σ}}_{β}^{2})}^{- 1} + τ \sum_{i = 1}^{10} X_{i}^{2}$ depends on X. Following the methods in Section 3, we simulated $X_{1}^{(t)}, \dots, X_{M_{1}}^{(t)} \sim i . i . d . N (0, 1)$ for t = 1, …, 100,000 to obtain m₁ = 0.001 and m₂ = 0.002. We interpret the reported ESSs as evidence of very vague priors. As a sensitivity analysis, we also computed the ESSs of two alternative priors p′(θ|θ̃) = N(0, 100) N(0, 10) Ga(1, 1) and p″(θ|θ̃) = N(0, 1) N(0, 1) Ga(2, 2), which gave m₁ = 0.06 and m₂ = 2.0 for p′(θ|θ̃), and m₁ = 1.0 and m₂ = 4.0 for p″(θ|θ̃).

7. Discussion

The methods proposed in this article are useful in Bayesian analysis, particularly in settings with elicited priors or where the data consist of a relatively small number of observations. By computing ESSs, one may avoid the use of an overly informative prior in the sense that inference is dominated by the prior rather than the data. As noted in our guidelines for application, other uses of ESS values include interpreting or reviewing others’ Bayesian analyses or designs, using the ESS values themselves to perform sensitivity analyses in the prior’s informativeness, and calibrating the parameters of outcome-adaptive Bayesian designs.

Extension of our methods to accommodate hierarchical models is not straightforward. This is a potentially important area for future research, because it would be useful to compute ESS values in such settings. Other potential applications involving more complicated problems include mixture priors synthesizing multiple component priors, or the class of ε-contaminated priors, where ε reflects the amount of uncertainty in the prior information (Greenhouse and Wasserman, 1995).

Supplementary Material

NIHMS283134-supplement-1.pdf^{(73KB, pdf)}

data and code

NIHMS283134-supplement-data_and_code.txt^{(4.9KB, txt)}

Acknowledgments

Satoshi Morita’s work was supported in part by grant H16-TRANS-003 from the Ministry of Health, Labour, and Welfare in Japan. Peter Thall’s work was partially supported by NCI grant RO1 CA 83932. Peter Müller’s work was partially supported by NCI grant R01 CA 075981. We thank the associate editor and the referee for their thoughtful and constructive comments and suggestions.

Footnotes

8. Supplementary Materials

The R program to compute ESS values and instructions for using this program are available under the Paper Information link at the Biometrics website http://www.biometrics.tibs.org.

References

Berger J, Pericchi L. Objective Bayesian methods for model selection: Introduction and comparison (with discussion) In: Lahiri P, editor. Model Selection. Vol. 38. Bethesda, Maryland: 2001. Institute of Mathematical Statistics Lecture Notes, Monograph Series. [Google Scholar]
Chaloner K, Rhame FS. Quantifying and documenting prior beliefs in clinical trials. Statistics in Medicine. 2001;20:581–600. doi: 10.1002/sim.694. [DOI] [PubMed] [Google Scholar]
Congdon P. Bayesian Statistical Modelling. Chichester, U.K: John Wiley and Sons; 2001. [Google Scholar]
Garthwaite PH, Kadane JB, O’Hagan A. Statistical methods for eliciting probability distributions. Journal of the American Statistical Association. 2005;100:680–701. [Google Scholar]
Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. 2. New York: Chapman and Hall/CRC; 2004. [Google Scholar]
Greenhouse J, Wasserman L. Robust Bayesian methods for monitoring clinical trials. Statistics in Medicine. 1995;14:1379–1391. doi: 10.1002/sim.4780141210. [DOI] [PubMed] [Google Scholar]
Hodges JS, Sargent DJ. Counting degrees of freedom in hierarchical and other richly-parameterized models. Biometrika. 2001;88:367–379. doi: 10.1198/TECH.2009.08161. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ibrahim JG, Chen MH. Power prior distributions for regression models. Statistical Science. 2000;15:46–60. [Google Scholar]
Spiegelhalter DJ, Freedman LS, Parmar MKB. Bayesian approaches to randomized trials. Journal of the Royal Statistical Society, Series A. 1994;157:357–416. [Google Scholar]
Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit (with discussion) Journal of the Royal Statistical Society, Series B. 2002;64:583–639. [Google Scholar]
Thall PF, Lee SJ. Practical model-based dose-finding in phase I clinical trials: Methods based on toxicity. International Journal of Gynecological Cancer. 2003;13:251–261. doi: 10.1046/j.1525-1438.2003.13202.x. [DOI] [PubMed] [Google Scholar]
Thall PF, Millikan RE, Mueller P, Lee SJ. Dose-finding with two agents in phase I oncology trials. Biometrics. 2003;59:487–496. doi: 10.1111/1541-0420.00058. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS283134-supplement-1.pdf^{(73KB, pdf)}

data and code

NIHMS283134-supplement-data_and_code.txt^{(4.9KB, txt)}

[R1] Berger J, Pericchi L. Objective Bayesian methods for model selection: Introduction and comparison (with discussion) In: Lahiri P, editor. Model Selection. Vol. 38. Bethesda, Maryland: 2001. Institute of Mathematical Statistics Lecture Notes, Monograph Series. [Google Scholar]

[R2] Chaloner K, Rhame FS. Quantifying and documenting prior beliefs in clinical trials. Statistics in Medicine. 2001;20:581–600. doi: 10.1002/sim.694. [DOI] [PubMed] [Google Scholar]

[R3] Congdon P. Bayesian Statistical Modelling. Chichester, U.K: John Wiley and Sons; 2001. [Google Scholar]

[R4] Garthwaite PH, Kadane JB, O’Hagan A. Statistical methods for eliciting probability distributions. Journal of the American Statistical Association. 2005;100:680–701. [Google Scholar]

[R5] Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. 2. New York: Chapman and Hall/CRC; 2004. [Google Scholar]

[R6] Greenhouse J, Wasserman L. Robust Bayesian methods for monitoring clinical trials. Statistics in Medicine. 1995;14:1379–1391. doi: 10.1002/sim.4780141210. [DOI] [PubMed] [Google Scholar]

[R7] Hodges JS, Sargent DJ. Counting degrees of freedom in hierarchical and other richly-parameterized models. Biometrika. 2001;88:367–379. doi: 10.1198/TECH.2009.08161. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Ibrahim JG, Chen MH. Power prior distributions for regression models. Statistical Science. 2000;15:46–60. [Google Scholar]

[R9] Spiegelhalter DJ, Freedman LS, Parmar MKB. Bayesian approaches to randomized trials. Journal of the Royal Statistical Society, Series A. 1994;157:357–416. [Google Scholar]

[R10] Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit (with discussion) Journal of the Royal Statistical Society, Series B. 2002;64:583–639. [Google Scholar]

[R11] Thall PF, Lee SJ. Practical model-based dose-finding in phase I clinical trials: Methods based on toxicity. International Journal of Gynecological Cancer. 2003;13:251–261. doi: 10.1046/j.1525-1438.2003.13202.x. [DOI] [PubMed] [Google Scholar]

[R12] Thall PF, Millikan RE, Mueller P, Lee SJ. Dose-finding with two agents in phase I oncology trials. Biometrics. 2003;59:487–496. doi: 10.1111/1541-0420.00058. [DOI] [PubMed] [Google Scholar]

PERMALINK

Determining the Effective Sample Size of a Parametric Prior

Satoshi Morita

Peter F Thall

Peter Müller

Summary

1. Introduction

2. Effective Sample Size

Table 1.

Definition 1

Definition 2

3. Computational Methods

Algorithm 1, for cases 1 and 2

Algorithm 2, for case 3

4. Guidelines for Application

5. Validation with Standard Models

Table 2.

Example 1. Beta/binomial model

Figure 1.

Example 2. Gamma/exponential model

Example 3. Univariate normal wth known variance

Example 4. Inverse χ2/normal model

Example 5. Dirichlet/multinomial model

Example 6. Power priors

6. Application to Some Nonstandard Cases

Example 7. Logistic regression

Table 3.

Table 4.

Example 8. Two-agent dose–response model

Example 9: Linear regression

7. Discussion

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Example 4. Inverse χ²/normal model