Prior Effective Sample Size in Conditionally Independent Hierarchical Models

Satoshi Morita; Peter F Thall; Peter Müller

doi:10.1214/12-BA720

. Author manuscript; available in PMC: 2013 Oct 28.

Published in final edited form as: Bayesian Anal. 2012 Sep;7(3):10.1214/12-BA720. doi: 10.1214/12-BA720

Prior Effective Sample Size in Conditionally Independent Hierarchical Models

Satoshi Morita ¹, Peter F Thall ², Peter Müller ²

PMCID: PMC3810292 NIHMSID: NIHMS498566 PMID: 24175005

Abstract

Prior effective sample size (ESS) of a Bayesian parametric model was defined by Morita, et al. (2008, Biometrics, 64, 595–602). Starting with an ε-information prior defined to have the same means and correlations as the prior but to be vague in a suitable sense, the ESS is the required sample size to obtain a hypothetical posterior very close to the prior. In this paper, we present two alternative definitions for the prior ESS that are suitable for a conditionally independent hierarchical model. The two definitions focus on either the first level prior or second level prior. The proposed methods are applied to important examples to verify that each of the two types of prior ESS matches the intuitively obvious answer where it exists. We illustrate the method with applications to several motivating examples, including a single-arm clinical trial to evaluate treatment response probabilities across different disease subtypes, a dose-finding trial based on toxicity in this setting, and a multicenter randomized trial of treatments for affective disorders.

Keywords: Bayesian hierarchical model, Conditionally independent hierarchical model, Computationally intensive methods, Effective sample size, Epsilon-information prior

1 Introduction

Recently, a definition for the effective sample size (ESS) of a given prior π(θ) with respect to a sampling model p(Y | θ) was proposed by Morita, Thall, and Müller (2008) (MTM). The ESS provides an easily interpretable index of the informativeness of a prior with respect to a given likelihood. The approach is to first define an ε-information prior π₀(θ) having the same means and correlations as π(θ) but being vague in a suitable sense, and then define the ESS to be the sample size n of hypothetical outcomes Y_n = (Y₁, ···Y_n) that, starting with π₀(θ), yields a hypothetical posterior π_n(θ | Y_n) very close to π(θ). MTM define the distance between π(θ) and π_n(θ | Y_n) in terms of the trace of the negative second derivative matrices of log{π(θ)} and log{π_n(θ | Y_n)}. The ESS is defined as the interpolated value of n that minimizes this “prior-to-posterior” distance. While this definition is suitable for a wide range of models and applications, it fails for hierarchical models.

In this paper, we propose two extensions of the definition for prior ESS that are applicable to two-stage conditionally independent hierarchical models (CIHMs) (Kass and Steffey, 1989). Our approach is pragmatic in that the ESS can always be evaluated either analytically or by using a simulation-based approach. We validate the definitions by verifying that they match the intuitively obvious answers in important special cases where such answers exist. We focus on the class of CIHMs due to their practical importance in settings where data are collected from exchangeable subgroups, such as study centers, schools, cities, etc. Important areas of application include meta-analysis (cf. Berry and Stangl, 1996; Berlin and Colditz, 1999), and clinical trial design (cf. Thall et al., 2003).

Moreover, we restrict attention to CIHMs as the most commonly used versions of hierarchical models. For more complex hierarchical models, one might report prior ESS values for appropriate sub-models, although it might become less meaningful to report an overall ESS.

A two-level CIHM for K sub groups is defined as follows. Let Y_k = (Y_k,₁, …, Y_{k,n_k}) denote the vector of outcomes for sub-group k and let Inline graphic = (Y₁,…, Y_K), with the K vectors assumed to be distributed independently conditional on hyperparameters. We use f(·) generically to indicate the sampling model of observable data, which may be a single variable Y_k,j, a vector Y_k, or the vector of all n₁ +···+n_K observations. The nature of the argument will clarify the specific meaning of f(·). In the first level, Y_k follows distribution f(Y_k | θ_k). In the second level, the subgroup-specific parameters θ=(θ₁, ···, θ_K) are assumed to be i.i.d. with prior π₁(θ_k | θ̃), where the hyperparameter θ̃ has a hyperprior π₂(θ̃ | φ) with known φ. The model is summarized in equation (1).

Sampling model f (Y_{M} ∣ θ) = \prod_{k = 1}^{K} f (Y_{k} ∣ θ_{k}) (Level 1) Prior π_{1} (θ ∣ \tilde{θ}) = \prod_{k = 1}^{K} π_{1} (θ_{k} ∣ \tilde{θ}) (Level 2) Hyperprior π_{2} (\tilde{θ} ∣ φ) .

(1)

A common example of a CIHM (1) is a conjugate normal/inverse χ²-normal-normal model. Let Inv-χ²(ν, S) denote a scaled inverse χ² distribution with ν degrees of freedom, mean $\frac{ν S}{ν - 2}$ for ν > 2, and variance $\frac{2 ν^{2} S^{2}}{{(ν - 2)}^{2} (ν - 4)}$ for ν > 4. This model has a normal sampling distribution Y_k,i | θ_k ~ N(θ_k,σ²) with known σ², where N(μ, σ²) denotes a normal distribution with mean μ, and variance σ². Independent conjugate normal priors θ_k | μ̃, γ̃² ~ N(μ̃, γ̃²) on the location parameters θ₁, ···, θ_K are assumed, with a normal/inverse χ² hyperprior μ̃ | μ_φ, $\tilde{μ} ∣ μ_{φ}, τ_{φ}^{2} ~ N (μ_{φ}, τ_{φ}^{2})$ and γ̃² | ν_φ, S_φ ~ Inv-χ²(ν_φ, S_φ). Once the methodology is established, we will explain how to compute ESS when σ² is not assumed to be known but rather is random with its own prior.

To compute an ESS under a CIHM, we will consider the following two cases, which address different inferential objectives. In case 1, the target is the marginalized prior,

π_{12} (θ ∣ φ) = \int π_{1} (θ ∣ \tilde{θ}) π_{2} (\tilde{θ} ∣ φ) d \tilde{θ} .

(2)

An example is a setting where θ₁,…,θ_K are treatment effects in K different disease subtypes and they are the parameters of primary interest. In case 2, the target prior is

π_{2} (\tilde{θ} ∣ φ) .

(3)

For example, this would arise if an overall treatment effect θ̃ obtained by averaging over K clinical centers in a multi-center clinical trial is the parameter of primary interest. We propose two definitions for the ESS under a CIHM, one for each case, allowing the possibility that both types of ESS may be of interest in a given analysis. For later reference we define the marginal likelihood

f_{1} (Y_{k} ∣ \tilde{θ}) = \int f (Y_{k} ∣ θ_{k}) π_{1} (θ_{k} ∣ \tilde{θ}) d θ_{k}

(4)

by integrating with respect to the level I prior.

Section 2 presents motivating examples. We briefly summarize the MTM formulation in Section 3. The two definitions of ESS in CIHMs and accompanying computational methods are given in Section 3. In Section 4 we compute the ESS for the three motivating examples. In Section 5 we discuss some standard CIHMs, and we close with a brief discussion in Section 6.

2 Motivating examples

2.1 A Single-Arm Sarcoma Trial

Thall et al. (2003) present a design for a single-arm phase II trial to examine the efficacy of the targeted drug imatinib for sarcoma, a disease with many subtypes. Since sarcomas are uncommon, the goal was to construct a design that allowed the efficacy of imatinib to be evaluated in K = 10 sarcoma subtypes. This was achieved by assuming the following CIHM, where the treatment effects differ across subtypes. The parameters of primary interest were the subtype-specific tumor response probabilities, ξ₁, ···, ξ₁₀. Let Ga(a_φ, b_φ) denote a gamma distribution with mean a_φ/b_φ and variance $a_{φ} / b_{φ}^{2}$ . Denoting θ_k = log{ξ_k/(1 − ξ_k)}, it was assumed that θ₁, …, θ₁₀ were i.i.d. N(μ̃, γ̃⁻¹) and that μ̃, and the precision parameter γ̃ followed independent normal and gamma hyperpriors, respectively. Elicitation of prior probabilities characterizing association between pairs of ξ_k’s yielded the hyperpriors μ̃~ N(−1.386,10) and γ̃ ~ Ga(2, 20), so that E(γ̃) = .10 and var(γ̃) = .005. In summary, the trial design assumed the following model:

\begin{array}{l} Sampling model & Y_{k, m} ∣ θ_{k} & ~ & Bin (m, ξ_{k}) indep . for all k \\ Prior & θ_{k} ∣ \tilde{μ}, \tilde{γ} & ~ & N (\tilde{μ}, {\tilde{γ}}^{- 1}) i . i . d . for all k \\ Hyperpriors & \tilde{μ} & ~ & N (- 1.386, 10) \\ \tilde{γ} & ~ & Ga (2, 20) . \end{array}

(5)

Thall et al. (2003) used the marginal posterior probability Pr(ξ_k > 0.30 | Inline graphic ) to define an early stopping criterion in disease subtype k, which was computed based on the posterior π(θ | ) under (5). Thus, 10 stopping rules were applied, one for each subtype. Note that included the data from all ten subtypes in order to exploit the association among the θ_k’s induced by the hierarchical model. This rule was first applied after observing a minimum of eight patients in each disease subtype, and subsequently at sample sizes of 17, 23, and 30 patients. Thus, an overly informative prior, for example, with a prior ESS ≥ 40 might be considered inappropriate since the prior, rather than patient response data, would dominate early termination decisions. Because both a prior and hyperprior were specified, the methods in MTM are not applicable, and it is not obvious how to determine the ESS of this model. We will show below that the ESS can be determined for this model using an approach that is coherent in the sense that it gives the intuitively obvious answer in cases where the ESS exists.

2.2 A CRM Dose-finding Trial for Multiple Patient Subgroups

As an extension of the HM in (5) we consider a phase I dose-finding trial with multiple patient subgroups using a model-based design. We assume an implementation that generalizes the continual reassessment method (O’Quigley et al, 1990). Suppose that there are K = 4 subgroups (k = 1,…, 4) with population proportions (.40, .30, .20, .10). Each patient in each subgroup receives one of six doses, 100, 200, 300, 400, 500, 600, denoted by d₁,…,d₆, with standardized doses $x_{z} = log (d_{z}) - 1 / 6 \sum_{l = 1}^{6} log (d_{l})$ . The outcome variable is the indicator Y_k,i = 1 if the i^th patient in subgroup k suffers toxicity, 0 if not. The probability of toxicity in subgroup k under dose x_i is denoted by π_k(x_i,α_k,β_k) = Pr(Y_k,i = 1 | x_i,α_k,/β_k) with logit (π_k(x_i, α_k,/β_k)} = α_k + β_kx_i, for k = 1,2,3,4. We have a CRM-type goal of finding the “optimal” dose $x_{k}^{*}$ in each subgroup k. Optimal is defined as the posterior mean of $π_{k} (x_{k}^{*})$ being closest to some fixed target π^*. The maximum sample size is 36, with the cohort size of 1, starting at the lowest dose d₁, and not skipping a dose level when escalating, with target toxicity probability π^* = .30. The parameters of primary interest are θ_k = (α_k, β_k), k = 1,2, 3, 4. It is assumed that α₁, …, α₄ and β₁,…, β₄ are i.i.d. $N ({\tilde{μ}}_{α}, {\tilde{σ}}_{α}^{2})$ and $N ({\tilde{μ}}_{β}, {\tilde{σ}}_{β}^{2})$ , respectively, and that (μ̃_α and μ̃_β follow independent normal hyperpriors. For the variance hyperparameters ${\tilde{σ}}_{α}^{2}$ and ${\tilde{σ}}_{β}^{2}$ , following Gelman (2006, Section 4.3) we assume that σ̃_α and σ̃_β are uniform on [0, U_φ]. Denoting the dose assigned to the i^th patient by x_[_i_], in summary we assume

\begin{array}{l} Sampling model & Y_{k, i} ∣ θ_{k}, x_{[i]} & ~ & Bernoulli (π_{k} (x_{[i]}, θ_{k})) indep . for all k \\ Prior & α_{k} ∣ {\tilde{μ}}_{α}, {\tilde{σ}}_{α}^{2} & ~ & N ({\tilde{μ}}_{α}, {\tilde{σ}}_{α}^{2}) i . i . d . for all k \\ β_{k} ∣ {\tilde{μ}}_{β}, {\tilde{σ}}_{β}^{2} & ~ & N ({\tilde{μ}}_{β}, {\tilde{σ}}_{β}^{2}) i . i . d . for all k \\ Hyperpriors & {\tilde{μ}}_{α} ∣ μ_{α, φ}, σ_{α, φ}^{2} & ~ & N (μ_{α, φ}, σ_{α, φ}^{2}) \\ {\tilde{μ}}_{β} ∣ μ_{β, φ}, σ_{β, φ}^{2} & ~ & N (μ_{β, φ}, σ_{β, φ}^{2}) \\ {\tilde{σ}}_{α}, {\tilde{σ}}_{β} ∣ U_{φ} & ~ & U (0, U_{φ}) . \end{array}

(6)

We will later, in Section 3.4, discuss how the ESS in this example depends not only on the assumed probability model and hyperparameters, but also on design choices like the adaptive dose-finding algorithm, the population proportions, etc. Note that we assume that (μ̃_α, ${\tilde{σ}}_{α}^{2}$ ) and (μ̃_β, ${\tilde{σ}}_{β}^{2}$ ) are independent, in order to have a reasonably parsimonious model. We can use elicited information to solve for the hyperprior means μ_α_,_φ and μ_β_,_φ, as follows. Given the standardized doses, those hyperprior means are calculated based on the elicited values E{π_k (x₂ = −.403)} = .25 at the second lowest dose and E{π_k(x₅ = .513)} = .75 at the second highest dose for all the subgroups. These give μ_α,φ= −0.131 and μ_β,φ= 2.398. We will evaluate the ESS under several combinations of ( $σ_{α, φ}^{2}, σ_{β, φ}^{2}$ , U_φ) in a sensitivity analysis.

2.3 A Multicenter Randomized Trial

When analyzing data from a multicenter trial, it often is important to examine the inter-center variability of treatment effects, i.e., treatment-by-center interaction (Gray, 1994), since substantial variation among treatment effects across centers may cause a regulatory agency to question the generalizability of results obtained from such a trial before giving approval for a new therapy. As a third example, we consider a multicenter randomized clinical trial reported by Stangl (1996). The trial was carried out to examine the inter-center variability of the effect of imipramine hydrochloride for preventing the recurrence of depression. The primary outcome was time to the first recurrence of a depressive episode, denoted by T_jk_,1, …,T_{jk,n_jk} for n_jk patients receiving treatment j at the k^th center, for j = 1,2 and k = 1,…, K. A total of 150 patients were enrolled in K = 5 centers. For each (j, k), the recurrence times T_jk_,1, …, T_{jk,n_jk} were assumed to be i.i.d. exponentially distributed with recurrence rate θ_jk. Working with the transformed parameters ζ_k = log(θ₁_k/θ₂_k) and η_k = log(θ₂_k)> the priors were assumed to be $ζ_{k} ~ i . i . d . N ({\tilde{μ}}_{ζ}, {\tilde{σ}}_{ζ}^{2})$ and $η_{k} ~ i . i . d . N ({\tilde{μ}}_{η}, {\tilde{σ}}_{η}^{2})$ . The hyperparameter ${\tilde{σ}}_{ζ}^{2}$ of the inter-center heterogeneity of the treatment effect log ratios is of primary interest in this example, while ${\tilde{σ}}_{η}^{2}$ represents the inter-center heterogeneity in the effect of the control treatment arm. Lognormal hyperpriors were assumed with ${\tilde{σ}}_{ζ} ~ LN (m_{φ}, s_{φ}^{2})$ , and σ_η ~ LN(−0.22,1), where LN(μ, σ²) denotes the lognormal distribution of e^X for X ~ N(μ, σ²). The model is summarized as follows:

\begin{array}{l} Sampling model & T_{j k, i} ∣ θ_{j k} & ~ & Exp (θ_{j k}) indep . for j = 1, 2, and all k \\ Priors & ζ_{k} ∣ {\tilde{μ}}_{ζ}, {\tilde{σ}}_{ζ}^{2} & ~ & N ({\tilde{μ}}_{ζ}, {\tilde{σ}}_{ζ}^{2}) i . i . d . for all k \\ η_{k} ∣ {\tilde{μ}}_{η}, {\tilde{σ}}_{η}^{2} & ~ & N ({\tilde{μ}}_{η}, {\tilde{σ}}_{η}^{2}) i . i . d . for all k \\ Hyperpriors & {\tilde{σ}}_{ζ} ∣ m_{φ}, s_{φ}^{2} & ~ & LN (m_{φ}, s_{φ}^{2}), \\ {\tilde{σ}}_{η} & ~ & LN (- 0.22, 1^{2}), \\ {\tilde{μ}}_{ζ}, {\tilde{μ}}_{η} & ~ & U (- 20, 20) . \end{array}

(7)

Stangl assumed two alternative sets of hyperparameters (m_φ, $s_{φ}^{2}$ ), to represent two types of prior belief on ${\tilde{σ}}_{ζ}^{2}$ in a Bayesian sensitivity analysis. The first choice was $(m_{φ}, s_{φ}^{2}) = (- 1.61, {0.50}^{2})$ , which places substantial prior belief on smaller ${\tilde{σ}}_{ζ}^{2}$ , and the second was $(m_{φ}, s_{φ}^{2}) = (0, {0.50}^{2})$ , which places prior weight on larger ${\tilde{σ}}_{ζ}^{2}$ . We will evaluate the ESS of each prior on ${\tilde{σ}}_{ζ}^{2}$ under case 2 of our proposed methods.

Event time data, like the recurrence time, often includes extensive censoring. In the presence of censoring, the amount of information, and thus the ESS, depends on the number of observed events in addition to the sample size. The ESS computation in this example, therefore, needs to account for censoring cases which can occur depending on study duration. We will discuss details of the ESS computation in the presence of censoring and other relevant design details in Section 3.4.

In any CIHM, the prior choice is subject to two competing desiderata. On one hand, an informative hyperprior that expresses a belief of strong association among the θ_k’s is needed to borrow strength across subpopulations. In some settings, it is appropriate to use an informative prior that reflects accurate and comprehensive prior knowledge. If the hyperprior is elicited from an area expert, then the ESS provides an easily understood numerical value that the expert may use, if desired, to modify his/her original elicited values. On the other hand, in some settings it is necessary to avoid excessively informative priors that may compromise the objectivity of one’s conclusions. In practice, many arbitrary choices are made for technical convenience while formulating a model. A skeptical reviewer may like to quantify the prior information as being equivalent to a certain number of hypothetical observations, i.e., a prior ESS. Such a summary immediately allows a reader to judge the relative contributions of the prior and the data to the final conclusion.

3 Effective sample size in CIHMs

3.1 Prior Effective Sample Size in Non-hierarchical Models

In this subsection, we review and formalize the heuristic definition of ESS for non-hierarchical models given by MTM. We give formal definitions of the ε-information prior and a prior-to-posterior distance. The intuitive motivation for MTM’s method is to mimic the rationale for why the ESS of a beta distribution, Be(a,b), equals a + b. A binomial variable Y with binomial sample size n and success probability θ following a Be(a, b) prior implies a Be(a + Y, b + n − Y) posterior. Thus, saying that a given Be(a, b) prior has ESS m = a + b requires the implicit reasoning that the Be(a, b) may be identified with a Be(c + Y, d + m − Y) posterior arising from a previous Be(c, d) prior having a very small amount of information. A simple way to formalize this is to set c + d, = ε for an arbitrarily small value ε > 0 and solve for m = a + b − (c + d) = a + b − ε.

In a generic, non-hierarchical model, let f(Y | θ) denote the distribution of a random, possibly vector-valued outcome Y and π(θ | θ̃) the prior on the parameter vector θ = (θ₁, …, θ_d), with hyperparameters θ̃. The definition of the ESS of π(θ | θ̃) given f(Y | θ) requires the notions of an ε-information prior and the distance between the prior and a hypothetical posterior corresponding to a sample of a given size used to update an ε-information prior. The following definition formalizes the heuristic definition given by MTM. Denote $σ_{π, j}^{2} = {var}_{π} (θ_{j})$ .

Definition of epsilon-information prior

Let π(θ | θ̃) be a prior with $σ_{π, j}^{2} < {\bar{V}}_{j} (\tilde{θ}) \leq \infty$ , where V̄_j(θ̃) is a fixed bound, for all j = 1,…, d. Given arbitrarily small ε > 0. the prior π₀(θ | θ̃₀) defined on the same domain and in the same parametric family as π(θ | θ̃) is an ε-information prior if E_π₀(θ) = E_π(θ), Corr_π₀(θ_j, θ_j′) = Corr_π(θ_j, θ_j′) for all j ≠ j′ and

{\frac{{\bar{V}}_{j} (\tilde{θ}) - σ_{π_{0}, j}^{2}}{{\bar{V}}_{j} (\tilde{θ}) - σ_{π, j}^{2}}} \frac{σ_{π, j}^{2}}{σ_{π_{0}, j}^{2}} < ε for j = 1, \dots, d .

(8)

In the case V̄_j(θ̃) = ∞, condition (8) reduces to $σ_{π, j}^{2} / σ_{π_{0}, j}^{2} < ε$ . Thus, in this case one may simply choose θ̃₀, subject to the constraints on the means and correlations, so that $σ_{π_{0}, j}^{2}$ is large enough to ensure that this inequality holds. When V̄_j(θ̃) < ∞, note that (8) can be written in the form ${{\bar{V}}_{j} (\tilde{θ}) / σ_{π_{0}, j}^{2} - 1} < ε {{\bar{V}}_{j} (\tilde{θ}) / σ_{π, j}^{2} - 1}$ . Thus, in this case one should choose θ̃₀ so that $σ_{π_{0}, j}^{2}$ is sufficiently close to V̄_j(θ̃) from below to ensure this inequality For example, if π(θ | θ̃) is a beta distribution, Be(α̃,β̃), then $σ_{π}^{2} = μ (1 - μ) / (\tilde{α} + \tilde{β} + 1)$ and V̄(θ̃) = μ(1 − μ), where μ = α̃/(α̃ + β̃). One may construct π₀(θ | θ̃₀)as a Be(α̃/c, β̃/c) by choosing c > 0 large enough so that 1/c < ε, since this implies that the above inequality holds.

Alternatively, a proper non-informative prior could be considered for π₀(θ | θ̃₀). However, this might not be appropriate for defining ESS. For example, the Jeffreys prior for a binomial success probability is $Be (\frac{1}{2}, \frac{1}{2})$ , which gives ESS values that are too large by an additive factor of 1 due to the implied sample size of 1.

To see how the definition works for θ of dimension > 1, consider a bivariate normal distribution with mean μ̃ = (μ̃₁, μ̃₂)′, variances ${\tilde{σ}}_{1}^{2}, {\tilde{σ}}_{2}^{2}$ and covariance σ̃₁₂. An ε-information prior is specified by using the same means μ̃ but variances $c_{1}^{2} {\tilde{σ}}_{1}^{2}$ and $c_{2}^{2} {\tilde{σ}}_{2}^{2}$ and covariance c₁c₂<σ̃₁₂ for arbitrarily large c₁ > 0 and c₂ > 0. As a practical guideline, we suggest choosing c large enough so that a further increase would not change the ESS by more than 0.1.

Given the likelihood $f (Y_{n} ∣ θ) = \prod_{i = 1}^{n} f (Y_{i} ∣ θ)$ of a i.i.d. sample Y_n = (Y₁, …, Y_n) and ε-information prior π₀(θ | θ̃₀), the posterior is

π_{n} (θ ∣ {\tilde{θ}}_{0}, Y_{n}) \propto π_{0} (θ ∣ {\tilde{θ}}_{0}) f (Y_{n} ∣ θ) .

MTM define a distance between the prior π(θ | θ̃) and π_n(θ | θ̃₀, Y_n) obtained from a hypothetical sample Y_n of size n starting with π₀. To do this, MTM use the prior variance $σ_{π, j}^{2}$ and the average posterior variance $σ_{π_{0}, j, n}^{2}$ under the ε-information prior. The average is with respect to the prior predictive distribution f(Y_n | θ̃) = ∫ f(Y_n | θ) π(θ | θ̃)dθ. Let $σ_{π_{0}, j, n}^{2} (Y_{n})$ denote the posterior variance of θ_j conditional on Y_n, under the ε-information prior π₀. Then $σ_{π_{0}, j, n}^{2} = \int σ_{π_{0}, j, n}^{2} (Y_{n}) d f (Y_{n} ∣ \tilde{θ})$ . In cases where these variances cannot be computed analytically, we use approximations, denoted by ${\hat{σ}}_{π, j}^{- 2}$ and ${\hat{σ}}_{π_{0}, j, n}^{- 2}$ . For example, one could use the negative second derivatives of the log distributions, as in MTM. We define the distance between π and π_0,_n to be

Δ (n, π, π_{0}) = \sum_{j = 1}^{d} ∣ σ_{π, j}^{- 2} - σ_{π_{0}, j, n}^{- 2} ∣ .

(9)

When interest focuses on only one of the parameters, say θ_j, then we use $∣ σ_{π, j}^{- 2} - σ_{π_{0}, j, n}^{- 2} ∣$ to define a distance between the marginal prior on θ_j versus the marginal posterior on θ_j under the ε-information prior and a sample of size n.

Definition of ESS in the non-CIHM case

The effective sample size (ESS) of a parametric prior π(θ | θ̃) for which $σ_{π, j}^{2} < \infty$ for all j = 1, …, d, with respect to the likelihood f(Y_n | θ) is the interpolated integer n that minimizes the distance Δ(n, π, π₀).

Computation of Δ(n, π, π₀) is carried out either analytically or using a simulation-based numerical approximation.

3.2 Conditionally independent hierarchical models

We extend the definition of ESS to accommodate two-level CIHMs in a balanced study design with K subgroups each having sample size m, i.e., Y_k = (Y_k_,1, ···, Y_k_,_m) for each k = 1, ···, K and the total sample size of Inline graphic = (Y₁, …, Y_K) is M = m × K. To accommodate the hierarchial structure, we propose the following two alternative definitions of ESS for CIHMs. Recall the discussion following (1). Under a CIHM interest may focus on θ₁, …, θ_K (case 1) or on θ̃ (case 2).

Case 1

When θ = (θ₁, …, θ_K) are the parameters of primary interest, the ESS is a function of the target marginal prior π₁₂(θ | φ) and the sampling model f(·). In this case, we constructively define an ε-information prior, π_12,0(θ | φ), as follows. First, specify an ε-information prior π_1,0(θ | θ̃) of π₁(θ | θ̃), as described in Section 3.1, and use this to define π_12,0(θ | φ) = ∫ π_1,0(θ | θ̃) π₂(θ̃ | φ)dθ̃. A proof that π_12,0 defined in this way is in fact an ε-information prior is given in the appendix.

An alternative way to define an ε-information prior for case 1 could be to first specify π₁₂(θ | φ) and then compute π_12,0. However, this approach is tractable only if π₁₂(θ | φ) can be specified analytically. Consequently, we do not use this alternative approach.

Given the likelihood $f (Y_{M} ∣ θ) = \prod_{k = 1}^{K} f (Y_{k} ∣ θ_{k})$ and π_12,0(θ | φ), we denote the hypothetical posterior by π_12,_M(θ | φ, Inline graphic ) ∝ π_12,0(θ | φ)f( | θ). Denote θ̄₁₂ = E_π₁₂(θ | φ), the prior mean of θ under π₁₂(θ | φ) and let = E( | θ̄₁₂) denote the prior mean of under θ̄₁₂. We will later discuss how to proceed in problems where is not meaningfully defined, due to censoring, the use of covariates, and other reasons. Let Σ_π₁₂ denote the marginal variance-covariance matrix of θ under π₁₂, and similarly for Σ_{π_12,M} ( Inline graphic ) under π_12,_M(θ | φ, ). Often exact evaluation is not possible and approximations Σ̂_π₁₂ and Σ̂_{π_12,M} must be used. For example, one could use the negative inverse Hessian matrices of log(π₁₂) and log(π_12,_M) evaluated at θ̄₁₂. Similarly to Δ(·) in equation (9) for a non-hierarchical model, in case 1 of a CIHM we define the distance between π₁₂(θ | φ) and π_12,_M(θ | φ, Inline graphic ) to be

Δ_{1} (M, π_{1}, π_{2}, π_{1, 0}) = | \det (\sum_{π_{12}}^{- 1}) - \det {\sum_{π_{12, M}}^{- 1} ({\bar{Y}}_{M})} | .

(10)

Definition of ESS for Case 1

The ESS of π₁₂(θ | φ) with respect to f(·), denoted by ESS₁₂(θ | φ), is the interpolated value of the sample size M minimizing Δ₁(M, π₁, π₂, π_1,0).

We use determinants in the definition of Δ₁ to incorporate the off-diagonal elements of the variance/covariance matrices and thereby account for association induced by the hyperprior among the parameters, θ₁, …, θ_K, which is a key aspect of a CIHM. The determinants quantify the total amounts of variability of the prior π₁₂ and the hypothetical posterior π_12,_M. The same definition of distance could be used for non-hierarchical models in place of (9). However, the trace used in (9) is easier to evaluate and leads to exactly the same ESS in all cases that we considered. Although our choice seems to make sense intuitively, it still is arbitrary. We validate the definition by investigating important cases of CIHMs, given in Section 5 and the supplementary materials.

Case 2

When the hyperparameters θ̃ are of primary interest, the ESS is a function of the target prior π₂(θ̃ | φ) and the marginal likelihood (4). We define an ε-information prior, π_2,0(θ̃ | φ₀), and use the marginal likelihood f₁( Inline graphic | θ̃) to update π_2,0(θ̃ | φ₀) to obtain the hypothetical posterior π_2,_M(θ̃ | φ₀, ). Let ${\bar{\tilde{θ}}}_{2} = E_{π_{2}} (\tilde{θ} ∣ φ)$ denote the prior mean of θ̃ under π₂(θ̃ | φ), and let ${\bar{Y}}_{M, 2} = E (Y_{M} ∣ {\bar{\tilde{θ}}}_{2})$ . We define a distance between π₂(θ̃ | φ) and π_2,_M(θ̃ | φ₀, Inline graphic ) for sample size M as in the definition in (11), using the variance/covariance matrices Σ_π₂ under the prior π₂(θ̃ | φ) and Σ_{π_2,M}( ) under π_2,_M(θ̃ | φ₀, ). When the variance/covariance matrices are not available in closed form, one can again use a numerical approximation.

Definition of ESS for Case 2

The ESS of π₂(θ̃ | φ) with respect to f_M( Inline graphic | θ̃), denoted by ESS₂(θ̃ | φ), is the interpolated value of M minimizing

Δ_{2} (M, π_{2}, π_{2, 0}) = | \det (\sum_{π_{2}}^{- 1}) - \det {\sum_{π_{2, M}}^{- 1} ({\bar{Y}}_{M, 2})} | .

(11)

Definition (11) is equivalent to the non-hierarchical ESS definition, simply because after marginalizing with respect to the group specific parameters θ_k the hierarchical model (1) reduces to a non-hierarchical model. If interest is focused on a subvector θ̃_s of θ̃, the ESS can be determined similarly in terms of the marginal hyperprior π₂(θ̃_s | φ).

An important aspect of the definition is that the ESS could be infinite when it is not possible to achieve a comparably informative posterior with any finite sample size. For example, recall the stylized normal/normal CIHM introduced after (1). Assume we fix γ̃² = 1, μ̃ ~ N(0, 0.01) and K = 10. In particular, we keep the number K of subpopulations fixed and consider an increasing number of samples per group. No matter how many observations, it is impossible to match the prior variance Var(μ̃) = 0.01 as the posterior variance under π_0,_n. Formally, the minimization (11) has no solution. This is desireable. The informative prior μ̃ ~ N(0, 0.01) cannot be interpreted as the posterior under any hypothetical sample having finite sample size, because, in this example, the number of groups is fixed at K = 10. However, while no sample size of a hypothetical prior study under the same experimental design, i.e., fixed K, can justify the employed prior, it still is possible to report a meaningful finite ESS under an alternative experimental design. For example, one could report the required sample size of a hypothetical prior study with fixed m = 20, σ² = 1, and increasing K. In this example we find ESS = 2100 with K = 105.

3.3 Algorithm for ESS Computation

Let θ̄₁₂ denote the prior mean vector of θ under π₁₂(θ | φ), and let ${\bar{\tilde{θ}}}_{2}$ denote the prior mean vector of θ̃ under π₂(θ̃ | φ). Let M_max = m_max × K be a positive integer multiple of K chosen so that it is reasonable to assume that ESS ≤ M_max.

Algorithm 1, ESS for Case 1

Step 1
Evaluate the target marginal prior π₁₂(θ | φ)
Step 2
Evaluate ε-information prior π_12,0(θ | φ)
Step 3
For each M = 0, K, 2K, …, M_max, compute Δ₁(M, π₁, π₂, π_1,0).
Step 4
ESS₁₂(θ | φ) is the interpolated value of M minimizing Δ₁(M, π₁, π₂, π_1,0).

When $\sum_{π_{12}}^{- 1}$ cannot be evaluated analytically, then one can use the negative Hessian of the log prior as approximation ${\sum^{^}}_{π_{12}}^{- 1} = - H_{π_{12}} ({\bar{θ}}_{12})$ . The Hessian H_π₁₂ is the matrix of second derivatives of the log(π₁₂) with respective to θ_j. The Hessian is evaluated at θ̄₁₂. When π₁₂(θ | φ) itself is not available in closed form, then we use the following simulation-based approximation. First, simulate a Monte Carlo sample θ̃⁽¹⁾, …, θ̃⁽^T⁾ from π₂(θ̃ | φ) for large T, e.g. T = 100,000. Use the Monte Carlo average $T^{- 1} \sum_{t = 1}^{T} \frac{\partial}{\partial θ_{k}} π_{1} (θ ∣ {\tilde{θ}}^{(t)})$ in place of $\int \frac{\partial}{\partial θ_{k}} π_{1} (θ ∣ \tilde{θ}) π_{2} (\tilde{θ} ∣ φ) d \tilde{θ}$ , and similarly for the second order partial derivatives. When θ̄₁₂ = E_π₁₂(θ | φ) cannot be computed analytically, first simulate a Monte Carlo sample θ⁽¹⁾, …, θ⁽^T⁾ from π₁₂(θ | φ) = ∫ π₁(θ | θ̃) π₂(θ̃ | φ)dθ̃ for large T, and compute the mean $T^{- 1} \sum_{t = 1}^{T} θ^{(t)}$ . Similarly, when $\sum_{12, M}^{- 1} ({\bar{Y}}_{M})$ is not available in closed form one can use the negative Hessian of log π_12,_M(θ̄₁₂ | Inline graphic , φ) the as a convenient approximation, ${\sum^{^}}_{12, M}^{- 1} ({\bar{Y}}_{M}) = - H_{π_{12, M}} ({\bar{θ}}_{12}, {\bar{Y}}_{M})$ . The posterior π_12,_M is evaluated conditional on and the Hessian is evaluated at θ̄₁₂.

Algorithm 2, ESS for Case 2

Step 1
Specify π_2,0(θ̃ | φ₀).
Step 2
For each M = 0, K, 2K, …, M_max, compute Δ₂(M, π₂, π_2,0).
Step 3
ESS₂(θ̃ | φ) is the interpolated value of M minimizing Δ₂(M, π₂, π_2,0).

When $\sum_{π_{2}}^{- 1}$ or $\sum_{π_{2}, M}^{- 1} ({\bar{Y}}_{M, 2})$ are not available in closed form we proceed similarly as before, using the negative Hessians to approximate $\sum_{π_{2}}^{- 1}$ and $\sum_{π_{2}, M}^{- 1}$ . The posterior π_2,_M is evaluated conditional on Inline graphic . The Hessian is evaluated at ${\bar{\tilde{θ}}}_{2}$ . In many cases, steps 2 and 3 may be repeated to compute ESS₂(θ̃ | φ) for subvectors of θ̃ that are of interest.

3.4 Numerical Evaluation of ESS

Equations (10) and (11) define a distance between prior and posterior as a difference of determinants of a prior precision matrix versus a posterior precision matrix. The posterior precision matrix is evaluated using the plug-in values Inline graphic in Δ₁ and in Δ₂, respectively. In some studies, however, the definition of or as an expectation is simply not meaningful. This is explained most easily by the following examples. Many phase I studies involve choosing doses for future patients as a function of the outcomes of earlier patients. In such settings, it is not meaningful to represent a sample of size M by a mean Inline graphic or . More generally, outcome adaptive designs for clinical studies make it difficult to meaningfully define . Similar difficulties arise when studies involve censored outcomes or regression on covariates.

In these cases, we use an extended definition of Δ₁ that replaces the plug in of Inline graphic by full prior predictive marginalization. Specifically, we replace $det {∣ \sum_{π_{12, M}}^{- 1} ({\bar{Y}}_{M}) ∣}$ by an expectation of $det {∣ \sum_{π_{12, M}}^{- 1} (Y_{M}) ∣}$ , averaging over possible study realizations with M patients. The sampling scheme might include additional variables, like assigned dose, patient-specific covariates, or censoring times. Let x_i denote these additional variables for the i^th patient. Let g(x_i | Y₁, …, Y_i₋₁, x₁, …, x_i₋₁, θ̄₁₂) denote a probability model for x_i. When x_i is a deterministic function of the conditioning variables, we interpret g(·) as a degenerate distribution with a single point mass. For example, in the case of an adaptive dose escalation design, the dose for the i^th patient is a function of the earlier outcomes. Thus, the outcomes are not independent, making the interpretation of Inline graphic tricky. We replace use of the plug in by the expectation of $det {∣ \sum_{π_{12, M}}^{- 1} (Y_{M}) ∣}$ with respect to the marginal model,

f_{M} (Y_{M} ∣ {\bar{θ}}_{12}) = \int \prod_{i = 1}^{M} f (Y_{i} ∣ Y_{1}, \dots, Y_{i - 1}, x_{i}, {\bar{θ}}_{12}) g (x_{i} ∣ Y_{1}, \dots, Y_{i - 1}, x_{1}, \dots, x_{i - 1}, {\bar{θ}}_{12}) d x_{1} \dots d x_{M} .

(12)

In words, f_M(·) is the marginal distribution of possible study outcomes when we carry out the study under a given design, and the prior mean θ̄₁₂. The predictive model (12) highlights the fact that ESS is defined in the context of an assumed experimental design. This includes details like the expected extent of censoring, distribution of covariates x_i, and more. Recall the stylized normal/normal CIHM stated below (1). One could consider two alternative experiments, either increasing the number of observations within each subpopulation, k = 1, …, K for fixed K, or alternatively one could consider increasing the number of groups K with a fixed number of observations per group. The two experiments naturally report different ESS values for the same prior, simply because the information that is being accrued with hypothetical future patients differs.

In the definition of the ESS, Case 1, for such studies we replace $det {∣ \sum_{π_{12, M}}^{- 1} ({\bar{Y}}_{M}) ∣}$ in definition of Δ₁ by

\int det {\sum_{π_{12, M}}^{- 1} (Y_{M})} f_{M} (Y_{M} ∣ {\bar{θ}}_{12}) d Y_{M}

(13)

where Σ⁻¹(Y_M) = Var(θ | Inline graphic ) is the posterior variance under data and prior π₁₂. Similarly, we replace $det {∣ \sum_{π_{2, M}}^{- 1} ({\bar{Y}}_{M, 2}) ∣}$ in the definition of Δ₂ in Case 2 by

\int det {\sum_{π_{2, M}}^{- 1} (Y_{M})} f_{M, 2} (Y_{M} ∣ {\bar{\tilde{θ}}}_{2}) d Y_{M}

(14)

where

f_{M, 2} (Y_{M} ∣ {\bar{\tilde{θ}}}_{2}) = \int \int \prod_{i = 1}^{M} f (Y_{i} ∣ Y_{1}, \dots, Y_{i - 1}, x_{i}, θ) g (x_{i} ∣ Y_{1}, \dots, Y_{i - 1}, x_{i}, \dots, x_{i - 1}, θ) d x_{1} \dots d x_{M} π_{1} (θ ∣ {\bar{\tilde{θ}}}_{2}) d θ .

(15)

For the standard CIHMs discussed later, in §5, the definitions using (13) instead of $det {∣ \sum_{π_{12, M}}^{- 1} ({\bar{Y}}_{M}) ∣}$ and (14) in place of $det {∣ \sum_{π_{2, M}}^{- 1} ({\bar{Y}}_{M, 2}) ∣}$ in the definitions of Δ₁ and Δ₂, respectively, give the same ESS as the original definitions. That is why we recommend using the simplified definitions (10) and (11) when possible. We recommend using the extended definitions only for problems where the definition of Inline graphic or is not meaningful.

Algorithms 1 and 2 remain unchanged. The only additional detail is the evaluation of Δ₁ and Δ₂. For Δ₁, we use numerical evaluation of the integral (13).

Algorithm 3, Numerical Evaluation of Δ₁

Evaluation of (13) as a Monte Carlo integral.

Repeat L iterations of Steps 1–6:

Step 1: Generate Y₁ ~ f(Y₁ | x₁, θ̄₁₂)

Repeat for i = 2, …, M:

Step 2: Evaluate g(x_i | Y₁, …, Y_i₋₁, x_i, …, x_i₋₁, θ̄₁₂). If g(·) is a deterministic function (as in dose allocation), record the value as x_i. If g(·) is a non-degenerate distribution generate x_i accordingly.

Step 3: Generate Y_i ~ f(Y_i | Y₁, …, Y_i₋₁, x_i, θ̄₁₂), using x_i from Step 2.

Step 4: The vector Inline graphic

= (Y₁, …, Y_M) is a simulated outcome vector.

Step 5: Evaluate the posterior variance covariance matrix Σ(θ | Inline graphic

). This evaluation might require Markov chain Monte Carlo integration when the model is not conjugate.

Step 6: Evaluate det(Σ⁻¹( Inline graphic

)).

Step 7: The average over all L iterations of steps 1 through 6.

\frac{1}{L} \sum det (\sum^{- 1} (Y_{M})) \approx \int det {\sum^{- 1} (Y_{M})} f_{M} (Y_{M} ∣ {\bar{θ}}_{12}) {d Y}_{M},

(16)

approximates the desired integral. The sum is over the L repeat simulations of Steps 1 through 6, plugging in the vector Inline graphic

that is generated in each step of the simulation.

Open in a new tab

With a minor variation, the same algorithm can be used to evaluate (14). Step 1 is replaced by

Step 1′: Generate $θ ~ π_{1} (θ ∣ {\bar{\tilde{θ}}}_{2})$ and generate Y₁ ~ f(Y₁ | x₁, θ).

The rest of the algorithm for computing ESS remains unchanged, with θ replacing θ̄₁₂. It is important that a new value θ is generated for each of the L repetitions of steps 1 through 6. This approximates the outer integral with respect to θ.

4 Application to the motivating examples

4.1 ESS for the sarcoma trial

The goal of the sarcoma trial was to estimate the efficacy of a molecularly targeted drug within each disease subtype. Recall that ξ_k was the response probability in subtype k, and θ_k = log{ξ_k/(1 − ξ_k)}. The likelihood for M patients with m patients in each subtype having response indicators Y_k_,_m = (Y_k₁, …, Y_km) is

f (Y_{M} ∣ θ) \propto \prod_{k = 1}^{K} \prod_{i = 1}^{m} ξ_{k} {(θ_{k})}^{Y_{k i}} {1 - ξ_{k} (θ_{k})}^{1 - Y_{k i}} .

(17)

For this example, based on the normal-gamma hierarchical prior described in Section 2, a closed form of the target prior in case 1 can be found analytically, as we will show in Section 5. Based on this prior and the likelihood (17), applying Algorithm 1 of subsection 3.3 gives ESS₁₂(θ | φ_μ̃ = (−1.386, 10), φ_γ̃ = (2, 20)) = 2.6. This ESS value is reasonable, since a separate early stopping rule was applied for each subtype in this study. For comparison, if one assumes instead that φ_μ̃ = (−1.386, 1) and φ_γ̃= (20, 1), then ESS₁₂(θ | φ_μ̃) = 702.2. This illustrates the important practical point that a seemingly reasonable choice of $φ_{\tilde{μ}} = (μ_{φ}, τ_{φ}^{2})$ and φ_γ̃ = (a_φ, b_φ) may give an excessively informative prior. Figure 1 gives plots of the ESS₁₂ values as a function of $τ_{φ}^{2}$ for each of the four pairs φ_γ̃ = (2, 20), (5, 20), (10, 20), and (10, 10), when μ_φ = −1.386. Figure 1 shows that, as the variance parameter $τ_{φ}^{2}$ in the hyperprior of μ̃ gets larger, the ESS₁₂ values decrease, which is intuitively correct. Similarly, ESS₁₂ increases with the prior mean of the precision parameter γ̃ under π₂(γ̃ | φ_γ̃).

Plots of *ESS*₁₂ as a function of variance parameter $τ_{φ}^{2}$ in the hyperprior of μ̃ for the four sets of φ*_γ̃* = (2, 20), (5, 20), (10, 20), and (10, 10), when *μ_φ* = −1.386.

If one wishes to focus, instead, on the overall mean treatment effect across all disease subtypes, (case 2), it leads to consideration of the hyperprior π₂(μ̃ | φ_μ̃). The ESS in this case may be computed using Algorithm 2 of subsection 3.3, which gives ESS₂(μ̃ | φ) = 3.7. We conclude that the hierarchical prior used by Thall et al. (2003) is reasonable under both points of view, either when one is concerned with inference on success probabilities for the sarcoma subtypes, or when one is interested in the average response probability for all sarcomas.

4.2 A CRM Dose-finding Trial for Multiple Patient Subgroups

In the second example, the goal of the CRM-type dose-finding study is to find the optimal dose in each subgroup. Recall that the probability of toxicity in each subgroup is modeled by π_k(x_i, θ_k) with logit{π_k(x_i, θ_k)} = α_k + β_kx_i, where θ_k = (α_k, β_k). Let m_1, m₂, m₃, m₄ denote the number of patients in the four subgroups, let x_k,_[_i_] denote the dose assignment for the i^th patient in subgroup k, and let D_{m_k} = {Y_k_,1, x_k,_[1], …, Y_{k,m_k}, x_{k,[m_k]}} denote the dose assignments and the responses observed in subgroup k. The likelihood for all $M = \sum_{k = 1}^{4} m_{k}$ patients is

f (Y_{M} ∣ x_{M}, θ) \propto \prod_{k = 1}^{4} \prod_{i = 1}^{m_{k}} π_{k} {(x_{k, [i]}, θ_{k})}^{Y_{k, i}} {1 - π_{k} (x_{k, [i]}, θ_{k})}^{1 - Y_{k, i}} .

(18)

Following the extended ESS definition given in Section 3.4, we compute ESS values under several combinations of ( $σ_{α, φ}^{2}, σ_{β, φ}^{2}$ , U_φ) for $σ_{α, φ}^{2} = σ_{β, φ}^{2}$ using Algorithm 3. Dose levels are determined on the basis of the observed data. As mentioned before, ESS is always defined in the context of an assumed experimental design. In the case of this study, the assumed relative accrual rates (0.4, 0.3, 0.2, 0.1) are part of the design, and are used in the evaluation of ESS.

Posterior summaries are evaluated using Markov chain Monte Carlo posterior simulation (with three parallel chains of length 5,000 with a burn-in of 500). The posterior means at the six dose levels in each subgroup are required in the dose-escalation/de-escalation algorithm for the respective new cohort. Table 1 summarizes the results of the computations, including the subgroup ESS values based on the assumed relative accrual rates (0.4, 0.3, 0.2, 0.1). Those computations suggest that, considering the maximum sample size of 36, the prior choices that are summarized in the first two rows of the table are defensible. The implied ESS is reasonably small compared to the sample size, since in these cases the per-subgroup sample size is less than 2. The prior in the third row may or may not be considered suitably non-informative for a dose-finding trial, but the prior in row 4 clearly is far too informative. Without computing ESS, it would be very difficult to make such determinations.

Table 1.

ESS values computed for several sets of level 2 prior parameters ( $σ_{α, φ}^{2}, σ_{β, φ}^{2}$ , U_φ). The per-subgroup ESS values are based on assumed relative accrual rates .40, .30, .20, .10 for k = 1, 2, 3, 4.

σ_{α, φ}^{2} = σ_{β, φ}^{2}

U_φ

ESS

Per-subgroup ESS

100

6.6

2.6

2.0

1.3

0.7

7.1

2.8

2.1

1.4

0.7

9.1

3.6

2.7

1.8

0.9

25.8

10.3

7.7

5.2

2.6

Open in a new tab

Computation of frequentist operating characteristics (OCs) by computer simulation, and calibration of prior hyperparameters as well as design parameters on that basis, has become standard practice in Bayesian clinical trial design. An important question is how computing the prior’s ESS can improve this process. This is especially important logistically because obtaining both prior ESS and the design’s OCs are computationally intensive. However, computing an ESS takes even less time than the time required for OCs under only one scenario. This is the case because in contrast to OC simulation, the evaluation of ESS does not require the simulation of complete trial histories. Also, in practice, after establishing hyperparameters that determine prior mean values, one guesses numerical values of second order hyperparameters, including variances and correlations, simulates the trial using that prior, and examines the resulting OCs. Since the numerical values of prior hyper-variances have no meaning per se, in the absence of ESS, one must rely on the OCs alone to determine whether the prior is acceptable and reasonable. This process often is repeated many times, until satisfactory choices are found, and is quite time-consuming and tedious. When the ESS of each prior is computed, however, it may be used to calibrate the hyper-variances so that they give a reasonable ESS, before running any simulations of complete trial histories. This saves a great deal of time because, as noted above, computing an ESS is far less time-consuming than running simulations.

Aside from obtaining a design’s OCs, because the ESS is readily interpretable, it is a useful tool for deciding whether a prior is reasonable per se. Recall that the ESS is a property of the prior, likelihood, and experimental design. In contrast, obtaining the OCs of a given design also requires the specification of particular scenarios (fixed outcome probabilities) under which the design is evaluated. In the absence of ESS, it is possible to obtain OCs that look reasonable when in fact the prior is undesirably informative. That is, one may unknowingly be misled by simulation results. This is especially worrying when the clinical outcome is a complex function of treatment, as in a dose-response setting. As an example, consider the CRM dose-finding trial with heterogeneous subgroups under the prior with level 2 prior parameters $σ_{α, φ}^{2} = σ_{β, φ}^{2} = 25$ and U_φ = 2 in the last row of Table 1. Recall that the ESS = 25.8 for that prior. A simulation of the trial assuming these level 2 hyper-prior parameters is summarized in Table 2, which shows that, despite the overly informative prior in terms of ESS, the subgroup-specific CRM design is most likely to choose the right doses in subgroups 1, 2, and 3. In contrast, the dose selection is clearly wrong in subgroup 4, since the highest dose level would be most desirable in that subgroup. In the assumed scenario, the dose-toxicity curve in subgroup 4 is qualitatively different from the other subgroups, with true π₄(x_i) = .10 or .20 for all doses, well below the target of .30. Since subgroup 4 has the lowest accrual rate of 10% and thus accrues on average .10 × 36 = 3.6 patients, the data from the trial cannot overcome the overly informative prior (ESS = 2.7) in this subgroup. Based on the ESS, however, one would never assume the prior with $σ_{φ}^{2} = 25$ and U_φ = 2. This example shows how use of the prior ESS complements the evaluation of OCs. In summary, in the context of Bayesian clinical trial design, prior ESS is both a tool for calibrating hyper-variances and thus speeding up the process of simulation to obtain the design’s OCs, and also a simple summary statistic that helps investigators decide whether a prior is acceptable.

Table 2.

Simulation study of the CRM dose-finding trial with heterogeneous subgroups assuming the prior with level 2 prior parameters $σ_{α, φ}^{2} = σ_{β, φ}^{2} = 25$ and U_φ = 2. The percentage of times that the method selected each dose level as the final MTD in each subgroup (%recommendation) and the numbers of patients who were treated at each dose level (No. of patients) out of 36 patients are summarized. Correct selections are marked in bold-face.

Subgroup	Dose level	d₁	d₂	d₃	d₄	d₅	d₆
1	True prob. tox	.05	.10	.15	.30	.50	.65
	%recommendation	.05	.16	.30	.36	.07	.04
	No. of patients	2.1	3.1	4.5	3.6	1.1	.4
2	True prob. tox	.10	.20	.30	.45	.60	.70
	%recommendation	.08	.26	.39	.24	.02	.01
	No. of patients	1.8	2.9	3.3	2.0	.6	.1
3	True prob. tox	.15	.30	.45	.55	.65	.75
	%recommendation	.12	.34	.32	.16	.02	.01
	No. of patients	1.2	2.2	1.9	1.0	.3	.1
4	True prob. tox	.10	.10	.10	.20	.20	.20
	%recommendation	.05	.20	.32	.31	.09	.02
	No. of patients	.5	.9	1.0	.8	.2	.1

Open in a new tab

4.3 ESS for a Multicenter Randomized Trial

In the third example, recall that θ₁_k and θ₂_k denote the affective disorder recurrence rates in the two treatments arms for trial center k, and these are reparameterized as ζ_k = log(θ₁_k/θ₂_k) and η_k = log(θ₂_k). Recall that T_jk_,_i is the time to the event for patient i receiving treatment j at center k. Denote the observed time to an event at T or right-censoring by T^o and δ = I[T^o = T]. Let $T_{M}^{o}$ and δ_m denote the vectors of all event observation times and indicators in the sample of M = m × K patients. The likelihood under the exponential model is

f (T_{M}^{o}, δ_{M} ∣ θ) = \prod_{k = 1}^{K} \prod_{j = 1}^{2} \prod_{i = 1}^{m_{j, k}} {θ_{j k} \exp (- θ_{j k} T_{j k, i})}^{δ_{jki}} {\exp (- θ_{j k} T_{j k, i}^{o})}^{1 - δ_{jki}},

(19)

where θ₁_k = exp(ζ_k + η_k) and θ₂_k = exp(η_k). Recall that the hyperparameter ${\tilde{σ}}_{ζ}^{2}$ characterizing inter-center variability is of primary interest (Case 2). Based on the hierarchical model described in Section 2 and the likelihood (19), we use Algorithm 2 to compute ${ESS}_{2} ({\tilde{σ}}_{ζ}^{2} ∣ m_{φ}, s_{φ}^{2})$ , which are ${ESS}_{2} ({\tilde{σ}}_{ζ}^{2} ∣ m_{φ} = - 1.61, s_{φ}^{2} = {0.5}^{2}) = 27.6$ and ${ESS}_{2} ({\tilde{σ}}_{ζ}^{2} ∣ m_{φ} = 0, s_{φ}^{2} = {0.5}^{2}) = 9.1$ . The two ESS values indicate that the first prior may be too informative, since the total sample size of the trial is 150. One may obtain a less informative hyperprior for ${\tilde{σ}}_{ζ}^{2}$ by specifying larger $s_{φ}^{2}$ . For example, $s_{φ}^{2} = {0.75}^{2}$ , 1.0², and 1.25² give ${ESS}_{2} ({\tilde{σ}}_{ζ}^{2} ∣ m_{φ} = 0, s_{φ}^{2}) = 5.0$ , 1.4, and 0.4. Table 3 summarizes the ${ESS}_{2} ({\tilde{σ}}_{ζ}^{2} ∣ m_{φ}, s_{φ}^{2})$ values computed for several sets of (m_φ, $s_{φ}^{2}$ ) pairs.

Table 3.

${ESS}_{2} ({\tilde{σ}}_{ζ}^{2} ∣ m_{φ}, s_{φ}^{2})$ values computed for several sets of (m_φ, $s_{φ}^{2}$ ), where ${\tilde{σ}}_{ζ}^{2}$ represents the inter-center heterogeneity of the treatment effect, and m_φ and $s_{φ}^{2}$ respectively denote the mean and variance parameters of the lognormal hyperprior of σ̃_ζ.

m_φ

s_{φ}^{2}

ESS₂

−1.61

0.5²

27.6

−1.61

0.75²

12.1

−1.61

1.0²

8.4

−1.61

1.25²

6.6

−0.5

0.5²

15.5

−0.5

0.75²

8.0

−0.5

1.0²

5.7

−0.5

1.25²

2.1

0.0

0.5²

9.1

0.0

0.75²

5.0

0.0

1.0²

1.4

0.0

1.25²

0.4

Open in a new tab

As we mentioned in Section 2.3, the ESS computation in this example accounts for censoring in the process of numerical computation of the distance between the prior and posterior. For the above computations, we set-up the study duration at 2 years, as it was done in Stangl’s study design. That is, patients are followed for 2 years, or until they experience the event. If patients do not experience the event by the end of the study, they are treated as censored cases. Given the event rates in the two treatment groups, the proportion of censored cases goes up as the study duration becomes shorter. When the study duration is 2 years, the ESS is computed to be 27.6, as shown in the first row of Table 3. When the study duration is 10 years, the ESS value is computed to be 24.5, while the ESS is 36.3 in case the duration is 0.5 years. That is, as the proportion of censored cases rises up, the ESS goes up, and vice-versa.

From Figure 7 of Stangl (1996), it can readily be seen by inspection that prior 1 is too informative by comparing the posterior and prior of the 3rd stage variances. The ESS formalizes this judgment by quantifying the informativeness of prior 1. Moreover, the ESS validates prior 2 as being reasonable, whereas this in not obvious using only a graphical evaluation.

5 Validation in Some Standard CIHMs

The aim of the proposed ESS definitions is to provide an easily interpretable way to quantify the prior informativeness of commonly used CIHMs. The definition of ESS obtained by matching a given prior with a posterior distribution under an ε-information prior could be argued to be a natural choice. In order to formalize this, however, many detailed technical choices must be made. The definitions that we have proposed here are two of many possible formalizations. We arrived at the definitions after considering many alternatives, and evaluating the performance of each for several widely used CIHMs. The examples (one is in this section and two are in the supplementary materials) report the implied ESS for these models. As standard CIHMs, we use hierarchical extensions of standard models listed in Chapter 5 of Congdon (2005). In this section, we show how to compute the ESSs in cases 1 and 2 analytically in the normal CIHM with known sampling variance, and we argue that the obtained ESSs are sensible. In the supplementary materials, we provide computational details for a hierarchical model for binary responses recorded over different subpopulations as in the sarcoma trial of Section 2.1, and also for an alternative hierarchical model for the multicenter randomized trial analysis given in Section 2.3, assuming an exponential sampling model for the observed recurrence times.

Perhaps the most widely used CIHM is a hierarchical model with normal sampling model and conjugate prior and hyperprior. A typical example of such a CIHM is discussed in Gelman et al. (1995, Section 5.5), for a study of special coaching programs to prepare students for the scholastic aptitude test (SAT). The observed outcomes are SAT scores, where Y_ki denotes score of the i-th student in school k. The scores are assumed to be normally distributed with school specific means θ₁, …, θ_k and known variance σ². The model is completed with conjugate hyperpriors, as follows:

\begin{array}{l} Sampling model & Y_{k, 1}, \dots Y_{k, m} ∣ θ_{k}, σ^{2} & ~ & N (θ_{k}, σ^{2}) indep . for all k \\ Prior & θ_{k} ∣ \tilde{μ}, {\tilde{γ}}^{2} & ~ & N (\tilde{μ}, {\tilde{γ}}^{2}) i . i . d . for all k \\ Hyperpriors & \tilde{μ} ∣ μ_{φ}, τ_{φ}^{2} & ~ & N (μ_{φ}, τ_{φ}^{2}) \\ {\tilde{γ}}^{2} ∣ ν_{φ}, S_{φ} & ~ & Inv - χ^{2} (ν_{φ}, S_{φ}) . \end{array}

Thus, for K schools, μ̃ represents an overall effect and γ̃² represents inter-school variability. The fixed hyperparameters are $φ_{\tilde{μ}} = (μ_{φ}, τ_{φ}^{2})$ and φ_γ̃² = (ν_φ, S_φ). Let $r = τ_{φ}^{2} / ({\tilde{γ}}^{2} + τ_{φ}^{2})$ denote the intra-class correlation. In this model the ESS in both cases 1 and 2 can be found analytically, because both the marginal prior π₁₂ and the posterior π_12,_M are multivariate normal, thus the information matrices of the prior and posterior can be obtained analytically as the inverses of their variance-covariance matrices. Details are given in the supplementary materials.

Case 1

When the main goal is inference on the school-specific SAT scores θ₁, …, θ_K, assuming fixed prior variance γ̃²,

{ESS}_{12} (θ ∣ φ_{\tilde{μ}}) = \frac{σ^{2}}{{\tilde{γ}}^{2}} {(\frac{1 - r}{1 + (K - 1) r})}^{1 / K} \times K .

(20)

This formula confirms what one would expect intuitively. The precision 1/γ̃² of the conjugate normal prior for each school is equivalent to σ²/γ̃² observations (SAT scores). Thus K independent priors for θ₁, …, θ_K, corresponding to r = 0, would be equivalent to Kσ²/γ̃² observations. The second factor in (20) is obtained analytically from the determinant of the information matrix of π₁₂ for the normal CIHM with conjugate priors (computational details are given in the supplementary materials). This factor accounts for the dependence among θ₁, …, θ_K in the two-level prior. Given K, as the correlation among θ₁, …, θ_K increases, this factor reduces the ESS₁₂. That is, as r → 1, the ESS₁₂ → 0. In particular, for K = 2 schools, ${ESS}_{12} (θ ∣ φ_{\tilde{μ}}) = \frac{σ^{2}}{{\tilde{γ}}^{2}} {(\frac{1 - r}{1 + r})}^{1 / 2} \times 2$ . Figure 2 shows a plot of ESS₁₂(θ | φ_μ̃) as a function of r and γ̃² for K = 5 and σ² = 1.0.

Plots of ESS₁₂(θ | φ*_μ̃*) as a function of prior variance γ̃² and r for the normal/inverse χ²-normal-normal model, when K = 5 and σ² = 1.0.

To validate the plausibility of (20), we consider three limiting cases, each of which leads to a non-hierarchical model having an obvious prior ESS. (i) Given γ̃², as r → 0, i.e. as $τ_{φ}^{2} \to 0$ , which is the limiting case with K independent schools, ESS₁₂ → K × σ²/γ̃² and σ²/γ̃² is the ratio of the data variance to the prior variance, which is the prior ESS of a non-hierarchical model (Spiegelhalter et al. 1994, Section 3.1.2; MTM, Section 5, Example 3). (ii) In contrast, given γ̃², as $τ_{φ}^{2} \to + \infty$ , and consequently r → 1, and there is one common effect θ = θ₁ = … = θ_k, i.e. this limiting case is a non-hierarchical model where schools are ignored. The marginal prior variance ${\tilde{γ}}^{2} + τ_{φ}^{2}$

for the common effect diverges, however, which is reflected by the fact that ESS₁₂ → 0, as desired, (iii) On the other hand, consider the case where $τ_{φ}^{2}$ is fixed and γ̃² → 0. Again, the model reduces to one common effect θ = θ₁ = … = θ_K, but now with finite marginal prior variance $τ_{φ}^{2}$ . In this case, to learn with certainty that all effects are equal would require extremely reliable data, i.e., a very large number of students in each school, which is (correctly) reflected by the fact that ESS₁₂(θ | φ_μ̃) → +∞. If we allow γ̃² to be random, ESS₁₂(θ | φ) takes the same form as expression (20), but with γ̃² replaced by its prior mean μ_γ̃². (see Supplementary Materials, section 1).

Case 2

If the primary focus is the population parameter μ̃, averaging across all schools,

{ESS}_{2} (\tilde{μ} ∣ φ) = \frac{σ^{2}}{τ_{φ}^{2}} {(1 - \frac{μ_{{\tilde{γ}}^{2}}}{τ_{φ}^{2} K})}^{- 1},

(21)

under the restriction that $τ_{φ}^{2} > K^{- 1} μ_{{\tilde{γ}}^{2}}$ . An explanation of this restriction is given in the supplementary materials, Section 1 (Normal/Inverse χ²-Normal-Normal Model).

6 Discussion

We have proposed two practical definitions for the prior ESS of a conditionally independent hierarchical model. The main limitations are the need for numerical evaluations in many cases and the use of several arbitrary choices in the definitions. The arbitrary choices in the definition include the construction of an ε-information prior, evaluation of the posterior variance/covariance matrices conditional on Inline graphic and , and evaluation of the distances Δ₁ and Δ₂ at the prior means θ̄₁₂ and ${\bar{\tilde{θ}}}_{2}$ . The definitions of the two distances, based on the determinants of the information matrices, are reasonable but arbitrary. Alternative choices could be investigated. For example, one could use L₂ distance after a transformation of the variance-covariance matrix to some suitable standard form. E.g. the Cayley transformation maps the variance-covariance matrix Σ to the orthogonal matrix (I − Σ)(I + Σ)⁻¹. The main strengths of the method are the constructive nature of the definitions, and validation by matching prior ESS values obtained by the method with intuitively correct prior ESS values in special but important cases.

One of the reasons for the big success of hierarchical models, in particular, in inference for biomedical studies is the following feature. The hierarchical model constructs an informative prior for any given subgroup by borrowing strength from other subgroups and using the information contained in the data. If this feature is important, then the investigator might wish assurance that the hyperprior should not be excessively informative. The proposed ESS facilitates such judgements.

An important area of practical application for the proposed prior ESS summaries is design and inference for early phase clinical trials with small to moderate sample sizes. A concern about inappropriately influential prior information is one of the main impediments to a more widespread use of Bayesian methods in clinical trials. With the ever growing pressure for efficient use of resources and higher ethical standards, clinical trial designs are becoming increasingly more complex. Trial designs now routinely include the use of multiple outcomes, borrowing strength across different disease sub-types or patient subpopulations, the use of biomarkers, adaptive allocation, sequential stopping, etc. Such complexity may lead naturally to the use of a CIHM. The ability to compute prior ESS in such models provides a natural basis for calibrating priors as well as explaining the model to non-statisticians. Prior ESS values provide a similar tool when using a CIHM in a meta-analysis, which is an area of extensive activity. The two case studies in Section 2 are typical examples. It may be argued that, without an understanding of the prior ESS, it is impossible to carry out a fair regulatory review of a clinical trial protocol using a method based on a Bayesian CIHM. The proposed methodology may be used to mitigate this concern.

Overall, we believe that reporting ESS is most useful in judging the relative role of the prior, either informative or non-informative, in a clinical trial design, and as a tool to greatly facilitate the process of design simulation to calibrate the prior and obtain OCs. In this setting, both investigators and regulators may be very concerned that the trial conduct should not be overly biased by (sometimes inappropriately) optimistic priors. The ESS also is helpful to report and interpret the results of a data analysis, since comparing the ESS to the data sample size allows one to judge the prior’s relative informativeness.

Supplementary Material

NIHMS498566-supplement-supplement_1.pdf^{(199.1KB, pdf)}

Acknowledgments

Satoshi Morita’s work was supported in part by Grant H20-CLINRES-G-009 from the Ministry of Health, Labour, and Welfare in Japan and a non-profit organization Epidemiological and Clinical Research Information Network (ECRIN). Peter Müller’s work was partially supported by Grant NIH/NCI R01 CA 75981. Peter Thall’s work was partially supported by Grant NIH/NCI R01 CA 083932. We thank the associate editor and the referees for their thoughtful and constructive comments and suggestions.

Appendix

Assume for simplicity that V̄_j = ∞; the proof in the case V̄_j < ∞ is similar. Accounting for the variability at both levels of the CIHM, the variance of each θ_j can be written in the expanded form $σ_{π_{12}}^{2} (θ_{j}) = E_{π_{2}} {σ_{π_{1}}^{2} (θ_{j} ∣ \tilde{θ})} + σ_{π_{2}}^{2} {E_{π_{1}} (θ_{j} ∣ \tilde{θ})}$ . Similarly, using π_1,0 in place of π₁, $σ_{π_{12, 0}}^{2} (θ_{j}) = E_{π_{2}} {σ_{π_{1, 0}}^{2} (θ_{j} ∣ {\tilde{θ}}_{0})} + σ_{π_{2}}^{2} {E_{π_{1, 0}} (θ_{j} ∣ {\tilde{θ}}_{0})}$ . Since the definition of ε-information prior ensures that E_{π_1,0}(θ_j|θ̃₀) = E_π₁(θ_j|θ̃), the second terms of the two expansions are identical. Denoting this common term by c_j, and also denoting $a_{j} = E_{π_{2}} {σ_{π_{1}}^{2} (θ_{j} ∣ \tilde{θ})}$ and $b_{j} = E_{π_{2}} {σ_{π_{1, 0}}^{2} (θ_{j} ∣ {\tilde{θ}}_{0})}$ , the ratio of variances may be written as

\frac{σ_{π_{12}}^{2} (θ_{j})}{σ_{π_{12, 0}}^{2} (θ_{j})} = \frac{E_{π_{2}} {σ_{π_{1}}^{2} (θ_{j} ∣ \tilde{θ})} + c_{j}}{E_{π_{2}} {σ_{π_{1, 0}}^{2} (θ_{j} ∣ {\tilde{θ}}_{0})} + c_{j}} = \frac{a_{j} + c_{j}}{b_{j} + c_{j}} .

(22)

Given θ̃, under the assumption V̄_j = ∞ one can choose θ̃₀ so that $c_{j} / σ_{π_{1, 0}}^{2} (θ_{j} ∣ {\tilde{θ}}_{0}) < ε / 2$ and $σ_{π_{1}}^{2} (θ_{j} ∣ \tilde{θ}) / σ_{π_{1, 0}}^{2} (θ_{j} ∣ {\tilde{θ}}_{0}) < ε / 2$ . Writing these inequalities as $c_{j} - σ_{π_{1, 0}}^{2} (θ_{j} ∣ {\tilde{θ}}_{0}) ε / 2 < 0$ and $σ_{π_{1}}^{2} (θ_{j} ∣ \tilde{θ}) - σ_{π_{1, 0}}^{2} (θ_{j} ∣ {\tilde{θ}}_{0}) ε / 2 < 0$ , taking the expectations with respect to π₂ and reversing the algebra in each inequality, it follows that $c_{j} / E_{π_{2}} {σ_{π_{1, 0}}^{2} (θ_{j} ∣ {\tilde{θ}}_{0})} = c_{j} / b_{j} < ε / 2$ and $E_{π_{2}} {σ_{π_{1}}^{2} (θ_{j} ∣ \tilde{θ})} / E_{π_{2}} {σ_{π_{1, 0}}^{2} (θ_{j} ∣ {\tilde{θ}}_{0})} = a_{j} / b_{j} < ε / 2$ . Writing the right-hand side of (22) as (a_j/b_j + c_j/b_j)/(1 + c_j/b_j), since c_j/b_j > 0 this is bounded above by a_j/b_j + c_j/b_j, which by the foregoing is bounded above by ε/2 + ε/2 = ε.

References

Berlin JA, Colditz GA. The role of meta-analysis in the regulatory process for foods, drugs, and devices. Journal of the American Medical Association. 1999;281:830–834. doi: 10.1001/jama.281.9.830. [DOI] [PubMed] [Google Scholar]
Berry DA, Stangl DK. Bayesian Biostatistics. New York: Marcel Dekker; 1996. [Google Scholar]
Congdon P. Bayesian Statistical Modelling. 2. Chichester: John Wiley and Sons; 2005. [Google Scholar]
Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. 2. New York: Chapman and Hall/CRC; 2004. [Google Scholar]
Gelman A. Prior distributions for variance parameters in hierarchical models. Bayesian Analysis. 2006;1:515–533. [Google Scholar]
Gray RJ. A Bayesian analysis of institutional effects in a multicenter cancer clinical trial. Biometrics. 1994;50:244–253. [PubMed] [Google Scholar]
Kass RE, Steffey D. Approximate Bayesian Inference in Conditionally Independent Hierarchical Models. Journal of the American Statistical Association. 1989;84:717–726. [Google Scholar]
Morita S, Thall PF, Müller P. Determining the effective sample size of a parametric prior. Biometrics. 2008;64:595–602. doi: 10.1111/j.1541-0420.2007.00888.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
O’Quigley J, Pepe M, Fisher L. Continual reassessment method: a practical design for phase I clinical trials in cancer. Biometrics. 1990;46:33–48. [PubMed] [Google Scholar]
Spiegelhalter DJ, Freedman LS, Parmar MKB. Bayesian approaches to randomized trials. Journal of the Royal Statistical Society, Series A. 1994;157:357–416. [Google Scholar]
Stangl DK. Hierarchical Analysis of Continuous-Time Survival Models. In: Berry DA, Stangl DK, editors. Bayesian Biostatistics. 1996. pp. 429–450. [Google Scholar]
Thall PF, Wathen K, 1, Bekele BN, Champlin RE, Baker LH, Benjamin RS. Hierarchical Bayesian approaches to phase II trials in diseases with multiple subtypes. Statistics in Medicine. 2003;22:763–780. doi: 10.1002/sim.1399. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS498566-supplement-supplement_1.pdf^{(199.1KB, pdf)}

[R1] Berlin JA, Colditz GA. The role of meta-analysis in the regulatory process for foods, drugs, and devices. Journal of the American Medical Association. 1999;281:830–834. doi: 10.1001/jama.281.9.830. [DOI] [PubMed] [Google Scholar]

[R2] Berry DA, Stangl DK. Bayesian Biostatistics. New York: Marcel Dekker; 1996. [Google Scholar]

[R3] Congdon P. Bayesian Statistical Modelling. 2. Chichester: John Wiley and Sons; 2005. [Google Scholar]

[R4] Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. 2. New York: Chapman and Hall/CRC; 2004. [Google Scholar]

[R5] Gelman A. Prior distributions for variance parameters in hierarchical models. Bayesian Analysis. 2006;1:515–533. [Google Scholar]

[R6] Gray RJ. A Bayesian analysis of institutional effects in a multicenter cancer clinical trial. Biometrics. 1994;50:244–253. [PubMed] [Google Scholar]

[R7] Kass RE, Steffey D. Approximate Bayesian Inference in Conditionally Independent Hierarchical Models. Journal of the American Statistical Association. 1989;84:717–726. [Google Scholar]

[R8] Morita S, Thall PF, Müller P. Determining the effective sample size of a parametric prior. Biometrics. 2008;64:595–602. doi: 10.1111/j.1541-0420.2007.00888.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] O’Quigley J, Pepe M, Fisher L. Continual reassessment method: a practical design for phase I clinical trials in cancer. Biometrics. 1990;46:33–48. [PubMed] [Google Scholar]

[R10] Spiegelhalter DJ, Freedman LS, Parmar MKB. Bayesian approaches to randomized trials. Journal of the Royal Statistical Society, Series A. 1994;157:357–416. [Google Scholar]

[R11] Stangl DK. Hierarchical Analysis of Continuous-Time Survival Models. In: Berry DA, Stangl DK, editors. Bayesian Biostatistics. 1996. pp. 429–450. [Google Scholar]

[R12] Thall PF, Wathen K, 1, Bekele BN, Champlin RE, Baker LH, Benjamin RS. Hierarchical Bayesian approaches to phase II trials in diseases with multiple subtypes. Statistics in Medicine. 2003;22:763–780. doi: 10.1002/sim.1399. [DOI] [PubMed] [Google Scholar]

PERMALINK

Prior Effective Sample Size in Conditionally Independent Hierarchical Models

Satoshi Morita

Peter F Thall

Peter Müller

Abstract

1 Introduction

2 Motivating examples

2.1 A Single-Arm Sarcoma Trial

2.2 A CRM Dose-finding Trial for Multiple Patient Subgroups

2.3 A Multicenter Randomized Trial

3 Effective sample size in CIHMs

3.1 Prior Effective Sample Size in Non-hierarchical Models

Definition of epsilon-information prior

Definition of ESS in the non-CIHM case

3.2 Conditionally independent hierarchical models

Case 1

Definition of ESS for Case 1

Case 2

Definition of ESS for Case 2

3.3 Algorithm for ESS Computation

Algorithm 1, ESS for Case 1

Algorithm 2, ESS for Case 2

3.4 Numerical Evaluation of ESS

Algorithm 3, Numerical Evaluation of Δ1

4 Application to the motivating examples

4.1 ESS for the sarcoma trial

Figure 1.

4.2 A CRM Dose-finding Trial for Multiple Patient Subgroups

Table 1.

Table 2.

4.3 ESS for a Multicenter Randomized Trial

Table 3.

5 Validation in Some Standard CIHMs

Case 1

Figure 2.

Case 2

6 Discussion

Supplementary Material

Acknowledgments

Appendix

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Algorithm 3, Numerical Evaluation of Δ₁