Semiparametric Bayesian inference for repeated fractional measurement data

Ying Yang; Peter Müller; Gary L Rosner

. Author manuscript; available in PMC: 2011 Aug 4.

Published in final edited form as: Chil J Stat. 2010 Apr 1;1(1):59–74.

Semiparametric Bayesian inference for repeated fractional measurement data

Ying Yang ¹, Peter Müller ^2,^*, Gary L Rosner ³

PMCID: PMC3150508 NIHMSID: NIHMS217175 PMID: 21822354

Abstract

We discuss inference for repeated fractional data, with outcomes between 0 to 1, including positive probability masses on 0 and 1. The point masses at the boundaries prevent the routine use of logit and other commonly used transformations of (0, 1) data. We introduce a model augmentation with latent variables that allow for the desired positive probability at 0 and 1 in the model. A linear mixed effect model is imposed on the latent variables. We propose a Bayesian semiparametric model for the random effects distribution. Specifically, we use a Polya tree prior for the unknown random effects distribution. The proposed model can capture possible multimodality and skewness of random effect distribution. We discuss implementation of posterior inference by Markov chain Monte Carlo simulation. The proposed model is illustrated by a simulation study and a cancer study in dogs.

Keywords: Fractional data, Linear mixed model, MCMC algorithm, Polya tree, Repeated measurement data, Semiparametric Bayesian inference

1. Introduction

Random effects are used to model dependence of repeated measurements on patients or other experimental units. Random effects can be thought of as unmeasured covariates whose values can be considered randomly distributed amongst study individuals. For continuous outcomes with normal errors, Laird and Ware (1982) proposed a normal linear random effects model. In this model, random effects are assumed to be centered around the mean regression coefficients for the populations, also known as the fixed effects. Conditional on random effects, repeated observations on a subject are considered independent. Goldstein (1986) and Longford (1987) developed a model that incorporates nested random effects, representing nested group-specific as well as individual-specific sources of heterogeneity (uncontrolled variation) to be modeled. Gilks et al. (1993) presented a linear multiple-random-effects model that simultaneously accommodates group-specific sources of heterogeneity for several groupings of individuals. They use Gibbs sampling to implement posterior inference. Kleinman and Ibrahim (1998a) described a semiparametric Bayesian generalization of the normal linear random effect model, where a nonparametric prior distribution is specified for the random effects.

In this article, we consider the practically important case when the outcome variable is fraction that is continuous between 0 and 1 plus positive point masses at 0 and 1. We call such data “fractional data”. Statistical modeling on fractional data of this type has not been investigated so far for longitudinal or repeated measurement data. Zero-inflated distributions have been used to model count data (Ridout et al., 1998) but would not be appropriate here. Fractional data can be seen in the cancer studies. For example, investigators may study the proportion of hypoxic cells in the cancer micoenvironment study or the proportion of stained cells in immunochemistry (ICH) studies. Such data may take the values between 0 and 1.

A standard approach for constrained data is the use of transformations to remove the constraint, such as a logit or probit transformation. Albert and Chib (1993) propose a probit regression model for binary and polychotomous outcomes. They impose a normal regression structure on latent continuous data. Values of the latent data are simulated from suitable truncated normal distributions. After the latent data have been generated, the posterior distributions of the parameters are computed using standard results from normal linear models. Draws from these posteriors are used to sample new latent data. The process is iterated leading to a Gibbs sampling scheme. For fractional data, however, we are considering, a complication arises from the fact that 0 and 1 are included in the range of possible values, with positive probabilities. This complicates the use of conventional logit or probit transformations.

In this paper, we propose a simulation-based approach to implement posterior inference for the parameters of interest in a model for fractional data. The key idea is to introduce additional latent variables to represent the awkward point masses at 0 and 1. A mixed-effects model is imposed on these latent variables. The model accommodates individual-specific sources of heterogeneity. We start with standard mixed normal linear model assumptions, as usual for continuous data, and then extend to a nonparametric Bayesian model.

The paper is organized as follows. In Section 2 we describe the proposed model for fractional data. In Section 3 we present the normal prior for the random effects in the mixed-effect model. In Section 4 we discuss the Polya tree prior for random effects distribution. We present a simulation study in Section 5 and a cancer study in Section 6. We conclude with a summary discussion in Section 7.

2. Model Formulation on Fractional Data

Suppose that a fractional outcome vector, y_i = (y_i₁, …, y_{in_i}), y_ij ∈ [0, 1] with n_i repeated measurements is observed in individual i. Responses y_ij can also be 0 or 1 with positive probability. Latent variables z_ij are introduced to address this data structure by including point masses at 0 and 1 in the model.

y_{i j} = {\begin{array}{l} 0, & if z_{i j} \leq 0; \\ z_{i j}, & if 0 < z_{i j} < 1, i = 1, \dots, n, j = 1, \dots, n_{i}; \\ 1, & if z_{i j} \geq 1. \end{array}

(1)

The z_ij are unknown. The distribution of z_ij is unconstrained and continuous at 0 and 1. We can therefore proceed with standard linear mixed model assumptions as is usual for continuous data, including normal distribution assumptions. If the application demands, one can use other functional forms for the latent continuous variable. Let N_p(t, S) denote a p-dimensional normal probability density function with moments (t, S). We construct the following model for ith individual:

z_{i} = X_{i} β + U_{i} θ_{i} + e_{i},

(2)

where β is a p × 1 vector of regression coefficients, commonly called fixed effects. The matrix X_i is an n_i × p design matrix of known covariates for ith individual. U_i is an n_i × q matrix of covariates for the q × 1 random effect vector θ_i, and e_i is an n_i × 1 vector of residuals. We assume that e_i and θ_i are independent and e_i ~ N_{n_i} (0, σ²I_{n_i}).

For the distribution of random effects, we initially assume a normal distribution. Later, we will introduce alternative and generalized models, as and if indicated by model diagnostics and criticism. We will assign nonparametric priors to the distribution of random effects.

The model includes a monotonicity assumption. We assume that P(y_ij = 1) increases as the mean of subdensity p(y_ij|0 < y_ij < 1) increases. We make an analogous assumption for P(y_ij = 0). We feel this is reasonable in most applications. For example, in an application with y_ij being the fraction of stained cells in immunohistochemistry data, it is reasonable to assume that the probability of all cells being stained (i.e., y_ij = 1) increases as the average fraction of stained cells rises.

3. Normal Linear Random Effect Model

The linear model given in Equation (2) defines the top level sampling model. Without loss of generality we assume conditional independence within experimental units (e.g., dogs), that is, a diagonal variance-covariance matrix. Little would change in the following discussion if we were to assume a non-diagonal variance-covariance matrix.

3.1 Prior specification

We complete the model with conjugate priors. For the fixed effects, we assume a conjugate multivariate normal prior

β \sim N_{p} (μ_{0}, \sum_{0}) .

(3)

Random effects are assumed to arise from the normal random effects model given by

θ_{i} \overset{iid}{\sim} N_{q} (0, \sum_{θ}) .

The prior on the residual variance is specified as

τ = {(σ^{2})}^{- 1} \sim Ga (\frac{γ_{0}}{2}, \frac{λ_{0}}{2}),

(4)

where Ga(a, b) denotes a gamma distribution with mean a/b and variance a/b². The conjugate priors of Equation (3) through Equation (4) are chosen for technical convenience. Substantial prior information might require different prior distributions. Finally, μ₀, Σ₀, Σ_θ, γ₀, and λ₀ are fixed hyperparameters.

3.2 Posterior inference

We implement posterior simulation by Gibbs sampling, resampling each of the indicated parameters conditional on the currently imputed values of all other parameters and the data. We did not use analytic forms because:

All priors are only conditionally conjugate. They are not conjugate for the joint posterior distribution;
Sampling one parameter at a time avoids manipulating an excessively large design matrix;
Additionally, data are truncated by 0 and 1, which breaks joint conjugacy. Moreover, the normal linear mixed model is assumed on z, not on y.

Resampling z

Conditional on other parameters, resampling the latent variable z requires truncated normal sampling. From Equations (1) and (2), we find

z_{i j} {\begin{array}{l} \sim N (X_{i j} β + U_{i j} θ_{i}, σ^{2}) I (z_{i j} \leq 0), & if y_{i j} = 0; \\ = y_{i j}, & if 0 < y_{i j} < 1; \\ \sim N (X_{i j} β + U_{i j} θ_{i}, σ^{2}) I (z_{i j} \geq 1), & if y_{i j} = 1; \end{array}

(5)

where X_ij and U_ij are the jth row of matrices X_i and U_i, respectively. Generating random samples from a truncated normal is straightforward. Gelfand et al. (1990) provided an algorithm for sampling from truncated normal distribution.

Resampling β

We proceed as in a standard normal linear regression. Equation (2) can be re-written as

\underset{z_{i}^{★}}{\underset{︸}{z_{i} - U_{i} θ_{i}}} = X_{i} β + e_{i} .

Thus conditional on other parameters, $z_{i}^{★} \sim N (X_{i} β, τ^{- 1})$ . Conditioning on z_i and θ_i, i = 1, …, n, and combining with the conjugate prior in Equation (3), we find p(β|…) = N(μ_β, v_β) with moments $v_{β} = {(\sum_{0}^{- 1} + τ \sum_{i} X_{i}^{'} X_{i})}^{- 1}$ and $μ_{β} = v_{β} (\sum_{0}^{- 1} μ_{0} + τ \sum_{i} X_{i}^{'} z_{i}^{★})$ .

We implement Gibbs sampling posterior simulation by iterating over the complete conditional posterior distributions given in the above expressions, starting with initial values for β, θ_i, i = 1, …, n, and τ. For later reference, we summarize the Gibbs sampling algorithm:

Generate z using Equation (5).
Draw β ~ p(β|…) from the multivariate normal distribution N(μ_β, v_β).
Draw τ ~ p(τ|…) from the inverse gamma conditional posterior distribution.
For i = 1, …, n, under model (1) through Equation (2), θ_i ~ p(θ_i| …) can be easily generated from a normal distribution.

Ergodic averages over the simulated parameter values approximate posterior integrals, including posterior means, posterior predictive distributions, etc. Geweke’s test (Geweke, 1992) could be used to verify the convergence of the simulations.

4. Polya Tree Random Prior for the Random Effect Distribution

The normal assumption on the random effects distribution can be restrictive. It is possible for the distribution of random effects to be multimodal and/or with unpredictable types of skewness. In biomedical data, multimodality frequently arises from patient heterogeneity, presence of outliers, exclusion of unknown but important covariates, alternative biologic mechanisms, etc. Examples of heterogenous populations include adult vs. pediatric populations, invasive vs. non-invasive tumors, strokes arising from bursting blood vessels vs. blocked vessels, and so forth. For accurate accounting of uncertainties and for improved prediction, it is clearly critical to account for such heterogeneity. Also, the nature of the heterogeneity might be of interest in itself, as, for example, in the discovery of new (sub-) types of cancer. We achieve the desired generalization with a nonparametric Bayesian model.

A commonly used technical definition of nonparametric Bayesian models is probability models with infinitely many parameters (Bernardo and Smith, 1994). In other words, a nonparametric Bayesian model is a probability model on a function space. Nonparametric Bayesian models are used to avoid critical dependence on parametric assumptions, to make parametric models more robust, and to define model diagnostics and sensitivity analysis for parametric models by embedding them in a larger encompassing nonparametric model (Müller and Quintana, 2004). Bayesian nonparametric and semiparametric approaches include mixture models (West, 1992), Dirichlet process and Dirichlet process mixture models (Ferguson, 1973; Antoniak, 1974; Escobar and West, 1995; MacEachern and Müller, 1998), and Polya tree priors (Lavine, 1992, 1994). For a recent review of nonparametric Bayesian models, see Walker et al. (1999).

In this section, we will explore the Polya tree model as a prior for the random effects distribution in models (1) and (2). Polya trees were proposed as a generalization of Dirichlet process in Bayesian data analysis by Lavine (1992, 1994). The definition, properties, and construction of Polya trees can be found in Appendix.

A random probability measure G, which is said to have a Polya tree distribution, or a Polya tree prior, with parameter (Π, Inline graphic ), is written as G ~ PT(Π, ). The set Π determines the partition structure of the Polya tree. The parameters α_ε in determine the smoothness of a realization of G and control how quickly the posterior predictive distribution moves from its prior mean to the empirical distribution.

4.1 Posterior predictive simulation

The joint marginal distribution of a sample (X₁, …, X_n) generated from a random distribution with a Polya tree prior has a closed form. The random probability measure G can analytically be integrated out. Suppose $x_{i} \overset{iid}{\sim} G$ , i = 1, …, n and G ~ PT(Π, Inline graphic ). Let g be the density function of G and g₀ be the density function of the centering distribution G₀ such that E(G) = G₀ (See the appendix for the choice of g₀). The marginal joint density of (X₁, …, X_n) is given by

\begin{array}{l} p (x_{1}, \dots, x_{n}) = \int p (x_{1}, \dots, x_{n} ∣ G) d p (G) \\ = \int \prod_{i = 1}^{n} g (x_{i}) d p (G) \\ = f (x_{1}) \prod_{i = 2}^{n} f (x_{i} ∣ x_{1}, \dots, x_{i - 1}), \end{array}

where f(x₁) = g₀(x₁) and

f (x_{i} ∣ x_{1}, \dots, x_{i - 1}) = g_{0} (x_{i}) lim_{M \to \infty} \prod_{m = 2}^{M} \frac{α_{{\underline{ε}}_{m}}^{'} (α_{{\underline{ε}}_{m - 1} 0} + α_{{\underline{ε}}_{m - 1} 1})}{α_{{\underline{ε}}_{m}} (α_{{\underline{ε}}_{m - 1} 0}^{'} + α_{{\underline{ε}}_{m - 1} 1}^{'})},

(6)

where ε_m = ε₁, …, ε_m identifies the level m subset containing x_i, i.e., x_i ∈ B_{ε₁,…,ε_m}. Here $α_{{\underline{ε}}_{m}}^{'}$ is equal to α_{ε_m} plus the number of observations among x₁, …, x_i₋₁ that belong to B_{ε₁,…,ε_m}.

In practice, we can always reduce the right side of Equation (6) to a finite product. Starting at a sufficiently large level M_{x_i}, the sets B_{ε₁,…,ε_m}, m ≥ M_{x_i}, contain no data points x₁, …, x_i₋₁ and thus $α_{{\underline{ε}}_{m}}^{'} = α_{{\underline{ε}}_{m}}$ , reducing the product to M = 2, …, M_{x_i}. Also for any fixed M, the right side of Equation (6) can be approximated by a finite product when M is large.

4.2 A semiparametric model for repeated fractional data

In this section, we improve on certain aspects of the normal linear mixed effect model introduced earlier. In particular, we discuss posterior inference for a model that assumes a Polya tree prior for an unknown random effect distribution. The Polya tree prior is centered around a parametric probability distribution. With this approach, random effects can be directly sampled and inference will be based on the predictive density.

As before, let the observed outcome be y_ij and assume that the distribution of the latent variable z_ij for the jth measurement from ith individual follows models (1) and (2). Given β and θ_i, z_i is normally distributed. The prior specification in a conjugate model for β and τ are the same as before, namely, Equations (3) and (4), respectively.

The random effect model is generalized by assuming a nonparametric random effect prior. That is,

θ_{i} \overset{iid}{\sim} G, G \sim PT (Π, A) .

We center the Polya tree distribution around a normal distribution G₀ with median (mean) 0 and a fixed large variance $σ_{θ}^{2}$ (i.e. $G_{0} = N (0, σ_{θ}^{2})$ ). A fixed partition at level m is generated by taking

B_{ε_{1}, \dots, ε_{m}} = (G_{0}^{- 1} (\frac{j}{2^{m}}), G_{0}^{- 1} (\frac{j + 1}{2^{m}})],

where j = 0, 1, …, 2^m − 1. Inline graphic is assumed as

A = {α_{ε_{1}, \dots, ε_{m}} = {cm}^{2} : ε_{1}, \dots, ε_{m} \in {0, 1}^{m}},

with a pre-specified value of c.

To implement Gibbs sampling posterior simulation, we need to generate from the conditional posterior distribution as in Section 3.2. The relevant full conditional posterior distributions include p(β|z, θ, σ²), p(σ²|z, β, θ), and p(θ_i|z, θ₋_i, β, σ²), where θ₋_i = (θ₁, …, θ_i₋₁, θ_i₊₁, …, θ_n), i = 1, …, n. Samples can be obtained using an MCMC algorithm and in particular a Metropolis-Hastings within Gibbs methods.

Conditional on β and the θ_i’s, z_ij can be generated by Equation (5) as before. Similarly, β can be sampled from N(μ_β, v_β) given z, θ = (θ₁, …, θ_n), and σ². The full conditional posterior distribution for τ is easily found to be an inverse gamma distribution, allowing random variable generation.

Now we describe how to update θ. Let z_i = (z_ij, j = 1, …, n_i). The likelihood for θ_i, given z, σ², β, and θ₋_i, is given by

l (z_{i} ∣ β, θ_{i}, θ_{- i}, σ^{2}) = \prod_{j = 1}^{n_{i}} N (z_{i j} ∣ β, θ_{i}, σ^{2}) .

The conditional posterior distribution for θ_i can be written, up to a constant of proportionality, as

p (θ_{i} ∣ z, θ_{- i}, β, σ^{2}) \propto l (z_{i} ∣ β, θ_{i}, σ^{2}) p (θ_{i} ∣ θ_{- i}),

(7)

for each i = 1, …, n. This motives the following Metropolis-Hastings transition probability.

A proposed value $θ_{i}^{(p)}$ is generated from the prior predictive distribution p(θ_i|θ₋_i). The prior predictive can be written as

θ_{i}^{(p)} \sim p (θ_{i} ∣ θ_{- i}) = \int G (θ^{(p)}) d p (G ∣ θ_{- i}) with p (G ∣ θ_{- i}) = PT (A^{★}, Π),

using the posterior PT, conditional on θ₋_i, with $A^{★} = {α_{ε}^{★} = α_{ε} + n_{ε}}$ . Here n_ε = Σ_h_≠_i I(θ_h ∈ B_ε) counts the number of θ_h, h ≠ i, that fall into the partitioning subset B_ε.

We proceed as follows to generate $θ_{i}^{(p)}$ . Recall the definition of Y_ε as the random splitting probabilities in the constructive definition of the PT. Starting with Y₀, we generate the sequence of random probabilities for G ~ PT( Inline graphic , Π), and set ε₀ = 0 with probability Y₀, etc. We continue generating random probabilities Y_{ε₀,…,ε_m} and ε_m until we reach a level m with $α_{ε_{0}, \dots, ε_{m}}^{★} = α_{ε_{0}, \dots, ε_{m}}$ , i.e., until we reach a partitioning subset B_ε_0,…,_εm that does not contain any of the currently imputed random effects θ_h, h ≠ i. Finally, we generate $θ_{i}^{(p)}$ from the base measure G₀ restricted to this set B_ε_0,…,_εm. This generates $θ_{i}^{(p)} \sim p (θ_{i} ∣ θ_{- i})$ .

The Markov chain moves to the candidate point $θ_{i}^{(p)}$ with probability

min {1, \frac{l (z_{i} ∣ β, θ_{i}^{(p)}, σ^{2})}{l (z_{i} ∣ β, θ_{i}^{(c)}, σ^{2})}},

where $θ_{i}^{(c)}$ is the current sample.

5. A Simulation Study

In this section, we fit a mixed fractional data model to simulated data. Normal and Polya tree priors are considered as priors of the distributions of random effects, and we compare the resulting inference. We simulated n = 100 random effects from

θ_{i} \overset{iid}{\sim} 0.5 N (- 0.2, {0.15}^{2}) + 0.5 N (0.2, {0.15}^{2}), i = 1, \dots, n .

We introduce one covariate, $X_{i j} \overset{iid}{\sim} N (0.5, {0.3}^{2})$ , i = 1, …, n, j = 1, 2. The simulation truth for the regression coefficient was set at β = (β₀, β₁) = (0.8, −0.6), and the random residual e was generated from N(0, 0.1²). Let z_ij = β₀ + X_ijβ₁ + θ_i + e_ij, i = 1, …, n, j = 1, 2. The observed fractional data are recorded as y_ij = 1 if z_ij ≥ 1; y_ij = z_ij if 0 < z_ij < 1; and y_ij = 0 if z_ij ≤ 0.

The histogram of the observed data y is displayed in Figure 1. From this figure, note the point masses at the boundaries. We analyze the data using model (1) through Equations (3) and (4), with z_ij = β₀ + X_ijβ₁ + θ_i + e_ij, and random effects $θ_{i} \overset{iid}{\sim} G$ , and residuals $e_{i j} \overset{iid}{\sim} N (0, σ^{2})$ , i = 1, …, n, j = 1, 2.

Assuming a flat normal prior on β = (β₀, β₁), and β ~ N₂((0, 0), 100I₂), the random effects distribution is modeled by the following two different approaches.

The distribution of the random effect θ is first modeled with normal distribution, that is, $G (θ) = N (0, σ_{θ}^{2})$ with a hyper-prior $1 / σ_{θ}^{2} \sim Ga (0.01, 0.01)$ . We carry out the posterior simulation as described in Section 3.2. The MCMC simulation was repeated 20,000 times. The first 5,000 iterations were discarded as burn-in. The point estimates and 95% posterior intervals of β and selected random effects are presented in Table 1. Figure 2(a) shows the posterior predictive density for y_{new, 1} under normal random effect prior when X_{new, 1} takes the average of all observed X’s. It shows normal appearance but does not capture the bimodal feature in the data, which suggests that a normal random effects prior is not appropriate in this example.

Table 1.

Posterior 2.5%th, 50%th, and 97.5%th percentiles for selected parameters under semiparametric and the parametric normal random effects model. β₀ is the intercept, β₁ is the fixed effect, and σ is the residual standard deviation. θ_i is random effect for ith subject.

Parameter	Normal prior	Polya tree prior (c = 0.1)
β₀	0.830(0.768, 0.890)	0.847(0.723, 0.943)
β₁	−0596(−0.659, −0.534)	−0.599(−0.661, −0.533)
σ	0.093(0.080, 0.108)	0.093(0.081, 0.109)
θ₆₉	0.272(0.133, 0.401)	0.259(0.085, 0.396)
θ₉₉	−0.200(−0.338, −0.051)	−0.219(−0.378, −0.039)
σ_θ	0.247(0.194, 0.350)

Open in a new tab

(a) Predictive density of y_{new, 1} under normal random effect distribution, (b) Predictive density of y_new,1 under PT prior for random effects distribution (note the bimodal feature) and (c) Estimated random effect distribution Ḡ = E(G|z) (dashed line) vs. the simulation truth (solid line)

We specify a Polya tree prior for the random effect distribution to allow for multimodality. Priors for other parameters remain unchanged.

We construct the Polya tree prior with a Polya tree centered around N(0, 5²). At the mth partition level, α_{ε₁,…,ε_m} is taken to be cm², with c fixed at 0.1. Posterior simulation was run for 25,000 iterations, with the first 5,000 being discarded as burn-in period. Every 20th iteration was saved. Posterior summary statistics for the estimated parameters are presented in Table 1. The point estimates for β₀ and β₁ are fairly close to the simulation truth. Figure 2(b) shows the posterior predictive density for y_{new, 1} under the Polya tree prior for the random effects distribution. It demonstrates the bimodal feature in the data. The posterior predictive density of θ_n₊₁ is plotted in Figure 2(c). It shows that inference on G captures the bimodal nature of random effect distribution very well.

As expected, point estimates of the fixed effects are similar for both models. Posterior intervals for β₁ are wider under the Polya tree priors. The increased posterior uncertainty under the Polya tree random effects model is due to the additional randomness in the later model.

The posterior medians for two θ_i, shown in Table 1, are slightly different under the two models. This is to be expected. The semiparametric model allows multiple shrinkage. Under the normal model, all estimates for θ_j are shrunk towards the common mean, E(θ_j) = 0.

6. Application

In cancer studies, some characteristics of the tumor microenvironment are often assessed by histologic evaluation of tumor biopsies. These include oxygenation and proliferation, two important features of the tumor environment that may influence the response to treatment. One approach to measure tumor oxygenation is to measure bound nitroimidazoles based on information from tumor biopsies. The accuracy of biopsy-based methods, however, is related to how precisely the information derived from the biopsies represents the overall tumor microenvironment. Thrall et al. (1997) studied binding of CCI-103F and pimonidazole, both 2-nitroimidazole compounds, in canine solid tumors to assess pretreatment oxygenation and changes in oxygenation during irradiation.

The study reports data for n = 9 dogs, each with a primary solid tumor. Twenty-four hours before the first radiation treatment, CCI-103F was administered intravenously. Immediately prior to the first radiation treatment, up to eight biopsies were obtained from different geographic regions of the tumor. One to four sections from each biopsy sample were placed on glass slides. Slides from four out of eight biopsies were measured 20 minutes after injecting the dye. Slides from the rest of biopsies in the same tumor were measured 24 hours after injecting the dye. The volume fraction of hypoxic tumor tissue was reported by measuring the CCI-103F labelled area in each slide. Responses are recorded as

y = \frac{(CCI - 103 F labelled counts)}{(CCI - 103 F labelled counts) + (unlabelled counts)} .

An important feature of the data is that the volume fraction of hypoxic tumor tissue can be 0 or 1 in some slides. Raw cell counts are not reported, only the fraction y, resulting in the fractional data format discussed earlier. The questions of interest include: “What is the average fraction of hypoxic cells in a tumor”; “How variable is the fraction within a tumor and between dogs?”; and “How is the measurement of hypoxic cells in the same tumor affected by different measuring times?”.

Figure 3 presents the average fraction of hypoxic cells in each biopsy for each dog. The x-axis denotes the biopsy label. Biopsies labelled as 1, 2, 3, and 4 were measured 20 minutes after injecting the dye. The fraction of hypoxic cells from biopsies labelled as 5, 6, 7, and 8 were measured 24 hours after injecting the dye.

We use conventional Bayesian and Bayesian semiparametric random effect models to fit the above fractional data. Let y_ijk denote the fraction of hypoxic cells in kth slide from jth biopsy of ith dog. Time t_ij is the measurement time for the jth biopsy from the ith dog, v_i is the volume of tumor from the ith dog, d_i denotes a random dog effect and b_j₍_i₎ a random biopsy effect nested within the ith dog. To include point mass probabilities for y_ijk = 0 and y_ijk = 1, we introduce a latent variable z_ijk into the model. As in Equation (1):

y_{ijk} = {\begin{array}{l} 0, & if z_{ijk} \leq 0; \\ z_{ijk}, & if 0 < z_{ijk} < 1; \\ 1, & if z_{ijk} \geq 1. \end{array}

The mixed-effects model (2) becomes z_ijk = β₀+β₁t_ij +β₂v_i+d_i+b_j₍_i₎+ ε_ijk, with residuals $ε_{ijk} \overset{iid}{\sim} N (0, σ^{2})$ , i = 1, …, 9, j = 1, …, 8, k = 1, …, 4.

The model is completed with conjugate priors: β = (β₀, β₁, β₂) ~ N(0, 4I₃), $d_{i} \overset{iid}{\sim} N (0, σ_{d}^{2})$ , and a random effect distribution for the biopsy effects: $b_{j (i)} \overset{iid}{\sim} F$ . As hyper priors, we assume $1 / σ_{d}^{2} \sim Ga (0.01, 0.01)$ and 1/σ² ~ Ga(0.01, 0.01).

Because of the small number of dogs, the distribution of random dog effect is assumed to be normal. Two types of random effects distributions are considered for the distribution of biopsy effect. First, a normal prior $F = N (0, σ_{b}^{2})$ is specified. From other studies, we expect more variability within tumors than between subject-specific averages. Then the parametric prior assumption is relaxed by specifying a nonparametric Polya tree prior, i.e., F ~ PT(Π, Inline graphic ).

6.1 Normal prior for the biopsy effect

Assume $b_{j (i)} \overset{iid}{\sim} N (0, σ_{b}^{2})$ and $1 / σ_{b}^{2} \sim Ga (0.01, 0.01)$ . We ran the Gibbs sampler over 35,000 iterations, with the first 5,000 being discarded as a burn-in period. In addition, because of high autocorrelation, only every 12th iteration was saved and the rest discarded, leading to a total Monte Carlo sample size of 2,500.

The histogram of posterior mean estimated biopsy effects is presented in Figure 4. The distribution of biopsy effects does not show normal appearance. The non-normality may be due to heterogeneity across dogs, or biopsies, or other covariates that are not recorded.

Histogram of mean biopsy effects under the parametric model.

Posterior estimates of model parameters are summarized in Table 2. The results indicate that the proportion of hypoxic cells is strongly related to time but not significantly associated with tumor volume.

Table 2.

Posterior median and central 95% posterior intervals for various parameters from the parametric normal model and Polya tree model. Here, β₀ is the intercept, β₁ is the slope over time, β₂ is the volume of tumor effect, σ is the error standard deviation, σ_d is the standard deviation of dog effect, and σ_b is the standard deviation of biopsy.

Parameter	Normal model	Polya tree model
β₀	0.303(0.127, 0.507)	0.347(0.173, 0.553)
β₁	0.113(0.019, 0.212)	0.054(−0.051, 0.169)
β₂	−0.001(−0.003, 0.003)	−0.001(−0.003, 0.002)
σ	0.093(0.085, 0.103)	0.093(0.085, 0.102)
σ_d	0.252(0.154, 0.527)	0.263(0.158, 0.510)
σ_b	0.197(0.163, 0.241)

Open in a new tab

6.2 Polya tree prior for the biopsy effect

Suppose the biopsy effect arises from a random probability measure F, and we assume a Polya tree prior for F. We use a Polya tree with centering distribution F₀ = N(0, 3²). We place the same priors on fixed effects and random dog effects as before. The parameters of the Polya tree prior are fixed as follows. The partitioning points to define Π are chosen to be the percentiles of F₀, and c is set to 0.1.

Posterior simulation was run for 38,000 iterations, with the first 4,000 being discarded as burn-in period. Every 20th iteration was saved. Convergence of the posterior distributions of parameters was assessed using Geweke’s method. The parameter estimates are presented in Table 2. Note that the time effect is no longer significant under the Polya tree prior.

The posterior mean of F is shown in Figure 5, which indicates high between-biopsy variability. The posterior mean is centered around 0, as we would expect.

Estimated random biopsy effect distribution F… = E(F|z).

7. Discussion

We have proposed Bayesian nonparametric modeling for a class of important inference problems arising in biomedical data analysis. We developed a model and corresponding inference for repeated fractional data model. Proposed techniques included specifying a nonparametric Polya tree prior for the random effects distribution. We provided appropriate posterior simulation schemes.

In the fractional data model, Polya tree priors avoid assuming a specific parametric distribution for random effects. This allows us to estimate the distribution of random effects, which provides insight into population heterogeneity and honestly accounts for related uncertainties.

Polya trees have some practical limitations, however. First, the resulting random probability measure depends on the specific partition sequence adopted. Second, using a fixed partitioning sequence Π results in discontinuities in the predictive distributions. Third, implementations for higher dimensional distributions require extensive housekeeping and are impractical.

Paddock et al. (2003) and Hanson and Johnson (2002) introduced randomized Polya trees to mitigate problems related to the discontinuities. The idea is based on dyadic rational partitions, but instead of taking the nominal half-point, Paddock et al. (2003) randomly chose a cutoff centered around the cutoff point. This construction is shown to mitigate the first two limitations noted above. Hanson and Johnson (2002) consider a mixture with respect to a hyperparameter that defines the partitioning tree. A prior was placed on the spread of the prior distribution of G, which can be learned from the data. Doing so eliminates the need to choose a centering distribution with adequate spread in ad hoc way.

In the fractional data discussed in the paper, the fractional response outcome was measured at two time points. In other applications the fractional measurements might be obtained from the same biopsy or sample at more than two time points, or the measurement might be made at several time points per cycle for multiple cycles for each subject. The proposed model can be generalized to incorporate the additional random effects, such as nested cycle effect, which is a topic of future research.

Acknowledgments

This work was supported by grant CA075981 from the National Cancer Institute.

Appendix

A.1 Definition and Basic Properties of Polya Trees

Let E = [0, 1],E⁰ = ∅, E^m be the m-fold product E × E × ··· × E, $E^{★} = \cup_{m = 0}^{\infty} E^{m}$ and E^N be the set of infinite sequences of elements of E. Let Ω be a separable measurable space, π₀ = Ω and Π = {π_m; m = 0, 1, …} be a separating binary tree of partitions of Ω; that is, let π₀, π₁, … be a sequence of partitions such that $\cup_{m = 0}^{\infty} π_{m}$ generates the measurable sets and such that every B ∈ π_m₊₁ is obtained by splitting some B′∈ π_m into two subsets. Let B₀ = Ω and, for all ε = ε₁, …, ε_m ∈E^★, let B_ε₀ and B_ε₁ be the two subsets into which B_ε is split.

Definition 7.1 [Lavine (1992)]

A random probability measure G is said to have a Polya tree distribution, or a Polya tree prior, with parameter (Π, Inline graphic ), written as G ~ PT(Π, ), if there exist non-negative numbers = (α₀, α₁, α₀₀, …) and random variables = (Y₀, Y₁, Y₀₀, …) such that all random variables in are independent; for every ε, (Y_ε₀, Y_ε₁) ~ Beta(α_ε₀, α_ε₁); and for every m = 1, 2, …, and every ε = ε₁, …, ε_m,

G (B_{ε_{1}, \dots, ε_{m}}) = (\prod_{j = 1, ε_{j} = 0}^{m} Y_{ε_{1}, \dots, ε_{j - 1} 0}) (\prod_{j = 1, ε_{j} = 1}^{m} (1 - Y_{ε_{1}, \dots, ε_{j - 1} 0}))

where the first term, i.e. for j = 1, is interpreted as Y₀ or 1 − Y₀.

The random variable Y_ε₀ is the conditional probability of partition subset B_ε₀ given B_ε. For instance, for m = 2, G(B₀₀) = Y₀Y₀₀, G(B₀₁) = Y₀(1−Y₀₀), G(B₁₀) = (1−Y₀)Y₁₀, and G(B₁₁) = (1 −Y₀)(1 −Y₁₀). The set Π determines the partition structure of the Polya tree. The parameters α_ε in Inline graphic determine the smoothness of a realization of G and control how quickly the posterior predictive distribution moves from its prior mean to the empirical distribution.

Several properties facilitate the use of the Polya tree for nonparametric Bayesian inference. Polya trees are conjugate under i.i.d sampling. Assume p(G|Π, Inline graphic ) = PT(Π, ) and $x_{i} \overset{iid}{\sim} G$ , i = 1, …, n. Then p(G|x, Π, ) = PT(Π, ). The parameters α_ε are updated by adding the count of points in B_ε, i.e., $A^{★} = {α_{ε}^{★} = α_{ε} + n_{ε} : ε \in E^{★}}$ and n_ε = the number of x_i’s in B_ε. In words, the posterior distribution of G under i.i.d sampling is also Polya tree with the same fixed partition sequence. The partitioning probability (Y_ε) is generated from beta distribution with updated parameters $α_{ε}^{★}$ , where $α_{ε}^{★}$ is equal to α_ε plus the number of x₁, …, x_n in subset B_ε.

The Polya tree includes the Dirichlet process as a special case. A Polya tree is a Dirichlet process if α_ε = α_ε₀ + α_ε₁ for every ε ∈ E^★ (Ferguson, 1974).

The parameters of a Polya tree can be chosen such that G is absolutely continuous with probability 1. In particular, any α_{ε₁,…,ε_m} = ρ(m) such that $\sum_{m = 1}^{\infty} ρ {(m)}^{- 1} < \infty$ guarantees G to be absolutely continuous. For example, Walker and Mallick (1999) and Paddock et al. (2003) consider α_{ε₁,…,ε_m} = cm², where c > 0.

A.2. Construction of a polya tree

Two parameters specify a Polya tree prior, the partition Π and the set Inline graphic . Through choosing Π, we may center the Polya tree prior around a particular continuous distribution G₀. To do so, we take the partition points to align with percentiles of G₀. For instance, if $B_{0} = (- \infty, G_{0}^{- 1} (\frac{1}{2})]$ (hence $B_{1} = (G_{0}^{- 1} (\frac{1}{2}), \infty)), B_{00} = (- \infty, G_{0}^{- 1} (\frac{1}{4})]$ , … and α₀ = α₁, α₀₀ = α₀₁, …, then since G(B₀) = Y₀ ~ Be(α₀, α₁), $E (G (B_{0})) = \frac{1}{2} = G_{0} (B_{0})$ . Also, e.g., G(B₀₀) = Y₀Y₀₀ implies $E (G (B_{00})) = \frac{1}{4} = G_{0} (B_{00})$ and for any B ∈ Π, E(G(B)) = G₀(B). We need not confine ourselves to quartiles of the form $G_{0}^{- 1} (\frac{j}{2^{r}})$ .

Figure 6 shows an example of the construction of a Polya tree prior on (0, 1] = Ω (Ferguson, 1974). At the top level of the tree, Ω is split in half at the dyadic rational, 0.5. Thus B₀ = (0, 0.5], B₁ = (0.5, 1] and Ω = B₀ ∪ B₁. At the second level, B₀ and B₁ are again split at 0.25 and 0.75, respectively, which result in subsets B₀₀ = (0, 0.25], B₀₁ = (0.25, 0.5], B₁₀ = (0.5, 0.75], and B₁₁ = (0.75, 1], so on.

Construction of a Polya tree prior on (0,1] (Ferguson, 1974)

One also has to choose the parameters in Inline graphic . The parameters α_ε in control how quickly the updated predictive distribution moves from the centering distribution G₀ to the empirical distribution. If the α_ε’s are large, then the distribution of x_n₊₁|x₁, …, x_n is close to G₀. If the α_ε’s are small, then the distribution of x_n₊₁|x₁, …, x_n is close to the empirical distribution function. The parameters α_ε also express the belief about the smoothness of G. Ferguson (1974) provides conditions on Inline graphic which yield discrete, continuous singular, and absolutely continuous distributions with probability one. For instance, for level m = 1, 2, …, α_{ε₁, …, ε_m} = 2⁻^m implies a Dirichlet process, α_ε = 1 yields a random probability G of a type considered by Dubins and Freedman (1966) and shown to be continuous singular with probability one, and α_{ε₁…ε_m} = m² implies an absolutely continuous distribution with probability 1. Walker and Mallick (1999) and Paddock et al. (2003) considered α_{ε₁, …, ε_m} = cm², where c > 0.

Therefore, through selection of Inline graphic and G₀, one can center the Polya tree prior around G₀ arbitrarily close, as determined by , in a manner analogous to the specification of baseline measure and precision parameter in the Dirichlet process. can be thought of as a precision parameter and G₀ as a base measure.

References

Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association. 1993;88:669–680. [Google Scholar]
Antoniak CE. Mixtures of Dirichlet process with applications to nonparametric problems. Annals of Statistics. 1974;2:1152–1174. [Google Scholar]
Bernardo JM, Smith AFM. Bayesian theory. Wiley; New York: 1994. [Google Scholar]
Dubins LE, Freedman DA. Random distribution functions. Proceeding of Fifth Berkeley Symposium Mathematical Statistics and Probability. 1966;3:183–214. [Google Scholar]
Escobar MD, West M. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association. 1995;90:577–580. [Google Scholar]
Ferguson TS. A Bayesian analysis of some nonparametric problems. Annals of Statistics. 1973;1:209–230. [Google Scholar]
Ferguson TS. Prior distributions on spaces of probability measures. Annals of Statistics. 1974;2:615–629. [Google Scholar]
Geweke J. Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments (with discussion) Bayesian Statistics. 1992;4:169–193. [Google Scholar]
Gelfand AE, Hills SE, Racine-Poon A, Smith AFM. Illustration of Bayesian inference in normal data model using Gibbs sampling. Journal of the American Statistical Association. 1990;85:972–985. [Google Scholar]
Gilks WR, Wang CC, Yvonnet B, Coursaget P. Random-effects models for longitudinal data using Gibbs sampling. Biometrics. 1993;49:441–453. [PubMed] [Google Scholar]
Goldstein H. Multilevel mixed linear model analysis using iterative generalized lease squares. Biometrika. 1986;73:43–56. [Google Scholar]
Hanson T, Johnson WO. Modelling regression error with a mixture of Polya trees. Journal of the American Statistical Association. 2002;97:1020–1033. [Google Scholar]
Kleinman KP, Ibrahim JG. A semiparametric Bayesian approach to the random effects model. Biometrics. 1998a;54:921–938. [PubMed] [Google Scholar]
Laird NM, Ware JM. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
Lavine M. Some aspects of Polya tree distributions for statistical modelling. Annals of Statistics. 1992;20:1222–1235. [Google Scholar]
Lavine M. More aspects of Polya tree distributions for statistical modelling. Annals of Statistics. 1994;22:1161–1176. [Google Scholar]
Longford NT. A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects. Biometrika. 1987;74:817–827. [Google Scholar]
MacEachern SN, Müller P. Estimating mixture of Dirichlet process models. Journal of Computational and Graphical Statistics. 1998;7:223–338. [Google Scholar]
Müller P, Quintana F. Nonparametric Bayesian data analysis. Statistical Science. 2004;19:95–110. [Google Scholar]
Paddock S, Ruggeri F, Lavine M, West M. Randomised Polya tree models for nonparametric bayesian Inference. Statistica Sinica. 2003;13:443–460. [Google Scholar]
Ridout MS, Demetrio CGB, Hinde JP. Models for counts data with many zeros. International Biometric Conference; 1998. pp. 179–192. [Google Scholar]
Thrall DE, Rosner GL, Azuma C, McEntee MC, Raleigh JA. Hypoxia marker labelling in tumor biopsies: quantification of labeling variation and criteria for biopsy sectioning. Radiotherapy and Oncology. 1997;44:171–176. doi: 10.1016/s0167-8140(97)01931-2. [DOI] [PubMed] [Google Scholar]
Walker SG, Mallick BK. Semiparametric accelerated life time model. Biometrics. 1999;55:477–483. doi: 10.1111/j.0006-341x.1999.00477.x. [DOI] [PubMed] [Google Scholar]
Walker SG, Damien P, Laud PW, Smith AFM. Bayesian nonparametric inference for random distributions and related functions. Journal of The Royal Statistical Society Series B – Statistical Methodology. 1999;61:485–509. [Google Scholar]
West M. ISDS discussion paper 92-A03. Duke University; USA: 1992. Hyperparameter estimation in Dirichlet process mixture. [Google Scholar]

[R1] Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association. 1993;88:669–680. [Google Scholar]

[R2] Antoniak CE. Mixtures of Dirichlet process with applications to nonparametric problems. Annals of Statistics. 1974;2:1152–1174. [Google Scholar]

[R3] Bernardo JM, Smith AFM. Bayesian theory. Wiley; New York: 1994. [Google Scholar]

[R4] Dubins LE, Freedman DA. Random distribution functions. Proceeding of Fifth Berkeley Symposium Mathematical Statistics and Probability. 1966;3:183–214. [Google Scholar]

[R5] Escobar MD, West M. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association. 1995;90:577–580. [Google Scholar]

[R6] Ferguson TS. A Bayesian analysis of some nonparametric problems. Annals of Statistics. 1973;1:209–230. [Google Scholar]

[R7] Ferguson TS. Prior distributions on spaces of probability measures. Annals of Statistics. 1974;2:615–629. [Google Scholar]

[R8] Geweke J. Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments (with discussion) Bayesian Statistics. 1992;4:169–193. [Google Scholar]

[R9] Gelfand AE, Hills SE, Racine-Poon A, Smith AFM. Illustration of Bayesian inference in normal data model using Gibbs sampling. Journal of the American Statistical Association. 1990;85:972–985. [Google Scholar]

[R10] Gilks WR, Wang CC, Yvonnet B, Coursaget P. Random-effects models for longitudinal data using Gibbs sampling. Biometrics. 1993;49:441–453. [PubMed] [Google Scholar]

[R11] Goldstein H. Multilevel mixed linear model analysis using iterative generalized lease squares. Biometrika. 1986;73:43–56. [Google Scholar]

[R12] Hanson T, Johnson WO. Modelling regression error with a mixture of Polya trees. Journal of the American Statistical Association. 2002;97:1020–1033. [Google Scholar]

[R13] Kleinman KP, Ibrahim JG. A semiparametric Bayesian approach to the random effects model. Biometrics. 1998a;54:921–938. [PubMed] [Google Scholar]

[R14] Laird NM, Ware JM. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]

[R15] Lavine M. Some aspects of Polya tree distributions for statistical modelling. Annals of Statistics. 1992;20:1222–1235. [Google Scholar]

[R16] Lavine M. More aspects of Polya tree distributions for statistical modelling. Annals of Statistics. 1994;22:1161–1176. [Google Scholar]

[R17] Longford NT. A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects. Biometrika. 1987;74:817–827. [Google Scholar]

[R18] MacEachern SN, Müller P. Estimating mixture of Dirichlet process models. Journal of Computational and Graphical Statistics. 1998;7:223–338. [Google Scholar]

[R19] Müller P, Quintana F. Nonparametric Bayesian data analysis. Statistical Science. 2004;19:95–110. [Google Scholar]

[R20] Paddock S, Ruggeri F, Lavine M, West M. Randomised Polya tree models for nonparametric bayesian Inference. Statistica Sinica. 2003;13:443–460. [Google Scholar]

[R21] Ridout MS, Demetrio CGB, Hinde JP. Models for counts data with many zeros. International Biometric Conference; 1998. pp. 179–192. [Google Scholar]

[R22] Thrall DE, Rosner GL, Azuma C, McEntee MC, Raleigh JA. Hypoxia marker labelling in tumor biopsies: quantification of labeling variation and criteria for biopsy sectioning. Radiotherapy and Oncology. 1997;44:171–176. doi: 10.1016/s0167-8140(97)01931-2. [DOI] [PubMed] [Google Scholar]

[R23] Walker SG, Mallick BK. Semiparametric accelerated life time model. Biometrics. 1999;55:477–483. doi: 10.1111/j.0006-341x.1999.00477.x. [DOI] [PubMed] [Google Scholar]

[R24] Walker SG, Damien P, Laud PW, Smith AFM. Bayesian nonparametric inference for random distributions and related functions. Journal of The Royal Statistical Society Series B – Statistical Methodology. 1999;61:485–509. [Google Scholar]

[R25] West M. ISDS discussion paper 92-A03. Duke University; USA: 1992. Hyperparameter estimation in Dirichlet process mixture. [Google Scholar]

PERMALINK

Semiparametric Bayesian inference for repeated fractional measurement data

Ying Yang

Peter Müller

Gary L Rosner

Abstract

1. Introduction

2. Model Formulation on Fractional Data