Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Mar 6.
Published in final edited form as: Biometrika. 2008 Nov 3;95(4):859–874. doi: 10.1093/biomet/asn043

Bayesian nonparametric inference on stochastic ordering

DAVID B DUNSON 1, SHYAMAL D PEDDADA 1
PMCID: PMC7059979  NIHMSID: NIHMS156945  PMID: 32148335

Summary

This article considers Bayesian inference about collections of unknown distributions subject to a partial stochastic ordering. To address problems in testing of equalities between groups and estimation of group-specific distributions, we propose classes of restricted dependent Dirichlet process priors. These priors have full support in the space of stochastically ordered distributions, and can be used for collections of unknown mixture distributions to obtain a flexible class of mixture models. Theoretical properties are discussed, efficient methods are developed for posterior computation using Markov chain Monte Carlo, and the methods are illustrated using data from a study of DNA damage and repair.

Keywords: Dependent Dirichlet process, Hypothesis testing, Mixture model, Nonparametric Bayes, Order restriction

1. Introduction

Our focus is inference on K group-specific distributions. For example, in toxicology studies, the groups may correspond to different doses of a potentially adverse chemical exposure. In such settings, it is of interest to assess whether or not the response distribution changes across groups, while also estimating the group-specific distributions. Although parametric assumptions may be difficult to justify, it is common to have prior knowledge that the magnitude of a response for a particular experimental unit would not decrease if that unit had been exposed to a higher dose. This implies stochastic ordering in the response distributions.

Focusing on the two-sample case, Arjas & Gasbarra (1996) proposed nonparametric Bayes methods for ordered hazard functions, while Gelfand & Kottas (2001) induced priors on stochastically ordered distributions through products of independent Dirichlet process (Ferguson, 1973, 1974) components. More recently, Hoff (2003b) developed general methods for estimating probability measures constrained to belong to a convex set, considering applications to mean, mode, quantile and stochastic ordering constraints. The problem of Bayesian estimation of probability measures subject to a partial stochastic order was further considered by Hoff (2003a), relying on the theory in Hoff (2003b) to develop latent variable and rejection sampling methods for posterior computation.

It can be difficult to implement these methods routinely, particularly for moderate to large K. In addition, interest often focuses not only on estimation of the group specific distributions but also on testing hypotheses of equalities between groups against stochastically ordered alternatives. There has been surprisingly little work on nonparametric Bayes hypothesis testing and model selection: Gutierrez-Pena & Walker (2005) imbed parametric model selection problems within a Bayesian nonparametric framework; Berger & Guglielmi (2001) test the fit of a parametric model through comparison to a nonparametric alternative; Dass & Lee (2004) study consistency of Bayes factors for testing point nulls versus nonparametric alternatives; and Basu & Chib (2003) develop methods for calculating Bayes factors for comparing Dirichlet process mixture models.

There has been recent interest in Bayesian methods for unconstrained collections of probability measures. Much of this work relies on extending the Dirichlet process. Suppose P is an unknown probability measure on (X,B(X)), with X a Borel subset of Euclidean space and B(X) the Borel σ-algebra of subsets of X. Then P ~ DP(αP0) denotes that P is assigned a Dirichlet process prior with precision α and base measure P0. Sethuraman (1994) showed that this is equivalent to the stick-breaking representation:

P()=h=1πhδΘh(),πh=Vhl<h(1Vl), (1)

where Vh ~ Be(1, α) and Θh ~ P0, independently for h = 1, …, ∞. Here, δΘ is a probability measure concentrated at the atom Θ, V = {Vh, h = 1, …, ∞} is an infinite sequence of stick-breaking weights, and Θ = {Θh, h = 1, …, ∞} is an infinite sequence of atoms, with V and Θ mutually independent.

In order to generalize the Dirichlet process to place a prior on a collection of probability measures (P1, …, PK), MacEachern (1999, 2001) proposed a dependent Dirichlet process, which represents each Pk as in (1), but incorporates dependence through dependent stick-breaking weights and/or dependent atoms. De Iorio et al. (2004) used the dependent Dirichlet process to create an analysis-of-variance-like dependence structure for random measures, assuming fixed stick-breaking weights but dependent atoms. Gelfand et al. (2005) developed a related approach for spatially-dependent probability measures. More flexible methods that also allow the stick-breaking weights to vary, with some computational expense, have been proposed by Griffin & Steel (2006) and Duan et al. (2007). Alternative strategies for modelling dependent collections of random measures through convex combinations of independent Dirichlet processes have been proposed by Müller et al. (2004), Pennell & Dunson (2006) and Dunson et al. (2007).

These methods are appealing in allowing flexible borrowing of information across groups. Our contribution is the development of a framework for testing of equalities in distributions between groups against stochastically ordered alternatives, while also allowing estimation of smooth, stochastically ordered densities. Hoff (2003a) assumes the distributions are discrete, noting that Dirichlet process mixtures of parametric models can be used to address this problem, but without developing an approach for inference in such a case.

2. Stochastically ordered random probability measures

2·1. Formulation and background

Let (P1,,PK)PK, with PK the set of K × 1 collections of probability measures on (X,B(X)). In addition, define the following convex subset of PK:

CE={(P1,,PK)PK:PiPj for all (i,j)E},

where E ⊂ (1, …, K)2 is a prespecified K × K matrix defining a partial ordering. Here, PiPj if and only if Pi(x, ∞) ≤ Pj(x, ∞) for all x, so that Pj is stochastically larger than Pi. The collections of probability measures belonging to CE satisfy the partial ordering defined by E.

As shown by Hoff (2003a,b), CE is a weakly closed convex set with extreme points {(δs1,,δsK):sSE}, where SE={(s1,,sK)XK:sisj for all (i,j)E}. Using a corollary to Choquet’s theorem, Hoff (2003a,b) shows that every (P1, …, PK) ∈ CE can be represented as a mixture over the extreme points. Hence, by placing an unconstrained prior on the mixing measure, one can obtain a prior for collections of stochastically ordered random measures. Let x = (x1, …, xn)′ denote the observed data, with xi~Pai, where ai ∈ {1, …, K} is a group index. Then Hoff (2003a) induces a prior on (P1, …, PK) with weak support on CE by letting xi=si,ai, with si ~ Q, where Q ~ DP(αQ0) and Q0 is a K-variate Borel probability measure on SE.

2·2. Restricted dependent Dirichlet process

As a direct consequence of applying the Sethuraman (1994) formulation to Q, we obtain the following specification for (P1, …, PK):

Pk()=h=1πhδΘhk(),πh=Vhl<h(1Vl),k=1,,K, (2)

where Vh ~ Be(1, α) and Θh = (Θh1, …, ΘhK)′ ~ Q0, for h = 1, …, ∞, with V={Vh}h=1 and Θ={Θh}h=1 mutually independent sequences of stick-breaking random variables and atoms, respectively. We refer to (2) as a restricted dependent Dirichlet process, since the prior implies a dependent Dirichlet process for the collection (P1, …, PK), with restrictions incorporated through the atoms.

Although the prior on P = (P1, …, PK) is equivalent to that induced by the Hoff (2003a) latent variable specification, so that his theoretical results apply, the reparameterization in (2) allows one to obtain additional insight into properties and to develop methods for hypothesis testing and posterior computation, as will be clear in §3 and 4.

In considering choice of Q0 in (2), it is important to keep in mind that, unless Q0 is chosen to assign probability mass to boundaries of SE, pr(Θhi < Θhj, h = 1, …, ∞,for all (i, j) ∈ E) = 1. The strict order restriction on the atoms for the different groups can lead to a tendency to overestimate group differences, particularly when the true difference is small, sample sizes are small to moderate, and the number of groups is moderate to large. Similar issues arise in much simpler settings involving estimation of normal means or regression parameters subject to order restrictions (Dunson & Neelon, 2003). By choosing Q0 to allow probability mass on the boundary of SE, we allow atoms to be identical in the different groups with positive probability.

In considering choice of Q0 and the impact on borrowing of information across groups, we focus on the two group case in which E = {(1, 2)}, so that P1P2. Data consist of xi~Pai independently, where ai = 1 for i = 1, …, n1 and ai = 2 for i = n1 + 1, …, n. We specify Q0 as

f(Θ1,Θ2)=f1(Θ1){π0δ0(Θ2Θ1)+(1π0)f2(Θ2Θ1)}, (3)

where X=,f1() is a density on , such as Gaussian, 0 ≤ π0 ≤ 1 is the prior probability of Θ1 = Θ2, and f2(·) is a density with support on +, such as truncated Gaussian.

It is clear from (3) that borrowing of information between Groups 1 and 2 is controlled by the probability π0 placed on identical atoms, and by the choice of f2, which has an impact on the magnitude of the differences in the subset of paired atoms that are not identical. In order to assess how borrowing of information occurs in prediction, after updating the prior with data from the two groups, we calculate the conditional predictive distribution of xn+1 for a new subject in Group 2.

By a generalization of the Blackwell & MacQueen (1973) Pólya urn scheme, the conditional distribution of xn+1 given an+1 = 2, x1 and x2 is

(xn+1|x2,x1)~(αα+n)f2*()+h=1n1{(π0α+n)δxh()+(1π0α+n)f2(;xh)}+h=n1+1n(1α+n)δxh(), (4)

where f2*=f1f2 is the convolution of f1 and f2, and f2(xh)=δxhf2. Hence, a new subject in Group 2 has probability (nn1)/(α + n) of being assigned to the same x value as a subject in Group 2 of the dataset, and probability π0n1/(α + n) of being assigned to the same value as a subject in Group 1. The probability of being assigned to a new value, xn+1 ∉ {x1, x2}, is {α + n1(1 − π0)}/(α + n).

2·3. Restricted dependent Dirichlet process mixtures

Expression (2) implicitly assumes almost sure discreteness of each Pk, so does not allow modelling of continuous distributions subject to a stochastic order constraint. As a broader class of restricted dependent Dirichlet process mixture models, we propose

xi~Kσi(,μi),(μi,σi)~Pai,i=1,,n,Pk=h=1πhδΨh,k=1,,K, (5)

where Kσ:D×X+ is a kernel that depends on the scale parameter σ, D is the domain of the data, Ψh={Θh,Γh}~Q0R0, {πh}h=1, {Θh}h=1 and Q0 are as defined in (2), Ψh+, R0 is a base probability measure on (+,B(+)), and Q0R0 denotes the product measure.

Letting gk(x)=Kσ(x,μ)Pk(dμ,dσ), for k = 1, …, K and all xD, the prior for (P1, …, PK) in (5) induces a prior for the collection g={g1,,gk}LE. Suppose for all σ+ and μX,Kσ(,μ) is a nondegenerate density on D satisfying the monotone stochastic ordering condition

D(z)Kσ(x,μ)dxD(z)Kσ(x,μ+Δ)dx, for all Δ>0,zD,

with D(z) the subset of D excluding values greater than z. Then LE is a set of densities satisfying partial stochastic ordering E from the following result:

LEMMA 1. For all (g1,,gK)LE,

D(z)gk(x)dxD(z)gl(x)dx, for all zDand(k,l)E.

Lemma 1 follows from the monotone stochastic ordering condition after noting that

D(z)gk(x)dx=h=1πhD(z)Kσh(x,μhk)dx,k=1,,K,

and μhkμhl for all (k, l) ∈ E.

Note that LE is a convex set with extreme points

[{Kσ(,μ1),,Kσ(,μK)}:σ+,μSE],

assuming that Kσ(,μ) cannot be expressed as a convex combination of kernels of the same form but with different μ, σ. Every gLE can be represented as a mixture over the extreme points, and our prior in (5) corresponds to placing a Dirichlet process prior on the mixing measure. Consider the special case in which Kσ(x,μ)=K((xμ)/σ)/σ, with K() an arbitrary non-degenerate density on , such as the standard normal. In this case, gk(x)=K((xμ)/σ)/σPk(dμ,dσ), with Pk having a marginal Dirichlet process prior on (×+,B()B(+)). From Lo (1984) (refer to expression 3.1) it follows that the support of the prior for gk contains all densities on with respect to Lesbesgue measure in its closure.

Note that expression (5) induces a Dirichlet process location-scale mixture on each of the densities gk. A convenient special case corresponds to location-scale mixtures of normal densities. However, it is well known that one can approximate any density on with respect to Lesbesque measure through use of a Dirichlet process location mixture of normals having a single unknown variance (Ghosal et al., 1999, Lijoi et al., 2005). Hence, for simplicity in applications, we focus the remainder of the article on the location mixture

xi~Kσ(,μi),μi~Pai,i=1,,n, (6)

with the prior for (P1, …, PK) specified as in (2) and with σ an unknown scale.

3. Properties and hypothesis testing

3·1. Two-group case

One of our primary goals is to develop an approach for hypothesis testing of equalities in distributions between groups against stochastically ordered alternatives. We focus on the model defined in expressions (2) and (6). Under this model, differences between groups in (g1, …, gK) are controlled through differences in the mixture distributions. Hence, we base inferences about differences among (g1, …, gK) on tests of differences among (P1, …, PK).

In developing the methods, we initially consider the two group case in which P1P2. Instead of requiring the distributions to be exactly identical in the two groups, interval null hypotheses are formulated based on bounding a distance metric. In particular, we focus on the total variation distance,

d12=maxBB(X)|P2(B)P1(B)|. (7)

LEMMA 2. Under the prior defined in (2) and (3),

d12=h=1πh1(βh>0)~Be{α(1π0),απ0},

The proof is in the Appendix. Lemma 2 implies that the distance between P1 and P2 stochastically increases with 1−π0. In addition, E(d12) = 1−π0, with α controlling uncertainty in this prior expectation.

Using the distance metric in (7), we formalize the null and alternative hypotheses as

H0:d12ϵ,H1:d12>ϵ, (8)

where ϵ > 0 is a small positive constant. In the limit as π0 → 0, d12δ1 in distribution, so that pr(d12 < ϵ) →0, for any ϵ →1

Theorem 1 provides justification for using hypotheses on the mixing measures as a basis for inferences on G1 and G2.

THEOREM 1. Specify H0 and H1 as in (8) and let Gk(B) = B gk(x)dx, k = 1, 2, with gk(x)=K(x,s)dPk(s). Then H0 implies that

maxx(X)|G2(x,)G1(x,)|<ϵ.

3·2. Multiple-group case

It is straightforward to generalize the approach in §3.1 to the multiple-group setting. When there are K groups subject to partial stochastic ordering indexed by E, it is first necessary to specify P0. For most orderings of interest in applications, one can express the hth atom in group k as Θhk=wkβh*, where wk is a fixed K × 1 vector of 0s and 1s, and βh*=(βh1*,,βhK*) is a parameter vector having a subset of elements constrained to be non-negative or nonpositive.

For example, when there is a simple stochastic ordering: P1P2 ≼ …, PK we let wk=(1k,0Kk), which implies that βhk*=ΘhkΘh,k1, for k = 2, …, K. Hence, constraining βhk*0, for k = 2, …, K, induces a P0 with appropriate support. Generalizing prior (3), we let

f(βh*)=f1(βh1*)k=2K{π0kδ0(βhk*)+(1π0k)fk(βhk*)}, (9)

where π0k = pr(Θhk = Θh, k−1) and fk is a density with support on +, for k = 2, …, K. To modify the prior to accommodate tree stochastic order, i.e. P1Pk, k = 2, …, K, we can use the same P0 but with wk = (1, 0k−2, 1, 0Kk)′. Umbrella orderings, with location of the peak known, and cases involving multiple factors, discussed in §5, can also be accommodated by changing W = (w1, …, wK)′.

For ease in exposition, we focus our discussion of multiple group hypothesis testing on the simple ordering case, though modifications to other cases are automatic. Generalizing the hypotheses in (8), we let

H0k:dk,k+1ϵ,H1k:dk,k+1>ϵ, (10)

for k = 1, …, K − 1, with dk,k+1=maxxX|Pk+1(x,)Pk(x,)|. We refer to H0k as the local null hypothesis of near-equivalence in Groups k and k + 1. One can also consider the global null hypothesis, H0 dMϵ, where

dM=max(j,l){1,,K}maxBB(X)|Pl(B)Pj(B)|.

Hence, H0 implies near-equivalence in all K groups. It is straightforward to show that dM ~ Be{α(1 − Π0)Π0}, with Π0=k=1K1π0k. Using the Gibbs sampler of §4, we can calculate local and global hypothesis probabilities from a single chain, while also obtaining model-averaged group-specific density estimates.

4. Posterior computation

4·1. Model and background

In describing algorithms for posterior computation, we focus on the following case:

xi~N(waiβi,τ1)βi~P=h=1Vhl<h(1Vl)δβk,Vh~Be(1,α),βh*~P0, (11)

where ai = k, for i ∈ {nk−1 + 1, …, nk−1 + nk}, k = 1, …, K, with n0 = 0 and nk the number of subjects in group k, wk chosen as discussed in §3.2, and βi = (βi1,…,βiK)′. Note that this expression is a special case of (5) with K corresponding to a normal kernel and with μi=waiβi. Assuming prior (9), we let f1(βh1*)=N(βh1*;μ0,σ02) and fk(βhk*)=N+(βhk*;0,κ1), for k = 2, …, K, with N+(·) denoting the normal density truncated to be positive.

As a result of the reparameterization used in expression (11), we effectively have a typical Dirichlet process mixture of normal linear regression models, with a constrained mixture structure used in the base measure, P0. Hence, we can use standard Markov chain Monte Carlo algorithms for posterior computation in Dirichlet process mixture models. Since the structure of the base measure creates some difficulties in implementing Pólya urn-based algorithms, we focus on a blocked Gibbs sampler (Ishwaran & James, 2001). This algorithm is based on updating the random weights and atoms in a truncation approximation to the infinite stick-breaking representation.

In particular, let P=h=1NVhl<h(1Vl)δβk*, with the components defined as in (11) but with VN = 1 so that terms N + 1, …, ∞ can be excluded. Following the approach of Ishwaran & James (2001), one can show that this approximation tends to be accurate for moderate N, such as N = 20, particularly if α ≤ 1. One can assess whether or not N is sufficiently large by monitoring the maximum index for an occupied cluster: NO = maxi ζi, with ζi = h if βi=βh*. If the posterior probability assigned to NO values close to N is not close to zero, N should be increased.

4·2. Prior specification and posterior inference

We recommend a default prior specification. To avoid sensitivity to the measurement scale, standardize the data by subtracting the Group-1 mean and dividing by the Group-1 standard deviation. Then μ0 = 0 and σ02=1 are chosen for the mean and variance of the Group-1 atoms. To assign high probability to a wide range of mild to moderate shifts in the response density, let κ~Ga(1/2,1/2) to induce a Cauchy prior on βhk*, for k > 1. The Cauchy is often used as a robust default in parametric model selection (Jeffreys, 1961). A diffuse prior can be chosen for the error precision, τ−1, which is common to all the models.

In addition, we choose a Ga(aα, bα) prior for α, with the gamma distribution parameterized to have mean aα/bα and variance aα/bα2. There is substantial information in the data about α, so a diffuse prior could be used, but we recommend letting aα = 1 and bα = 1 to favour a small-to-moderate number of clusters. Letting ϵ = 0.05 and fixing α = 1, one can assign pr(H0k) = pr(H1k) = 1/2 by choosing π0k = 0.792, the solution to the equation 0.5=00.05Be(π;1π0k,π0k)dπ. Letting aπ + bπ = 1 to correspond to a unit information prior, we obtain a hyperprior for π0k with mean 0.792 by letting aπ = 0.792, bπ = 0.208. In the two group case, the induced prior on d12 is U-shaped, having a mean of 0.208 and high density near 0 and 1.

With the prior specification complete, the Gibbs sampler described in the Appendix can be run to obtain draws from the posterior distribution. These draws can be used to obtain group-specific density estimates and posterior hypothesis probabilities. The posterior distribution of the distance between any two groups can also be obtained.

5. Simulation examples

We considered two simulation cases. In both cases, K = 2 and data in Group 1 were simulated from the following mixture of three normals:

f(y)=0.2N(y;2.5,τ1)+0.7N(y;0,τ1)+0.1N(y;1.5,τ1),

with τ = 3. In Case 1 data in Group 2 were simulated from this same density, while in Case 2 we increased the component-specific means slightly from (−2.5, 0, 1.5) to (−2.4, 0.4, 2.2), resulting in a distance of d12 = 1 as each of the means varied.

For each case, we simulated 100 datasets under three sample sizes, n0 = 10, 25 and 100, with n0 denoting the sample size per group. Each dataset was analyzed using the Gibbs sampler of §4, with 2000 iterations collected after a 500 iteration burn-in. Apparent convergence was rapid and mixing was good. The set of simulations in the n0 = 100 case was run using Matlab on a Mac PowerBook G4 laptop in approximately 10 hours.

Figure 1 plots the results in Case 1, showing the group-specific Bayesian density estimates for each of the 100 simulated datasets in the three sample sizes under consideration, along with a histogram of the d12 distance measure. For a small sample size of 10/group, the estimates were surprisingly good on average, though the primary or secondary mode was substantially overestimated in a small proportion of simulations. In addition, in many of the simulations, the density estimate exhibited shrinkage towards the normal base measure, with the primary peak and dip in the left-hand tail underestimated. The performance improved dramatically for n0 = 25, with most of the density estimates close to the truth. The n0 = 100 estimates were all very close to the truth. The distribution of the posterior mean of d12 across the simulations was increasingly concentrated at the true value of 0 as the sample size increased. Figure 2 plots the results in Case 2. The performance was similar to that described in case 1, but with the distribution of d12 concentrated increasingly away from 0 as the sample size increased.

Figure 1:

Figure 1:

Results from simulation study Case 1. The rows correspond to the sample sizes of 10, 25 and 100 per group. The first two columns show the posterior mean density estimates for Groups 1 and 2 from each simulation in dotted curves, with the true densities as solid curves. The third column shows a histogram of the posterior means of the distance d12 between the groups, with vertical lines for the prior mean.

Figure 2:

Figure 2:

Results from simulation study Case 2. The rows correspond to the sample sizes of 10, 25 and 100 per group. The first two columns show the posterior mean density estimates for groups 1 and 2 from each simulation in dotted curves, with the true densities as solid curves. The third column shows a histogram of the posterior means of the distance d12 between groups, with vertical lines for the prior mean.

Summary statistics of d12 across the simulations are shown in Table 1. The posterior probability of H0 increased as the sample size increased under Case 1, and decreased as the sample size increased under Case 2. For the two small sample sizes, there was not strong evidence in favour of either hypothesis. This is as expected, given that we have purposely chosen a simulation case in which the difference between the groups is subtle. Repeating the simulation in a case with a larger difference, results not shown, showed that we can obtain strong evidence in favour of H1 even in sample sizes of n0 = 10. Table 1 illustrates that the posterior hypothesis probabilities are somewhat robust to ϵ, with changes of less than 0.10 even with ϵ varying from 0.01 to 0.1. To assess sensitivity of hypothesis testing to the hyperprior on π0, we repeated simulation Cases 1 and 2 for sample size n0 = 100 with (i) π0 fixed at 0.5 and (ii) aπ = bπ = 1. In each case, density estimation performance was the same as that shown in Figs 1 and 2. However, when π0 = 0.5, there was a tendency to overestimate d12, leading to substantially lower posterior probabilities of H0 in Case 1 than those shown in Table 1. This is as expected given that fixing π0 does not allow borrowing of information across the atoms in the restricted dependent Dirichlet process specification. Although prior (ii) was also centred on π0 = 0.5, there was less of a tendency to overestimate d12 under H0.

Table 1.

Simulation study. Summary statistics of d12 across the simulations for each sample size. Results shown are means with 95% empirical confidence limits in parentheses.

Case Common sample size d12 Pr(d12 < ϵ | data)
ϵ = 0.01 ϵ = 0.05 ϵ = 0.1
1 10 0.140
(0.028, 0.614)
0.67
(0.17, 0.86)
0.72
(0.18, 0.91)
0.75
(0.22, 0.93)
1 25 0.085
(0.011, 0.299)
0.72
(0.19, 0.89)
0.78
(0.22, 0.94)
0.81
(0.28, 0.96)
1 100 0.044
(0.003, 0.215)
0.79
(0.28, 0.94)
0.85
(0.30, 0.99)
0.89
(0.42, 1.00)
2 10 0.233
(0.029, 0.660)
0.55
(0.10, 0.82)
0.60
(0.10, 0.91)
0.63
(0.10, 0.93)
2 25 0.278
(0.032, 0.765)
0.46
(0.02, 0.84)
0.51
(0.02, 0.87)
0.54
(0.06, 0.91)
2 100 0.560
(0.091, 0.884)
0.13
(0.00, 0.74)
0.15
(0.00, 0.79)
0.18
(0.00, 0.83)

6. Genotoxicology application

6·1. Data structure and the scientific problem

We applied the approach to data from a study of DNA damage and repair. Batches of cells were exposed to 0, 5, 20, 50 or 100 micromoles H2O2 and DNA damage was then measured in individual cells after allowing a repair time of 0, 60 or 90 minutes. With i = 1, …, n indexing the cells under study, the measured response xi for cell i was the Olive tail moment, which is a surrogate of the frequency of DNA strand-breaks obtained using the comet assay.

The goal of the study is to assess the sensitivity of the comet assay to detecting damage induced by the known genotoxic agent H2O2, while also investigating how rapidly damage is repaired. Let ai ∈ {1, …, K} be a group index denoting the level of H2O2 and repair time for cell i. The value of ai for each dose × repair time value is shown in Fig. 3. The total sample size is 1400, with 100 cells per group except for groups 9 and 13, which had 50.

Figure 3:

Figure 3:

Genotoxicity application. Directed graph illustrating order restriction. Arrows point towards stochastically larger groups. Posterior probabilities of H1k are shown.

Among cells with zero repair time, DNA damage should be nondecreasing with dose of H2O2. In addition, within a given dose level, DNA damage should be nonincreasing with repair time. Hence, we make the ordering assumption illustrated in Fig. 3 using a directed graph, with arrows pointing towards stochastically larger groups. We wish to assess whether or not DNA damage continues to increase at higher levels of H2O2 exposure and to investigate whether or not damage is significantly reduced across each increment of the repair time.

6·2. Analysis and results

Previous authors analyzed the data in Groups 1–5 using a dynamic mixture of Dirichlet processes (Dunson, 2006; Pennell & Dunson, 2006), which allows for dependence in the distributions within adjacent dose groups but does not enforce stochastic ordering restrictions. In addition, that model does not allow the incorporation of both dose of H2O2 and repair time, though extensions are possible.

We implemented the approach described in §4. The Gibbs sampler was run for 20, 000 iterations after a 1, 000-iteration burn-in. As in the simulation study, the chain appeared to converge rapidly and mix efficiently based on standard diagnostics. In fact, 20, 000 iterations was really more than sufficient, and we obtained essentially indistinguishable results based on a 500-iteration burn-in and a 2, 000-iteration collection interval, as used in the simulation studies.

Figure 4 plots group-specific Bayesian density estimates and 95% pointwise credible intervals. Although the Bayes estimates have been constrained to follow a stochastic ordering, there is no evidence of systematic deviations from frequentist kernel density estimates obtained using only the data in a given group. Figure 3 provides posterior probabilities of local alternative hypotheses for different group comparisons. The results suggest highly significant increases in DNA damage between the 0, 5 and 20 micromole H2O2 dose groups given a repair time of 0 minutes, with the evidence of further increases at higher doses less clear. As expected, there is no evidence of a change in the distribution between Groups 1, 6 and 11, since there was no induced damage to be repaired. However, there were highly significant decreases in DNA damage in each of the exposed groups after a repair time of 60 minutes. Allowing an additional 30 minutes of repair did not significantly alter the distribution. These conclusions were the same for ϵ = 0.01 and ϵ = 0.10

Figure 4:

Figure 4:

Genotoxicity application. Estimated densities of the Olive tail moment in a subset of the H2O2 dose × repair-time groups. Solid curves are posterior mean density estimates and dashed curves provide 95% pointwise credible intervals.

These results are consistent with the raw data and plots shown in Fig. 4, and are both scientifically reasonable and interesting. It is known that the type of damage induced by hydrogen peroxide can be repaired quickly by base excision and repair mechanisms. The result that there is no further improvement after 60 minutes suggests that it may be unnecessary to collect data for repair times exceeding an hour in future molecular epidemiology studies using the comet assay to identify genotypes predictive of DNA repair rates.

To assess sensitivity, we repeated analyses using a variety of alternative hyperprior settings. In particular, we tried the cases aπ + bπ = 5 instead of 1 to correspond to a more informative prior, aα = 1, bα = 1 to favour more clusters, κ = 1, and κ = 2. The estimated densities did not change noticeably across these analyses and the conclusions were robust.

7. Discussion

The proposed approach for posterior computation focused on order restrictions that can be expressed in terms of sign constraints on a K × 1 vector of regression coefficients specific to each Dirichlet process component. This encompasses simple order, tree orders, umbrella orders with known changepoints, and certain multi-factor cases, such as that considered in Section 6. However, orders having more than K − 1 restrictions cannot be accommodated. For example, considering the order restriction illustrated graphically in Fig. 3, a different computational approach is needed if the graph were modified to include additional arrows, e.g., from Groups 6 to 7.

Although we have focused on a relatively simple setting, it is straightforward to imbed our prior for stochastically ordered mixture distributions within a larger hierarchical model. For example, one can incorporate a parametric adjustment for covariates within the kernel. In addition, one can allow for stochastically ordered latent variable distributions. Ongoing work focuses on the extension to incorporate continuous predictors, which can conceptually be accomplished by replacing the atoms with nondecreasing stochastic processes.

Acknowledgement

This research was supported by the Intramural Research Program of the U.S., National Institutes of Health, National Institute of Environmental Health Sciences. The authors thank an anonymous referee for important comments.

Appendix

Technical details

Proof of Lemma 2.

We first show that

d12=maxBB(X){h=1πh1(Θh1B,Θh2B)}=h=1πh1(βh>0).

This follows by first noting that the expression in {·} is maximized when B is chosen as the union of small neighbourhoods around the atoms {Θh1}h=1, with these neighbourhoods sufficiently small to exclude all elements of {Θh2}h=1 that do not belong to {Θh1}h=1. Then, under (3), it is clear that, for this choice of B, 1(Θh1B,Θh2B)=1(βh>0). The final part of Lemma 1 follows from noting that G ~ DP(αG0) implies

G(B)=h=1πh1(ΘhB)~Be[αG0(B),α{1G0(B)}], forallBB(X),

where Vh ~ Be(1, α), Θh ~ G0, h = 1, …, ∞. In addition, 1(ΘhB) ~ Ber{G0(B)}.

Proof of Theorem 1.

The conditions in Theorem 1 imply that

G1(x,)=h=1πhK*(x,Θh1) and G2(x,)=h=1πhK*(x,Θh1+βh),xX,

where K*(x,Θ)=xK(z,Θ)dz. Letting H0={h:βh=0,h=1,2,,} and H¯0={1,2,,}\H0, we have

G2(x,)G1(x,)=hH¯0πh{K*(x,Θh1+βh)K*(x,Θh1)}.

Since hH¯0πh<ϵ under H0 and, for any fixed x, 0K*(x,Θ)1 is a monotone increasing function of Θ, Theorem 1 follows directly.

Gibbs sampling steps.

Let ζi = h denote that individual i is sampled from component h, so that βi=βh*, for h = 1, …, T. Then the blocked Gibbs sampler proceeds through the following steps.

Step1. Update ζi, for i = 1, …, n, by sampling from the multinomial conditional distribution with

pr(ζi=h)=πhN(xi;waiβh*,τ1)l=1NπlN(xi;waiβl*,τ1),h=1,,N.

Step 2. Assuming τ ~ gamma(aτ, bτ), update τ by sampling from the full conditional,

gamma(aτ+n2,bτ+12i=1n(xiwaiβi)2).

Step 3. Update Vh, for h = 1, …, N − 1, by sampling from the full conditional distribution,

Be(1+i=1n1(ζi=h),α+i=1n1(ζi>h)).

Step 4. Update βh*, for h = 1, …, N, via the following Gibbs sub-steps:

  1. update βh1* from N(Eh1, Vh1), where
    Eh1=Vh1{σ02μ0+τi:ζi=h(xik=2Kwai,kβik)},Vh1={σ02+τi=1n1(ζi=h)}1
  2. update βhk*, for k = 2, …, K, from the full conditional, π^hkδ0+(1π^hk)N+(Ehk,Vhk), where the conditional probability of βhk*=0 is
    π^hk=π0k{π0k+(1π0k)2κ0N(z;Ehk,Vhk)dzπN(0;Ehk,Vhk)}1,
    and the mean and variance in the normal component are, respectively,
    Ehk=Vhk{τi:ζi=h(xilkwai,iβil)},Vhk={κ+τi:ζi=hnwai,k2}1.

Step 5. Assuming α ~ Ga(aα, bα), update α from its full conditional distribution,

Ga(aα+N1,bαh=1N1log(1Vh)).

Step 6. Assuming π0k ~Be(aπ, bπ), for k = 2, …, K, update π0k from its full conditional distribution,

Be(aπ+h=1N1(βhk*=0),bπ+h=1N1(βhk*0)).

Step 7. Assuming κ ~ Ga(aκ, bκ), update κ from its full conditional distribution,

Ga(aκ+12h=1Nk=2K1(βhk*0),bκ+12h=1Nk=2K(βhk*)2).

REFERENCES

  1. ARJAS E & GASBARRA D (1996). Bayesian inference of survival probabilities, under stochastic ordering constraints. J. Am. Statist. Assoc 91, 1101–9. [Google Scholar]
  2. BASU S & CHIB S (2003). Marginal likelihood and Bayes factors for Dirichlet process mixture models. J. Am. Statist. Assoc 98, 224–35. [Google Scholar]
  3. BERGER JO & GUGLIELMI A (2001). Bayesian and conditional frequentist testing of a parametric model versus nonparametric alternatives. J. Am. Statist. Assoc 96, 174–81. [Google Scholar]
  4. DASS SC & LEE J (2004). A note on the consistency of Bayes factors for testing point null versus non-parametric alternatives. J. Statist. Plan. Infer 119, 143–52. [Google Scholar]
  5. DE IORIO M, MULLER P, ROSNER GL & MACEACHERN SN (2004). An ANOVA model for dependent random measures. J. Am. Statist. Assoc 99, 205–15. [Google Scholar]
  6. DUAN JA, GUIDANI M & GELFAND AE (2007). Generalized spatial Dirichlet process models. Biometrika 94, 809–25. [Google Scholar]
  7. DUNSON DB & NEELON B (2003). Bayesian inference on order-constrained parameters in generalized linear models. Biometrics 59, 286–95. [DOI] [PubMed] [Google Scholar]
  8. DUNSON DB, PILLAI N & PARK J-H (2007). Bayesian density regression. J. Royal Statist. Soc. B 69, 163–183. [Google Scholar]
  9. GELFAND AE & KOTTAS A (2001). Nonparametric Bayesian modeling for stochastic order. Ann. Inst. Statist. Math 53, 865–76. [Google Scholar]
  10. GELFAND AE, KOTTAS A, & MACEACHERN SN (2005). Bayesian nonparametric spatial modeling with Dirichlet process mixing. J. Am. Statist. Assoc 100, 1021–35. [Google Scholar]
  11. GHOSAL S, GHOSH JK & RAMAMOORTHI RV (1999). Posterior consistency of Dirichlet mixtures in density estimation. Ann. Statist 27, 143–58. [Google Scholar]
  12. GRIFFIN JE & STEEL MFJ (2006). Order-based dependent Dirichlet processes. J. Am. Statist. Assoc 101, 179–94. [Google Scholar]
  13. GUTIERREZ-PENA E & WALKER SG (2005). Statistical decision problems and Bayesian nonparametric methods. Int. Statist. Rev 73, 309–30. [Google Scholar]
  14. HANSEN MB & LAURITZEN SL (2002). Nonparametric Bayes inference for concave distribution functions. Statist. Neer 56, 110–27. [Google Scholar]
  15. HOFF PD (2003a). Bayesian methods for partial stochastic orderings. Biometrika 90, 303–17. [Google Scholar]
  16. HOFF PD (2003b). Nonparametric estimation of convex models via mixtures. Ann. Statist 31, 174–200. [Google Scholar]
  17. ISHWARAN H & JAMES LF (2001). Gibbs sampling methods for stick-breaking priors. J. Am. Statist. Assoc 96, 161–73. [Google Scholar]
  18. JEFFREYS H (1961). Theory of Probability, 3rd ed Oxford University Press. [Google Scholar]
  19. LIJOI A, PRUNSTER I & WALKER SG (2005). On consistency of nonparametric normal mixtures for Bayesian density estimation. J. Am. Statist. Ass 100, 1292–6. [Google Scholar]
  20. LO AY (1984). On a class of Bayesian nonparametric estimates: I. Density estimates. Ann. Statist 12, 351–7. [Google Scholar]
  21. MACEACHERN SN (1999). Dependent nonparametric processes In Proc. Sect. Bayes. Statist. Sci, pp. 50–5., Alexandria, VA: American Statistical Association. [Google Scholar]
  22. MACEACHERN SN (2001). Decision theoretic aspects of dependent nonparametric processes In Bayesian Methods with Applications to Science, Policy and Official Statistics, Ed. George E, pp. 551–60. Creta: International Society for Bayesian Analysis. [Google Scholar]
  23. MULLER P, QUINTANA F & ROSNER G (2004). A method for combining inference across related nonparametric Bayesian models. J. R. Statist. Soc. B 66, 735–49. [Google Scholar]
  24. PENNELL ML & DUNSON DB (2006). Bayesian semiparametric dynamic frailty models for multiple event time data. Biometrics 62, 1044–52. [DOI] [PubMed] [Google Scholar]
  25. SETHURAMAN J (1994). A constructive definition of Dirichlet priors. Statist. Sinica 4, 639–50. [Google Scholar]

RESOURCES