Abstract
We develop a dependent Dirichlet process (DDP) model for repeated measures multiple membership (MM) data. This data structure arises in studies under which an intervention is delivered to each client through a sequence of elements which overlap with those of other clients on different occasions. Our interest concentrates on study designs for which the overlaps of sequences occur for clients who receive an intervention in a shared or grouped fashion whose memberships may change over multiple treatment events. Our motivating application focuses on evaluation of the effectiveness of a group therapy intervention with treatment delivered through a sequence of cognitive behavioral therapy session blocks, called modules. An open-enrollment protocol permits entry of clients at the beginning of any new module in a manner that may produce unique MM sequences across clients. We begin with a model that composes an addition of client and multiple membership module random effect terms, which are assumed independent. Our MM DDP model relaxes the assumption of conditionally independent client and module random effects by specifying a collection of random distributions for the client effect parameters that are indexed by the unique set of module attendances. We demonstrate how this construction facilitates examining heterogeneity in the relative effectiveness of group therapy modules over repeated measurement occasions.
Keywords: Bayesian hierarchical models, Conditional autoregressive prior, Dependent Dirichlet process, Group therapy, Growth curve, Mental health, Multiple membership, Non-parametric priors, Substance abuse treatment
1 Introduction
For many applications in which data have a multilevel structure, observations on a study participant might not be nested within a single higher-level unit. Multiple membership (MM) modeling is used to account for such data structures, which arise in applications such as the estimation of teacher effects from student test scores, where each student is typically linked to multiple teachers over one or more grades (Hill & Goldstein 1998). MM structures also occur in the analysis of health care costs when patients are treated by multiple providers (Carey 2000) and smoothing disease rates when modeling health outcomes across geographic areas (Langford et al. 1999).
In our motivating application, the MM structure arises in a study of the effect of group cognitive behavioral therapy (CBT) on reducing depressive symptoms among clients in residential substance abuse treatment. The Building Recovery by Improving Goals, Habits and Thoughts (BRIGHT) study (Watkins et al. 2011) was a community-based effectiveness trial of a group cognitive behavioral therapy (CBT) intervention for treating residential substance abuse treatment clients having depressive symptoms. The BRIGHT study employed a quasi-experimental design in which cohorts of clients at each of four study sites received either residential treatment as usual (UC) (n = 159) or residential treatment enhanced with the BRIGHT intervention (CBT) provided by trained substance abuse treatment counselors (n = 140). Clients were assigned to receive either CBT or UC according to which intervention was offered at their study sites at the time of entry into residential substance abuse treatment. CBT and UC were offered at each study site on an alternating basis over time. The clients assigned to the CBT condition were expected to complete four modules of group CBT, with each module consisting of four thematically-similar sessions offered over a two-week period. This sequence of modules was then offered on repeating basis. In all, S = 61 group CBT modules were offered to the clients assigned to the CBT condition. These 61 modules were divided into G = 4 CBT open-enrollment therapy groups, which are sequences of sessions that have distinct sets of clients; the number of clients enrolled in each open-enrollment group was 17, 21, 19, and 83, respectively. Enrollment into the therapy group occurs on an open basis (Morgan-Lopez & Fals-Stewart 2006, Paddock et al. 2011), with clients entering the therapy group at the start of new modules. The primary study outcome is client depressive symptomology, as measured by the Beck Depression Inventory-II (BDI-II) (Beck et al. 1996). The BDI-II score is a sum across 21 four-level items (scored 0–3), with a higher score indicating a greater level of depressive symptoms. The BDI-II score for client i is measured up to oi times, with oi = 1 for clients with only a baseline assessment at study entry and up to oi = 3 for clients measured as well at both 3 and 6 months post-baseline. The MM structure arises here since client outcomes might be correlated due to common module attendance, and the BDI-II scores are not uniquely associated with a single module but rather with all modules attended by a client.
For longitudinal studies in which participants belong to multiple higher-level units, the standard analytic approach is to include a single set of random effects terms that are assumed to be constant over time to account for the multiple membership. However, constraining these random effects to be constant across time does not allow for changes in correlations among outcomes for clients who attend modules together; their outcomes might be more strongly correlated immediately following group therapy versus at baseline or longer-term follow-up times. Further, including distinct terms in the model to account for multiple membership and for the correlation of repeated measurements within-client might be too restrictive for applications such as group cognitive behavioral therapy (CBT). Not all clients benefit similarly from group therapy (Smokowski et al. 2001). For example, group climate and cohesion are associated with improved outcomes (Ryum et al. 2009, Crowe & Grenyer 2008). Thus, not only might the effects of modules change over time, but the effects of modules on participant outcome trajectories might vary across study participants.
We present a dependent Dirichlet process (DDP) model for repeated measures multiple membership data. Specifically, we propose a set of random distributions for client random effect parameters that are indexed by therapy group module attendance sequences. Our model allows one to obtain treatment effect estimates for group therapy versus a comparison condition that account for the correlation of client outcomes due to the attendance sequences, with the framework embedded in a hierarchical construction for modeling repeated measures data. One may use our approach to examine whether there is heterogeneity in the relative effectiveness of group therapy modules by identifying clusters of clients whose outcome trajectories vary across modules. Our framework is flexible enough to retain application-specific modeling choices. For the BRIGHT study, this includes specifying a proper conditionally autoregressive (CAR) base distribution for the non-parametric prior on module random effects, which accounts for the open enrollment-induced client overlap in attendance of modules that are offered at adjacent time points (Paddock et al. 2011). We demonstrate that the DDP model may be re-cast for estimation as a DP under our multiple membership linkage of clients to treatment in a similar fashion as for the analysis of variance (ANOVA) DDP (De Iorio et al. 2004).
In §2, we introduce an additive model that employs client and MM random effects for BRIGHT study modules that was examined for open-enrollment group therapy data by Paddock & Savitsky (to appear), and then build upon that work by introducing a multivariate generalization to allow for time-varying MM random effects. We present the DDP model in §3 to generalize the additive MM model to jointly model dependence owing to repeated measures within clients and group therapy module participation. Brief mention is made of our computational approach and software solution for conducting posterior simulations under the multiple membership models in §4, followed by an exploration of the properties of the models on simulated data in §5. Our motivating application focuses on the assessment of a group CBT intervention deployed in an open-enrollment study design for the treatment of depressive symptoms among clients in residential substance abuse treatment in §6. We conclude with discussion and conclusions in §7.
2 Multiple Membership Additive Semi-Parametric Models
This section introduces model constructions that include module random effects, which are mapped to each client according to the modules attended by that client using multiple membership modeling. These models permit inference about the relative effectiveness of the CBT intervention while accounting for differences in module effects as well as the dependence induced among clients based on overlaps in the sequences of modules attended. A separate client random effects term captures the within-client dependence among repeated measures.
2.1 Model Construction and Definitions
We first begin with the model of Paddock & Savitsky (to appear) for modeling longitudinal post-treatment outcomes and allowing outcomes for clients who attend the same therapy group to be correlated:
| (1) |
where yij is the BDI-II depressive symptom score for client i (i = 1,…, n) at repeated measurement event j = (1, …, oi). The global intercept is represented by μ. dij are the fixed effects predictors and their associated effects are β. We parameterize for the BRIGHT study, where Ti specifies an indicator for the treatment arm assigned to client i (Ti = 1 for clients receiving cognitive behavioral therapy (CBT), Ti = 0 for those receiving the “usual care” (UC)), and tij denotes the continuously-valued time at which yij was observed. The components of dij are chosen to estimate the effects on depressive symptom scores of CBT assignment, time, and the interaction of CBT assignment and time; a quadratic specification was chosen based on previous data analysis (Paddock & Savitsky (to appear)). The random effects predictor, zij, is a q × 1 vector associated with the q random effects for client i, {bi}. We set for the BRIGHT study, so that the (q = 3) × 1 vector of random effect parameters for client i, bi, capture client-specific variation in change in BDI-II scores over time. Our parameterization of fixed and client random effects employs global second order polynomial terms to enforce smoothness and prevent over-fitting under a study design with a relatively small number of measurement waves per client, as is typical of behavioral intervention studies such as BRIGHT. The second-to-last term allows for multiple membership modeling since depressive symptom scores observed post-treatment, yij, are not linked to specific therapy group modules, but rather to all modules attended by client i. This term maps the yij’s to the vector of S module random effects, γ, by multiplying γ by an S × 1 weight vector, xi, that is normalized to sum to 1 (Hill & Goldstein 1998). In particular, Si equals the number of modules attended by client i; xis = 1/Si if client i attended module s and xis = 0 otherwise. Let N = Σi oi denote the number of repeated measures observed for all clients. Observational error is indicated by . We produce within-sample fitted client growth trajectories in §5 and §6 with employment of (β, {bi}, γ).
2.2 Distribution of Client Random Effects
Though one may parametrically model the client random effects, {bi}, we model them non-parametrically using a Dirichlet process (DP) prior to motivate the subsequent DDP development and to exploit the DP’s usefulness for flexibly modeling the distribution of the {bi}’s despite having no more than three repeated measures per client in the BRIGHT study (Paddock & Savitsky (to appear)). Specify,
| (2) |
| (3) |
where we choose base distribution, F0 ≡
(0, Λ−1), a convenient conjugate form that spans the support for b and simplifies posterior sampling while still allowing the data to estimate a general form for F. We further specify, α ~
a (a1 = 1, b1 = 1), to allow the data to estimate the DP concentration parameter, reflecting its importance for determining the total number of client clusters formed. We may equivalently enumerate (2) as a discrete mixture (Sethuraman 1994),
| (4) |
of countably infinite weighted point masses, where “locations” ( ) index the unique values for the {bi}. The discrete construction for F allows for ties among sampled values for {bi}, so that M ≤ n and index clusters (i.e., clients sharing locations, or having same values of b) with n × 1, s where si = m implies . Then the set, ( ), provides an equivalent parameterization to {bi}, though the former provides better mixing under posterior sampling (Neal 2000).
2.3 Distribution of Module Random Effects
2.3.1 Univariate Module Effects
Owing to the overlap in client attendance of modules under open enrollment into group therapy, we specify a conditionally autoregressive (CAR) prior for module random effects to allow them to be correlated. The degree of correlation is determined by the closeness of the modules, which depends on how we define which modules are neighbors. We define modules offered at adjacent time points within the same open-enrollment group as neighbors given that clients tend to attend subsequent modules in the BRIGHT study’s residential treatment setting (Paddock et al. 2011).
To implement this, we enumerate a two-part form for the covariance matrix (Besag et al. 1991). Firstly, define an S × S adjacency matrix, Ω, to encode dependence among neighboring modules where we set ωss′= 1 if module s is a neighbor of or “communicates” with module s′ (denoted with “~” in s ~ s′), and 0 otherwise. Construct D = Diag(ωs+), where ωs+ =Σjωsj equals the number of neighbors of module s. Then compose the covariance matrix, Q−= (D − Ω)−, the Moore-Penrose pseudo-inverse, as Q is not of full rank, and specify the joint distribution of random module effects,
| (5) |
where scalar precision parameter, τγ,, controls the overall strength of variation. The rank of (D − Ω) is S − G, where G represents the number of distinct open-enrollment therapy groups (Hodges et al. 2003). We use the following model short-hand label for simulated data and BRIGHT data analysis,
MMCAR: Employ the additive model of Equation 1 under the joint prior construction of Equation 5 in the fashion of Paddock & Savitsky (to appear).
Note that one could use a standard MM model for applications under which random effects may be assumed exchangeable.
2.3.2 Multivariate Module Effects
The univariate module effects may be replaced with a multivariate model specification that relaxes the assumption of constant module effects over time specified in Equation 5. Re-state Equation 1,
| (6) |
where S × q, Γ = (γ1,…, γS)′, for each of the multivariate q×1, γs. We again assume a second order polynomial model, but this time for the module effects, where each module, s, is parameterized with a (q = 3) × 1 random effects vector back multiplied by
, which permits the effect of module s under the BRIGHT study to vary with time, tij. We may most easily make the extension of the CAR modeling of Besag et al. (1991) by stacking each of the q, S × 1 columns from Γ into qS × 1,
= (γ(1), …, γ(q)) for the here S × 1, γ(s). Then compose the multivariate CAR prior,
| (7) |
for qS × qS precision matrix, Q = (D − Ω) ⊗ Λ, where Λ describes the dependence among the q random effects per module and is specified to be identical to that used for base distribution associated with prior (2) imposed on {bi}. In summary, Equation 6 extends Equation 1 by permitting MM random (module) effects to vary over time. Assign the following label for our multivariate construction,
MM MV: Employ the additive model of Equation 6 under the joint prior construction of Equation 7.
2.4 Prior Distributions for Other Parameters
Scalar precision parameters, (τε,, τγ,), are each specified with a
a(0.1, 0.1) prior with mean 1, while the q × q precision matrix, Λ ~
(q + 1,
), where the degrees of freedom are set to the minimum value to encourage updating by the data. Lastly, (μ, β), each receive non-informative priors. In instances where our priors specify fixed hyperparameters, we use values intended to be easily overwhelmed in the presence of data rather than eliciting them from our data.
3 Dependent Dirichlet Process for Multiple Membership Data
To allow for greater flexibility in modeling changes in module effects over time as well as the effects of modules on client depressive symptom trajectories, we now reformulate Equation 1 to explicitly index the client random effects by group therapy module identifiers, under which each client is assigned a q × (S + 1) matrix of random effects. This contrasts with the previous specification of sets of q × 1, {bi}i=1,…,n client random effects and the S × 1 module effects, γ, given in Equation 1. The resulting client-by-module matrix parameterization arises from replacing a single random prior distribution for client effects with a collection of random prior distributions that are indexed by the unique module attendance sequences. First, we re-formulate Equation 1 in a more flexible composition,
| (8) |
| (9) |
| (10) |
where we have replaced q × 1, bi and S × 1, γ with the q × (S + 1), Δi for client i composed with,
| (11) |
The first column of Δi employs the analogous bi client random effects from the additive models. The {as,i}s=1,…,S collect a set of q × 1 module random effect vectors for client i. We note that every client receives an effect term, as,i, for all of the S modules, even for modules they have not attended; such is even true for clients in the UC arm. By contrast, the additive model of Equation 1 is only defined at observed sequences of client module attendances, while this formulation is defined over a broader space of potential module attendance sequences across clients. We impose a DP prior on the set of client-by-module effects, Δi, in order that we may borrow strength and dimension reduce to discover clusters of clients expressing differential response sensitivities to treatment exposures. Employment of a continuous base distribution under the DP prior for the {Δi}i=1,…,n allows the posterior inference on an arbitrary sequence of group therapy module linkages for each client. Effect values at unobserved modules are drawn from the non-degenerate continuous base distribution as updated by the observed module attendances. The module effect estimates for unobserved attendances for each client are set equal to the location values associated with the cluster to which the client is assigned. The ability to develop a proper posterior distribution for arbitrary module attendance sequences is referred to by deIorio et al. (2004) as non-degeneracy.
Each of the q × 1 columns of Δi in Equation 8 is back multiplied by xi, which is the MM weight vector we earlier defined, but with a 1 prepended for a random intercept. More specifically, for xi equal to some value x, we construct the latter object as x ≡ (1, x1, …, xS) for xs ∈ [0, 1] to encode the vector sequence for group therapy module attendance. Under our MM construction, the (S + 1) × 1, x is composed of values in [0, 1] for for clients who attend at least one module, and for clients who do not.
We define the q × 1 parameter vector, θx,i ≡ Δix, resulting from composition of the client-by-module random effects with the module attendance sequence. We write θx,i and θx,i′ for clients i and i′ that share the same attendance sequence, x ∈
. Construct the subsequence, (xs(1), … xs(K)) for K ≤ S non-zero entries in x corresponding to modules attended for one or more clients with xi = x. Then we may provide the more granular construction, θx,i = bi + xs(1)as(1),i +…+ xs(K)as(K),i, for client i where we note that only those modules attended by client i contribute to the likelihood. The multiplication of each as(k) by xs(k) reflects the MM design with xs(k) ∈ [0, 1].
Our formulation in Equation 8 may be re-expressed with the q × 1 vector of client random effects, θx,i, in a similar fashion as the q × 1 bi in Equation 1, but here we index the client random effects by module attendance sequence, x. The prior for θx,i is specified under a collection of random distributions, {Fx}, indexed by the unique attendance sequences, x ∈
,
| (12) |
with random effects vector, zij, the same as composed in Equation 1. Specify the prior formulation for θx,i,
| (13) |
We next enumerate a multiple membership dependent Dirichlet process (MM DDP) set of non-parametric distributions indexed by module attendance sequence, x, in the stick-breaking construction (Sethuraman 1994),
| (14) |
of weighted point mass locations where the weights are common for all values of x ∈
, but the locations are indexed the unique attendance sequences (unlike for the simpler DP). We note that marginally, for each x, the locations,
are exchangeable in h, such that Fx follows a Dirichlet process and we have established the propriety of the MM DDP. Denote the following short-hand notation for MM DDP construction,
| (15) |
| (16) |
where we have extended the ANOVA DDP prior of deIorio et al. (2004) to a multiple membership framework for the set of effect random distributions, {Fx}.
We achieve Equation 8 from Equation 12 by extending a property of ANOVA DDP to the MM DDP that re-writes Equation 14 as a DP due to the finite indexing space of group therapy modules with,
| (17) |
| (18) |
| (19) |
Then we may re-write our DDP model formulation of Equation 12 to the DP construction specified in Equation 8.
Though we use Equations 8 – 10 to estimate the MM DDP, the conceptual alternative in Equations 12–14 provides insight into the inferential properties of the MM DDP. The indexing of distributions, rather than just mean effects, by the module attendance sequences better spans the space of distributions generating the client random effects and allows the estimation of client module effects for modules not attended.
We also gain insight into the manner in which strength is borrowed over the set of module attendance sequences. The MM DDP formulation employs {Fx}x∈
indexed by the set of unique module attendance sequences. Few clients, however, may be expected to exactly overlap or to share the same x. Yet, clients will overlap for a portion of the module attendance sequences such that we have repeated observations for each module s ∈ (1, …, S) for estimation of the dependent {as,i′}i′ for all i′ :xs,i′>0. The partial overlaps among the {x}x∈
induce a dependence structure among the {Fx} based on the extent of overlaps.
3.1 Base Distribution
We structure the base distribution, F0, for our q × (S +1) client-by-module parameters to leverage the adjacency dependence of the BRIGHT study modules. Compose F0 for draws for the cluster locations, , as the product of multivariate Gaussian distributions for each of the q × 1, and the q × S, that, together, comprise with,
| (20) |
| (21) |
where m indexes cluster location. The
construction in Equation 21 employs a separable (parsimonious) covariance formulation for the distribution on the set of q × S matrix variate parameters,
. We have employed the notation of Dawid (1981) under which the q × q, Λ, defines the precision matrix for the columns of {
} and the S × S, Q, for the rows. The covariance formulation is equivalent to
. (See Hoff (2011) for an intuitive discussion of separable covariance formulations.) Lastly, the preceding 0 presents the value of the q × S mean. Consistent with prior formulations under the additive models, the q × q,
Λ ~
(q + 1,
). We structure the S × S precision matrix, Q, which models the module-induced adjacency dependence among the q × 1 set,
, with a proper CAR formulation as enumerated in Jin et al. (2005), where Q = (D − ρΩ), and ρ ∈ (−1, 1) ensures Q is of full rank and may be viewed as a smoothing parameter that measures the strength of adjacency association. Matrices (D, Ω) hold the same definitions as earlier specified in §2.3.
Proceeding with the notation of Dawid (1981), we pull together the components of the base distribution into,
| (22) |
where P = diag (1, Q). Let us prepare F0 in the form we will use to conduct posterior simulations by stacking the q rows of (each an (S + 1) × 1 vector) to the q(S + 1) × 1, in,
| (23) |
Vectorize in a similar manner to obtain the qS×1, , which is similar to (7) but is full rank to permit efficient joint posterior sampling under high within-cluster dependence among the qS elements of under (9). Our MM DDP formulation specifies the full set of S module effects for client i set equal to the location values, { }, drawn from the CAR base distribution for cluster m that contains client i for some posterior sampling iteration.
Due to the BRIGHT study design, there were G = 4 open-enrollment therapy groups. Each group was composed of modules having at least partial overlap with another module with respect to the set of clients in attendance, and the sets of clients in the four groups were different. We thus add more flexibility in (23) by specializing the CAR prior in P to each open-enrollment therapy group with,
| (24) |
where we have defined a set, {Qg}g=1, …,G, of CAR precision matrices composed as Sg× Sg, Qg = (Dg − ρgΩg) and recover D = diag (D1, …, DG) and Ω = diag (Ω1, …, ΩG), reflecting the disjoint, non-communicating structure we seek to model. It is noted by Jin et al. (2005) that the parameterization of the global scalar smoothing parameter, ρ, may be overly restrictive, and they offer more heavily parameterized alternatives to permit the learning to adapt more locally. Our specification that offers the indexing of ρg by disjoint group allows smoothing across client-indexed module effects to be local to group. We may specify other continuous, multivariate distributions in place of the CAR for each group, including replacing the CAR covariance matrix construction with an anistropic Gaussian process (Savitsky & Vannucci 2010) or with an unspecified formulation under an inverse Wishart prior.
Assign the following label for the non-parametric construction,
DDP: Equations 12–14 under the base distribution of Equation 23.
4 Computational Approach
Convergence of the sampler employed for simulation and the BRIGHT data analyses was assessed by employing a fixed width estimator with Monte Carlo standard errors (MCSE) computed using the consistent batch means (CBM) method (Jones et al. 2006). Computational software for the posterior distribution simulations is available in our package for the R statistical software (R Development Core Team 2011) package called growcurves (Savitsky & Paddock 2012). All of the methods, fit statistics and charts presented in this paper may be readily reproduced from growcurves. The parameters under DP priors are all sampled in a conjugate fashion by marginalizing over the random measure, F, to produce the Pólya urn scheme of Blackwell & MacQueen (1973), under which each cluster assignment indicator is sampled from a mixture of existing clusters and a new cluster. To the extent that a new cluster is selected, associated parameter locations are generated (and subsequently re-sampled) from the posterior of the base distribution under a single observation. (See Paddock & Savitsky (to appear) for details).
We employ the cross-validatory, log pseudo marginal likelihood (LPML) leave-one-out fit statistic as described in Congdon (2005) under importance re-sampling of the posterior distributions over model parameters to estimate, f (yi|y−i, Mr), where Mr indexes our models where the leave-one-out property induces a penalty for model complexity and helps to assess the possibility for over-fitting. We also include the DIC3 criterion of Celeux et al. (2006) that composes the marginal (predictive) density to estimate f(y|θ) for composition of pD which is more appropriate for the (DP or DDP) mixture formulations that characterize all of our models. The non-penalized mean deviance, D̄, is also utilized.
5 Simulation Study
5.1 Data Generation
We generate data sets for simulation modeling from (12) by allocating the first 132 clients to the CBT and a remaining 168 to a non-group therapy usual care (UC) condition. We employ 24 modules for our simulation. Each CBT client attends 4 modules and each module on average holds 22 clients. The module attendance sequences, {xi}, used to select columns of the client-indexed matrix effects, {Δi}, are next generated in an open enrollment manner by randomly selecting the starting module for each CBT client in the block of 4 modules to which they are assigned. We set xi = [1, 0, …, 0] for all UC clients (who, by design, don’t attend group therapy modules) as our hold-out or comparator module attendance sequence for identification. Such a design instantiates partial overlaps among the module attendance sequences for clients. The minimum and maximum numbers of clients linked to modules were restricted to 11 and 26, respectively, to conform to practical limitations on the underlying structure for group therapy modules. We simulate up to three repeated measures per client.
We simulate 4 clusters of clients, where each cluster generates a (q = 3) × ((S = 24) + 1) set of effect locations, , shared by all clients assigned to them. The q = 3 rows of capture up to second order (intercept, linear and quadratic) polynomial effects for each module. The effects are generated in a vectorized fashion from a multivariate Gaussian with the covariance formulation as outlined for the DDP base distribution enumerated in §3.1. The module effects are generated from a multivariate proper CAR prior under the assumption of adjacency for successive modules with smoothing parameter, ρ = 0.7. A covariance matrix allowing for q = 3 polynomial orders of module random effects is defined with,
| (25) |
where the diagonals encode the variance of the first through third polynomial orders, respectively, for each of (S = 24) × 1 multivariate cluster effect locations. We formulate Λ−1 such that the first and second orders and the second and third orders express negative correlations; for example, if the slope for the effect trajectory of a given module expresses a negative trajectory, then the quadratic term is positive and will tend to decelerate or bend the curve back up. Once the effects are generated, clients are randomly assigned to one of the 4 clusters with equal probability. Each cluster will hold both UC and CBT clients, though the module attendance sequence for the UC clients is set to 0’s such that their assigned module effects do not contribute to the generation of the response values. The model intercept, μ, is set to 35 and fixed effect coefficients are set to β = (−3, 0.25, 0, −2.5, 0.25) for , respectively, for each client, i, where Ti an indicator for the treatment arm assigned to client i (Ti = 1 for CBT, Ti = 0 for UC), and tij denotes the j = 1, …, 3 continuously-valued time at which yij was observed, taking on value 0, 3, or 6 months. The q × (S + 1) resultant set of random effects for client i, Δi, are multiplied with the (S + 1) × 1 MM link vector, xi, to produce q × 1, θx,i matched to for client-specific polynomial variation from the mean time trend (which is captured in β). The model noise precision is set to τε = 0.1.
5.2 Data Modeling
Figure 1 presents in-sample predicted growth curves for randomly selected clients within each treatment arm along with actual client data values. Client growth curves under the DDP model express more adaptiveness to the data, both for U-shaped curves as expressed by client 6 and bell-shaped curves estimated for client 58.
Figure 1.
Posterior mean client growth curves under semi-parametric (MMCAR, MM_MV) and non-parametric (DDP) MM models under simulated data. Simulated data shown by circles.
Posterior mean values for the 3 polynomial effect terms assigned to each module are composed into module effect trajectories through time in Figure 2 comparing MM_MV and DDP models for each of the 4 clusters (columns) and for 4 randomly-selected modules. The posterior mean module effect trajectories estimated under the DDP model track closer to the true trajectory shapes than do the non-client adaptive curves for MM_MV.
Figure 2.
Posterior mean client module effect trajectories from simulated data for four clusters of clients, where the columns index clusters and the rows represent randomly selected modules. The curves are dimensioned in response units and represent the contribution of the modules to the response.
We compose a small Monte Carlo simulation with 10 iterations, where each generates a data set with the above-noted specifications. Estimation is performed under our models for each generated data set and the posterior draws for the fixed effects are concatenated across iterations to examine performance of the 3 comparator formulations under repeated sampling. Figure 3 reveals the posterior distribution over the 95% credible intervals under each model estimated using the predictive margins technique; see Lane & Nelder (1982). We note that that the DDP formulation expresses the least uncertainty around the true values (represented by a dashed line at each of the 3 measurement months).
Figure 3.
Predictive margins for the treatment effect of CBT versus usual care at 0 (left panel), 3 (middle panel) and 6 (right panel) months for MMCAR, MM_MV and DDP formulations under a Monte Carlo simulation from a data generating model where effects are indexed by cluster of clients and modules. Segments reflect the 95% credible intervals and boxes represent the interquartile range of the marginal posterior distribution. The dashed lines in each time period indicate the true treatment effects over the simulated datasets.
5.2.1 Model Fit Statistics
Model fit statistics, D̄, −LPML, and DIC3, are presented in Table 2. One observes lower (better) values across all 3 statistics for the DDP than the other two comparator models, while MM_MV, employing multivariate module effects, outperforms MMCAR parameterized with univariate module effects. In particular, the leave-one-out LPML statistic strongly prefers the DDP model. While the DDP is parameterized with client-by-module random effects, the effective parameterization is reduced under the clustering of clients. Nevertheless, the DDP would generally be expected to express a higher number of effective parameters than the two additive models, though the LPML performances do not indicate over-fitting. The polynomial construction for zij enforces smoothness in the estimated fit as demonstrated in the client growth curves from Figure 1, which also serves to mitigate the possibility for over-fitting. We performed additional simulations to explore scenarios 48 and 66 modules that, on average, have 11 and 8 clients per module, respectively, with the same number of clients. The relative model differences persist under −LPML. The −LPML difference between DDP and MM MV is 158 under S = 24 modules and 149 under 48 modules and 251 under 66 modules.
Table 2.
BRIGHT Study Model Fit Comparisons: D̄, −LPML and DIC3 scores for model alternatives. Lower values imply better performance.
| Model | D̄ | −LPML | DIC3 |
|---|---|---|---|
| MMCAR | 5505 | 2980 | 5679 |
| MM_MV | 5547 | 2994 | 5716 |
| DDP | 5079 | 2929 | 5302 |
6 Application to Group Therapy Data
We now return to the BRIGHT study for comparison of fit among our 3 model formulations. We further focus on inference under the MM DDP construction examine heterogeneity with respect to module type in BDI-II trajectories across disjoint clusters of clients. We recall our parameterization of fixed effects for the BRIGHT study data, where Ti an indicator for the treatment arm assigned to client i (Ti = 1 for CBT, Ti = 0 for UC), and tij denotes the continuously-valued time at which depressive symptom score, yij, was observed. As before, set .
We simplify and focus inference by composing posterior distributions for module effects up to clusters of clients. The client clustering is obtained from the among posterior samples of client partitions using the least squares algorithm of Dahl et al. (2008). The shapes, magnitudes and differences across the clusters express the range we see among clients so that we don’t lose generality with a focus at the cluster, rather than client, level. The most populated 6 clusters are employed and contain (88, 51, 24, 23, 20, 19) clients, respectively, that together hold 225 out of 299 total BRIGHT study clients. Roughly half of the clients in the 6 clusters are UC clients who do not attend any group therapy modules. UC clients with mean client random effects, bi, similar to those of a subset of CBT clients are expected to co-cluster in posterior sampling such that the module effect values for all clients in the cluster are assigned the module effect location values for that cluster. This is an intuitive result where UC clients who express similar idiosyncratic characteristics to co-clustered CBT clients would be expected to similarly respond to CBT treatment were it offered to them.
Figure 4 renders module effect trajectories of the BDI-II depressive symptom scores for randomly selected modules. Results are summarized by averaging trajectories into client clusters, with the largest six clusters shown across the columns, denoted by cluster_1, …, cluster_6. Each client cluster’s trajectories are presented for each each of the 4 open-enrollment CBT therapy groups along the rows within clusters, which are denoted by cbt_1, …, cbt_4 in the Figure. Large differences are observed in shape and magnitude among modules, particularly for client cluster 1, whose trajectories for each of the four open-enrollment groups are provided in the left-most column of plot cells of Figure 4. The range of the curves expresses clinically meaningful differences of 4 – 6 (BDI-II) points (Furukawa 2010). Scanning the columns from left-to-right reveals a marked attenuation in cluster responsiveness to the CBT intervention. Member clients of clusters 4 – 6 express much less depressive symptom sensitivity to participation in the modules and, therefore, one notes much less differentiation in effect values among the modules for these clusters.
Figure 4.
Posterior mean client module effect trajectories of BDI-II depressive symptom scores under the DDP model. Results are summarized by averaging trajectories into client clusters. Each plot cell is indexed by client cluster within each of 4 disjoint CBT groups, (cbt−1,…, cbt−4). The rows of plot cells are indexed by CBT group and the columns, by cluster. The largest 6 clusters of clients are represented (in order of number of clients contained in each). Each plot contains module effect trajectories for randomly-selected modules within each of the four CBT groups.
Figure 5 provides additional insight from the DDP model for examining the variation in module effects across clusters of clients and how those effects vary over time. The Figure shows module effect trajectories disaggregated into the q = 3 posterior mean polynomial effects from which they are each rendered across the 6 clusters of clients. The 3 polynomial effect values are presented for all modules, organized in the same cbt group-within-cluster format utilized in Figure 4. These polynomial parameters imply a module effect trajectory with the order 1 effect providing the intercept, the order 2 effect the slope and order 3, a non-linear quadratic term. The resulting effects trajectory for a module would be U-shaped if the order 3 term is positive. As we noted in Figure 4, there is notable variation in the effect of modules on depressive symptoms across client clusters within each of the four CBT therapy groups as we scan from left-to-right, particularly for cbt groups 1–3; for example, the first two clusters of each CBT therapy group, shown in the first two columns of Figure 5, show clinically meaningful variation in client outcomes.
Figure 5.
Posterior mean client module intercept (order 1), linear (order 2) and quadratic (order 3) effects for each module within the 4 disjoint CBT groups of modules averaged to cluster for BRIGHT case study under the DDP model. The rows of plot cells are indexed by CBT group and the columns, by cluster.
Model fit statistics reveal an improved fit for the DDP in comparison to the other two models; however, unlike for the simulation results, the MMCAR produces a better fit for the BRIGHT data than does MM_MV. These results indicate the importance of differences across clients in responsiveness to modules. Within-sample predicted growth curves (not shown) demonstrate a similar improvement as observed in Figure 1 in shape and orientation adaptability for the DDP as compared to the other models to express better fit performance.
We explore sensitivity of the clustering of clients to our prior specification for the DP concentration parameter, α, employed in (10) for the MM DDP model by varying the shape and rate hyperparameters, (a1, b1), employed in the prior, α ~
a (a1, b1). We vary both hyperparameters in combinations within a range of 1 – 4 for each, producing a prior number of clusters from a minimum of 3 to maximum of 18. While our group therapy data application results show small differences in the posterior numbers of clusters formed, the allocation of clients to the most populous clusters is essentially unchanged as is our inference on client-module effects. Distributions for underlying parameters are also essentially unchanged.
7 Discussion
Our MM DDP approach extends the ANOVA DDP construction of De Iorio et al. (2004) to a multiple membership framework. The MM DDP provides wide support on the space of distributions indexed by the set of distinct multiple membership sequences through the borrowing of strength in overlaps among expressed sequences. The formulation allows one to examine whether element (e.g. module) effects vary across different client trajectories and vice-versa, allowing for one to learn about differing response sensitivities among clients to treatment elements, even for unobserved combinations of clients and treatment elements. We compose a model base distribution to retain straight-forward and efficient posterior sampling properties of the DP while allowing flexibility for Gaussian covariance specifications to parsimoniously parameterize dependence among module effects; in particular, we illustrate adjacency-based formulations for the covariance matrix of the Gaussian base measure in a fashion that renders flexibility while retaining conjugacy.
Other alternatives to our MM DDP may be considered, such as the hierarchical DP (HDP) (Teh et al. 2006) or the nested DP (NDP) (Rodríguez et al. 2008), which both target a grouped data structure with nested observations. These approaches, however, don’t anticipate a multiple membership construction where sub-groups of clients share connections to the same modules as does the MM DDP, which indexes the collection of random measures, {Fx}, by multiple membership (attendance) sequence. While one may ignore the multiple membership composition and employ either of the HDP and NDP, they both perform posterior simulations in a nested, two-step, fashion (for a two-level hierarchical formulation), while we see how the MM DDP reduces to a DP that permits a simpler computational approach. Lastly, neither the HDP or NDP allow inference on unobserved module attendance sequences as does the MM DDP.
The usefulness of our approach may be limited for data with decreasing overlaps among the treatment element (e.g. client module attendance) sequences, {xi}, as this would restrict the ability for the data to borrow strength in the estimation of the collection of random distributions, {Fx}. In one direction where clients perfectly overlap into disjoint groupings of client-modules for CBT studies, the MM DDP reduces to the ANOVA DDP. In the other direction, however, where clients express progressively less overlaps in modules attended, estimability may be compromised. In practice, resource limitations in the total number of modules offered for typical open-enrollment group therapy studies tend to produce a sufficient level of overlaps of clients on each module for estimation.
Software implementing the MM DDP is available for the R statistical software (R Development Core Team 2011) in a package called growcurves (Savitsky & Paddock 2012). All of the methods, fit statistics and charts presented in this paper may be reproduced from growcurves.
Table 1.
Simulation Study Model Fit Comparisons: D̄, −LPML and DIC3 scores for model alternatives. Lower values imply better performance.
| Model | D̄ | −LPML | DIC3 |
|---|---|---|---|
| MMCAR | 5073 | 2691 | 5208 |
| MM_MV | 4905 | 2592 | 5034 |
| DDP | 4607 | 2434 | 4715 |
Acknowledgments
This research was supported by the National Institute on Alcohol Abuse and Alcoholism grant to Susan Paddock (grant number R01AA019663). Data collection was supported by National Institute on Alcohol Abuse and Alcoholism grant to Katherine Watkins (grant number R01AA014699).
References
- Beck A, Steer R, Brown G. Manual for the Beck Depression Inventory-II. Psychological Corporation; 1996. [Google Scholar]
- Besag J, York J, Mollié A. Bayesian image restoration, with two applications in spatial statistics (Disc: P21–59) Annals of the Institute of Statistical Mathematics. 1991;43:1–20. [Google Scholar]
- Blackwell D, MacQueen JB. Ferguson distributions via pólya urn schemes. The Annals of Statistics. 1973;1:353–355. [Google Scholar]
- Carey K. A multilevel modelling approach to analysis of patient costs under managed care. Health Economics. 2000;9:435–446. doi: 10.1002/1099-1050(200007)9:5<435::aid-hec523>3.0.co;2-z. [DOI] [PubMed] [Google Scholar]
- Celeux G, Forbes F, Robert CP, Titterington DM. Reply to comments on “Deviance information criteria for missing data models” (Pkg: P651–706) Bayesian Analysis. 2006;1(4):701–706. [Google Scholar]
- Congdon P. Wiley series in probability and statistics. Wiley; 2005. Bayesian models for categorical data. [Google Scholar]
- Crowe T, Grenyer B. Is therapist alliance or whole group cohesion more influential in group psychotherapy outcomes? Clinical psychology and psychotherapy. 2008;15(4):239–246. doi: 10.1002/cpp.583. [DOI] [PubMed] [Google Scholar]
- Dahl DB, Day R, Tsai JW. Technical Report. 2008. Distance-based probability distribution on set partitions with applications to protein structure prediction. [Google Scholar]
- Dawid AP. Some matrix-variate distribution theory: Notational considerations and a Bayesian application. Biometrika. 1981;68:265–274. [Google Scholar]
- De Iorio M, Müller P, Rosner GL, MacEachern SN. An ANOVA model for dependent random measures. Journal of the American Statistical Association. 2004;99(465):205–215. [Google Scholar]
- de Iorio M, Müller P, Rosner GL, MacEachern SN. An anova model for dependent random measures. Journal of American Statistical Association. 2004;99(2):205–215. [Google Scholar]
- Furukawa T. Assessment of mood: Guides for clinicians. Journal of Psychosomatic Reseach. 2010;68:581–589. doi: 10.1016/j.jpsychores.2009.05.003. [DOI] [PubMed] [Google Scholar]
- Hill PW, Goldstein H. Multilevel modeling of educational data with cross-classification and missing identification for units. Journal of Educational and Behavioral Statistics. 1998;23:117–128. [Google Scholar]
- Hodges JS, Carlin BP, Fan Q. On the precision of the conditionally autoregressive prior in spatial models. Biometrics. 2003;59(2):317–322. doi: 10.1111/1541-0420.00038. [DOI] [PubMed] [Google Scholar]
- Hoff PD. Separable covariance arrays via the Tucker product, with applications to multivariate relational data (Pkg: P179–208) Bayesian Analysis. 2011;6(2):179– 196. [Google Scholar]
- Jin X, Carlin BP, Banerjee S. Generalized hierarchical multivariate car models for areal data. Biometrics. 2005;61(4):950–961. doi: 10.1111/j.1541-0420.2005.00359.x. [DOI] [PubMed] [Google Scholar]
- Jones GL, Haran M, Caffo BS, Neath R. Fixed-width output analysis for Markov Chain Monte Carlo. Journal of the American Statistical Association. 2006;101(476):1537–1547. [Google Scholar]
- Lane PW, Nelder JA. Analysis of covariance and standardization as instances of prediction. Biometrics. 1982;38:613–621. [PubMed] [Google Scholar]
- Langford IH, Leyland AH, Rasbash J, Goldstein H. Multilevel modelling of the geographical distributions of diseases. Journal of the Royal Statistical Society, Series C: Applied Statistics. 1999;48:253–268. doi: 10.1111/1467-9876.00153. [DOI] [PubMed] [Google Scholar]
- Morgan-Lopez A, Fals-Stewart W. Analytic complexities associated with group therapy in substance abuse treatment research: Problems, recommendations, and future directions. Experimental and Clinical Psychopharmacology. 2006;14(2):265– 273. doi: 10.1037/1064-1297.14.2.265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neal RM. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics. 2000;9(2):249–265. [Google Scholar]
- Paddock SM, Hunter SB, Watkins KE, McCaffrey DF. Analysis of rolling group therapy data using conditionally autoregressive priors. Annals of Applied Statistics. 2011;5(2A):605–627. doi: 10.1214/10-AOAS434. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paddock SM, Savitsky TD. Bayesian hierarchical semiparametric modeling of longitudinal post-treatment outcomes from open-enrollment therapy groups. To appear: Journal of the Royal Statistical Society, Series A: Statistics in Society. to appear doi: 10.1111/j.1467-985X.2012.12002.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; Vienna, Austria: 2011. URL: http://www.R-project.org/ [Google Scholar]
- Rodríguez A, Dunson DB, Gelfand AE. The nested Dirichlet process. Journal of the American Statistical Association. 2008;103(483):1131–1154. [Google Scholar]
- Ryum T, Hagen R, Nordahl H, Vogel P, Stiles T. Perceived group climate as a predictor of long-term outcome in a randomized controlled trial of cognitive-behavioural group therapy for patients with comorbid psychiatric disorders. Behavioural and Cognitive Psychotherapy. 2009;37:497–510. doi: 10.1017/S1352465809990208. [DOI] [PubMed] [Google Scholar]
- Savitsky TD, Vannucci M. Spiked dirichlet process priors for generalized Gaussian process models. Journal of Probability and Statistics. 2010;2010:1–14. doi: 10.1155/2010/201489. (Article ID 201489) [DOI] [PMC free article] [PubMed] [Google Scholar]
- Savitsky T, Paddock S. growcurves: Semiparametric Hierarchical Bayesian Modeling of Longitudinal Outcomes. R package version 2.15.2. 2012 URL: http://CRAN.R-project.org/package=growcurves.
- Sethuraman J. A contructive definition of Dirichlet priors. Stastica Sinica. 1994;4 (2):639–650. [Google Scholar]
- Smokowski P, Rose S, Bacallao M. Damaging experiences in therapeutic groups: How vulnerable consumers become group casualties. Small Group Research. 2001;32 (2):223–251. [Google Scholar]
- Teh YW, Jordan MI, Beal MJ, Blei DM. Hierarchical Dirichlet processes. Journal of the American Statistical Association. 2006;101(476):1566–1581. [Google Scholar]
- Watkins KE, Hunter SB, Hepner KA, Paddock SM, de la Cruz E, Zhou AJ, Gilmore J. An effectiveness trial of group cognitive behavioral therapy for patients with persistent depressive symptoms in substance abuse treatment. Archives of General Psychiatry. 2011;68(6):1–8. doi: 10.1001/archgenpsychiatry.2011.53. [DOI] [PMC free article] [PubMed] [Google Scholar]





