Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Feb 28.
Published in final edited form as: Int J Biostat. 2021 Aug 13;18(2):439–453. doi: 10.1515/ijb-2019-0126

Causal Inference Under Interference with Prognostic Scores for Dynamic Group Therapy Studies

Bing Han 1,*, Susan M Paddock 2, Lane Burgette 3
PMCID: PMC9973534  NIHMSID: NIHMS1876458  PMID: 34391217

Abstract

Group therapy is a common treatment modality for behavioral health conditions. Patients often enter and exit groups on an ongoing basis, leading to dynamic therapy groups. Examining the effect of high versus low session attendance on patient outcomes is a research question of interest. However, there are several challenges to identifying causal effects in this setting, including the lack of randomization, interference among patients, and the interrelatedness of patient participation. Dynamic therapy groups motivate a unique causal inference scenario, as the treatment statuses are completely defined by the patient attendance record for the therapy session, which is also the structure inducing interference. We adopt the Rubin Causal Model framework to define the causal effect of high versus low session attendance of group therapy at both the individual patient and peer levels. We propose a strategy to identify individual, peer, and total effects of high attendance versus low attendance on patient outcomes by the prognostic score stratification. We examine performance of our approach via simulation and apply it to data from a group cognitive behavioral therapy trial for treating depression among patients in a substance use disorders treatment setting.

Keywords: Causal inference, Interference, Mental health, Prognostic score

1. Introduction

Group therapy is a common modality for the treatment of behavioral health conditions. Enrollment of patients who are in substance use disorders (SUDs) treatment settings into therapy groups is typically open - group members cycle in and out of the group as treatment regimens conclude and space becomes available for new members to join [1, 2], resulting in dynamic therapy group [3, 4, 5]. New members may join a group and existing patients may quit at different times. Some patients may also miss some sessions during their enrollment periods. The Building Recovery by Improving Goals, Habits, and Thoughts (BRIGHT) study examined group cognitive behavioral therapy (GCBT) with such dynamic therapy groups. The primary focus of this GCBT was to treat depressive symptoms in patients receiving care in residential substance abuse treatment. Throughout its course, the BRIGHT study consisted of 245 group therapy sessions [6]. Figure 1 shows how group memberships changed session-to-session for the first 25 patients enrolled into the BRIGHT study.

Figure 1:

Figure 1:

Illustration of enrollment for the first 25 patients in GCBT: each row corresponds to a patient and points in each row stand for the attended sessions. Individual treatment status Z is attending 14 or more sessions, and peer treatment status V is 50% or more peers having Z = 1.

An important question is whether there is a relationship between session attendance and patient outcomes measured post treatment. One meta-analysis focused on patients in SUD treatment settings reported an association between the number of CBT sessions attended and SUD outcomes [7]. By contrast, Cuipers et al. [8] did not find a significant association between individual CBT session attendance and depression outcomes. Very few studies have examined whether session attendance of fellow group members is associated with improved outcomes [9]. Examining group members’ contributions is motivated by the idea that group dynamics is a function of the group overall and its individual members [10]. The outcome measures of patients in a group therapy are usually mutually dependent. The dependence structure can be complicated due to the dynamic group memberships over time. Conventionally, a parametric correlation structure is assumed in modeling the outcome measure. Secondary analyses of data from such studies have been undertaken to examine the association of non-randomized high versus low attendance with patient outcomes.

In contrast to prior analyses focusing on establishing associations, we define the causal treatment effect of high versus low attendance in GCBT on patient outcomes using the Rubin Causal Model (RCM) [11]. Under the RCM framework [11], potential outcomes are defined as the outcomes that would have been observed under each possible treatment status, and the average treatment effect is defined as the mean difference between potential outcomes under the treatments of interest. As only one of all possible potential outcomes can be observed per individual, assumptions are required to identify the average treatment effect. One such assumption that is commonly adopted is the stable unit treatment value assumption (SUTVA), also called the “no interference” assumption [11]. Under SUTVA, potential outcomes of one individual are not affected by the treatment statuses of others.

However, the SUTVA assumption is questionable in the dynamic therapy groups. Group processes have been shown to be associated with outcomes, including depression [12], a key outcome in our motivating application. A consequence of interference is that the potential outcomes for focal individuals depend not only on their own treatment status (i.e., the individual-level treatment), but also on the treatment statuses of others (i.e., the peer-level treatment) [13, 14, 15]. For a study with N individuals and a treatment that is binary at the individual level, there would be 2N potential outcomes for a focal individual, considering both the individual-level and the peer-level treatments. By contrast, there are just two potential outcomes for a focal individual under SUTVA, since the potential outcomes for individual i are not related to the treatment statuses of other individuals.

To reduce the difficulty of 2N potential outcomes, the SUTVA assumption can be relaxed by assuming partial interference, such that interference exists within certain groups of individuals but not across groups, for groups such as schools [14], neighborhoods [16, 17], or a general social network [18]. Partial interference assumptions in this setting specify who interferes with whom, such as whether only those directly versus indirectly connected interfere with one another [19, 20, 21, 18]. Partial interference limits the set of potential outcomes to be considered in estimating a treatment effect.

A network among members describes a structure in which certain partial interference of members must be considered, i.e., potential outcomes may vary according to the network. Li et al. [21] characterized two types of networks. First, the network of interference among individuals is fixed, and treatment statuses are assigned by a random mechanism exogenous of the network. Examples of this type include varying vaccine coverage across fixed geographic areas, where the spatial relationship among regions was the network of interference and the vaccine coverage statuses was the treatment [17]; and the retention of kindergarten students in given schools, where the school rosters were the network of interference and grade retention was the treatment [14]. Second, the network of interference is random but individuals’ treatment statuses are fixed. Li et al. [21] considered an example where freshmen in college were randomly assigned to dormitories (i.e., the network of interference) and their admission statuses (regular versus talented) were fixed.

Our motivating GCBT study yields a new type of causal inference under interference: the network of interference (i.e., group memberships) is random but individuals’ treatment statuses are fully determined by the network. In dynamic therapy groups, the group memberships are the session attendance records, e.g., Figure 1. This attendance record is one instance of many possible attendance records. We focus on the dichotomous treatment statuses of high versus low attendance in a GCBT for a tractable application of the RCM that can be used to guide clinical practice. Similar modeling strategies have been used in prior studies, e.g.,[14, 21]. We also employ the prognostic score [22] technique for estimation.

This paper is organized as follows. In Section 2 we briefly review a conventional parametric approach to group therapy data, i.e., the multiple membership model. In Section 3, we present our prognostic score approach to identify causal effects in a dynamic group under the new type of causal inference under interference. In Section 4 we describe the details of the estimation procedure based on the identification results in the previous section. In Section 5 we report a numerical simulation to demonstrate the performance of the proposed approach. In Section 6 we apply the proposed method to estimate the causal effects of high versus low attendance in the BRIGHT study. Section 7 concludes with discussion. The appendix sketches the proof for the key identification results in Section 2.

2. Classic multiple membership model

The multiple membership model [23, 24, 4, 25] has been widely applied to analyze data from groups with changing memberships over time. The multiple membership model is a special instance of the linear mixed-effect model, where a weight matrix accounts for the correlations among outcomes due to changing group structures. The weight matrix can be seen as a standardized record of attendance for the entire course of a GCBT. Let N and R denote the total number of patients and the total number of sessions in a group therapy study, respectively. Let ri denote the total number of sessions patient i had participated. A weight matrix, denoted by M, is an N × R matrix such that the (i, j)th element equals 1ri if the ith participant attended the jth session, and equals zero otherwise. Hence, the row sums of the weight matrix are all equal to one. For example, consider the following weight matrix

1000.50.50.333.333.3330.

In this example, the first patient participated in the first session only, the second patient participated in the first and the third sessions, and the third patient participated in the first three sessions.

Let Y = (Y1, Y2, …, YN)′ denote the vector of patient-level end-stage outcome, and let Z = (Z1, Z2, …, ZN)′ denote a vector of treatment exposure, e.g., 1 for high attendance and 0 for low attendance, as defined by a clinical threshold. The classic multiple membership model is

Y=λ0+λ1Z+Xβ+MΦ+ϵ, (1)

where λ1 represents the association between treatment exposure and the outcome, Xβ is the individual-level covariate fixed effect, is the random effect through the multiple membership weights, where Φ = (Φ1, Φ2, …, ΦR)′ and Φj, j = 1 … R are i.i.d. with mean zero and an unknown variance, and the error vector ϵ is again i.i.d. and independent of the random effect. The model can be readily fitted using standard modeling procedures for linear mixed-effect models. Extensions to this formulation have been proposed in the literature. For example, the session random effects Φ has been modeled by a Bayesian approach using conditionally autoregressive priors [2]. In our motivating application, we used the i.i.d. random effects for themes, where a theme consists of four consecutive sessions. See the data analysis section for more details.

Model (1) assumes that the dynamic group structure only introduces a correlation among outcomes through the random effects, but does not affect the mean of the outcome. This assumption is questionable given the expectation that therapy group participants will interact, implying that peers with certain attributes might affect a focal individual’s outcome. The classic multiple membership model can be readily extended to include such peer effects.

We define a patient’s peers as other patients who attended at least one session in common with the focal patient. Other definitions are also possible, such as anyone who started earlier than patient i and shared a common peer, or the peers of a patient including other patients who share at least k sessions with patient i. Let the peers of patient i be denoted by an index set s(i)j:1jN,ji. In this paper we assume that js(i) implies is(j), i.e., the definition of peers is commutative. Given the enrollment record M and a definition of peers, the index sets s(1), s(2), …, s(N), i.e., peers of each individual, are also determined.

For an individual-level predictor in X, we can define the corresponding peer-level predictor as a summary statistic, such as sample mean or median, of the same individual-level predictor calculated among one’s peers. Let W denote the collection of all peer-level predictors. Conceptually peer-level covariates can be even richer than individual-level predictors. Other possible peer-level predictors may include variations of individual-level predictors (e.g., sample standard deviation or inter-quartile range), modes of individual-level predictors (e.g. male majority or not), or discretized ordinal summary measures (e.g., cutting baseline BDI scores to majority below 13, majority between 14 and 17, and other wise).

Similarly, we can define the peer-level treatment status as Vi=IjS(i)ZjS(i) threshold , where S(i) denotes the number of patients in S(i). The threshold can be based on prior clinical knowledge, a sample quantile, or a hard cutoff such as 0.5. The definition of peer-level treatment statuses may also take into consideration of time order of group therapy attendance. For example, instead of using the individual level treatment status Zj, jS(i) to define Vi for the focal patient i, we can use the partial count of sessions of peer j by the time patient i left the study to define j’s actual treatment status concurrent with patient i’s duration. Such a time-varying definition would require a large enough sample size so that all combinations of peer- and individual-level treatment statuses have sufficient sample sizes.

In this paper we use a hard cutoff of 50% based on individual-level treatment status and ignoring the time order, i.e., Vi=IjS(i)ZjS(i)0.5. We choose this threshold based on three main considerations. First, prior methodology studies in observational studies with treatment interference used either the same or similar percentage cutoff to define peer-level treatment status. For example, Hong and Raudenbush [14] defined a school as a “retention school” if 50% kindergarteners were retained in their Kindergarten year. Second, to our knowledge there is no guidance in the substantive literature to define a clinically meaningful cutoff for a patient’s peer. The cutoff of 50% differentiates the higher half and the lower half of all patients’ peers and can be seen as the first attempt to explore the potential role a peer’s effect. Third, studies conducted in group therapy settings such as the BRIGHT study do not have a large sample size. The thresholds for both individual and neighborhood treatment statuses need to be set carefully so that all treatment levels can have a reasonable sample size. In particular, the subsample Vi = Zi = 0 needs to be sufficiently large in order to fit a reasonable prognostic model.

Let V = (V1, V2, …, VN)′ denote the vector of peer-level treatment statuses. The extended multiple membership model with peer effects is

Y=λ0+λ1Z+λ2V+λ3VZ+Xβ+Wγ+MΦ+ϵ, (2)

where VZ represents the element-wise product between V and Z. This two-way interaction term VZ defines four distinct associations by individual- and peer-level treatment statuses:

  • Individual-level treatment effect: λ1 when peer-level treatment status is 0, and λ13 when peer-level treatment status is 1.

  • Peer-level treatment effect: λ2 when individual-level treatment status is 0, and λ2 + λ3 when individual-level treatment status is 1.

  • Total effect: λ1 + λ2 + λ3.

By definition, the total effect is equal to the sum of one peer effect and the corresponding individual effect. When the two peer effects (or the two individual effects) are unequal, the total effect has two distinct decompositions. In Model (2), the mean outcome of an focal individual includes the fixed peer effects. The outcomes among patients may also be correlated through the random effect and the weight matrix M.

3. Potential outcomes and identification

To overcome the inherent shortcomings in the parametric multiple membership model, we propose a causal inference framework to analyze the GCBT data. We adopt the general notation used in prior studies for causal inference under interference [14, 15]. Let Yi(𝓜) denote the potential outcome of the ith patient that could be observed under a hypothetical and fixed attendance record 𝓜. Note that the ith patient’s potential outcome generally depends on other patients through two ways: first, causally one’s potential outcome depends on others’ attendance through 𝓜; second, potential outcomes of different patients, e.g., Yi(𝓜) and Yj(𝓜), may be correlated.

There are a large number of possible records of attendance 𝓜, and hence many potential outcomes. Only one of the many potential outcomes is observed, corresponding to 𝓜=M, where M is the observable record in the current study, i.e., the weight matrix used in the multiple membership model. We adopted the simplification assumption in previous studies for interference [14, 15]. Let z1, … zN denote the individual-level treatment statuses corresponding and v1, …, vN denote the peer-level treatment statuses, all corresponding to 𝓜 and an adopted definition for peers.

Assumption 1. Under a hypothetical record of attendance 𝓜,

Yi(𝓜)=Yizi,vi+ei(𝓜),

where Eei(𝓜)=0, ei(𝓜) Yizi,vi. In addition, Yi(z, v), i = 1, …, N under the same (z, v) are i.i.d. sampled from a common distribution and E[Y (z, v)] denotes the common expectation. Yi(z, v), i = 1, …, N and all z, v values, are referred to as the essential potential outcome.

Similar to the multiple membership model (2), Assumption 1 breaks down a potential outcome into two terms: an essential potential outcome Yi(zi, vi), which is similar to the fixed effect in (2), and the second term ei(𝓜) which mimics a random effect with mean zero and allows for correlation of potential outcomes across patients. Same as prior studies in the causal inference literature, the large number of potential outcomes is reduced to four essential potential outcomes. For example, consider two sets of hypothetical attendance records 𝓜 and 𝓜. If the individual- and peer-level treatment statuses are the same for one individual, i.e., zi=zi, and vi=vi, we have EYi(𝓜)=EYizi,vi=EYizi,vi=EYi𝓜.

A few distinct average effects of the treatment can be defined by the expectations of essential potential outcomes.

  • Total effect: E[Y (1, 1)] – E[Y (0, 0)]

  • Peer effects: E[Y (1, 1)] – E[Y (1, 0)], E[Y (0, 1)] – E[Y (0, 0)], where the two peer effects may be assumed equal or unequal.

  • Individual effect: E[Y (1, 1)] – E[Y (0, 1)], E[Y (1, 0)] – E[Y (0, 0)], where the two individual effects may be assumed equal or unequal.

Next we introduce a key assumption to identify a causal effect from the observed data, which is an extension of the original ignorability assumption in the RCM that has been adopted by others studying treatment effects under interference, e.g., [13, 14].

Assumption 2 (Ignorability) {Yi(z, v), for all z, v} and M are conditionally independent given (Xi, Wi), where Xi are individual-level covariates for an patient i and Wi are peer-level covariates. Namely

Yi(0,0),Yi(0,1),Yi(1,0),Yi(1,1) MXi,Wi

Note that the conditional independence in Assumption 2 is not automatically satisfied. Although M and all Xi, i = 1, …, N define Wi for a focal patient i, the relationship is not 1-to-1. Many distinct M, Xi, i = 1, …, N can yield the same Wi. Assumption 2 essentially assumes that the observed information at the individual and peer level is sufficient to identify the essential potential outcomes. This assumption, although standard in the RCM framework, can be violated in practice when unobservables are not ignorable. However, without non-ignorable unobservables, Wi can contain sufficient observed information from peers, as long as it can identify essential potential outcomes. In practice, Assumption 2 intends to simplify peers’ impact to a focal patient’s essential outcome through some parsimonious choices of Wi.

By Assumptions 1 and 2, the observed attendance record M has three ways to affect a focal patient’s observed outcome. First, M determines the treatment status Z, V and hence the unconditional mean of the observed outcome, i.e., E[Y (Z, V)]. Second, M and all individual-level predictors determine the peer-level covariates Wi. Third, M also relates to the error term ei(M), which generate correlations in observed outcomes among patients.

Assumption 3 (Positivity) 0 < P (Zi = z, Vi = v) < 1, for all z, v.

Assumption 4 (Common support) The four conditional distributions fXi,Wi|Zi=z,Vi=v have the same support for all i, z, v.

Under Assumptions 1 to 4, the causal effects can be identified through iterated expectations of observed outcomes. We state two important identification results below.

Proposition 1 (Identification by matching covariates) Under all assumptions above, further assume X1, …, XN are an i.i.d. sample from a study population. The expectation of the essential potential outcome in the same population is identified by

E[Y(z,v)]=EEYi(M)|M=𝓜,Zi=z,Vi=v,Xi,Wi,

where PZi=z,Vi=v,M=𝓜>0, for some 1 ≤ iN, and the outer expectation is taken over the marginal distribution of Xi.

Proposition 1 suggests if two patients have identical covariates, then they can be seen as sharing the same set of essential potential outcomes. For simplicity in presentation, consider four patients with distinct treatment statuses sharing exactly the same covariate values. Then each patient’s observed outcome is an unbiased estimate for the unobserved essential potential outcome of any of the other three patients. If each patient with an observed treatment status can be matched in covariates to patients in the other three treatment statuses, then averaging the difference within matched tuples will yield an unbiased estimate for the average treatment effects. However, such a matching procedure can be difficult due to the many covariates. One may also attempt to define a vector-valued propensity score, such as PVi,ZiXi,Wi to simplify the covariate matching process. However, note that the treatment statuses are determined by the attendance record, i.e., Vi and Zi are functions of M. This is different from the usual setting for the propensity score, where the treatment status is still random given the other observed data. Additional technical assumptions are needed to estimate the propensity score from the observable data. See, e.g., [18]. Proposition 1 also allows for a direct modeling approach for the outcome by adjusting for covariates, e.g., the multiple membership model (2), provided that the model is correctly specified and the number of covariates is reasonably small.

Instead of adapting the propensity score, in this paper we state an identification result using an alternative approach, the prognostic score [22]. Let ψi=ψXi,Wi be a function such that Xi,Wi Yi(0,0)ψi. Then ψi is a prognostic score. Further assume that either

Yi(z,v)Xi,Wiψi,orYi(z,v) Xi,Wiψi,mXi,Wi, (3)

for all (z, v), where m(Xi, Wi) is called the effect modifier.

Proposition 2 (Identification by prognostic score) Under all assumptions above,

Yi(z,v) Mψi,mXi,Wi. (4)

In the absence of effect modifier, the prognostic score ψi is sufficient for identifying all four potential outcomes.

Proposition 3 (Estimation by prognostic score) Under all assumptions above, and X1, …, XN are an i.i.d. sample from a study population, the expectation of the essential potential outcome in the same population is identified by

E[Y(z,v)]=EEYi(M)M=𝓜,Zi=z,Vi=v,ψi,mXi,Wi, (5)

where PZi=z,Vi=v,M=𝓜>0, for some 1 ≤ iN, and the outer expectation is taken over the marginal distribution of ψi, m(Xi, Wi).

4. Estimation of average treatment effects

The usual choice of prognostic score is ψi=EYi(0,0)Xi,Wi, or a linear predictor for g(E[Yi(0, 0)]), where g() is a monotonic transformation. Since the true value of prognostic scores are unknown, we need to estimate it by applying an outcome model for the subset of patients with observed treatment statuses Zi = 0, Vi = 0. Such a model is referred to as the prognostic model, which can be fitted using historic data independent of the current study, or by the subsample with Zi = 0, Vi = 0 in the current study when historic data are not available. In our application with no available historic data, we used a multiple membership model for the subsample with Zi = 0, Vi = 0 and did not include any treatment status predictor in the model.

Given the estimated prognostic score, a causal effect of interest can be estimated by many strategies, such as stratification and regression adjustments. The stratification estimator splits the patients into several prognostic score strata. Within each stratum the treatment effect is assumed identical. The average treatment effect is then estimated by averaging across all strata. The regression adjustment is an outcome model adjusting for the prognostic score as a predictor in an outcome model.

In this paper we propose a hybrid approach using stratification and regression adjustment. We first stratify the patients by the sample quantiles in the estimated prognostic scores. Assuming no effect modifiers, we then fit an outcome model adjusting for the prognostic strata:

Y=Ψζ+λ1Z+λ2V+λ3VZ+Xβ+Wγ+MΦ+ϵ, (6)

where Ψ is a set of stratum indicators and ζ stands for their coefficients. The total, peer, and individual effects can be estimated by linear contrasts among λi, i = 1, 2, 3. The last two terms in (6) are the random effects for statistical dependence. Conventionally many observation studies using propensity scores used 5 to 10 propensity score strata. We adopted the same convention and used 5 prognostic score strata in our data analysis. In clinical studies with a small to moderate sample size, a basic rule of thumb in statistics is to check the ratio of the number of parameters to the sample size in model (6). We did not include effect modifiers here, but the model can be readily extended to include effect modifiers by interactions between (Z, V) and (X, W).

To check the prognostic balance by the estimated prognostic scores, we consider a diagnostic based on Proposition 2, which says that the potential outcome is independent of observed treatment statuses after adjusting prognostic scores and effect modifiers. We can regress the estimated prognostic scores, denoted as Y^(0,0) on (V, Z), Ψ, and, if applicable, m(X, W). If all coefficients related to (V, Z) are jointly significant, Proposition 2 does not hold. Either the prognostic score model is inadequate, or the stratification adjustment is insufficient, or both.

The estimation error in the prognostic model can add additional uncertainty in the final effect estimates. While an analytic result to account for this estimation error is difficult, we consider an alternative way by parametrically bootstrapping the prognostic model (4). Specifically, we first draw B samples from a multivariate normal distribution with mean β^*, γ^* and variance equal to the their sampling variance estimate. Denote the resulted prognostic score strata by Φb, b = 1, …, B, in each of the bootstrap sample. Next, for each bootstrap model, we fit the outcome model (5). The variance of the effect estimates is

1Bb=1Bvar^θ^b+1B1b=1Bθ^b1Bb=1Bθ^b2, (7)

where θ^b denotes the estimator for a causal effect, var^θ^b is the model-based variance estimate based on the bth sample of prognostic score. The first term in (7) may be considered as the within-sample variance and the second term as the between-sample variance in λ^. This variance decomposition is analogous to the rules in combining estimates from multiply imputed datasets [26].

The estimation strategy in this section falls in the scope of the semi-parametric methods for causal inference [27]. Our strategy is based on the nonparametric identification assumptions in Section 3 and the parametric component, including the multiple membership prognostic model, and the outcome model adjusting for prognostic strata. These parametric components are not essential to the general prognostic score method, but the relatively small sample size in our clinical trial requires a parsimonious modeling technique. Another implicit parametric assumption is the definition of peer-level covariates. In this paper, we always use scalar-valued summaries, e.g., sample means of individual-level covariates among peers. When the sample size is not a restricting factor, all of these parametric restriction should be relaxed. For example, the prognostic model can use a modern nonparametric prediction tool instead of a linear model. The outcome model can also be more flexible, for example, by a partially linear model where the prognostic score serves as the nonparametric component. In fact, the current formulation (6) can be seen as such a partially linear model with a crude nonparametric component, i.e. piecewise linear. In addition, various effect modifiers can be included in the outcome model to improve the goodness of fit. Two or more summary measures, e.g., sample mean and sample standard deviation, can be used in the peer-level covariates.

5. Simulations

5.1. Simulation for efficiency comparison

We first conducted a simulation study to compare the efficiency of the proposed prognostic score approach relative to standard covariate adjustment for estimating the parameters used to derive treatment effect estimates, λ1, λ2, and λ3. The data generating strategy consists of three steps. First, we generated a single baseline covariate X from a uniform distribution on (1, 10). Second, the total number of sessions R was set to 200. We then generated the record of attendance M. For simplicity, we assumed that all patients had continuous attendance, where the first session was randomly assigned, and the total number of attended sessions was sampled by a Poisson distribution with the mean equal to 1 + 2X, and truncated by the maximum sessions one can possibly attend. The individual-level treatment status was set to 12 or more sessions, and the peer-level treatment status was set to be 55% or more peers with individual-level treatment status equal to 1. These choices made the sample sizes in all four treatment modalities roughly balanced. Third, we used two outcome generating processes, where Z and V are defined as previously described. Then:

Yi=1+.1Wi+.1Xi+.2Zi+.2Vi+.3ZiVi+MiΦ+ϵi, (8)
Yi=1+Wi62+.05Xi+.2Zi+.2Vi+.3ZiVi+MiΦ+ϵi, (9)

where Mi is the ith row in the weight matrix M. In both outcome processes, the random session effects Φ follow i.i.d. N (0, .42), and the random error ϵi follow i.i.d. N (0, .22). The first process (8) has a linear mean function so that standard covariate adjustment in a multiple membership outcome model should suffice. The second process (9) represents a nonlinear situation where standard covariate adjustment may not work well. For each outcome process, we considered two sample sizes: N = 200 and 400. This gives a total of four data generating scenarios. In each scenario, we simulated 10 sets of fixed effects, and 100 sets of random effects for each set of fixed effects, or a total of 1,000 datasets.

For each simulated dataset we fit three versions of the outcome model. The first version did not include prognostic score strata and it served as a benchmark for comparison. It was still a multiple membership model with standard covariate adjustment for X and W. The other two approaches used (6) as the outcome model with five prognostic score strata, with the difference being how the two sets of prognostic score strata were estimated. One prognostic score model was a linear model and the other prognostic score model was a polynomial regression, both between observed Y and X, W and for the subsample of Z = 0, V = 0.

Table 1 summarizes the simulation results. For simplicity in presentation, we reported the performance in estimating the three regression coefficients before the treatment status indicators. When N = 200, the three methods have almost identical performance under the linear outcome process (8). Under the nonlinear outcome process (9), the benchmark method has sizable biases in estimating (λ1, λ2, λ3). By contrast, the two prognostic approaches have substantially smaller biases. The linear prognostic approach has a moderate reduction in biases, and the nonlinear prognostic approach has even smaller biases. The standard errors are very close to the true sampling standard deviation. The comparisons are similar under N = 400. Overall, the prognostic score-based approaches have smaller biases without sacrificing precision. The prognostic score approaches did not completely remove biases. The fundamental reason is that stratification with a fixed number of strata will never be a perfect match on prognostic scores. A second cause is that both the outcome model (6) is always misspecified in this simulation, because the actual outcome generating process has a quadratic relationship with Wi.

Table 1:

Summary of simulation results for efficiency comparison: numbers are multiplied by 1,000. Benchmark: Covariate adjustment; Linear: Prognostic strata based on a linear model plus covariate adjustment; Nonlinear: Prognostic strata based on a polynomial model plus covariate adjustment.

N=200
Outcome process Adjustment λ1 λ2 λ3
(Equation) Method Bias sd se¯ Bias sd se¯ Bias sd se¯
Linear (8) Benchmark −1 85 79 −4 74 78 7 77 81
Linear −2 83 80 −7 77 81 10 82 85
Nonlinear −2 85 80 −6 76 81 8 82 83
Nonlinear (9) Benchmark 129 64 79 −52 50 67 −289 109 150
Linear 97 65 73 −47 58 63 −156 118 141
Nonlinear 85 67 68 −3 53 61 −79 122 137
N=400
Outcome process Adjustment λ1 λ2 λ3
(Equation) Method Bias sd se¯ Bias sd se¯ Bias sd se¯

Linear (8) Benchmark 6 49 55 8 45 45 −10 50 50
Linear 6 49 55 8 45 46 −9 49 51
Nonlinear 7 50 55 8 47 46 −9 51 52
Nonlinear (9) Benchmark 54 67 68 165 55 65 −271 68 68
Linear 15 69 66 80 63 65 −142 75 71
Nonlinear 4 77 63 68 67 61 −98 77 76

5.2. Simulation for effect modification

We conducted the second simulation study to demonstrate how the proposed prognostic score approach handles effect modification. We reused the data generating strategy for covariates and attendance record in the previous section. We further generated a single effect modifier, denoted by Xi* from a discrete distribution with the support of (−1, 1) with an equal probability of 0.5. We revised the outcome generating strategy (9) to

Yi=1+Wi62+.05Xi+.3ZiViXi*+MiΦ+ϵi, (10)

By (10), the modifier Xi* does not contribute to the prognostic score, but affects potential outcomes in the group of Zi = Vi = 1. We also set the other two treatment groups with no effects. Since Xi* is symmetrically distributed around 0, the average effect of the treatment group Zi = Vi = 1 is 0. The coefficient λ3 = .3 reflects the extent to which Xi* modifies the treatment effect for this group. Together, this is a relatively simple and special case of effect modification as defined in (3), but suffices to demonstrate the possible role an effect modifier could play. We applied the nonlinear prognostic score approach in the previous section with and without effect modification to illustrate estimates with or without the modifier. In all analyses, we did not assume knowing the zero-valued treatment effects anywhere. We used the sample size of 400 patients and 200 sessions and generated 1,000 datasets.

Table 2 summarizes the simulation results. When the analysis did not use the modifier, the estimated average treatment effects were always zero with negligible biases in all three treatment modalities. When the modifier was omitted, the prognostic score approach estimated the average effect integrating over the sample distribution of the effect modifier. The result may still be interpretable if the sample distribution of the modifier is realistic or at least meaningful. When the correct effect modifier was included, the detailed effect modification was well estimated.

Table 2:

Summary of simulation results for effect modification: numbers are multiplied by 1,000

No effect modification
λ1 = 0 λ2 = 0 λ3 = 0
Bias sd mean(se) Bias sd mean(se) Bias sd mean(se)
17 53 55 15 40 45 −20 86 67
Effect modification
λ1 = 0 λ2 = 0 λ3 = 0
Bias sd mean(se) Bias sd mean(se) Bias sd mean(se)

1 23 23 −3 21 20 −3 51 53

5.3. Simulation for data-intensive prognostic model

We conducted the third simulation study to explore the possibility to use modern prediction tools in estimating the prognostic scores. Based on the data generating strategy under section 5.1, we further enrich the mean function to be

Yi=1+Wi62+.05Xi+.2sinπXi/5+.1WiXi+.2Zi+.2Vi+.3ZiVi+MiΦ+ϵi, (11)

In each simulated dataset, we first generated 10 independent sites, where each site consisted of 100 patients and 200 sessions. Data of a site was generated according to the strategy in section 5.1 and (11). Under this simulation scenario, each therapy group in a site was still at the size of current clinical trials, but the same therapy was implemented across multiple sites, and the total sample size is sufficient to enrich both the prognostic model and the outcome model. We used the random forest method by the randomForest library in R to fit the prognostic score and used 50 strata. In the outcome model we used stratum-specific fixed effects for X and W through interactions with stratum indicator and X or W. We did not include an interaction term between X and W because it may be unlikely to use such a term in practice. We generated 100 simulation datasets. The simulation results were very good: the sample bias (SD) for λ1, λ2, and λ3 were .023 (.033), .021 (.034), and −.037 (.042), respectively. The model-based SE also close to the sampling SD but moderately larger. By contrast, the classic multiple membership model had dismal biases due to severe misspecification but the SD was smaller due to the parsimonious technique: the results were −1.25 (.028), −1.70 (.026), and 2.56 (.033) in the same order as above.

6. Analysis of the group cognitive behavioral therapy data

The Building Recovery by Improving Goals, Habits, and Thoughts (BRIGHT) study was a clinical trial of a GCBT intervention delivered by substance abuse treatment counselors to patients with persistent depressive symptoms [6]. All study participants had a Beck Depression Inventory-II (BDI-II) [28] score indicating mild-to-severe depressive symptoms (BDI-II > 17) at baseline. A complete course of GCBT for a patient in the BRIGHT study was 16 sessions, which were divided into four modules of four sessions each, with each module focused on a particular theme. Patients were expected to complete the 16 sessions over an 8-week period by attending sessions twice per week. Each session was co-led by two substance abuse treatment counselors. Counselors received two days of training plus practice leading the intervention coupled with weekly supervision from a licensed clinical psychologist to maintain fidelity of the intervention. A patient was expected to enter the GCBT at the beginning of any module and complete 16 sessions across four modules. However, the actual situation was dynamic therapy groups due to various reasons. There were 245 GCBT sessions and 60 modules offered during the course of the BRIGHT study, as very few modules had more than four sessions to accommodate holidays and make-up sessions. Further details about study design and additional eligibility criteria are available elsewhere [6]. Among the 140 eligible patients assigned to the GCBT, 132 attended at least one therapy session. Our analysis focuses on those 132 patients. The number of therapy sessions a patient attended ranges between 1 and 19, with a median of 13 and an IQR of 8. The main outcome of interest is the BDI-II depression symptom score following the GCBT, measured three months after entry into the study. Individual-level covariates include age, gender, the baseline BDI-II score, marital status, employment status, and proportion of time spent with different session leader. Peer-level covariates are the sample mean or sample proportion of the individual-level covariates of one’s peers.

We define the individual-level treatment status as whether a patient attended at least 14 sessions. This threshold is deemed as clinically meaningful and also close to the sample median of the number of sessions a patient attended. A patient’s peers were defined as all other patients who attended at least one session with patient i. The peer treatment status is defined as whether more than a half of a patient’s peers having individual positive treatment status. The peer-level covariates are defined as the mean age, % male, and the mean baseline BDI-II score among peers of a patient. Among the 132 patients, 43 have both high individual- and peer-level session attendance (Z = 1, V = 1), 19 have high individual-level session attendance and low peer-level attendance (Z = 1, V = 0), 24 have low individual-level attendance and high peer-level attendance (Z = 0, V = 1), and the remaining 46 have low individual- and peer-level attendance (Z = 0, V = 0).

We use three methods to estimate treatment effects. First, we make simple two-sample comparisons between any two treatment statuses defined by (Z, V), without any adjustment. This comparison is biased due to the lack of randomization in treatment statuses, and the inference is incorrect due to not accounting for correlations, but otherwise serves as descriptive statistics. Second, we fit the multiple membership model (2). Third, we apply the proposed prognostic score adjustment in Section 4. The GCBT data has a relatively small sample size (N = 132) and a large number of sessions (R = 245), which makes it difficult to include the session random effects in Equations (2) and (6). We instead include random effects for the 60 modules. The prognostic model includes the three individual-level covariates and their peer analogues and random module effects to model correlation among outcomes for patients attending the same sessions. We divide the estimated prognostic scores into four strata using the sample quartiles. We use the diagnostics discussed in Section 4 to check if the prognostic score strata are sufficient. We regress the estimated prognostic score on (Z, V) and the prognostic score strata. The asymptotic F-test (d.f.=3, 70) for the overall effect of (Z, V) has a p-value of 0.68. This suggests that the four prognostic strata may be adequate and effect modifiers may not be needed. We bootstrap the prognostic model by 100 times to apply the adjusted variance in Equation (6). We then fit the outcome model (Equation 6) adjusting the prognostic score strata.

Table 3 shows the estimated treatment effects on BDI-II depression scores post-treatment. Our proposed prognostic strata-based approach shows a large total effect estimate (9.5 points reduction) relative to the corresponding estimates under the two-sample and classic multiple membership model approaches of −6.2 and 6.6 points, respectively. The standard error estimates are largest under the classic multiple membership modeling approach, even with the uncertainty induced by the prognostic strata estimation accounted for in the results shown in Table 3.

Table 3:

Mean (sd) of estimated treatment effects of high versus low GCBT session attendance on BDI-II depression symptom scores: “two-sample” are two-sample comparisons without any covariate adjustments; “classic” is the multiple membership model; “prognostic” is the proposed prognostic score adjustment approach

Effect Two-sample Classic Prognostic
Individual effect 1: E[Y (1, 0) – Y (0, 0)] −6.7 (3.8) −5.4 (3.7) −6.7 (3.2)
Individual effect 2: E[Y (1, 1) – Y (0, 1)] −1.7 (2.8) −1.5 (3.4) −2.2 (3.1)
Peer effect 1: E[Y (0, 1) – Y (0, 0)] −4.5 (3.7) −5.1 (4.0) −5.9 (3.2)
Peer effect 2: E[Y (1, 1) – Y (1, 0)] 0.5 (2.9) −1.1(4.2) −2.8 (3.3)
Total effect: E[Y (1, 1) – Y (0, 0)] −6.2 (3.0) −6.6(3.7) −9.5 (2.9)

Breaking down the total effect into peer and individual components reveals substantively interesting findings. The total effect is equal to the sum of Individual effect 1” and Peer effect 2.” The Peer effect 2 results indicate that, among individuals who attend a relatively large number of sessions, the effect of the level of their peers attendance on their outcomes is small-to-negligible — Peer effect 2 under the prognostic approach indicates a change of BDI-II of −2.8 points (SE 3.3). Individual effect 1 is, as expected, close to the total effect. The total effect in this case is mainly associated with the individual effect. In contrast, a comparison of Individual effect 2 and Peer effect 1 suggests a different trade-off between peer and individual contributions to the total effect. The peer effect in this case explains more of the total effect than the corresponding individual effect, suggesting that peers play an important role in the group-based intervention for individuals who attend relatively low numbers of sessions. Thus, the individual effect cannot be interpreted and understood without the peer effect, and vice versa.

Lastly, we conduct some sensitivity analyses . Nineteen patients (14%) have missing BDI-II measurement in the follow-up. Missing values are imputed by the sample mean in the same treatment status and prognostic score stratum as the patient with missing BDI-II scores. We test additional covariates (race/ethnicity, education, employment, and marital status). These checks yield similar estimates for all treatment effects. We also perform analyses using two alternative cutoffs on individual- and peer-level treatment statuses: individual-level treatment is defined as 15 sessions or more and peer-level cutoff is lowered to 40%; and individual-level treatment is defined as 13 sessions or more and peer-level cutoff is raised to 60%. Effect estimates under the alternative definitions cannot be directly compared to Table 3 because the treatment definitions differ. However, the total effect under both alternative definitions are significant at the .05 level, and there are some relatively minor differences in the individual and peer effect estimates.

7. Discussion

We present a novel application of the Rubin Causal Model framework to estimate treatment effects of an open-enrollment group cognitive behavioral therapy attendance on post-treatment depressive symptoms. We reduce the set of potential outcomes considered in two ways: first, we employ a partial interference assumption to define peers among whom interference occurs; and second, we define a scalar function, vi, to characterize peer treatment statuses. Compared with other studies for causal inference under interference, our application is unique in that both the individual and peer treatment statuses are based on the group therapy attendance records, instead of an exposure to an external factor. Given that session attendance was not randomized, we adjust for prognostic score strata [22]. The proposed method can be applied to other dynamic group settings. Implications of our approach to causal inference in networks is another area of future study.

Our approach aligns with theoretical arguments [10] and empirical findings [12] about importance of groups in the therapeutic process. This literature supports two aspects of our approach - interference among patients should be addressed when defining treatment effects and the total effect of group therapy is a function of both individual and peer effects. Our findings show different magnitudes of the individual treatment effect for the two peer treatment statuses, with the individual treatment effect of attending a relatively large number of sessions of greater magnitude when peers were less likely to attend a relatively large number of sessions.

Provided that no important unobservables were missed, the results for the BRIGHT study suggested important causal pathways to realize the group therapy’s effect on patients’ depression symptom score. The covariates we used were significantly related to the outcome. In particular, controlling for the baseline BDI score was necessary and important to encompass many potential factors. We have conducted various sensitivity analyses and checked the goodness of fit of the prognostic score. Our results are robust to various alternative configurations.

That being said, a major limitation of our proposed method for group therapy data analyses is posed by the small sample size of our data set, which is typical of therapy group intervention studies. This limited our ability to examine alternative definitions of Z and V, such as defining relatively high session attendance by a different threshold (e.g., half of the targeted 16 sessions), examining alternative definition of peers, and examining alternative specification for random effects in prognostic score and outcome modeling. We were also not able to apply more sophisticated modeling formulation beyond (6). Future research is warranted to develop approaches that result in a tractable set of potential outcomes while avoiding or mitigating the difficulties of dichotomizing the underlying treatment status variable. In this paper, we did not take into account the time order among patients: peers were defined as long as patients shared sessions in common. The temporal feature of an interference network also needs further investigation.

Acknowledgments

This work was supported by the National Institute On Alcohol Abuse And Alcoholism of the National Institutes of Health [R01AA019663]. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Appendix

Proposition 1 is a direct result from Assumption 2.

Proof of Proposition 2.

fYi(z,v),Mψi,mXi,Wi=fYi(z,v),Mψi,mi=fYi(z,v),MXi,Wi,ψi,mifXi,Wiψi,midXidWi=fYi(z,v)ψi,mifMXi,Wi,ψi,mifXi,Wiψi,midXidWi=fYi(z,v)ψi,mifM,Xi,Wiψi,midXidWi=fYi(z,v)ψi,mifMψi,mi.

Proof of Proposition 3.

Noticing that (Vi, Zi) is determined by M,

E[Y(z,v)]=EEYi(z,v)ψi,mXi,Wi=EE[Yi(z,v)ψi,mi}=EEYi(z,v)M=𝓜,Vi=v,Zi=z,ψi,mi by Proposition 2=EEYiZi,ViM=𝓜,Vi=v,Zi=z,ψi,mi=EEYiZi,Vi+ei(M)M=𝓜,Vi=v,Zi=z,ψi,mi by Eei(M)M=0= EEYi(M)M=𝓜,Vi=v,Zi=z,ψi,mi.

Footnotes

Conflict of Interest: None declared.

Data availability statement:

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  • [1].Morgan-Lopez AA and Fals-Stewart W Analytic methods for modeling longitudinal data from rolling therapy groups with membership turnover. Journal of Consulting and Clinical Psychology, 75(4):580–593, 2007. [DOI] [PubMed] [Google Scholar]
  • [2].Paddock Susan M, Hunter Sarah B, Watkins Katherine E, and McCaffrey Daniel F. Analysis of rolling group therapy data using conditionally autoregressive priors. The Annals of Applied Statistics, 5(2A):605–627, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Bauer DJ, Sterba SK, and Hallfors DD Evaluating group-based interventions when control participants are ungrouped. Multivariate Behavioral Research, 43(2):210–236, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Cafri Guy, Hedeker Donald, and Aarons Gregory A. An introduction and integration of cross-classified, multiple membership, and dynamic group random-effects models. Psychological Methods, 20(4):407–421, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Burgette Lane F and Paddock Susan M. Bayesian models for semicontinuous outcomes in rolling admission therapy groups. Psychological methods, 22(4):725, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Watkins Katherine E., Hunter Sarah B., Hepner Kimberly A., Paddock Susan M., de la Cruz Erin, Zhou Annie J., and Gilmore Jim. An effectiveness trial of group cognitive behavioral therapy for patients with persistent depressive symptoms in substance abuse treatment. Archives of General Psychiatry, 68(6):1–8, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Magill M and Ray LA Cognitive-behavioral treatment with adult alcohol and illicit drug users: a meta-analysis of randomized controlled trials. Journal of Studies of Alcohol and Drugs, 70(4):516–527, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Cuipers P, Huibers M, Ebert DD, Koole SL, and Andersson G How much psychotherapy is needed to treat depression? A metaregression analysis. Journal of Affective Disorders, 149:1–13, 2013. [DOI] [PubMed] [Google Scholar]
  • [9].Kivlighan DM, Owen J, and Antle B Members’ attendance rates and outcomes of relationship education groups: A consensus-dispersion analysis. Journal of Family Psychology, 31(3):358–366, 2017. [DOI] [PubMed] [Google Scholar]
  • [10].Agazarian YM Theory of the invisible group applied to individual and group-as-a-whole interpretations. Group, 7(2):27–37, 1983. [Google Scholar]
  • [11].Rosenbaum Paul R and Rubin Donald B. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41–55, 1983. [Google Scholar]
  • [12].Burlingame Gary M., Theobald McClendon Debra, and Alonso Jennifer. Cohesion in group therapy. Psychotherapy, 48(1):34–42, 2011. [DOI] [PubMed] [Google Scholar]
  • [13].Hong Guanglei and Raudenbush Stephen W. Effects of kindergarten retention policy on children’s cognitive growth in reading and mathematics. Educational Evaluation and Policy Analysis, 27(3):205–224, 2005. [Google Scholar]
  • [14].Hong Guanglei and Raudenbush Stephen W. Evaluating kindergarten retention policy: A case study of causal inference for multilevel observational data. Journal of the American Statistical Association, 101(475):901–910, 2006. [Google Scholar]
  • [15].Vanderweele Tyler J, Hong Guanglei, Jones Stephanie M, and Brown Joshua L. Mediation and spillover effects in group-randomized trials: A case study of the 4rs educational intervention. Journal of the American Statistical Association, 108(502):469–482, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Sobel Michael E. What do randomized studies of housing mobility demonstrate?: Causal inference in the face of interference. Journal of the American Statistical Association, 101(476):1398–1407, 2006. [Google Scholar]
  • [17].Hudgens Michael G. and Halloran M. Elizabeth. Toward causal inference with interference. Journal of the American Statistical Association, 103(482):832–842, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Forastiere Laura, Airoldi Edoardo M, and Mealli Fabrizia. Identification and estimation of treatment and interference effects in observational studies on networks. arXiv preprint arXiv:1609.06245, 2016. [Google Scholar]
  • [19].Aronow Peter M, Samii Cyrus, et al. Estimating average causal effects under general interference, with application to a social network experiment. The Annals of Applied Statistics, 11(4):1912–1947, 2017. [Google Scholar]
  • [20].Ogburn Elizabeth L, VanderWeele Tyler J, et al. Vaccines, contagion, and social networks. The Annals of Applied Statistics, 11(2):919–948, 2017. [Google Scholar]
  • [21].Li X, Ding P, Lin Q, Yang D, and Liu JS Randomization inference for peer effects. Journal of the American Statistical Association, 114(529):1651–1664, 2019. [Google Scholar]
  • [22].Hansen Ben B. The prognostic analogue of the propensity score. Biometrika, 95(2):481–488, 2008. [Google Scholar]
  • [23].Hill Peter W and Goldstein Harvey. Multilevel modeling of educational data with cross-classification and missing identification for units. Journal of Educational and Behavioral Statistics, 23(2):117–128, 1998. [Google Scholar]
  • [24].Goldstein Harvey. Multilevel statistical models Arnold, 2003. [Google Scholar]
  • [25].Raudenbush Stephen W and Bryk Anthony S. Hierarchical linear models: Applications and data analysis methods, volume 1. Sage, 2002. [Google Scholar]
  • [26].Rubin Donald B. Multiple imputation for nonresponse in surveys, volume 81. John Wiley & Sons, 2004. [Google Scholar]
  • [27].Kennedy Edward H. Semiparametric theory and empirical processes in causal inference. In Statistical causal inferences and their applications in public health research, pages 141–167. Springer, 2016. [Google Scholar]
  • [28].Beck AT, Steer RA, and Brown GK Manual for the Beck Depression Inventory-II Psychological Corporation, 1996. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

RESOURCES