Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Nov 3.
Published in final edited form as: Exp Clin Psychopharmacol. 2006 May;14(2):265–273. doi: 10.1037/1064-1297.14.2.265

Analytic Complexities Associated with Group Therapy in Substance Abuse Treatment Research: Problems, Recommendations, and Future Directions

Antonio A Morgan-Lopez 1, William Fals-Stewart 1
PMCID: PMC4631029  NIHMSID: NIHMS55969  PMID: 16756430

Abstract

In community-based alcoholism and drug abuse treatment programs, the vast majority of interventions are delivered in a group therapy context. In turn, treatment providers and funding agencies have called for more research on interventions delivered in groups in an effort to make the emerging empirical literature on the treatment of substance abuse more ecologically valid. Unfortunately, the complexity of data structures derived from therapy groups (due to member interdependence and changing membership over time) and the present lack of statistically valid and generally accepted approaches to analyze these data have had a significant stifling effect on group therapy research. The purpose of this article is to (a) describe the analytic challenges inherent in data generated from therapy groups, (b) outline common (but flawed) analytic and design approaches investigators often use to address these issues (e.g., ignoring group-level nesting, treating data from therapy groups with changing membership as fully hierarchical), and (c) provide recommendations for handling data from therapy groups using presently available methods In addition, promising data analytic frameworks that may eventually serve as foundations for the development of more appropriate analytic methods for data from group therapy research (i.e., non-hierarchical data modeling, pattern mixture approaches) are also briefly described. Although there are other substantial obstacles that impede rigorous research on therapy groups (e.g., evaluation and measurement of group process, limited control over treatment delivery ingredients), addressing data analytic problems is critical for improving the accuracy of statistical inferences made from research on ecologically-valid group-based substance abuse interventions.

Keywords: group therapy, data analysis, missing data, pattern mixtures, multiple membership


Driven by both therapeutic and economic considerations, the vast majority of treatment for substance-abusing patients in community-based programs is delivered in a group therapy format (e.g., Price et al., 1991; Stinchfield, Owen, & Winters, 1994). Conversely, even a cursory review of the empirical literature reveals that most research to date has been on the evaluation of substance abuse treatment interventions delivered in individual-based, one-on-one counseling. Assuming that a fundamental goal of substance abuse treatment research is to inform treatment providers about best practice methods, the disparity between the form and format of treatment in research contexts versus community settings naturally begs the question, “Why isn’t the study of group therapy more common in substance abuse treatment research?” Although there is no single or simple answer to this question, there are clear barriers that have made the study of group therapy1 truly daunting.

A fundamental (although largely untested) assumption about therapy delivered in groups is that the interdependence of members serves as a primary curative mechanism. This interdependence is believed to facilitate positive change among individual participants (e.g., Flores, 1988; Yalom, 1995). The perpetual changes in participant membership over the life of a therapy group lead to changes, both gradual and dramatic, in the group’s environment, norms and internal processes. These changes in membership are very likely to influence participants’ treatment response and eventual outcomes. For example, in the context of substance abuse treatment, the addition of a new group member who is, at the time of admission, not motivated to achieve abstinence and is seeking companions with which to also use drugs, may have very negative effects (e.g., premature drop-out of members, relapses) on standing group members whose levels of commitment to abstinence are ‘shaky.’ Conversely, the addition of such a new member to group of patients with long-term sobriety who are also strongly committed to continued abstinence may ultimately have a very positive effect on the new member (e.g., eventual commitment to long-term sobriety, formation of a new, more healthful social network). The complexities created by the addition of a single member to a standing group as described in this simple example notwithstanding, they pale in contrast to most therapy groups running in substance abuse treatment programs, in which multiple members may be added or leave the group at any given point; standing members may miss multiple sessions consecutively or intermittently over the course of the group, and so forth.. The challenge for investigators is to understand and describe the dynamic and reciprocal effects between the group and its members when both are in a continual state of flux.

Thus, the effects of interdependence of group participants and group membership changes over time, with their influence on the group as whole and on the individual members, create very complicated analytic issues for investigators who wish to appropriately model data derived from therapy groups. A primary barrier in the analysis of data from therapy groups is an assumption that is inherent to most models under the generalized linear mixed model family; namely, that the composition of the treatment group does not change over time. This is an assumption that is clearly not tenable in the context of typical therapy groups running in substance abuse treatment programs. The vexing analytic problems inherent in data generated from therapy groups are widely recognized; yet, our own review of the statistical literature, and in consultation with many recognized experts in statistical methods in the U.S. and abroad, revealed that methods to analyze such data have simply not yet been fully explicated.

Unfortunately, it is easy for reviewers of manuscripts or grant applications to highlight the limitations of the analytic methods chosen to model data from therapy groups, which can negatively influence summaries and recommendations, often to a substantial degree. However, with no statistically valid or generally recommended approaches presently available to analyze these data, what are investigators to do? Judging by the great imbalance of federally funded research favoring the study of substance abuse treatments delivered in a one-on-one counseling format over group-delivered interventions, the implicit choice that investigators have made is relatively clear. More specifically, the presence of this unsolved problem has had a decided stifling effect on group therapy research in substance abuse treatment, with many investigators avoiding research on therapy groups altogether and focusing on individual-based treatments. Although this strategy avoids the problems inherent in studies that use therapy groups, it comes with a heavy price, particularly in terms of ecological validity. More pointedly, many providers in community-based treatment programs characterize the majority of efficacy trials examining substance abuse treatments as largely inconsequential to their work because these studies do not focus on interventions they deliver, particularly group therapy.

Recently, the voice of the treatment community has been heard; there is now a concerted effort by federal funding agencies, with encouragement from Congress, to promote a more ‘community-friendly’ substance abuse treatment research portfolio (National Institute on Drug Abuse, 2002); this will inevitably lead to more research on group-delivered treatments. Thus, the strategy of avoiding group therapy research, with all its complexity notwithstanding, will be less viable in the emerging funding environment.

Thus, the purpose of this article is to examine some of the methodological, logistical, and analytic problems in conducting research that include therapy groups, with a strong emphasis on the latter. The emphasis on analytic problems is critical, given the implications that analytic problems have for making correct (and possibly incorrect) inferences in group-based substance abuse treatment trials (e.g., advocating for a treatment that may be ineffective in reality). We also review common, but flawed (to a greater or lesser extent) approaches investigators use to deal with these problems. We provide recommendations to researchers who are confronted with analyzing data from therapy groups using presently available methods. Lastly, we also describe two emerging and related analytic modeling frameworks from which to understand therapy groups, both of which may ultimately serve as foundations for the development and refinement of statistically valid methods to handle these data.

Research on Group Therapy for Substance Abuse

Evidence of Efficacy

Weiss et al. (2004) recently completed a comprehensive meta-analytic review of treatment outcome studies comparing interventions delivered in a group therapy context to other treatment conditions for patients with substance use disorders. The investigations reviewed were classified into one of the following six categories: (a) group therapy versus no group therapy; (b) group therapy versus individual therapy; (c) group therapy plus individual therapy versus group therapy alone; (d) group therapy plus individual therapy versus individual therapy alone; (e) group therapy versus another group therapy with different content or theoretical orientation; and (d) more group therapy versus less group therapy.

In general, these authors concluded that specialized groups can enhance the effectiveness of treatment-as-usual or waitlist control conditions. However, results did not demonstrate reliable outcome differences between groups with different therapeutic content; moreover, significant differences did not emerge between interventions delivered in a group or individual modality. Given the presumed superior cost profile of group-based versus individual-based interventions, the latter finding is particularly promising and provides a strong cost-effectiveness justification for the use of therapy groups in drug and alcohol treatment programs.

Paucity of Research

Among the most notable aspects of the Weiss et al. (2004) article was the dearth of research on group therapy for substance abuse. Their search of the empirical literature, spanning more than 30 years, yielded only 24 prospective substance abuse treatment outcome studies comparing group therapy and other conditions. This aspect of their review highlights a glaring disconnect between the way substance abuse treatment is delivered in community-based practice versus delivery in efficacy trials. As stated by these authors. “The discrepancy between the widespread use of group therapy in clinical practice and the paucity of research on this topic stems, in part, from the inherent difficulties in conducting meaningful research on group therapy” (p. 348). If investigations on substance abuse treatment and its effectiveness are to have greater ecological validity and assume more relevance to the treatment community, investigators must identify and overcome these “inherent difficulties” and make research on group therapy a staple of their programmatic research.

Barriers Impeding Research on Group Therapy

Logistical and Methodological Problems

A meeting recently convened by the National Institute on Drug Abuse (National Institute on Drug Abuse, 2003) identified several interrelated methodological, logistical, and data analytic challenges encountered in studying therapy groups. Among the most vexing issues identified by the panel had to do with group membership enrollment paradigms. In clinical trials and in community-based treatment programs, two general types of group admission procedures are used. In closed-enrollment groups, or what are often referred to as “closed groups,” all participants are enrolled before the group begins and group membership is ostensibly designed to remain constant (although member drop out or discharge usually makes this more the exception than the rule). In contrast, in open-enrollment groups, or “rolling groups,” participants are permitted to initiate treatment at different times, either at specified entry points during the group (semi-rolling) or, more commonly, at any point during the life of the group (full-rolling).

Closed-enrollment groups have the advantage of minimizing (although not eliminating) the effects of changing group membership on outcomes; yet, the recruitment process has its own liabilities. In particular, a requisite number of group members have to accumulate before the group can start; the inherent delay in this process increases the likelihood that potential participants will lose interest in the group and seek help elsewhere (e.g., Coviello et al., 2001). In addition, if drop out rates are high, membership may dwindle during the course of the group, to the point where the group itself is no longer viable. Although these were identified as barriers to conducting research with closed-enrollment groups, these are also related to the practical problems encountered with the use of closed-enrollment groups in standard community-based treatment practice. Results of a survey we completed revealed that the majority of project directors and treatment providers in outpatient substance abuse treatment programs reported they used rolling groups far more frequently than closed groups for reasons that were both practical and clinical (Fals-Stewart, 2005). In particular, those interviewed reported that rolling admissions allowed them to sustain the groups in the face of high program (and group) dropout and avoided delays in treatment initiation. In addition, rolling groups tend to have a better cost-benefit profile than closed-enrollment groups because (a) patients can enter rolling groups almost immediately (versus waiting for a closed group to form), resulting in more billable sessions and, relatedly, (b) once started, rolling groups have the potential to be perpetually ongoing and therefore always generating revenue (compared to non–revenue generating periods for closed groups when they end or are waiting to form).

However, use of rolling groups brings its own set of unique problems. Because patients are entering at different points over the life of the group, it becomes difficult to develop or follow a structured, sequential treatment approach that builds on previously presented material. In addition, the varying group composition over time introduces substantial heterogeneity, which can be difficult to manage. Lastly, members may be uncomfortable in a group that continually adds new members, making it difficult for members to disclose sensitive personal information, thereby potentially impeding clinical progress.

Data Analytic Challenges

When we move from a focus on methodological and logistical problems to analytic concerns, other issues become evident. In particular, if we attempt to understand how therapy groups work by collecting information from patients during and after participation, the question becomes, “Can we analyze data generated from these groups appropriately?”

Two significant considerations confront investigators who conduct, or plan to conduct, research on group therapy, each of which has important implications for data analysis: interdependence of group participants and changing membership. By their very nature and design, participation in therapy groups creates a degree of interdependence among the members; indeed, this is one of the defining and, it is hoped, desirable qualities of therapy groups. However, this non-independence makes data analysis more complex than, for example, data from studies that use individual-based treatments. More specifically, to draw correct inferences from analysis of data generated from members of therapy groups, this interdependence must be accounted for in the data analytic model.

In a hypothetical closed enrollment group with no change in membership, members are fully nested within the group, creating a clear hierarchical data structure. Although there may be changes in the degree of member interdependence over the course of treatment (e.g., it may increase), this can be captured in presently available state-of-the-art hierarchical data modeling approaches, whether analyzed in multilevel regression or structural equation modeling frameworks (Hox, 2002).

In reality, closed enrollment therapy groups are more dynamic in substance abuse treatment settings because members drop out during the course of the group (which can be substantial in community-based treatment programs). When rolling groups are considered, they are, by design, far more complex than that of closed groups because of ongoing membership additions and removals. Unfortunately, models for analysis of data derived from groups with changing membership have not been fully explicated. Because group membership gradually or abruptly changes over time (e.g., new members are added to the group intermittently while other members drop out or are removed), participants are not, in an analytic sense, consistently nested within a given group because it is not the same group over time (at least in terms of member composition). Conversely, the changing groups are not wholly different each time new members are added or members leaves; groups are similar to one another over time (although the degree of that similarity is likely to vary substantially as a function the change in membership at each time point).

This phenomenon is shown in Figure 1, which graphically illustrates membership in a therapy group during a 5-week period. In this fictional rolling group, the addition and removal of multiple members, missed sessions, and drop out are all evident, highlighting the complexities of trying to understand the behavior of the group and its members, as well as the reciprocal effects they have on each other. In this scenario, group members would best be conceptualized as being “partially nested” within the time-varying groups. The groups are similar to one another over time, in terms of membership, but are not the same from week-to-week; in statistical sense, Blalock (1990) refers to this as ‘fuzzy’ group membership. If this fictional group used a closed enrollment paradigm, group membership would change as a consequence of member drop out only; nonetheless, this would still create membership changes over time. Thus, along with accounting for member interdependence, it would also be optimal for any data analytic approach to be used with data from therapy groups to account for the temporal similarity of group membership, regardless of whether the groups under consideration are closed or rolling.

Figure 1.

Figure 1

Diagram of a rolling group admission sequence during a 5-week period.

Note. GL = Group Leader. Numbers contained within circles represent group members. Numbers in groups that are bolded, italicized, and underlined represent new members added to the group in a given week. In this group, there are five original members, as illustrated for Week 1 and members numbered 1 through 5. The next week (Week 2), a new member is added (i.e., “rolls” into) to the group (Member number 6, in enlarged, bolded, italicized, and underlined type) and one member from Week 1 (Member 3) is a “no-show.” In Week 3, two new members roll into the group (Members 7 and 8); Member 3 has returned, but Member 2 is not present and is a dropout (i.e., does not attend any other groups). In Week 4, no new members roll in; Members 5, 6, and 8 do not attend. Member 6 is a dropout. In Week 5, two new group members roll in (i.e., Members 9 and 10); Member 8 returns for this group.

Handling Data from Therapy Groups: Common Approaches and Their Pitfalls

With these analytic challenges notwithstanding, many investigators who wish to do research on therapy groups are ultimately faced with task of trying to analyze data from these groups appropriately (or, more accurately, most appropriately) with the tools that are presently available. Because statistically valid approaches to analyze such data have not been fully developed specifically for the context of changing group membership, investigators have typically been forced to choose among three available, but less-than-optimal analytic and design approaches.

Approaches that Ignore Group-Level Nesting

The most common and oldest approach is to ignore the interdependence of data generated from group members; that is, do not account for the “group” level in the analysis and treat individuals of groups as if they are fully independent of each other. As has been explicated in many articles and texts during the last quarter century, failure to disaggregate data into individual-level and group-level variance components in this fashion often leads to a serious risk of committing Type I error (cf. Barcikowski, 1981; Kish, 1987; Hox, 2002; Snijders & Bosker, 1999). Estimates of standard errors from traditional analytic approaches that assume independence of observations are too small and are more likely to produce spurious “significant” results. Increased likelihood of Type I errors at the individual level, when failing to account for group-level variability, can lead to the consequence of advocating for substance abuse treatments that may be ineffective in reality. In other circumstances, particularly when examining within-individual changes (e.g., growth models), failure to account for variability in outcomes over time due to repeated measurements from the same individuals (i.e., within-individual variability) can lead to unnecessarily conservative tests (i.e., increased Type II error; Moerbeek et al., 2003) for between-individual effects (i.e., “Level-2” effects).

Another approach to modeling data from therapy groups is to separate variability in outcomes into individual-level and group-level components for the sole purpose of correction of individual-level standard errors. These disaggregation approaches are appropriate for contexts where group level variability is not important per se, as the analysis typically “hides” group-level variability. These methods treat interdependence among the group members as a nuisance and attempts to eliminate it. However, particularly in the study of groups, the interdependence is very often the phenomenon of greatest interest. Eliminating interdependence from the models may mask the very essence of what therapy groups are and how they bring about therapeutic change.

Hierarchical Modeling Approaches

A more modern approach is to treat data from therapy groups as fully hierarchical and to analyze the data in readily available mixed model, multilevel, or structural equation modeling frameworks. The underlying assumption of such a modeling scheme is that the group remains the same over time and that membership changes do not affect this appreciably (i.e., in the analysis, “group” is treated as a level in the model and group members are fully nested within the group). This is a more elegant approach than disaggregation because it attempts to model member interdependence (as opposed to ignoring it or eliminating at as a nuisance). However, the inherent bias of this approach is that it ignores the dynamic changes of the group as members are added (in the case of rolling groups) and removed (in the cases of rolling and closed groups). As noted earlier, group participants are not fully nested within a given group because the group gradually changes over time. In other words, members are partially nested within groups. Fully hierarchical models are most useful for closed-enrollment that do not have appreciable member drop out, but may be more limited for typical closed groups (that very often have high rates of drop out) and rolling groups. From a data analytic perspective, treating participants as fully nested within groups when they are not results in a model that is under-specified because there are sources of variation that have not been included. This can lead to significant under-estimation of standard errors for parameters and inflated Type I error (Rasbash & Browne, 2001).

In practical terms, treating therapy groups that have changing membership as fixed groups also makes a critical assumption that may not be tenable. More specifically, treating therapy groups with changing membership as fixed groups makes the assumption that individuals who enter the treatment group late or who drop out of treatment early come from the same population as individuals who stay in the treatment group consistently. It also assumes that the treatment effect will be consistent across each of these “subgroups”. By not making any provision for differences in treatment efficacy among these subgroups (i.e., completers, early dropouts, late starters) standard multilevel analyses will miss subgroups for whom the treatment may be less effective, particularly those that leave treatment early.

Perhaps the most readily available and analytically sophisticated approach to analyzing data derived from therapy groups involves treating elements of the data as both hierarchical and non-hierarchical. As described by Hill and Goldstein (1998), the most common of these approaches is cross-classified modeling. Patients would be viewed as being nested in a series of groups (as the group changes over time with the addition and deletion of members), not just a single group. Of the three approaches, this modeling approach may come closest to the reality of capturing the group structure as seen in most contexts. However, an assumption of these models, at least as they are most typically constructed, is that groups are treated as independent of each other which, of course, they are not (i.e.,. the “different” groups share members over time). In contrast to treating the data as fully hierarchical, this approach is most likely to over-estimate the amount of variation between groups, leading to an inflation of standard errors and an increase in Type II error.

Design Approaches

Although not a modeling approach, it is important to mention another tactic used to deal with the methodological complexities of therapy groups which is likely very often used but not openly promoted or acknowledged. Recognizing the analytic complexities of modeling data generated from therapy groups, coupled with the lack of a well-accepted, statistically valid approach to handling these data, many investigators circumvent this issue completely by avoiding the inclusion of therapy groups in the designs of their studies. Although it is difficult to quantify how often experimental design decisions are made with this concern as a motivating factor, it was recognized as a strategy that is relatively common among substance abuse treatment investigators (National Institute on Drug Abuse, 2003). Certainly, using individual-based intervention approaches allows for the application of well-established and widely accepted modeling approaches. Because modeling strategies for therapy groups with changing membership have not been fully developed, avoiding their use is understandable, at least from a grantsmanship and publication standpoint. After all, investigators seeking funding or attempting to publish their research wish to avoid less favorable priority scores and negative reviewer critiques, respectively.

With all other factors being equal, if inclusion of therapy groups leads reviewers to highlight the analytic issues inherent in designs that include them, investigators will tend to avoid their use. As noted, however, all other factors are not presently equal; researchers are being strongly encouraged by federal agencies and community providers to conduct research that is more ecologically valid. With this mandate will come more research on therapy groups; as such, investigators need guidance on how to handle the data that will be derived from these trials with presently available methods.

Recommendations for Analysis of Data Generated From Therapy Groups Using Presently Available Approaches

With the lack of an accepted modeling approach to draw from the statistical tool chest, what options do substance abuse treatment researchers realistically have when they are faced with the complexities of group therapy research? As we are sure is now evident, none of the options available will be optimal in most instances. Thus, it is understandable why many investigators have chosen to avoid the issue completely using individual-based interventions in the designs of their studies. However, this strategy is scientifically unacceptable unless there is a compelling rationale that precludes the use of therapy groups in a given trial (e.g., situations where treatment in groups may be iatrogenic; see Moos, 2005, for a discussion). Therapy groups are so widely used in community practice that an unwillingness to study them in research designs, however understandable, will perpetuate problems of ecological validity that have plagued substance abuse treatment research to date.

Similarly, it is difficult to make the case that for completely ignoring group membership and treating data as if individual participants were independent of each other. With the significant statistical problems notwithstanding (i.e., inflation of Type I error and spurious significant results), much has been written during the last half century about conceptual problems associated with analyzing data at one level and formulating conclusions about another level (e.g., Kreft & de Leeuw, 1987; Robinson, 1950). In this case, drawing any inferences about the group based on data analysis that ignore the group-level clustering will very often lead to incorrect inferences; this is referred to as the atomistic fallacy (Alker, 1969). Thus, if there is an interest in the effect of the group, this approach, on both statistical and conceptual grounds, is untenable.

In situations where the investigator is not substantively interested in the effect of the group, methods that account for the dependencies of the data resulting from participation in group, but treat it as a nuisance that is accounted for in the modeling, may be appropriate. Generalized estimating equations treats any clustering as a nuisance and makes adjustments based on the extent of clustering (Hardin & Hilbe, 2003). Similarly, correction formulae can be used to adjust standard errors (based on the degree of member interdependence as measured by the intraclass correlation coefficient) when an analytic method is used that does not account for clustering (for an overview, see Gulliford, Ukoumunne, & Chin, 1999). The major conceptual disadvantage of treating clustering effects as a nuisance is that the effect of group is masked (i.e., no explicit output of the group-level intraclass correlation). In most cases, it would seem more advantageous to isolate and quantify the extent to which the group influences its individual members, leaving the investigator with the option of ignoring this effect if he or she so chooses.

In most instances, however, the effect of the group will be of some interest and, in many cases in the study of therapy groups, may be of primary importance. In particular, if the effect of the group on the individual and, relatedly, how group environment and processes may influence group members differentially is of interest, hierarchical data modeling approaches (conducted in a multilevel regression or a structural equation modeling framework) may be the most appropriate. Although treating the data from therapy groups as either fully hierarchical or as cross classified can lead to biased inferences as noted and thus are not optimal, these models will nonetheless come closest to capturing the group process than others described thus far. Recognizing the limits of the modeling approaches that are presently available, the ‘best’ approach may be to conduct analyses of group therapy data using multiple methods and determining if the pattern of results is consistent across methods (i.e., sensitivity analysis); if there is consistency in results, this would increase confidence that the inferences are reasonable.

Lastly, developing and disseminating statistical methods to analyze data from therapy groups with changing membership is an area of research that we and others are pursing. Thus, it is important for investigators to collect time series data that captures fluctuations in different aspects of group process and environment vis-a-vis changing membership. Eventually, such information will be valuable when valid methods have been developed and are available to investigators.

Future Directions: Promising Modeling Frameworks for Data from Therapy Groups

As we hope is evident, it is imperative that the problems associated with analyzing data from therapy groups with changing membership are recognized and addressed by theoretical and applied statisticians, with the aim of developing and promulgating valid and accessible modeling approaches. An important starting point is to identify general modeling frameworks that can capture issues related to both group participant interdependence and time-varying changes in group membership; from there, statistical approaches can be developed and refined.

What modeling frameworks may best fit the unique challenges of group therapy data structures? In our view, the key to conceptualizing the problem of (and solutions for) group therapy data is that both rolling and closed therapy groups can reasonably be seen as both (a) a non-hierarchical data modeling problem and (b) a pattern mixture/missing data problem. Although we and others have not fully developed methods for application to therapy group data with changing membership, each of these analytic frameworks holds some promise as foundations for ultimately addressing the inherent data analytic complexities we have been describing.

Non-Hierarchical Data Modeling Approaches

Much of the statistical literature on multilevel analysis has focused on data structures that are purely hierarchical; in many applications, it is assumed that the data are nested unambiguously. However, this assumption is not always justified; in fact, it is commonly the case that data derived from many situations have a complex hierarchical structure. When lower level units are influenced by more than one higher level unit, we have crossed random effects. For example, consider a study by Raudenbush (1994) in which he examined cognitive growth in children from grades 1 through 4. As children change grades, they are exposed to different teachers. Thus, the Level-1 observations were the time-series data for each child and the Level-2 units were the children. However, children were not fully nested within teachers; children had different teachers in different grades. In this instance, children and teachers were crossed factors, which can be accounted for via crossed random effects analysis.

In a similar vein, the structure of therapy groups with changing membership results in crossed factors (group members are crossed with groups), with group members being treated by many different, but not wholly distinct, groups (i.e., the group changes gradually as members enter and leave, resulting in “many” groups). A particular group member spends a proportion of time in a series of different groups. In this instance, the group member has ‘multiple membership’ of units at the group-level of clustering.

Goldstein (2003) and Raudenbush (1993) present a general framework for handling complex hierarchical data structures with random cross-classifications. Rasbash and Goldstein (1994) provide a description of a method for estimating cross-classified models using a fairly standard hierarchical formulation and a set of dummy variables (0, 1) for each unit of one of the cross-classified random variables. The dummy variables are used as explanatory variables in the random part of the model; typically, the variances of the random coefficients of these dummy variables are constrained to be equal, thus allowing estimation of between-unit variance.

However, a standard assumption of these models is that the effects of the higher units (in this case, groups) are mutually independent (Snijders & Bosker, 1999). Unfortunately, this assumption is not tenable with most therapy groups, in which the variation in groups from, say, week-to-week, is gradual. Thus, in this case, the assumption of group independence over time is very likely to be violated. The goal is to include in the model the degree of group similarity and variation, as a consequence of time and changing membership, and how this may influence treatment response and outcome.

A multiple membership modeling approach we are exploring extends standard random crossed-classified modeling to encompass the situation present by therapy groups with changing membership. As noted earlier, standard dummy coding is typically used to fit cross random effects models. We are presently experimenting with assigning weights other than zero and one to represent each group, with the weights varying as a function of membership overlap in each temporal iteration of the group over time. Using weights other than zero or one to indicate multiple unit membership may provide the basis for handling the fuzzy group membership inherent in most therapy groups used to treat substance abuse.

Pattern Mixture Approaches

Another way to view the structure of data collected from treatment groups is to consider the patterns of available and missing data due to differences in attendance patterns. For example, in Figure 1, group member #1 does not miss any sessions. Group member #2 comes to group the first two weeks and drops out for the remainder of the trial. Group member #8 does not join the group until Week 3, misses Week 4 and returns at Week 5.

From an analytic standpoint, each of these three people are in the same treatment group (for the purpose of group-level nesting) and, by default, the same treatment condition (i.e,. “experimental” or “treatment-as-usual”/control condition). However, each of these three group members may have (a) different responses to treatment that are dependent on length of stay in treatment, reasons for entering treatment “late” and reasons for leaving treatment early (i.e., attendance group X treatment interaction effects) (Hedeker & Gibbons, 1997) and (b) differences in the levels of cohesiveness to the treatment group because of variability in the length of time for assumption of group-level norms. The latter may have implications for the extent to which attendance patterns impact group-level variance components because there may not be the same level of cohesion (i.e., lower intraclass correlation coefficient) among individuals who have not been in the group as long even if they are in the same treatment group.

The modeling of differences in treatment efficacy as a function of attendance (i.e., missing data) patterns has its roots in pattern mixture modeling for non-ignorable missing data (Allison, 1987; Hedeker & Gibbons, 1997; Little, 1993; Muthén, Kaplan & Hollis, 1987; Roy, 2003). Pattern mixture modeling is conducted by (a) creating groups according to missing data patterns, (b) estimating the model of interest separately in each missing data group, and (c) combining estimates and standard errors across the missing data patterns in order to get a single set of treatment effect estimates for the overall sample (Allison, 1987; Hedeker & Gibbons, 1997; Muthén, et al., 1987; Roy, 2003). In the hierarchical data modeling framework, estimates are combined across missing data patterns by first estimating treatment effect by missing data pattern interactions in a standard mixed linear model framework (e.g., SAS Proc Mixed) and then using matrix procedures (e.g., SAS Proc IML) to combine the missing data group-specific estimates into a single estimate of the treatment effect.

The closest analog to this approach in structural equation modeling is found in Allison (1987), where treatment effect estimates are combined using equality constraints (but this approach can only be used with summary data (i.e., means & covariances) and not with raw data). Extensions of work on pattern mixtures include considerations for probabilistic missing data group membership (Roy, 2003). This work has also seen continued development of pattern mixture models in the context of finite-mixture SEM (Patock-Peckham & Morgan-Lopez, in press), as development of pattern mixtures in multilevel regression has outpaced development in structural equation modeling (Hedeker & Gibbons, 1997).

Conclusion

In a narrow sense, the objectives of this article were to (a) highlight the problems inherent in analyzing data from therapy groups conducted in substance abuse treatment programs, which typically have changes in membership over time, (b) note the limitations of presently available methods for modeling these data, (c) provide recommendations for analysis using available statistical methods, and (d) describe promising modeling frameworks that may lead to statistically valid analytic methods to handle these data. However, our overall aim was much broader. By raising awareness of these issues and providing recommendations from which investigators can draw when seeking funding for their research or writing research reports for publication, we also were interested in facilitating more research on group therapy in substance abuse treatment.

It would be both naïve and overly simplistic, of course, to view data analytic issues as the only barriers to group therapy research in substance abuse treatment. Without question, there are other substantial obstacles that impede research on therapy groups, which are (but are not limited to): (a) evaluating and assigning what occurs during the course of a therapy group (i.e., group process); (b) limited control over various elements of treatment delivery, making identification of active ingredients extremely difficult; and (c) feasibility issues (e.g., time required to recruit a sufficient number of participants for a cohort). Our narrow focus in this article on analytic modeling issues inherent in data from therapy groups is not intended to minimize the size and scope of other barriers, but rather to illuminate the nature of this particular problem more fully. Certainly, other obstacles to group therapy research deserve equal attention and scrutiny.

With that stated, in our view and that of others (e.g., NIDA, 2003), analytic complexities inherent in group therapy data have been a major impediment for many investigators who want to conduct research in this area, but also wish to circumvent the problems we have highlighted herein so as to avoid criticisms when their grant applications and articles are reviewed. Although it difficult to estimate the magnitude of the stifling effect this has had on group therapy research in our field (net of other methodological obstacles), we believe it to be substantial. It is hoped that this paper will serve to educate the substance abuse research community (i.e., reviewers and researchers alike) about the best available methods to handle the issues we have highlighted and stimulate more research by applied and theoretical statisticians to address these pressing problems.

Acknowledgments

This project was supported, in part, by grants from the National Institute on Drug Abuse (R01DA12189, R01DA014402, R01DA014402-SUPL, R01DA015937, R01DA016236, R01DA016235-SUPL), the National Institute on Alcohol Abuse and Alcoholism (R21AA013690) and the Behavioral Health Research Division, RTI International.

Footnotes

1

In this review, the term “group therapy” signifies an intervention delivery context in which treatment is delivered to multiple individuals in a group. As used in this paper, group therapy is a type of intervention delivery format and not a particular type of therapy.

References

  1. Alker HR. A typology of fallacies. In: Dogan M, Rokkan S, editors. Quantitative ecological analysis in the social sciences. Cambridge, MA: M.I.T. Press; 1969. pp. 69–86. [Google Scholar]
  2. Allison PD. Estimation of linear models with incomplete data. In: Schuessler K, editor. Sociological Methodology 1987. San Francisco: Jossey-Bass; 1987. pp. 71–103. [Google Scholar]
  3. Barcikowski RS. Statistical power with group mean as the unit of analysis. Journal of Educational Statistics. 1981;6:267–285. [Google Scholar]
  4. Blalock HM. Auxillary measurement theories revisited. In: Hox JJ, De Jong-Gierveld J, editors. Operationalization and research strategy. Amsterdam: Swets & Zeitlinger; 1990. [Google Scholar]
  5. Coviello DM, Alterman AI, Rutherford MJ, Cacciola JS, McKay JR, Zanis DA. The effectiveness of two intensities of psychosocial treatment for cocaine dependence. Drug and Alcohol Dependence. 2001;61(2):145–154. doi: 10.1016/s0376-8716(00)00136-8. [DOI] [PubMed] [Google Scholar]
  6. Fals-Stewart W. Use of group therapy in outpatient substance abuse treatment programs: A survey of randomly selected programs in the Northeastern United States. 2005 Unpublished raw data. [Google Scholar]
  7. Flores P. Group psychotherapy with addicted populations. New York: The Haworth Press; 1988. [Google Scholar]
  8. Goldstein H. Multilevel statistical models. 3rd ed. London: Arnold; 2003. [Google Scholar]
  9. Gulliford MC, Ukoumunne OC, Chinn S. Components of variance and intraclass correlations for the design of community-based surveys and intervention studies. American Journal of Epidemiology. 1999;149:876–883. doi: 10.1093/oxfordjournals.aje.a009904. [DOI] [PubMed] [Google Scholar]
  10. Hardin JW, Hilbe JM. Generalized estimating equations. Boca Raton, Florida: Chapman and Hall/CRC; 2003. [Google Scholar]
  11. Hedeker D, Gibbons RD. Application of random-effects pattern-mixture models for missing data in longitudinal studies. Psychological Methods. 1997;2:64–78. [Google Scholar]
  12. Hill PW, Goldstein H. Multilevel modelling of educational data with cross classification and missing identification of units. Journal of Educational and Behavioural Statistics. 1998;23:117–128. [Google Scholar]
  13. Hox J. Multilevel analysis. Techniques and applications. Mahwah, NJ: Lawrence Erlbaum Associates; 2002. [Google Scholar]
  14. Kish L. Statistical design for research. New York: John Wiley and Sons; 1987. [Google Scholar]
  15. Kreft GG, de Leeuw ED. The see-saw effect: A multilevel problem? Quality & Quantity. 1988;22:127–137. [Google Scholar]
  16. Little RJA. Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association. 1993;88:125–134. [Google Scholar]
  17. Moerbeek M, van Breukelen GJP, Berger MPF. A comparison between traditional and multilevel regression for the analysis of multi-center intervention studies. Journal of Clinical Epidemiology. 2003;56:341–350. doi: 10.1016/s0895-4356(03)00007-6. [DOI] [PubMed] [Google Scholar]
  18. Moos RH. Iatrogenic effects of psychosocial interventions for substance use disorders: Prevalence, predictors, prevention. Addiction. 2005;100:595–604. doi: 10.1111/j.1360-0443.2005.01073.x. [DOI] [PubMed] [Google Scholar]
  19. Muthén B, Kaplan D, Hollis M. On structural equation modeling with data that are not missing completely at random. Psychometrika. 1987;52:431–462. [Google Scholar]
  20. National Institute on Drug Abuse. Director’s report to the National Advisory Council on Drug Abuse. Washington, DC: Department of Health and Human Services; 2002. Sep, [Google Scholar]
  21. National Institute on Drug Abuse. Request for applications for group therapy for individuals in drug abuse and alcoholism treatment. Washington, DC: Department of Health and Human Services; 2003. (RFA-DA-04-008). [Google Scholar]
  22. Patock-Peckham JA, Morgan-Lopez AA. College drinking behaviors: Mediational links between parenting styles, impulse control, drinking control, and alcohol use and problems. Psychology of Addictive Behaviors. doi: 10.1037/0893-164X.20.2.117. (in press). [DOI] [PubMed] [Google Scholar]
  23. Price RH, Burke AC, D’Aunno TA, Klingel DM, McCaughrin WC, Rafferty JA, Vaughn TE. Outpatient drug abuse treatment services, 1988: Results of a national survey. In: Pickens RW, Leukefield CG, Schuster CR, editors. Improving drug abuse treatment. Rockville, MD: National Institute on Drug Abuse; 1991. pp. 63–92. [PubMed] [Google Scholar]
  24. Rasbash J, Browne WJ. Modelling non-hierarchical structures. In: Leyland AH, Goldstein H, editors. Multilevel modelling of health statistics. Chichester: John Wiley & Sons; 2001. pp. 93–105. [Google Scholar]
  25. Raudenbush SW. A crossed random effects model for unbalanced data with applications in cross sectional and longitudinal research. Journal of Educational Statistics. 1993;18:321–349. [Google Scholar]
  26. Raudenbush SW. A cross random effects model for studying social context effects on individual growth. Multilevel Modelling Newsletter. 1994;6:2–6. [Google Scholar]
  27. Robinson WS. Ecological correlations and the behavior of individuals. American Sociological Review. 1950;15:351–357. [Google Scholar]
  28. Roy J. Modeling longitudinal data with nonignorable dropouts using a latent dropout class model. Biometrics. 2003;59(4):829–836. doi: 10.1111/j.0006-341x.2003.00097.x. [DOI] [PubMed] [Google Scholar]
  29. Snijders TAB, Bosker RJ. Multilevel analysis: An introduction to basic and advanced multilevel modeling. London: Sage Publications; 1999. [Google Scholar]
  30. Stinchfield RD, Owen PL, Winters KC. Group therapy for substance abuse: A review of the empirical literature. In: Fuhriman A, Burlingame GM, editors. Handbook of group psychotherapy: An empirical and clinical synthesis. New York: John Wiley and Sons; 1994. p. 459. [Google Scholar]
  31. Weiss RD, Jaffe WB, de Menil VP, Cogley CB. Group therapy for substance use disorders: What do we know? Harvard Review of Psychiatry. 2004;12:339–350. doi: 10.1080/10673220490905723. [DOI] [PubMed] [Google Scholar]
  32. Yalom I. The theory and practice of group psychotherapy. 4th ed. New York: Basic Books; 1995. [Google Scholar]

RESOURCES