Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Oct 1.
Published in final edited form as: J Consult Clin Psychol. 2014 Feb 3;82(5):920–930. doi: 10.1037/a0035628

Analyzing Multiple Outcomes in Clinical Research Using Multivariate Multilevel Models

Scott A Baldwin 1, Zac E Imel 2, Scott R Braithwaite 3, David C Atkins 4
PMCID: PMC4119868  NIHMSID: NIHMS557790  PMID: 24491071

Abstract

Objective

Multilevel models have become a standard data analysis approach in intervention research. Although the vast majority of intervention studies involve multiple outcome measures, few studies use multivariate analysis methods. The authors discuss multivariate extensions to the multilevel model that can be used by psychotherapy researchers.

Method and Results

Using simulated longitudinal treatment data, the authors show how multivariate models extend common univariate growth models and how the multivariate model can be used to examine multivariate hypotheses involving fixed effects (e.g., does the size of the treatment effect differ across outcomes?) and random effects (e.g., is change in one outcome related to change in the other?). An online supplemental appendix provides annotated computer code and simulated example data for implementing a multivariate model.

Conclusions

Multivariate multilevel models are flexible, powerful models that can enhance clinical research.

Keywords: intervention data, multilevel, multivariate


Analyzing Multiple Outcomes in Clinical Research Using Multivariate Multilevel Models Multilevel (mixed) models (Hox, 2010; Raudenbush & Bryk, 2002) have become a standard method for analyzing psychotherapy outcome data given the hierarchical structure of psychotherapy data, for example: (a) observations (level-1) clustered within persons (level-2) in longitudinal data (Singer & Willet, 2003), (b) patients (level-1) clustered within therapists or groups (level-2; Crits-Christoph & Mintz, 1991; Wampold & Serlin, 2000), and (c) effect sizes (level-1) clustered within studies (level-2) in meta-analysis (Hox, 2010). Moreover, it is not unusual to have additional grouping factors that can lead to three or more levels within psychotherapy data (e.g., repeated measures on individuals nested within couples; Atkins, 2005).

The hierarchical structure of psychotherapy data is important for both substantive and methodological reasons. Substantively, we are often interested in variability among higher-level factors, such as person-to-person variability or therapist-to-therapist variability (Baldwin & Imel, 2013; Crits-Christoph et al., 1991; Imel, Baldwin, Bonus, & Maccoon, 2008; Saxon & Barkham, 2012). Methodologically, the hierarchical structure leads to correlations among the observations within a cluster. For example, in group therapy, data from patients within the same group are likely to be correlated relative to data from patients in other groups. This correlation violates the assumption of independence of observations common to most statistical tests, which can lead to biased p-values, incorrect confidence intervals, and inflated effect sizes (Baldwin, Murray, & Shadish, 2005; Crits-Christoph & Mintz, 1991; Wampold & Serlin, 2000). Multilevel models accommodate the correlation among observations by modeling between-cluster variability via additional error terms called random effects (Raudenbush & Bryk, 2002; Singer & Willet, 2003) and are also more flexible regarding ignorable missing data and the correlation structure of the residuals than earlier methods (e.g., ANOVA; Singer & Willet, 2003). In addition, multilevel models can accommodate longitudinal data where participants are measured on different schedules (Hox, 2010), treat time as categorical, continuous, or some combination (Singer & Willet, 2003), be extended to situations where clustering affects some participants but not others (Baldwin, Bauer, Stice, & Rohde, 2011; Bauer, Sterba, & Hallfors, 2008), and accommodate non-normal outcomes (Rabe-Hesketh & Skrondal, 2008). Overall, multilevel models represent highly useful statistical tools for psychotherapy researchers.

However, one particular class of multilevel models has not been widely adopted within the psychotherapy research community—multivariate multilevel models, which extend multilevel models to two or more outcomes (Hox, 2010; MacCallum, Kim, Malarkey, & Kiecolt-Glaser, 1997).1 It is unusual to find psychotherapy research studies that only involve a single outcome variable. However, few researchers employ multivariate techniques when evaluating multiple outcomes, especially when multilevel models were used to analyze outcomes. As we will show, this omission results in a failure to test important theoretical questions that are best examined in the multivariate, multilevel context.

Historically, psychotherapy researchers regularly used multivariate data analysis methods, such as MANOVA, to control the experiment-wise Type I error rates. For example, a common strategy was to test for a treatment effect across outcomes using MANOVA and follow-up with series of ANOVA analyses on each outcome. Thus, even when MANOVA was used, the fundamental focus of the analysis was on univariate hypotheses. However, multivariate models can be used to address multivariate hypotheses. Examples of multivariate hypotheses include testing whether outcomes have different average rates of change (Kaysen et al., 2011) or whether change in one outcome is related to change in another (Suvak, Walling, Iverson, Taft, & Resick, 2009).

The present paper introduces multivariate multilevel models for intervention research and illustrates how to fit and interpret the models. Example data throughout focuses on examining relationships between primary and secondary outcomes in a randomized trial. We examine whether predictor variables (e.g., treatment condition) have different relationships across outcomes and model relationships between outcomes (e.g., how is change in one outcome related to change in another). Although a brief overview of univariate multilevel models included as a bridge for to multivariate models, we assume that readers have a basic familiarity with univariate multilevel models. Readers desiring a more in-depth discussion of univariate models can consult a number of excellent textbooks (Gelman & Hill, 2007; Raudenbush & Bryk, 2002; Singer & Willet, 2003). We have included an online appendix that provides annotated code for fitting these models in Stata, SAS, SPSS, R, and Mplus.

Differential Fixed Effects Across Outcomes

Most intervention studies include multiple outcomes. For example, a study comparing cognitive-behavioral therapy and acceptance and commitment therapy for depression may include depression measures as well as quality of life measures, which could be classified as primary (depression) or secondary (quality of life) outcomes. Researchers may be interested in whether treatment effects differ by outcome type for both clinical and theoretical reasons. Clinically, patients presenting for treatment of depression may be most keenly interested in the impact of treatment on their depressive symptoms, though improvement in all areas of their life would likely be welcome. Further, a cognitive-behavioral therapy that explicitly targets depression symptoms might be expected to have an earlier and potentially larger impact on these symptoms (i.e., quality of life might improve as a result of decreases in depression). Theoretically, some have argued that larger treatment effects for primary outcomes are consistent with the hypothesis that factors specific to a treatment package, rather than factors common across treatment packages, are partially responsible for change (Hofmann & Lohr, 2010).

Most researchers do not directly test for differential treatment effects across outcomes, yet they often interpret their results as if they had tested for differential effects. Indeed, in our review work, we often find researchers invoking the “eyeball test” in such situations, noting simply that effects for primary outcomes were “larger” than those for secondary outcomes. Thus, results sections from clinical research usually involve using multilevel models, or a similar analysis technique, to examine intervention effects one outcome at a time. Even if the outcomes are grouped in primary and secondary categories, such grouping is usually for the purposes of the written report rather than incorporated directly into the data analysis. The results are then summarized with respect to which outcomes had significant treatment effects and which did not. For example, McDonagh et al. (2005) randomized participants meeting criteria for posttraumatic stress disorder (PTSD) to cognitive-behavioral therapy (CBT), present centered therapy (PCT), and wait-list (WL). In the analysis, the authors examined treatment effects across each outcome (using the group×time interaction in a repeated measures ANOVA and post-hoc tests). The authors found a statistically significant intervention effect for PTSD symptoms but not for depression. The authors interpret the statistical significance for one outcome but not the other as important: “The fact that this treatment [PCT] had more of an impact on PTSD symptoms than on depressive symptoms suggests its mechanism is not simply an antidepressant effect such as has already been demonstrated for problem-solving therapy” (p. 522).

However, comparisons of intervention effects require a statistical test of the difference (cf. Nieuwenhuis, Forstmann, & Wagenmakers, 2011). In other words, we need an explicit test of whether the size of the intervention effect depends upon the outcome type—we need to test the intervention effect by outcome interaction. The idea is similar to moderator analyses in psychotherapy meta-analyses, in which one tests whether an effect size (i.e., treatment effect) varies as a function of study characteristics.

To get a sense of how often researchers analyze multiple outcomes and do not investigate whether treatment effects vary across outcomes, we reviewed randomized trials that used multilevel models published in the Journal of Consulting and Clinical Psychology from 2009 to 2012. We coded whether the study included multiple outcomes and whether the authors used multivariate methods to investigate differential treatment effects. We identified 60 randomized trials that used multilevel modeling to estimate treatment effects for multiple outcomes. Of these 60, one tested for differential treatment effects across outcomes (Jouriles et al., 2009), suggesting a profound mismatch between study design—with multiple outcomes—and study analysis.

Extending the Univariate Multilevel Model

Testing multivariate hypotheses about fixed effects can be accomplished by extending multilevel models to accommodate two or more outcomes. To illustrate the extension, we simulated data to mimic a clinical trial comparing cognitive-behavioral therapy (CBT) to a no-treatment control for the treatment of depression, three timepoints (baseline, midtreatment, posttreatment), 100 participants (50 per condition), and two outcomes—depression and quality of life. We coded time as 0, 1, and 2, with 0 representing the baseline timepoint. We also coded treatment condition (Tx) as 1 for CBT and 0 for control. In the population model, the treatment effect for depression was a .5 standard deviation difference at posttreatment (time 2) between CBT and control and there was no treatment effect for quality of life.

The univariate growth-curve models for each outcome can be written as follows (Singer & Willet, 2003):

y1ij=β10+β11Timeij+β12Txj+β13TimeijTxj+u1j+v1jTimeij+e1ij, (1)
y2ij=β20+β21Timeij+β22Txj+β23TimeijTxj+u2j+v2jTimeij+e2ij. (2)

Focusing on Equation (1), y1ij is the depression outcome at time i for person j. β10 is the overall intercept, and like all intercepts, it represents the expected depression value (i.e., mean) of the outcome when all predictors are equal to 0, in this case when Timeij = 0 (Baseline) and Txj = 0 (control). Changing the coding method for time or treatment condition (or by including other variables in the model) will alter the specific interpretation of the intercept (see Singer and Willet, 2003, for a discussion of alternative methods for coding time). β11 is the average rate of change in depression symptoms during treatment for the control condition, β12 is the mean difference between CBT and control at baseline, β13 is the difference in rate of change between CBT and control (i.e., the treatment effect), u1j is a random effect representing person-specific differences at baseline (i.e., unique baseline values for each participant), v1j is a random effect representing person-specific differences in change during treatment (i.e., unique rate of change for each participant), and e1ij is residual error. The parameters in Equation (2) have identical interpretations except they pertain to quality of life.

Because the data in our example are longitudinal, the repeated observations within an individual are correlated. The random effects described above allow us to accommodate this correlation. Specifically, the models in Equations (1) and (2) assume that observations are independent conditional on the random effects (i.e., uncorrelated once the random effects are taken into account) and that the random effects are normally distributed (Singer & Willet, 2003):

[u1jv1j]~MVN(0,[σu12σu1v1σv12]), (3)
[u2jv2j]~MVN(0,[σu22σu2v2σv22]). (4)

Focusing on Equation (3) the random effects for depression, u1j and v1j, come from a normal distribution with a mean of 0, variances of σu12 and σv12, and a covariance of σu1v12. The parameters in Equation (4) have identical interpretations except that they pertain to quality of life. It may not be obvious at first glance, but Equations (3) and (4) encapsulate one of the advantages of multilevel models. The models shown in Equations (1) and (2) model the person-to-person variability in intercepts and slopes via random effects. However, multilevel models accomplish this by assuming that these intercepts and slopes come from a distribution of intercepts and slopes that are normally distributed. Thus, as opposed estimating 100 distinct intercept and 100 distinct slopes to capture between person heterogeneity in intercepts and slopes, multilevel models accomplish this by estimating two variances and a covariance.2 The residual errors are also normally distributed, with unique residual variances for each outcome:

e1ij~N(0,σe12), (5)
e2ij~N(0,σe22). (6)

By fitting two independent models in Equations (1)(6), we implicitly assume that depression and quality of life are independent. This is untenable as two outcomes from the same participant are almost certainly related. For example, change in depression is likely correlated with change in quality of life; thus, we should estimate the covariance between the random slopes (v1j and v2j). Further, because the outcomes are repeated measures on the same person, there will likely be a relationship between the residual errors (Fieuws & Verbeke, 2004). As discussed below, ignoring these relationships across outcomes will cause problems for tests of the difference between treatment effects.

A multivariate model can address these limitations because it allows us to model the relationships between the variables via correlations amongst the random effects and amongst the residuals (Hox, 2010; MacCallum et al., 1997). To help bridge the univariate and fully multivariate model, we describe a form of the multivariate multilevel model that makes the same assumptions as the two univariate models described above and is identical except that both outcomes are model simultaneously rather than sequentially. We then show how the assumptions inherent to the univariate models can be relaxed so that a fully multivariate model can be estimated.

In the univariate models, we described the random effects as coming from two distinct multivariate normal distributions described in Equations (3) and (4). In a multivariate model—that is, a model that includes both depression and quality of life simultaneously—the random effects are drawn from a single multivariate normal distribution (Snijders & Bosker, 2012). A multivariate normal distribution consistent with Equations (3) and (4) can be expressed as follows:

[u1jv1ju2jv2j]~MVN(0,ΩG),ΩG=[σu12σu1v1σv1200σu2200σu2v2σv22]. (7)

The covariance matrix for the random effects in Equation (7), ΩG, contains the same four variance components and the same two covariances as in Equations (3) and (4). However, Equation (7) makes explicit that we assume no relationship among the random effects across outcomes by constraining the between-outcome covariances to 0. For example, the covariance in the third row and first column is fixed to 0 and represents the covariance between the random intercept for depression and the random intercept for quality of life. Like the random effects, the multivariate model assumes the residuals come from a single multivariate normal distribution rather than two univariate normal distributions (Snijders & Bosker, 2012). A multivariate normal representation of Equations (5) and (6) is:

[e1ije2ij]~MVN(0,ΩR),ΩR=[σe120σe22]. (8)

We refer to a multivariate model the estimates the growth models in Equations (1) and (2) and covariance matrices described in Equations (7) and (8) as the multivariate independent outcomes model. The independent outcomes model is simply a multivariate version of our earlier univariate models that we use as a baseline to compare multivariate models that allow a relationship between the outcomes.

As noted previously, the problem with the independent outcomes model is that it assumes that depression and quality of life are unrelated. If we are interested in testing parameters that are fundamentally multivariate, the independence model will lead to problems. For example, if our goal was to understand whether CBT had a stronger effect relative to no treatment on depression than on quality of life (i.e., a parameter that involves multiple outcomes), then the independence model will produce an incorrect hypothesis test and confidence interval, as we show below.

A multivariate model that estimates the relationships between random effects across outcomes and residuals across outcomes will produce appropriate estimates and standard errors for multivariate parameters. We call this type of model a multivariate related outcomes model. In the related outcomes model, all covariances between each random effect are estimated:

[u1jv1ju2jv2j]~MVN(0,ΩG),ΩG=[σu12σu1v1σv12σu1u2σv1u2σu22σu1v2σv1v2σu2v2σv22]. (9)

Thus, we now estimate the covariance between any combination of random intercepts and slopes—we no longer constrain the between-outcome covariances to 0 as we did in Equation (7). For example, the covariance between the random slopes, which describes the covariance between the rate of change in depression and quality of life, is σv1v2. The related outcomes model also estimates a covariance among the residual errors rather than constraining the covariance to 0 as we did in Equation (8):

[e1ije2ij]~MVN(0,ΩR),ΩR=[σe12σe1e2σe22]. (10)

The only difference between the independent outcomes and related outcomes models are the covariances among the random effects across outcomes and the covariance among the residuals. These covariances can be important because they (a) can be interpreted substantively (e.g., what is the correlation between intercepts and slopes across the two outcomes?) and (b) impact statistical tests for multivariate parameters.

Data Set-up for Multivariate Models

Before turning to the application and results of the multivariate multilevel models, we briefly comment on data set-up. Multilevel modeling software developed from a structural equation model framework (e.g., Mplus) will typically expect the data to be in the wide format shown in Table 1. Consequently, multiple equations much like those represented above in Equations (1), (2), (9), and (10) can be specified. However, for most multilevel software packages (e.g., xtmixed in Stata, PROC MIXED in SAS, lme4 in R), we must combine Equations (1) and (2) into a single equation using a set of indicator variables to define which observations go with depression and quality of life. The long dataset in Table 1 provides an example of how the data must be organized for a single equation model. The values for depression and quality of life are combined into a single outcome variable, yhij, where h indexes the outcome measure. We also create two indicator variables, dj and qj, where dj = 1 for depression and 0 for quality of life and qj = 1 for quality of life and 0 for depression. (Connecting the symbols to the actual column names in Table 1: yhij = Y, dj = Depression, and qj = Quality of Life.) We then combine Equations (1) and (2) with the indicator variables to produce a single equation:

yhij=β10dj+β20qj+β11Timeijdj+β21Timeijqj+β12Txjdj+β22Txjqj+β13TimeijTxjdj+β23TimeijTxjqj+u1jdj+u2jqj+v1jTimeijdj+v2jTimeijqj+e1ijdj+e2ijqj. (11)

Table 1.

Wide and Long Datasets for Multivariate Multilevel Data.

Wide
ID Time Depression Quality of Life
1 0 −1.37 −0.58
1 1 −1.31 0.24
1 2 −0.52 0.01
2 0 0.05 2.00
2 1 −0.84 1.83
2 2 −1.86 0.31
Long
ID Time Y Depression Quality of Life
1 0 −1.37 1 0
1 1 −1.31 1 0
1 2 −0.52 1 0
1 0 −0.58 0 1
1 1 0.24 0 1
1 2 0.01 0 1
2 0 0.05 1 0
2 1 −0.84 1 0
2 2 −1.86 1 0
2 0 2.00 0 1
2 1 1.83 0 1
2 2 0.31 0 1

The random effects (u1j, u2j, v1j, and v2j) and residual errors (e1ij and e2ij) are distributed as in Equations (9) and (10), respectively.

We can verify that Equation (11) is identical to Equations (1) and (2) by examining the value of yhij when plugging in the appropriate values of dj and qj for depression and quality of life. The value of yhij for the depression outcome is:

(yhijdj=1,qj=0)=β10+β11Timeij+β12Txj+β13TimeijTxj+u1j+v1jTimeij+e1ij. (12)

Likewise, the value of yhij for quality of life is:

(yhijdj=0,qj=1)=β20+β21Timeij+β22Txj+β23TimeijTxj+u2j+v2jTimeij+e2ij. (13)

Differential Treatment Effects

Using the simulated treatment data, we examined the differential treatment effects of CBT versus control on depression and quality of life. The online appendix material provides the data and annotated syntax for estimating these models in Stata, SPSS, SAS, Mplus, and R. We report the output from xtmixed in Stata using maximum likelihood estimation, although the software packages provide identical results out to 4 to 5 decimals places. In this section, we first compare the results and fit of the univariate, multivariate independent outcomes, and multivariate related outcomes models so that we can compare the results of each model. Second, we show how to test for differential treatment effects using a post analysis contrast and how the standard error for this contrast differs across the multivariate independent outcomes and multivariate related outcomes models. Third, we describe two methods in addition to the post analysis contrast for testing differential treatment effects. Finally, testing differential treatment effects across outcomes requires that the outcomes be on the same metric. We discuss why outcomes need to be on the same metric and how one can use standardized scores in situations where outcomes are on different metrics and one wants to test differential effects.

Table 2 provides the estimates and standard errors for the fixed effects, the variance and covariance estimates for the random effects and residuals, and the deviance for each model. Table 2 makes clear that the independent outcomes model is identical to the two univariate models put together. The independent outcomes model estimates all the parameters in the two univariate models. Furthermore, the deviance, a measure of model fit (Singer & Willet, 2003), of the independent outcomes model is the sum of the deviance of the two univariate models. Thus, we can use the independent outcomes model as a comparison to investigate whether the multivariate related outcomes model improves upon the univariate models typically used in clinical research.

Table 2.

Results of Univariate and Multivariate Models for Depression and Quality of Life Outcomes.

Parameter Univariate Multivariate

Depression Quality of Life Independent Outcomes Related Outcomes
Depression
 Intercept β10 −.08 (.13) -- −.08 (.13) −.08 (.13)
 Time β11 .06 (.07) -- .06 (.07) .06 (.07)
 Treatment β12 .16 (.18) -- .16 (.18) .16 (.18)
 Time × Treatment β13 −.41* (.10) -- −.41* (.10) −.41* (.10)
Quality of Life
 Intercept β20 -- −.08 (.14) −.08 (.14) −.08 (.14)
 Time β21 -- .002 (.08) .002 (.08) .002 (.08)
 Treatment β22 -- .08 (.20) .08 (.20) .08 (.20)
 Time × Treatment β23 -- −.09 (.14) −.09 (.14) −.09 (.14)
Depression
u1j
σu12
.49 -- .49 .49
v1j
σv12
.04 -- .04 .04
 cov(u1j,v1j) σu1v1 .004 -- .004 .004
Quality of Life
u2j
σu22
-- .73 .73 .73
v2j
σv22
-- .14 .14 .14
 cov(u2j,v2j) σu2v2 -- −.14 −.14 −.14
Between Outcomes
 cov(u1j,u2j) σu1u2 -- -- -- .16
 cov(u1j,v2j) σu1v2 -- -- -- −.04
 cov(v1j,u2j) σv1u2 -- -- -- −.04
 cov(v1j,v2j) σv1v2 -- -- -- .06
Residuals
e 1ij
σe12
.39 -- .39 .39
e2ij
σe22
-- .30 .30 .30
 cov(e 1ij,e2ij) σe1e2 -- -- -- .15
Deviance 752.268 749.363 1501.631 1444.798

Note.

*

p < .05

Because the independent outcomes model is nested within the related outcomes model, we can test whether the addition of the covariances among outcomes in the random effects and residuals significantly improves fit using a likelihood ratio test. The independent outcomes model is considered nested within the related outcomes model because the independent outcomes model (a) uses the same data as the related outcomes model and (b) is a constrained version of the related outcomes model (i.e., the between-outcome covariances are constrained to zero). The likelihood ratio test compares the difference between the deviances of the models (1501.63 − 1444.80 = 56.83; smaller is better) to a χ2-distribution with degrees of freedom equal to the difference in the number of parameters between models (21 − 16 = 5). There are five additional parameters as the related outcomes model includes four additional covariances in the random effects of intercepts and slopes, and also allows the residual errors to be correlated. The likelihood ratio test equals χ2(5)=56.83, p<.001, which indicates that the related outcomes model fits the data better than the independent outcomes model. This test is also a joint test of the significance for the additional covariance parameters in the related outcomes model.

Although the related outcomes model has the best fit, Table 2 indicates that the parameter estimates do not change from independent model to fully multivariate model. Thus, it is reasonable to ask why should we bother with the multivariate model? All parameter estimates in Table 2, except for the covariances among the random effects between outcomes and covariance among residuals between outcomes, are univariate parameters—only data from one of the outcomes contributes to the univariate estimates. Consequently, the estimates and standard errors, for both the fixed effects and variance/covariances, are identical across models. In some cases, some parameters are not estimated and thus assumed zero but that does not affect the estimates of the univariate parameters. However, when we consider multivariate parameters, the differences between the independent outcomes and related outcomes models is important.

The additional covariances will affect tests of whether the treatment effects differ across outcomes. First, consider the univariate treatment effects for depression and quality of life, which are the two time×treatment interactions terms, β13 and β23. Each coefficient describes the difference in rate of change among the intervention conditions for each outcome separately, with the null hypothesis being that there is no difference between conditions. The time×treatment interaction for depression was β13=−.41 (p < .05, 95% CI = −.60, −.22) whereas the time×treatment interaction for quality of life was β23=−.09 (p = .41, 95% CI = −.30, .12). Thus, we reject the null hypothesis of no treatment effect for depression but not for quality of life. We may be tempted to also conclude that the effects of treatment versus control are larger for depression than quality of life. However, this would be incorrect because we have only tested the univariate null hypothesis that each coefficient is zero. We have not tested whether the coefficients differ from one another—that is, we have not tested the multivariate null hypothesis that β13 − β23=0.

To test the null hypothesis that β13 − β23=0 we can use a post-analysis contrast. A common form for this test is:

z=β13-β23seβ13-β23;seβ13-β23=σβ132+σβ232-2σβ13β23, (14)

where the denominator is the standard error of the difference between treatment effects, σβ132 and σβ232 are the expected variability of the treatment effect across samples and σβ13β23 is the covariance between the treatment effects across samples.3 As we have seen, the independent outcomes and related outcomes models will provide identical estimates of β13 and β23, so the numerator in Equation (14) will be identical across models. Thus, the key part of Equation (14) is the standard error, specifically the covariance, σβ13β23. If depression and quality of life are correlated, then independent outcomes misspecifies the relationships among the random effects and among the residuals and σβ13β23 will incorrectly be set to 0. In contrast, the related outcomes model correctly specifies the relationships among the random effects and residuals and σβ13β23 will be positive. The misspecification in the independent outcomes model means the standard error will be too large and will thus reduce power.

The test of differential treatment effects was significant in both the independent outcomes (β13 − β23=−0.32, se = 0.15, z = −2.23, p = 0.03) and related outcomes model (β13 − β23=−0.32, se = 0.10, z = −3.14, p = 0.002). Note, however, that the standard error in the independent outcomes model was 50% larger than in the related outcomes model because the independent outcomes model does not take into account the correlation among the outcomes. To provide a sense of how the larger standard error will impact power, we simulated 10,000 additional datasets using the population parameters described above and fit the independent and related outcomes models to each dataset. To assess power, we computed how often the test of the differential treatment effects was significant across the 10,000 datasets. Power was 0.81 for the related outcomes model and 0.68 for the independent outcomes model, a 16% decrease in power.

Equation (14) is not the only method for evaluating differential intervention effects. A second method is to use a likelihood ratio test to compare the fit of a model that estimates distinct treatment effects by outcome such as Equation (11) to a model that estimates a common treatment effect for outcomes. The model with a common treatment effect is identical to Equation (11) except in the second model we estimate a common time×treatment interaction (underlined):

yhij=β10dj+β20qj+β11Timeijdj+β21Timeijqj+β12Txjdj+β22Txjqj+β3TimeijTxj_+u1jdj+u2jqj+v1jTimeijdj+v2jTimeijqj+e1ijdj+e2ijqj. (15)

The common time × treatment is constructed by multiplying the time variable and the treatment indicator irrespective of outcome type. This is in contrast to Equation (11) where the time × treatment interaction is multiplied by the outcome variable indicators so that unique treatment effects can be estimate for each outcome. The null hypothesis for the likelihood ratio test is that there is no difference in model fit between the model with a common time×treatment interaction—Equation (15)—and the model with unique time×treatment interactions—Equation (11).4 In our example, the likelihood ratio test was significant, χ2(1)=9.43, p<.01, indicating that the model with unique treatment effects for outcome types fit the data better than the model with a common treatment effect.

The third method for testing for differential treatment effects is to add a three-way interaction between time, treatment, and either dj or qj to Equation (15):

yhij=β10dj+β20qj+β11Timeijdj+β21Timeijqj+β12Txjdj+β22Txjqj+β3TimeijTxj+β4TimeijTxjdj_+u1jdj+u2jqj+v1jTimeijdj+v2jTimeijqj+e1ijdj+e2ijqj. (16)

It does not matter whether we use dj or qj; however, because we used dj then β3 is interpreted as the time×treatment interaction for quality of life and β4 is difference between the time×treatment interaction for depression and the time×treatment interaction for quality of life. Thus, the null hypothesis for β4 is that there is no difference between the time×treatment interaction across outcomes—that is, the treatment effect does not differ with respect to outcome. The significance test for β4 is identical to the post-analysis contrast of the difference between β13 and β23 from Equation (11).

Suppose that none of the tests of the differential treatment effect was significant. This would indicate that the outcomes share a common treatment effect. In that case, the model in Equation (15), which estimates a common treatment effect across outcomes, could be used in place of the model in Equation (11), which estimates unique treatment effects. Note that this simplification is justified because we formally tested whether the coefficients differed. Simply evaluating the significance of the coefficients in Equation (11) does not provide a basis for estimating a common effect.

Although we have focused on using these tests to evaluate differential treatment effects across outcomes, the three methods we described can be used to test whether any fixed effect differs across outcomes. For example, in a psychopathology study, we could use these methods to evaluate whether the average growth in one variable differs from the average growth in a second variable. Furthermore, these tests can be extended to more than two outcomes in which once can test whether all effects are equal.

In order for the tests of differential treatment effects, or any test of differences between fixed effects, to be interpretable, it is essential that the outcome variables be on the same metric as they were in our example. Recall that the null hypothesis for the differential treatment effect is that β13 − β23=0. The scale of β13 and β23 are dictated by the scale of the outcome variable. If depression and quality of life are on different metrics then the difference between the coefficients may be non-zero as function of the metrics rather than a real difference. Consequently, the difference will be difficult, if not impossible, to interpret and the hypothesis test incorrect. This is precisely the same problem faced in meta-analysis. In a meta-analysis of randomized trials we aim to compare treatment effects across different measures with different scales. To solve this problem we standardize the treatment effects using effect sizes. In treatment meta-analyses, we often use Cohen’s d, which expresses mean differences in terms of standard deviation units. We recommend a similar strategy for multivariate models where the outcome variables are on different metrics and where the aim of the analysis is to test differences in fixed effects across outcomes. The standardization could take several forms but a reasonable choice would be to create z-scores using the mean and standard deviation from all time points.

Examining the Relationship Between Outcomes

Up to this point we have used the multivariate multilevel model to examine whether the relationship between a predictor variable (e.g., treatment condition) and the outcome variable differed across outcomes type (e.g., depression versus quality of life)—that is, we compared the fixed effects across outcomes. We can also use the multivariate model to examine the relationship between the outcomes. Suvak et al. (2009) used multivariate multilevel models to explore the relationship between intrusion and avoidance over time in a sample of trauma survivors. For example, they examined the correlation between rate of change in intrusion symptoms and avoidance symptoms (i.e., the correlation between the random slopes) as well as the correlation between initial intrusion symptoms and rate of change in avoidance symptoms (i.e., the correlation between the random intercept and random slope) and vice versa.

In our example data, we considered the two categories of correlations used by Suvak et al. (2009). The first category of correlation is to examine the relationship between similar parameters across outcomes. We can examine the relationship between the random intercepts from each outcome to assess how the person-specific baseline values are related. Likewise, we can examine the relationship between the random slopes from each outcome to assess how the person-specific rates of change are related. The formulae for the correlation between intercepts, rI, and slopes, rS, are (Fieuws & Verbeke, 2004):

rI=σu1u2σu12σu22 (17)
rS=σv1v2σv12σv22 (18)

Each of the components in Equations (17) and (18) are drawn from the covariance matrix of the multivariate related outcomes model described in Equation (9) and displayed in Table 2. Thus, the numerator of Equation (17) is the covariance between the random intercepts and the denominator is the product of the square root of the variance of the intercepts. Equation (18) is identical except it involves the variances and covariance of the random slopes. The correlation among the intercepts was rI =.26,5 indicating a small positive relationship between the person-specific baseline values of depression and quality of life. The correlation among the slopes was rS =.76, indicating a strong positive relationship between the person-specific rates of change for depression and quality of life.

Power can be a significant challenge when estimating the relationship between slopes. Hertzog, Lindenberger, Ghisletta, and von Oertzen (2006) showed that power to detect relationships among slopes can be low even with large samples and several measurement occasions.6 A key factor in determining power is what they called Growth Curve Reliability, which can be defined as the proportion of the total variability in an outcome that is accounted for by person-to-person variability in intercepts and slopes. If the reliability is low, power for examining the relationship between slopes will typically be low, even when sample sizes are large. Researchers wishing to examine relationships between outcomes will need to consider these issues when designing their studies and interpreting the results of the multivariate models.

The second category of correlation is to examine the relationship between distinct parameters across outcomes. Substantively, we might consider whether baseline quality of life affects the rate of change in depression symptoms. The formula for the correlation between the random intercepts for quality of life and the rate of change for depression is:

rIS=σu2v1σu22σv12. (19)

The formula the correlation between the random intercepts for depression and the rate of change for quality of life has the same form but uses the appropriate parameters. There was a small negative correlation between baseline values for quality of life and rate of change in depression, rIS = −.23. The correlation between baseline values in depression and rate of change in quality of life was also small and negative, rIS = −.16. Thus, there was little relationship between a participant’s baseline standing on one outcome and the rate of change on the other outcome.

Extending the Model to Three or More Outcomes

The multivariate model can be extended to three or more outcomes by adding fixed and random effects for the third outcome. For example, suppose in addition to depression and quality of life we also measured anxiety symptoms. To Equations (1) and (2) we would add the following equation:

y3ij=β30+β31Timeij+β32Txj+β33TimeijTxj+u3j+v3jTimeij+e3ij, (20)

where the parameters have the same interpretation as before except the pertain to anxiety. In order to accommodate the model with three outcomes in software that expects the data in long format, we would again use indicator variables to identify which parameters go with particular outcomes. In this case, we create three indicator variables, dj, qj, and a j, where dj = 1 for depression and 0 for the others, qj = 1 for quality of life and 0 for the others, aj = 1 for anxiety and 0 for the others. As previously, we use the indicator variables to create a single equation:

yhij=β10dj+β20qj+β30aj+β11Timeijdj+β21Timeijqj+β31Timeijaj+β12Txjdj+β22Txjqj+β32Txjaj+β13TimeijTxjdj+β23TimeijTxjqj+β33TimeijTxjaj+u1jdj+u2jqj+u3jaj+v1jTimeijdj+v2jTimeijqj+v3jTimeijaj+e1ijdj+e2ijqj+e3ijaj (21)

The variance/covariance matrices for the random effects and the residuals are also extended. The random effects matrix includes all six random effects:

[u1jv1ju2jv2ju3jv3j]~MVN(0,ΩG),ΩG=[σu12σu1v1σv12σu1u2σv1u2σu22σu1v2σv1v2σu2v2σv22σu1u3σv1u3σu1u3σv2u3σu32σu1v3σv1v3σu2v3σv2v3σu3v3σv32]. (22)

The residual matrix includes all three residuals:

[e1ije2ije3ij]~MVN(0,ΩR),ΩR=[σe12σe1e2σe22σe1e3σe2e3σe32]. (23)

For both matrices the interpretation of the parameters is the same as discussed previously except now the matrices include additional variances and covariances involving the third outcome.

The types of question we can examine with three or more outcomes are similar to the model with two outcomes. We can compute the correlations among the random effects. Likewise, we can compare whether treatment effects differ across the three outcomes or whether the treatment effect on depression differs from the average of the other outcomes. As before, scaling of the outcome variables needs to be identical for the comparisons to be meaningful. We can also estimate a joint test of whether all three treatment effects are zero using a post-analysis contrast. However, the simplest way is to use a likelihood ratio test that compares the fit of the model in Equation (21) to a model that is identical to Equation (21) except that it excludes all time × treatment effects (i.e., fixes all treatment effects to zero).

The model with three outcomes is considerably more complex than even the model with two outcomes. For example, the model with three outcomes involves 39 parameters and the model with two outcomes involves 21 parameters. It can be challenging to ensure that the more complicated model is correctly specified in software programs as there are many variables and interactions in both the fixed and random effects portions of the model. Furthermore, most multilevel software was not explicitly designed to model multivariate data in this way. Consequently, estimation can be slow, especially with large datasets. Multilevel software that is multivariate in nature, such as Mplus, typically does not have much trouble estimating these models, even with large datasets.

Extensions

In the present paper, we detailed how to use multivariate multilevel models to examine the relationship between multiple outcomes in a clinical trial. However, this is one of many possible applications of multivariate models that psychotherapy researchers might use. For example, models for dyadic psychotherapy data can be framed as a multivariate model. Suppose one had collected martial satisfaction data three times for 100 couples. One way to model this data is to use a 3-level multilevel model, with observations at level-1, participant at level-2, and dyad at level-3. Alternatively, we could use a 2-level multivariate model with two outcomes, one for each partner (Atkins, 2005, describes the benefits and drawbacks for both 2- and 3-level models). This multivariate model is similar to our differential treatment effects examples, except we would exchange the primary and secondary outcomes with the partners’ outcomes. The actor-partner interdependence model for dyadic data can also be estimated using a multivariate multilevel model (Kenny, Kashy, & Cook, 2006).

Multivariate multilevel models can be used in social relations modeling (Kenny, 1994; Kenny et al., 2006) approaches to psychotherapy data. For example, Marcus, Kashy, and Baldwin (2009) used a multivariate multilevel model to apply a social relations model to therapeutic alliance data (see also Marcus, Kashy, Wintersteen, & Diamond, 2011). In this study, patients and therapists each rated the alliance. The primary question was whether therapists and clients agree with respect to their alliance ratings. That is, if a therapist consistently rates her alliances as high across her caseload, do her patients also rate the alliance high across her caseload—the correlation among the patient and therapist alliance ratings at the therapist-level. Additionally, they examined whether therapists and patients agree within caseloads—the correlation among the patient and therapist ratings at the patient-level (see also Imel, Hubbard, Rutter, & Simon, 2013).

Other examples of multivariate multilevel models relevant to psychotherapy data include using them: (a) to examine mediation hypotheses in a multilevel context (Bauer, Preacher, & Gil, 2006), in which both the dependent variable and the mediator(s) are included as outcomes in the multilevel model; (b) to fit measurement models where item responses are included as outcomes in the multilevel model (Zheng & Rabe-Hesketh, 2007); (c) to fit multivariate meta-analyses where two or more effect sizes from a single study are analyzed simultaneously (Riley, Abrams, Sutton, Lambert, & Thompson, 2007; White, 2011); (d) to examine the relationship between outcomes with different distributional forms (e.g., comparing a normally distributed outcome to a binary outcome; Hadfield, 2010); and (e) to simultaneously examine predictors unique to each outcome or different functional forms for change across outcomes (e.g., linear change for one outcome and quadratic change for another; MacCallum et al., 1997).

Another extension to the models we have presented is to estimate them from a structural equation modeling perspective or from a perspective that combines multilevel modeling and structural equation modeling.7 For example, Bollen and Curran (2006) discuss a multivariate latent growth curve model similar to our example but where one estimates regression paths between the random intercepts and slopes, rather than covariances only. For example, in an alcohol treatment study, if we hypothesized that variability among patients in depressive symptoms prior to treatment (i.e., random intercept for depression) would predict variability in the rate of change in drinking behavior (i.e., random slope for drinking behavior), we could estimate the regression path between the random intercept for depression and random slope for drinking behavior.

Mehta and Neale (2005) also illustrate how multilevel factor analysis is an extension of the multivariate multilevel models. They fit a multilevel factor analysis model to literacy data that included five measures of literacy taken on students clustered within schools. They fit a multilevel model with random intercepts for each of the schools. Rather than estimating correlations among the random intercepts and residuals, they estimated a common factor at both the school-level (i.e., random intercept) and student-level (i.e., residual). To be sure, fitting such a model is challenging and may require more data than is typically available to psychotherapy researchers. Nevertheless, Mehta and Neale’s (2005) example provides an additional illustration of how we could extend the models discussed in this paper. Other interesting methodological developments are likely to occur at the intersection of multilevel modeling and structural equation modeling.

Conclusion

We introduced the multivariate multilevel model as a way to examine hypotheses that are important to psychotherapy research. As with any methodology, advances in psychotherapy research are not going to take place simply because we apply multivariate multilevel models to our data. However, many, if not most, theories about how and why psychotherapy works are multivariate in nature. As we have shown, these multivariate hypotheses can be addressed with a multivariate multilevel model in ways that are not possible with univariate models. We suspect as more researchers become familiar with these techniques we will see more creative uses of multivariate models that improve our understanding of psychotherapy.

Supplementary Material

Dataset_long_format
Dataset_wide_format
Supplemental_Appendix

Acknowledgments

David C. Atkins’ effort was supported by Award Number R01 AA019511 from the National Institute on Alcohol Abuse and Alcoholism (NIAAA). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIAAA or the National Institutes of Health.

Footnotes

1

Occasionally, the term multivariate is used to refer to models with a single outcome variable but with multiple predictor variables. In this paper, we use the term multivariate to refer to models with two or more outcome variables.

2

Note that the when using maximum likelihood methods to estimate these models, the random effects are not estimated. Rather, the variance/covariance matrix of the random effects is estimated. The random effects can be predicted using empirical Bayes methods (Raudenbush & Bryk, 2002).

3

These variances and covariances are part of the asymptotic variance/covariance matrix of the fixed effects. See the online appendix for instructions how to request this information within statistical packages.

4

Because Equations (11) and (15) differ with respect to the fixed effects, the likelihood ratio test can only be used if maximum likelihood rather than restricted maximum likelihood estimation was used (see Singer & Willet, 2003, for an introductory disscussion of likelihood ratio tests).

5

Standard errors for this correlation, as well as the others described in this section, can be obtained using the delta method (Fieuws & Verbeke, 2004). Stata and Mplus will provide standard errors based on the delta method. However, the delta method assumes the sampling distribution of these correlations is normally distributed, which it is not. Consequently, the delta method can produce problematic results, such as confidence intervals that exceed the boundary of a correlation. Alternatives include fitting a model with a covariance a model with the covariance constrained to zero and using a likelihood ratio test to compare the fit. Additionally, bootstrapping the confidence intervals can be helpful. Finally, Bayesian methods provide a useful alternative for interval estimation for correlations (Baldwin & Fellingham, 2012).

6

We thank an anonymous reviewer for pointing us toward this reference.

7

The intersection between structural equation modeling and multilevel models has been discussed at length (e.g., Bauer, 2003; Curran, 2003; Skrondal & Rabe-Hesketh, 2004), with a number of writers illustrating how one can parameterize structural equation models to reproduce a multilevel model (Bauer, 2003; Curran, 2003; Mehta & Neale, 2005). Theoretical work has been done to frame multilevel models and structural equation models as special cases of a broader class of models known as generalized latent variable models (Skrondal & Rabe-Hesketh, 2004). In fact, the notion that multilevel models and structural equation models are both latent variable models is a foundational idea for some software programs including gllamm (which stands for Generalized Linear Latent and Mixed Models) in Stata (Rabe-Hesketh, Skrondal, & Pickles, 2002) and Mplus (Muthén & Muthén, 2012).

Contributor Information

Scott A. Baldwin, Brigham Young University

Zac E. Imel, University of Utah

Scott R. Braithwaite, Brigham Young University

David C. Atkins, University of Washington

References

  1. Atkins DC. Using multilevel models to analyze couple and family treatment data: Basic and advanced issues. Journal of Family Psychology. 2005;19:98–110. doi: 10.1037/0893-3200.19.1.98. [DOI] [PubMed] [Google Scholar]
  2. Baldwin SA, Bauer DJ, Stice E, Rohde P. Evaluating models for partially clustered designs. Psychological Methods. 2011;16:149–165. doi: 10.1037/a0023464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Baldwin SA, Fellingham GW. Bayesian methods for the analysis of small sample multilevel data with a complex variance structure. Psychological Methods. 2012 doi: 10.1037/a0030642. Advance Online Publication. [DOI] [PubMed] [Google Scholar]
  4. Baldwin SA, Imel ZE. Therapist effects: Findings and methods. In: Lambert MJ, editor. Bergin and Garfield’s Handbook of Psychotherapy and Behavior Change. 6. New York: Wiley; 2013. pp. 258–297. [Google Scholar]
  5. Baldwin SA, Murray DM, Shadish WR. Empirically supported treatments or Type I errors? Problems with the analysis of data from group-administered treatments. Journal of Consulting and Clinical Psychology. 2005;73:924–935. doi: 10.1037/0022-006X.73.5.924. [DOI] [PubMed] [Google Scholar]
  6. Bauer DJ. Estimating multilevel linear models as structural equation models. Journal of Educational and Behavioral Statistics. 2003;28:135–167. [Google Scholar]
  7. Bauer DJ, Preacher KJ, Gil KM. Conceptualizing and testing random indirect effects and moderated mediation in multilevel models: New procedures and recommendations. Psychological Methods. 2006;11:142–163. doi: 10.1037/1082-989X.11.2.142. [DOI] [PubMed] [Google Scholar]
  8. Bauer DJ, Sterba SK, Hallfors DD. Evaluating group-based interventions when control participants are ungrouped. Multivariate Behavioral Research. 2008;43:210–236. doi: 10.1080/00273170802034810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bollen KA, Curran PJ. Latent curve models: A structural equation perspective. Hoboken, NJ: Wiley; 2006. [Google Scholar]
  10. Crits-Christoph P, Baranackie K, Kurcias JS, Beck AT, Carroll K, Perry K, Zitrin C. Meta-analysis of therapist effects in psychotherapy outcome studies. Psychotherapy Research. 1991;1:81–91. [Google Scholar]
  11. Crits-Christoph P, Mintz J. Implications of therapist effects for the design and analysis of comparative studies of psychotherapies. Journal of Consulting and Clinical Psychology. 1991;59:20–26. doi: 10.1037//0022-006x.59.1.20. [DOI] [PubMed] [Google Scholar]
  12. Curran PJ. Have multilevel models been structural equation models all along? Multivariate Behavioral Research. 2003;38:529–569. doi: 10.1207/s15327906mbr3804_5. [DOI] [PubMed] [Google Scholar]
  13. Fieuws S, Verbeke G. Joint modelling of multivariate longitudinal profiles: Pitfalls of the random-effects approach. Statistics in Medicine. 2004;23:3093–3104. doi: 10.1002/sim.1885. [DOI] [PubMed] [Google Scholar]
  14. Gelman A, Hill J. Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press; 2007. [Google Scholar]
  15. Hadfield JD. MCMC methods for multi-response generalized linear mixed models: The MCMCglmm R package. Journal of Statistical Software. 2010;33:1–22. [PMC free article] [PubMed] [Google Scholar]
  16. Hertzog C, Lindenberger U, Ghisletta P, Oertzen Tv. On the power of multivariate latent growth curve models to detect correlated change. Psychological Methods. 2006;11:244–252. doi: 10.1037/1082-989X.11.3.244. [DOI] [PubMed] [Google Scholar]
  17. Hofmann SG, Lohr JM. To kill a dodo bird. The Behavior Therapist. 2010;33:14–15. [Google Scholar]
  18. Hox JJ. Multilevel analysis: Techniques and applications. 2. New York: Routledge; 2010. [Google Scholar]
  19. Imel ZE, Baldwin SA, Bonus K, Maccoon D. Beyond the individual: Group effects in mindfulness-based stress reduction. Psychotherapy Research. 2008;18:735–742. doi: 10.1080/10503300802326038. [DOI] [PubMed] [Google Scholar]
  20. Imel ZE, Hubbard RA, Rutter CM, Simon G. Patient-rated alliance as a measure of therapist performance in two clinical settings. Journal of Consulting and Clinical Psychology. 2013;81:154–165. doi: 10.1037/a0030903.supp. [DOI] [PubMed] [Google Scholar]
  21. Jouriles EN, McDonald R, Rosenfield D, Stephens N, Corbitt-Shindler D, Miller PC. Reducing conduct problems among children exposed to intimate partner violence: A randomized clinical trial examining effects of project support. Journal of Consulting and Clinical Psychology. 2009;77:705–717. doi: 10.1037/a0015994. [DOI] [PubMed] [Google Scholar]
  22. Kaysen D, Atkins DC, Moore SA, Lindgren KP, Dillworth T, Simpson T. Alcohol Use, Problems, and the Course of Posttraumatic Stress Disorder: A Prospective Study of Female Crime Victims. Journal of Dual Diagnosis. 2011;7:262–279. doi: 10.1080/15504263.2011.620449. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Kenny DA. Interpersonal perception: A social relations analysis. New York: Guilford; 1994. [PubMed] [Google Scholar]
  24. Kenny DA, Kashy DA, Cook WL. Analysis of dyadic data. New York: Guilford; 2006. [Google Scholar]
  25. MacCallum RC, Kim C, Malarkey WB, Kiecolt-Glaser JK. Studying multivariate change using multilevel models and latent curve models. Multivariate Behavioral Research. 1997;32:215–253. doi: 10.1207/s15327906mbr3203_1. [DOI] [PubMed] [Google Scholar]
  26. Marcus DK, Kashy DA, Baldwin SA. Studying psychotherapy using the one-with-many design: The therapeutic alliance as an exemplar. Journal of Counseling Psychology. 2009;56:537–548. doi: 10.1037/a0017291. [DOI] [Google Scholar]
  27. Marcus DK, Kashy DA, Wintersteen MB, Diamond GS. The therapeutic alliance in adolescent substance abuse treatment: A one-with-many analysis. Journal of Counseling Psychology. 2011;58:449–455. doi: 10.1037/a0023196. [DOI] [PubMed] [Google Scholar]
  28. McDonagh A, Friedman M, McHugo G, Ford J, Sengupta A, Mueser K, Descamps M. Randomized trial of cognitive-behavioral therapy for chronic posttraumatic stress disorder in adult female survivors of childhood sexual abuse. Journal of Consulting and Clinical Psychology. 2005;73:515–524. doi: 10.1037/0022-006X.73.3.515. [DOI] [PubMed] [Google Scholar]
  29. Mehta PD, Neale MC. People are variables too: Multilevel structural equations modeling. Psychological Methods. 2005;10:259–284. doi: 10.1037/1082-989X.10.3.259. [DOI] [PubMed] [Google Scholar]
  30. Muthén LK, Muthén BO. Mplus user’s guide. 7. Los Angeles, CA: Muthén & Muthén; 2012. [Google Scholar]
  31. Nieuwenhuis S, Forstmann BU, Wagenmakers EJ. Erroneous analyses of interactions in neuroscience: A problem of significance. Nature neuroscience. 2011;14:1105–1107. doi: 10.1038/nn.2886. [DOI] [PubMed] [Google Scholar]
  32. Rabe-Hesketh S, Skrondal A. Multilevel and longitudinal modeling using Stata. College Station, TX: Stata Press; 2008. [Google Scholar]
  33. Rabe-Hesketh S, Skrondal A, Pickles A. Reliable estimation of generalized linear mixed models using adaptive quadrature. The Stata Journal. 2002;2:1–21. [Google Scholar]
  34. Raudenbush SW, Bryk AS. Hierarchical linear models: Applications and data analysis methods. Thousand Oaks, CA: Springer; 2002. [Google Scholar]
  35. Riley RD, Abrams KR, Sutton AJ, Lambert PC, Thompson JR. Bivariate random-effects meta-analysis and the estimation of between-study correlation. BMC Medical Research Methodology. 2007;7:3. doi: 10.1186/1471-2288-7-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Saxon D, Barkham M. Patterns of therapist variability: Therapist effects and the contribution of patient severity and risk. 2012;80:535–546. doi: 10.1037/a0028898. [DOI] [PubMed] [Google Scholar]
  37. Singer JD, Willet JB. Applied longitudinal data analysis: Modeling change and event occurence. New York: Oxford University Press; 2003. [Google Scholar]
  38. Skrondal A, Rabe-Hesketh S. Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models. Boca Raton, FL: Chapman & Hall/CRC Press; 2004. [Google Scholar]
  39. Snijders TAB, Bosker RJ. Multilevel analysis: An introduction to basic and advanced multilevel modeling. 2. Thousand Oaks, CA: Sage; 2012. [Google Scholar]
  40. Suvak MK, Walling SM, Iverson KM, Taft CT, Resick PA. Multilevel regression analyses to investigate the relationship between two variables over time: Examining the longitudinal association between intrusion and avoidance. Journal of traumatic stress. 2009;22:622–631. doi: 10.1002/jts.20476. [DOI] [PubMed] [Google Scholar]
  41. Wampold BE, Serlin RC. The consequences of ignoring a nested factor on measures of effect size in analysis of variance. Psychological Methods. 2000;5:425–433. doi: 10.1037/1082-989x.5.4.425. [DOI] [PubMed] [Google Scholar]
  42. White IR. Multivariate random-effects meta-regression: Updates to mvmeta. The Stata Journal. 2011;11:255–270. [Google Scholar]
  43. Zheng X, Rabe-Hesketh S. Estimating parameters of dichotomous and ordinal item response models with gllamm. The Stata Journal. 2007;7:313–333. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Dataset_long_format
Dataset_wide_format
Supplemental_Appendix

RESOURCES